# Deepgram API

Proxies the approved Deepgram launch surface through Cohesivity. Cohesivity injects the upstream Deepgram Token server-side; tenant requests authenticate only with the Cohesivity application key or an edge session token. Do not send a Deepgram key.

## Prerequisites

Provision this resource before use. Edge requests without provisioning will error.

### Provision

    curl -s -X POST https://cohesivity.ai/api/resources/deepgram-api \
      -H "Authorization: Bearer <coh_management_key>"

### Delete

    curl -s -X DELETE https://cohesivity.ai/api/resources/deepgram-api \
      -H "Authorization: Bearer <coh_management_key>"

**Important:** Provision this resource now, before building or running the application. Provisioning is the agent's job, not the application's.

## Official Docs

https://developers.deepgram.com/docs — read the relevant speech-to-text, text-to-speech, and Voice Agent docs before coding.

> **Server-side only.** `coh_application_key` is a secret. Call this from your `vercel-hosting` API routes, `cloudflare-workers`, or your own server tier — never from a browser, mobile app, or other client-side code. See the canonical key-secrecy directive in `.cohesivity` for details.

## Edge Usage

- **Base URL:** https://cohesivity.ai/edge/deepgram-api
- **Auth:** `coh_application_key` as the **key** query parameter, or a Cohesivity edge session token in `Authorization: Bearer <token>`
- **Upstream auth safety:** do not send a Deepgram API key. Cohesivity strips tenant `Authorization`, cookies, and `key` before adding `Authorization: Token <platform key>` upstream.
- **Claimed billing:** fluid-only after rate, duration, and concurrency checks. There is no fixed monthly Deepgram STT/TTS/Voice Agent bucket.

## Supported Endpoints

- `POST /v1/listen` — prerecorded English nova-3 speech-to-text. Omit `model` / `language` or set `model=nova-3&language=en`; other query options are blocked.
- `WS /v1/listen` — streaming English nova-3 speech-to-text. Requires `encoding=linear16`, explicit integer `sample_rate`, and mono audio (`channels=1` or omitted) so duration can be metered exactly.
- `POST /v1/speak` — Aura-2 English text-to-speech. JSON body is exactly `{ "text": "..." }`; max 2000 characters. Omitted model defaults to `aura-2-thalia-en`; other allowed models are Aura-2 English voices.
- `WS /v1/agent/converse` — Deepgram Voice Agent with Deepgram `flux-general-en` listen using `version: "v2"`, Aura-2 English speak, OpenAI think provider from the launch allowlist, and first Settings audio declared as linear16 input/output with wav output.

Everything else is rejected before upstream, including `/v2/listen`, standalone Flux outside Voice Agent, model/admin/key/token/billing endpoints, BYO provider keys, custom provider endpoints, custom headers, callbacks/webhooks, and tool/function/MCP connector surfaces.

## Voice Agent Policy

- First client message must be a JSON `Settings` message. Cohesivity validates it before opening the upstream Voice Agent socket.
- First Settings must include top-level `audio.input` and `audio.output` objects using `encoding: "linear16"` with explicit integer sample rates from 8000 through 48000; `audio.output.container` must be `"wav"`.
- Listen provider must include explicit `type: "deepgram"`, model `flux-general-en`, and `version: "v2"`. Omit `language` for `flux-general-en` because the model carries English selection. `nova-3`, `flux-general-multi`, and every other listen model are blocked for Voice Agent.
- Speak provider must include explicit `type: "deepgram"` and an approved Aura-2 English voice: `aura-2-amalthea-en`, `aura-2-andromeda-en`, `aura-2-apollo-en`, `aura-2-arcas-en`, `aura-2-aries-en`, `aura-2-asteria-en`, `aura-2-athena-en`, `aura-2-atlas-en`, `aura-2-aurora-en`, `aura-2-callista-en`, `aura-2-cora-en`, `aura-2-cordelia-en`, `aura-2-delia-en`, `aura-2-draco-en`, `aura-2-electra-en`, `aura-2-harmonia-en`, `aura-2-helena-en`, `aura-2-hera-en`, `aura-2-hermes-en`, `aura-2-hyperion-en`, `aura-2-iris-en`, `aura-2-janus-en`, `aura-2-juno-en`, `aura-2-jupiter-en`, `aura-2-luna-en`, `aura-2-mars-en`, `aura-2-minerva-en`, `aura-2-neptune-en`, `aura-2-odysseus-en`, `aura-2-ophelia-en`, `aura-2-orion-en`, `aura-2-orpheus-en`, `aura-2-pandora-en`, `aura-2-phoebe-en`, `aura-2-pluto-en`, `aura-2-saturn-en`, `aura-2-selene-en`, `aura-2-thalia-en`, `aura-2-theia-en`, `aura-2-vesta-en`, or `aura-2-zeus-en`.
- Think provider must include explicit `type: "open_ai"` with exactly one of `gpt-5`, `gpt-5-mini`, `gpt-5-nano`, `gpt-4.1`, `gpt-4.1-mini`, `gpt-4.1-nano`, `gpt-4o`, or `gpt-4o-mini`.
- Session duration caps are Ephemeral STT 120s / Voice Agent 180s, Claimed Free 600s / 600s, Plus 1800s / 1800s, and Pro 1800s / 1800s.
- Google/Gemini, Anthropic, Groq, Bedrock, custom providers, custom endpoints, custom headers, BYO credentials (including bare `key` fields), multiple think providers, callbacks, and tools/functions are blocked on initial Settings and on later UpdateSpeak/UpdateThink messages.

## Examples

- Prerecorded STT: `POST https://cohesivity.ai/edge/deepgram-api/v1/listen?key=<coh_application_key>` with audio body and `Content-Type: audio/wav`
- Streaming STT: `wss://cohesivity.ai/edge/deepgram-api/v1/listen?encoding=linear16&sample_rate=16000&key=<coh_application_key>`
- TTS: `POST https://cohesivity.ai/edge/deepgram-api/v1/speak?key=<coh_application_key>` with body `{ "text": "Cohesivity gives agents speech APIs without provider keys." }`
- Voice Agent: `wss://cohesivity.ai/edge/deepgram-api/v1/agent/converse?key=<coh_application_key>` then send this first `Settings` JSON message before audio/control frames:

```json
{
  "type": "Settings",
  "audio": {
    "input": { "encoding": "linear16", "sample_rate": 24000 },
    "output": { "encoding": "linear16", "sample_rate": 24000, "container": "wav" }
  },
  "agent": {
    "listen": { "provider": { "type": "deepgram", "model": "flux-general-en", "version": "v2" } },
    "think": { "provider": { "type": "open_ai", "model": "gpt-5-nano" }, "prompt": "Reply briefly." },
    "speak": { "provider": { "type": "deepgram", "model": "aura-2-thalia-en" } },
    "greeting": "Hello."
  }
}
```

## Billing and Usage

- Failed upstream responses and upstream connection failures synchronously revoke the preflight reservation and do not burn quota or fluid.
- Prerecorded STT settles only when Deepgram returns a successful response with parseable duration metadata; otherwise Cohesivity revokes the reservation and returns a settlement error instead of guessing.
- Streaming STT duration is calculated from forwarded binary linear16 audio bytes and sample rate. Text/control frames are not counted as audio.
- TTS settles from the validated request text character count only after a successful upstream response.
- Voice Agent settles from accepted socket wall-clock duration, capped by the tier session limit.
- Runtime availability follows Cohesivity runtime release and promotion. This page documents the candidate behavior in code; it does not imply a candidate is stable/default-serving before promotion.

## Launch Rate Limits

Ephemeral tenants pause as a whole if any authoritative hard cap below is exceeded. Claimed tiers use account-scoped buckets shared across every project owned by the Cohesivity user; OpenAI, Deepgram, and Exa are fluid-only after tier, rate, and concurrency checks; Deepgram has no fixed monthly usage bucket for claimed tiers.

**Ephemeral**

- stt requests: 10 per ephemeral tenant lifetime before claim or expiry
- tts requests: 10 per ephemeral tenant lifetime before claim or expiry
- tts characters: 5000 per ephemeral tenant lifetime before claim or expiry
- voice agent sessions: 3 per ephemeral tenant lifetime before claim or expiry
- concurrent stt sessions: 1 max at once
- concurrent tts requests: 1 max at once
- concurrent voice agent sessions: 1 max at once
- stt requests: 2 per minute
- tts requests: 2 per minute
- voice agent sessions: 1 per minute

**Claimed Free**

- concurrent stt sessions: 2 max at once
- concurrent tts requests: 2 max at once
- concurrent voice agent sessions: 1 max at once
- stt requests: 10 per minute
- tts requests: 5 per minute
- voice agent sessions: 1 per minute

**Claimed Plus**

- concurrent stt sessions: 10 max at once
- concurrent tts requests: 5 max at once
- concurrent voice agent sessions: 5 max at once
- stt requests: 60 per minute
- tts requests: 20 per minute
- voice agent sessions: 5 per minute

**Claimed Pro**

- concurrent stt sessions: 25 max at once
- concurrent tts requests: 10 max at once
- concurrent voice agent sessions: 10 max at once
- stt requests: 150 per minute
- tts requests: 60 per minute
- voice agent sessions: 10 per minute

### Notes

- Deepgram is fluid-only for claimed accounts after strict endpoint, rate, duration, and concurrency checks. Ephemeral tenants get small STT/TTS/Voice Agent lifetime caps during the 72-hour claim window.
- Cohesivity exposes only English nova-3 standalone STT, Aura-2 English TTS, and Voice Agent over POST/WS /v1/listen, POST /v1/speak, and WS /v1/agent/converse. Voice Agent listen is restricted to flux-general-en with provider version v2, and first Settings must use linear16 input/output audio with wav output. Standalone Flux and /v2/listen remain blocked. Admin/model/key/billing surfaces, BYO credentials, custom providers, callbacks, and tool/function endpoints are blocked.