# Deepgram API Proxies the approved Deepgram launch surface through Cohesivity. Cohesivity injects the upstream Deepgram Token server-side; tenant requests authenticate only with the Cohesivity application key or an edge session token. Do not send a Deepgram key. ## Prerequisites Provision this resource before use. Edge requests without provisioning will error. ### Provision curl -s -X POST https://cohesivity.ai/api/resources/deepgram-api \ -H "Authorization: Bearer " ### Delete curl -s -X DELETE https://cohesivity.ai/api/resources/deepgram-api \ -H "Authorization: Bearer " **Important:** Provision this resource now, before building or running the application. Provisioning is the agent's job, not the application's. ## Official Docs https://developers.deepgram.com/docs — read the relevant speech-to-text, text-to-speech, and Voice Agent docs before coding. > **Server-side only.** `coh_application_key` is a secret. Call this from your `vercel-hosting` API routes, `cloudflare-workers`, or your own server tier — never from a browser, mobile app, or other client-side code. See the canonical key-secrecy directive in `.cohesivity` for details. ## Edge Usage - **Base URL:** https://cohesivity.ai/edge/deepgram-api - **Auth:** `coh_application_key` as the **key** query parameter, or a Cohesivity edge session token in `Authorization: Bearer ` - **Upstream auth safety:** do not send a Deepgram API key. Cohesivity strips tenant `Authorization`, cookies, and `key` before adding `Authorization: Token ` upstream. - **Claimed billing:** fluid-only after rate, duration, and concurrency checks. There is no fixed monthly Deepgram STT/TTS/Voice Agent bucket. ## Supported Endpoints - `POST /v1/listen` — prerecorded English nova-3 speech-to-text. Omit `model` / `language` or set `model=nova-3&language=en`; other query options are blocked. - `WS /v1/listen` — streaming English nova-3 speech-to-text. Requires `encoding=linear16`, explicit integer `sample_rate`, and mono audio (`channels=1` or omitted) so duration can be metered exactly. - `POST /v1/speak` — Aura-2 English text-to-speech. JSON body is exactly `{ "text": "..." }`; max 2000 characters. Omitted model defaults to `aura-2-thalia-en`; other allowed models are Aura-2 English voices. - `WS /v1/agent/converse` — Deepgram Voice Agent with Deepgram `flux-general-en` listen using `version: "v2"`, Aura-2 English speak, OpenAI think provider from the launch allowlist, and first Settings audio declared as linear16 input/output with wav output. Everything else is rejected before upstream, including `/v2/listen`, standalone Flux outside Voice Agent, model/admin/key/token/billing endpoints, BYO provider keys, custom provider endpoints, custom headers, callbacks/webhooks, and tool/function/MCP connector surfaces. ## Voice Agent Policy - First client message must be a JSON `Settings` message. Cohesivity validates it before opening the upstream Voice Agent socket. - First Settings must include top-level `audio.input` and `audio.output` objects using `encoding: "linear16"` with explicit integer sample rates from 8000 through 48000; `audio.output.container` must be `"wav"`. - Listen provider must include explicit `type: "deepgram"`, model `flux-general-en`, and `version: "v2"`. Omit `language` for `flux-general-en` because the model carries English selection. `nova-3`, `flux-general-multi`, and every other listen model are blocked for Voice Agent. - Speak provider must include explicit `type: "deepgram"` and an approved Aura-2 English voice: `aura-2-amalthea-en`, `aura-2-andromeda-en`, `aura-2-apollo-en`, `aura-2-arcas-en`, `aura-2-aries-en`, `aura-2-asteria-en`, `aura-2-athena-en`, `aura-2-atlas-en`, `aura-2-aurora-en`, `aura-2-callista-en`, `aura-2-cora-en`, `aura-2-cordelia-en`, `aura-2-delia-en`, `aura-2-draco-en`, `aura-2-electra-en`, `aura-2-harmonia-en`, `aura-2-helena-en`, `aura-2-hera-en`, `aura-2-hermes-en`, `aura-2-hyperion-en`, `aura-2-iris-en`, `aura-2-janus-en`, `aura-2-juno-en`, `aura-2-jupiter-en`, `aura-2-luna-en`, `aura-2-mars-en`, `aura-2-minerva-en`, `aura-2-neptune-en`, `aura-2-odysseus-en`, `aura-2-ophelia-en`, `aura-2-orion-en`, `aura-2-orpheus-en`, `aura-2-pandora-en`, `aura-2-phoebe-en`, `aura-2-pluto-en`, `aura-2-saturn-en`, `aura-2-selene-en`, `aura-2-thalia-en`, `aura-2-theia-en`, `aura-2-vesta-en`, or `aura-2-zeus-en`. - Think provider must include explicit `type: "open_ai"` with exactly one of `gpt-5`, `gpt-5-mini`, `gpt-5-nano`, `gpt-4.1`, `gpt-4.1-mini`, `gpt-4.1-nano`, `gpt-4o`, or `gpt-4o-mini`. - Session duration caps are Ephemeral STT 120s / Voice Agent 180s, Claimed Free 600s / 600s, Plus 1800s / 1800s, and Pro 1800s / 1800s. - Google/Gemini, Anthropic, Groq, Bedrock, custom providers, custom endpoints, custom headers, BYO credentials (including bare `key` fields), multiple think providers, callbacks, and tools/functions are blocked on initial Settings and on later UpdateSpeak/UpdateThink messages. ## Examples - Prerecorded STT: `POST https://cohesivity.ai/edge/deepgram-api/v1/listen?key=` with audio body and `Content-Type: audio/wav` - Streaming STT: `wss://cohesivity.ai/edge/deepgram-api/v1/listen?encoding=linear16&sample_rate=16000&key=` - TTS: `POST https://cohesivity.ai/edge/deepgram-api/v1/speak?key=` with body `{ "text": "Cohesivity gives agents speech APIs without provider keys." }` - Voice Agent: `wss://cohesivity.ai/edge/deepgram-api/v1/agent/converse?key=` then send this first `Settings` JSON message before audio/control frames: ```json { "type": "Settings", "audio": { "input": { "encoding": "linear16", "sample_rate": 24000 }, "output": { "encoding": "linear16", "sample_rate": 24000, "container": "wav" } }, "agent": { "listen": { "provider": { "type": "deepgram", "model": "flux-general-en", "version": "v2" } }, "think": { "provider": { "type": "open_ai", "model": "gpt-5-nano" }, "prompt": "Reply briefly." }, "speak": { "provider": { "type": "deepgram", "model": "aura-2-thalia-en" } }, "greeting": "Hello." } } ``` ## Billing and Usage - Failed upstream responses and upstream connection failures synchronously revoke the preflight reservation and do not burn quota or fluid. - Prerecorded STT settles only when Deepgram returns a successful response with parseable duration metadata; otherwise Cohesivity revokes the reservation and returns a settlement error instead of guessing. - Streaming STT duration is calculated from forwarded binary linear16 audio bytes and sample rate. Text/control frames are not counted as audio. - TTS settles from the validated request text character count only after a successful upstream response. - Voice Agent settles from accepted socket wall-clock duration, capped by the tier session limit. - Runtime availability follows Cohesivity runtime release and promotion. This page documents the candidate behavior in code; it does not imply a candidate is stable/default-serving before promotion. ## Launch Rate Limits Ephemeral tenants pause as a whole if any authoritative hard cap below is exceeded. Claimed tiers use account-scoped buckets shared across every project owned by the Cohesivity user; OpenAI, Deepgram, and Exa are fluid-only after tier, rate, and concurrency checks; Deepgram has no fixed monthly usage bucket for claimed tiers. **Ephemeral** - stt requests: 10 per ephemeral tenant lifetime before claim or expiry - tts requests: 10 per ephemeral tenant lifetime before claim or expiry - tts characters: 5000 per ephemeral tenant lifetime before claim or expiry - voice agent sessions: 3 per ephemeral tenant lifetime before claim or expiry - concurrent stt sessions: 1 max at once - concurrent tts requests: 1 max at once - concurrent voice agent sessions: 1 max at once - stt requests: 2 per minute - tts requests: 2 per minute - voice agent sessions: 1 per minute **Claimed Free** - concurrent stt sessions: 2 max at once - concurrent tts requests: 2 max at once - concurrent voice agent sessions: 1 max at once - stt requests: 10 per minute - tts requests: 5 per minute - voice agent sessions: 1 per minute **Claimed Plus** - concurrent stt sessions: 10 max at once - concurrent tts requests: 5 max at once - concurrent voice agent sessions: 5 max at once - stt requests: 60 per minute - tts requests: 20 per minute - voice agent sessions: 5 per minute **Claimed Pro** - concurrent stt sessions: 25 max at once - concurrent tts requests: 10 max at once - concurrent voice agent sessions: 10 max at once - stt requests: 150 per minute - tts requests: 60 per minute - voice agent sessions: 10 per minute ### Notes - Deepgram is fluid-only for claimed accounts after strict endpoint, rate, duration, and concurrency checks. Ephemeral tenants get small STT/TTS/Voice Agent lifetime caps during the 72-hour claim window. - Cohesivity exposes only English nova-3 standalone STT, Aura-2 English TTS, and Voice Agent over POST/WS /v1/listen, POST /v1/speak, and WS /v1/agent/converse. Voice Agent listen is restricted to flux-general-en with provider version v2, and first Settings must use linear16 input/output audio with wav output. Standalone Flux and /v2/listen remain blocked. Admin/model/key/billing surfaces, BYO credentials, custom providers, callbacks, and tool/function endpoints are blocked.