# AI Gateway Cohesivity's own AI Gateway gives agents one Cohesivity endpoint for approved OpenAI, Anthropic, Google, and xAI models. Tenant requests authenticate only with a Cohesivity application key or edge session token; Cohesivity handles provider routing, credential isolation, rate enforcement, usage logs, and wallet settlement. ## Prerequisites Provision this resource before use. Edge requests without provisioning will error. ### Provision curl -s -X POST https://cohesivity.ai/api/resources/ai-gateway \ -H "Authorization: Bearer " ### Delete curl -s -X DELETE https://cohesivity.ai/api/resources/ai-gateway \ -H "Authorization: Bearer " **Important:** Provision this resource now, before building or running the application. Provisioning is the agent's job, not the application's. ## Provider Model Docs Read provider model docs for model behavior, prompting, context windows, and image options when needed, but code only against Cohesivity's documented routes and model aliases below. Do not call provider URLs directly from tenant code. > **Server-side only.** `coh_application_key` is a secret. Call this from your `vercel-hosting` API routes, `cloudflare-workers`, or your own server tier — never from a browser, mobile app, or other client-side code. See the canonical key-secrecy directive in `.cohesivity` for details. ## Edge Usage - **Base URL:** https://cohesivity.ai/edge/ai-gateway - **Auth:** `coh_application_key` as the **key** query parameter, or a Cohesivity edge session token in `Authorization: Bearer ` - **Format:** OpenAI-compatible JSON request and response shapes for the supported endpoints below. Provider-specific responses may be normalized where Cohesivity needs consistent embeddings, image, usage, and billing data. - **Credential safety:** do not send provider keys. Cohesivity removes tenant `Authorization`, cookies, `key`, OpenAI headers, `x-api-key`, Google key headers, Anthropic headers, xAI auth headers, and gateway auth headers before applying Cohesivity-managed provider credentials. ## Supported Endpoints - `POST /v1/chat/completions` — chat completions for allowlisted openai, anthropic, google, and xai text models. Requests must set `max_tokens` or `max_completion_tokens`. - `POST /v1/embeddings` — embeddings for allowlisted openai and google embedding models. - `POST /v1/images/generations` — image generation for allowlisted openai, google, and xai image models after claim. Every other endpoint or method is rejected before model execution: model listing, `/responses`, `/messages`, `/ai/run`, audio, files, batches, realtime, edits, variations, vector stores, assistants, provider-native passthrough, arbitrary paths, and hidden-cost hosted features are blocked for launch. ## Launch Model Access - **Tenant model aliases:** every request model must use the company prefix `openai/`, `anthropic/`, `google/`, or `xai/`. Bare model names and provider-native route slugs are blocked on the tenant surface. - **Ephemeral:** request-capped, no wallet billing. Chat models: `openai/gpt-5-nano`, `anthropic/claude-haiku-4.5`, `google/gemini-3.1-flash-lite`, and `xai/grok-4.3`. Embeddings: `openai/text-embedding-3-small`, `openai/text-embedding-3-large`, and `google/gemini-embedding-2-preview`. Images are not available before claim. Lifetime caps are 100/25/100/25/100/25/25 respectively, with 5/2/5/2/5/2/2 per-minute bursts. - **Claimed Free:** wallet-fluid with per-minute bursts. Chat: `openai/gpt-5-nano`, `openai/gpt-5.4-nano`, `openai/gpt-5.4-mini`, `openai/gpt-5.4`, `anthropic/claude-haiku-4.5`, `anthropic/claude-sonnet-4.6`, `google/gemini-3.1-flash-lite`, `google/gemini-3.5-flash`, and `xai/grok-4.3`. Embeddings: `openai/text-embedding-3-small`, `openai/text-embedding-3-large`, `google/gemini-embedding-2-preview`, and `google/gemini-embedding-2`. Images: `openai/gpt-image-2`, `google/gemini-3.1-flash-image-preview`, `google/nano-banana-2`, and `xai/grok-imagine-image`. - **Claimed Plus / Pro:** wallet-fluid with higher per-minute bursts and the full launch allowlist. Adds `openai/gpt-5.5`, `anthropic/claude-opus-4.7`, `google/gemini-3.1-pro-preview`, `google/gemini-3-pro-image-preview`, `google/nano-banana-pro`, and `xai/grok-imagine-image-quality`. - Claimed AI Gateway has no lifetime or monthly request bucket; wallet balance plus the per-account UTC-minute burst policy are the governing limits. ## Common Mistakes - **Using provider-native endpoints or model slugs.** This offering supports only the three endpoint shapes above under `/edge/ai-gateway`, and model names must use Cohesivity's company-prefixed aliases. - **Omitting bounded output on chat.** Set `max_tokens` or `max_completion_tokens`; Cohesivity rejects unbounded text output before provider execution so wallet guard capacity is meaningful. - **Sending provider keys.** Tenant-supplied provider auth headers are stripped. Store no provider secrets in tenant apps. - **Streaming embeddings or images.** Streaming is enabled only for allowlisted chat-completions models with terminal usage; embeddings, images, and partial image delivery must be non-streaming for launch. - **Sizing xai images.** `xai/grok-imagine-image` and `xai/grok-imagine-image-quality` use their documented default image shape at launch; omit `size` and `resolution`. - **Requesting hidden-cost features.** Audio, files, hosted web/file search, code interpreter, computer use, batches, realtime, assistants, vector stores, edits, variations, and provider passthrough fields are blocked. ## Examples - openai chat: `POST https://cohesivity.ai/edge/ai-gateway/v1/chat/completions?key=` with body `{ "model": "openai/gpt-5-nano", "messages": [{"role":"user","content":"Write one haiku about infrastructure."}], "max_completion_tokens": 80 }` - streaming chat: same endpoint with any allowlisted chat model, for example `{ "model": "openai/gpt-5-nano", "messages": [{"role":"user","content":"Stream a short answer."}], "max_completion_tokens": 80, "stream": true }`. Cohesivity ensures terminal usage is included before finalizing billing. - anthropic chat: `POST https://cohesivity.ai/edge/ai-gateway/v1/chat/completions?key=` with body `{ "model": "anthropic/claude-haiku-4.5", "messages": [{"role":"user","content":"Summarize this release note."}], "max_tokens": 120 }` - google embeddings: `POST https://cohesivity.ai/edge/ai-gateway/v1/embeddings?key=` with body `{ "model": "google/gemini-embedding-2", "input": "Cohesivity gives agents infrastructure." }` - openai image after claim: `POST https://cohesivity.ai/edge/ai-gateway/v1/images/generations?key=` with body `{ "model": "openai/gpt-image-2", "prompt": "A polished product dashboard for usage analytics", "size": "1024x1024", "quality": "medium", "n": 1 }` - google image after claim: same image endpoint with body `{ "model": "google/nano-banana-2", "prompt": "A polished product dashboard for usage analytics", "n": 1 }` - xai image after claim: same image endpoint with body `{ "model": "xai/grok-imagine-image", "prompt": "A polished product dashboard for usage analytics", "n": 1 }` ## Streaming Allowlisted chat-completions models across openai, anthropic, google, and xai may stream at launch. Cohesivity forwards OpenAI-compatible SSE chunks to tenants and watches for terminal provider usage; if the stream ends, errors, or is canceled before usage arrives, the preflight reservation is revoked and no wallet debit is finalized. Embeddings, images, and partial image delivery stay non-streaming. ## Billing and Usage - Claimed AI Gateway usage is fluid-only after model-tier and rate checks. There is no fixed monthly AI Gateway request or token bucket. - Cohesivity records request counters, per-model burst counters, input/output token counters, cached token counters, image token counters, and recent-event metadata when providers return billable usage. Prompts, images, and raw request bodies are not stored in usage events. - Wallet debit is finalized only from parseable successful provider usage or cost. A successful billable response without parseable usage/cost returns a Cohesivity settlement error after revoking the preflight reservation instead of guessing a charge. - Failed provider responses do not burn quota or fluid; Cohesivity revokes the preflight counters synchronously before returning the provider failure. ## Launch Rate Limits Ephemeral tenants pause as a whole if any authoritative hard cap below is exceeded. Claimed tiers use account-scoped buckets shared across every project owned by the Cohesivity user; OpenAI, AI Gateway, Deepgram, and Exa are fluid-only after tier, rate, and concurrency checks; AI Gateway and Deepgram have no fixed monthly usage bucket for claimed tiers. **Ephemeral** - openai/gpt-5-nano requests: 100 per ephemeral tenant lifetime before claim or expiry - anthropic/claude-haiku-4.5 requests: 25 per ephemeral tenant lifetime before claim or expiry - google/gemini-3.1-flash-lite requests: 100 per ephemeral tenant lifetime before claim or expiry - xai/grok-4.3 requests: 25 per ephemeral tenant lifetime before claim or expiry - openai/text-embedding-3-small requests: 100 per ephemeral tenant lifetime before claim or expiry - openai/text-embedding-3-large requests: 25 per ephemeral tenant lifetime before claim or expiry - google/gemini-embedding requests: 25 per ephemeral tenant lifetime before claim or expiry - openai/gpt-5-nano requests: 5 per minute - anthropic/claude-haiku-4.5 requests: 2 per minute - google/gemini-3.1-flash-lite requests: 5 per minute - xai/grok-4.3 requests: 2 per minute - openai/text-embedding-3-small requests: 5 per minute - openai/text-embedding-3-large requests: 2 per minute - google/gemini-embedding requests: 2 per minute **Claimed Free** - openai/text-embedding-3-small requests: 60 per minute - openai/text-embedding-3-large requests: 20 per minute - openai/gpt-5-nano requests: 30 per minute - openai/gpt-5.4-nano requests: 30 per minute - openai/gpt-5.4-mini requests: 15 per minute - openai/gpt-5.4 requests: 5 per minute - openai/gpt-image-2 requests: 5 per minute - anthropic/claude-haiku-4.5 requests: 15 per minute - anthropic/claude-sonnet-4.6 requests: 5 per minute - google/gemini-embedding requests: 20 per minute - google/gemini-3.1-flash-lite requests: 30 per minute - google/gemini-3.5-flash requests: 10 per minute - google/gemini-flash-image requests: 5 per minute - xai/grok-4.3 requests: 15 per minute - xai/grok-imagine-image requests: 5 per minute **Claimed Plus** - openai/text-embedding-3-small requests: 300 per minute - openai/text-embedding-3-large requests: 100 per minute - openai/gpt-5-nano requests: 100 per minute - openai/gpt-5.4-nano requests: 100 per minute - openai/gpt-5.4-mini requests: 60 per minute - openai/gpt-5.4 requests: 20 per minute - openai/gpt-image-2 requests: 20 per minute - openai/gpt-5.5 requests: 10 per minute - anthropic/claude-haiku-4.5 requests: 60 per minute - anthropic/claude-sonnet-4.6 requests: 20 per minute - anthropic/claude-opus-4.7 requests: 10 per minute - google/gemini-embedding requests: 100 per minute - google/gemini-3.1-flash-lite requests: 100 per minute - google/gemini-3.5-flash requests: 30 per minute - google/gemini-flash-image requests: 20 per minute - google/gemini-3.1-pro-preview requests: 20 per minute - google/gemini-pro-image requests: 10 per minute - xai/grok-4.3 requests: 60 per minute - xai/grok-imagine-image requests: 20 per minute - xai/grok-imagine-image-quality requests: 10 per minute **Claimed Pro** - openai/text-embedding-3-small requests: 1000 per minute - openai/text-embedding-3-large requests: 300 per minute - openai/gpt-5-nano requests: 300 per minute - openai/gpt-5.4-nano requests: 300 per minute - openai/gpt-5.4-mini requests: 200 per minute - openai/gpt-5.4 requests: 60 per minute - openai/gpt-image-2 requests: 60 per minute - openai/gpt-5.5 requests: 40 per minute - anthropic/claude-haiku-4.5 requests: 200 per minute - anthropic/claude-sonnet-4.6 requests: 60 per minute - anthropic/claude-opus-4.7 requests: 40 per minute - google/gemini-embedding requests: 300 per minute - google/gemini-3.1-flash-lite requests: 300 per minute - google/gemini-3.5-flash requests: 100 per minute - google/gemini-flash-image requests: 60 per minute - google/gemini-3.1-pro-preview requests: 60 per minute - google/gemini-pro-image requests: 30 per minute - xai/grok-4.3 requests: 200 per minute - xai/grok-imagine-image requests: 60 per minute - xai/grok-imagine-image-quality requests: 30 per minute ### Notes - AI Gateway is fluid-only for claimed accounts after model-tier and per-minute checks. Ephemeral tenants get only the listed openai, anthropic, google, and xai starter models with lifetime and burst caps. - Cohesivity exposes only POST /v1/chat/completions, POST /v1/embeddings, and POST /v1/images/generations through Cohesivity AI Gateway. Allowlisted OpenAI, Anthropic, Google, and xAI chat models may stream when terminal usage is returned; model listing, provider-native passthrough, tenant-supplied provider keys, audio, files, batches, realtime, vector stores, assistants, edits, variations, embeddings streaming, and image streaming are blocked before model execution.