# AI Gateway

Cohesivity's own AI Gateway gives agents one Cohesivity endpoint for approved OpenAI, Anthropic, Google, and xAI models. Tenant requests authenticate only with a Cohesivity application key or edge session token; Cohesivity handles provider routing, credential isolation, rate enforcement, usage logs, and wallet settlement.

## Prerequisites

Edge calls require this resource to be provisioned first; unprovisioned calls return an error.

### Provision

    curl -s -X POST https://cohesivity.ai/api/resources/ai-gateway \
      -H "Authorization: Bearer <coh_management_key>"

### Delete

    curl -s -X DELETE https://cohesivity.ai/api/resources/ai-gateway \
      -H "Authorization: Bearer <coh_management_key>"

Provisioning happens once, before the application runs; the running application does not provision its own resources.

## Provider Model Docs

Provider model docs cover model behavior, prompting, context windows, and image options. Cohesivity's documented routes and model aliases below define the supported interface; provider URLs are outside it.

> **Server-side only.** `coh_application_key` is a secret; browser JS, mobile bundles, and other client-side code cannot hold it safely. This call belongs in a Railway-hosted server, `cloudflare-workers`, or your own server tier. See the canonical key-secrecy directive in `.cohesivity` for details.

## Edge Usage

- **Base URL:** https://cohesivity.ai/edge/ai-gateway
- **Auth:** `coh_application_key` as the **key** query parameter, or a Cohesivity edge session token in `Authorization: Bearer <token>`
- **Format:** OpenAI-compatible JSON request and response shapes for the supported endpoints below. Provider-specific responses may be normalized where Cohesivity needs consistent embeddings, image, usage, and billing data.
- **Credential safety:** provider keys are unnecessary. Cohesivity removes tenant `Authorization`, cookies, `key`, OpenAI headers, `x-api-key`, Google key headers, Anthropic headers, xAI auth headers, and gateway auth headers before applying Cohesivity-managed provider credentials.

## Supported Endpoints

- `POST /v1/chat/completions`: chat completions for allowlisted openai, anthropic, google, and xai text models. Requests must set `max_tokens` or `max_completion_tokens`.
- `POST /v1/embeddings`: embeddings for allowlisted openai and google embedding models.
- `POST /v1/images/generations`: image generation for allowlisted openai, google, and xai image models after claim.

Every other endpoint or method is rejected before model execution: model listing, `/responses`, `/messages`, `/ai/run`, audio, files, batches, realtime, edits, variations, vector stores, assistants, provider-native passthrough, arbitrary paths, and hidden-cost hosted features are blocked for launch.

## Launch Model Access

- **Tenant model aliases:** every request model must use the company prefix `openai/`, `anthropic/`, `google/`, or `xai/`. Bare model names and provider-native route slugs are blocked on the tenant surface.
- **Ephemeral:** request-capped, no wallet billing. Chat models: `openai/gpt-5-nano`, `anthropic/claude-haiku-4.5`, `google/gemini-3.1-flash-lite`, and `xai/grok-4.3`. Embeddings: `openai/text-embedding-3-small`, `openai/text-embedding-3-large`, and `google/gemini-embedding-2-preview`. Images are not available before claim. Lifetime caps are 100/25/100/25/100/25/25 respectively, with 5/2/5/2/5/2/2 per-minute bursts.
- **Claimed Free:** wallet-fluid with per-minute bursts. Chat: `openai/gpt-5-nano`, `openai/gpt-5.4-nano`, `openai/gpt-5.4-mini`, `openai/gpt-5.4`, `anthropic/claude-haiku-4.5`, `anthropic/claude-sonnet-4.6`, `google/gemini-3.1-flash-lite`, `google/gemini-3.5-flash`, and `xai/grok-4.3`. Embeddings: `openai/text-embedding-3-small`, `openai/text-embedding-3-large`, `google/gemini-embedding-2-preview`, and `google/gemini-embedding-2`. Images: `openai/gpt-image-2`, `google/gemini-3.1-flash-image-preview`, `google/nano-banana-2`, and `xai/grok-imagine-image`.
- **Claimed Plus / Pro:** wallet-fluid with higher per-minute bursts and the full launch allowlist. Adds `openai/gpt-5.5`, `anthropic/claude-opus-4.7`, `google/gemini-3.1-pro-preview`, `google/gemini-3-pro-image-preview`, `google/nano-banana-pro`, and `xai/grok-imagine-image-quality`.
- Claimed AI Gateway has no lifetime or monthly request bucket; wallet balance plus the per-account UTC-minute burst policy are the governing limits.

## Common Mistakes

- **Using provider-native endpoints or model slugs.** This offering supports only the three endpoint shapes above under `/edge/ai-gateway`, and model names require Cohesivity's company-prefixed aliases.
- **Omitting bounded output on chat.** Chat completions need `max_tokens` or `max_completion_tokens` set; Cohesivity rejects unbounded text output before provider execution so wallet guard capacity stays meaningful.
- **Sending provider keys.** Tenant-supplied provider auth headers are stripped, so provider secrets serve no purpose in tenant apps.
- **Streaming embeddings or images.** Streaming is enabled only for allowlisted chat-completions models with terminal usage; embeddings, images, and partial image delivery must be non-streaming for launch.
- **Sizing xai images.** `xai/grok-imagine-image` and `xai/grok-imagine-image-quality` use their documented default image shape at launch; `size` and `resolution` are not supported yet.
- **Requesting hidden-cost features.** Audio, files, hosted web/file search, code interpreter, computer use, batches, realtime, assistants, vector stores, edits, variations, and provider passthrough fields are blocked.

## Examples

- openai chat: `POST https://cohesivity.ai/edge/ai-gateway/v1/chat/completions?key=<coh_application_key>` with body `{ "model": "openai/gpt-5-nano", "messages": [{"role":"user","content":"Write one haiku about infrastructure."}], "max_completion_tokens": 80 }`
- streaming chat: same endpoint with any allowlisted chat model, for example `{ "model": "openai/gpt-5-nano", "messages": [{"role":"user","content":"Stream a short answer."}], "max_completion_tokens": 80, "stream": true }`. Cohesivity ensures terminal usage is included before finalizing billing.
- anthropic chat: `POST https://cohesivity.ai/edge/ai-gateway/v1/chat/completions?key=<coh_application_key>` with body `{ "model": "anthropic/claude-haiku-4.5", "messages": [{"role":"user","content":"Summarize this release note."}], "max_tokens": 120 }`
- google embeddings: `POST https://cohesivity.ai/edge/ai-gateway/v1/embeddings?key=<coh_application_key>` with body `{ "model": "google/gemini-embedding-2", "input": "Cohesivity gives agents infrastructure." }`
- openai image after claim: `POST https://cohesivity.ai/edge/ai-gateway/v1/images/generations?key=<coh_application_key>` with body `{ "model": "openai/gpt-image-2", "prompt": "A polished product dashboard for usage analytics", "size": "1024x1024", "quality": "medium", "n": 1 }`
- google image after claim: same image endpoint with body `{ "model": "google/nano-banana-2", "prompt": "A polished product dashboard for usage analytics", "n": 1 }`
- xai image after claim: same image endpoint with body `{ "model": "xai/grok-imagine-image", "prompt": "A polished product dashboard for usage analytics", "n": 1 }`

## Streaming

Allowlisted chat-completions models across openai, anthropic, google, and xai may stream at launch. Cohesivity forwards OpenAI-compatible SSE chunks to tenants and watches for terminal provider usage; if the stream ends, errors, or is canceled before usage arrives, the preflight reservation is revoked and no wallet debit is finalized. Embeddings, images, and partial image delivery stay non-streaming.

## Billing and Usage

- Claimed AI Gateway usage is fluid-only after model-tier and rate checks. There is no fixed monthly AI Gateway request or token bucket.
- Cohesivity records request counters, per-model burst counters, input/output token counters, cached token counters, image token counters, and recent-event metadata when providers return billable usage. Prompts, images, and raw request bodies are not stored in usage events.
- Wallet debit is finalized only from parseable successful provider usage or cost. A successful billable response without parseable usage/cost returns a Cohesivity settlement error after revoking the preflight reservation instead of guessing a charge.
- Failed provider responses do not burn quota or fluid; Cohesivity revokes the preflight counters synchronously before returning the provider failure.

## Rate Limits

Ephemeral tenants pause as a whole if any authoritative hard cap below is exceeded. Claimed tiers use account-scoped buckets shared across every project owned by the Cohesivity user; OpenAI, AI Gateway, Deepgram, and Exa are fluid-only after tier, rate, and concurrency checks; AI Gateway and Deepgram have no fixed monthly usage bucket for claimed tiers.

**Ephemeral**

- openai/gpt-5-nano requests: 100 per ephemeral tenant lifetime before claim or expiry
- anthropic/claude-haiku-4.5 requests: 25 per ephemeral tenant lifetime before claim or expiry
- google/gemini-3.1-flash-lite requests: 100 per ephemeral tenant lifetime before claim or expiry
- xai/grok-4.3 requests: 25 per ephemeral tenant lifetime before claim or expiry
- openai/text-embedding-3-small requests: 100 per ephemeral tenant lifetime before claim or expiry
- openai/text-embedding-3-large requests: 25 per ephemeral tenant lifetime before claim or expiry
- google/gemini-embedding requests: 25 per ephemeral tenant lifetime before claim or expiry
- openai/gpt-5-nano requests: 5 per minute
- anthropic/claude-haiku-4.5 requests: 2 per minute
- google/gemini-3.1-flash-lite requests: 5 per minute
- xai/grok-4.3 requests: 2 per minute
- openai/text-embedding-3-small requests: 5 per minute
- openai/text-embedding-3-large requests: 2 per minute
- google/gemini-embedding requests: 2 per minute

**Claimed Free**

- openai/text-embedding-3-small requests: 60 per minute
- openai/text-embedding-3-large requests: 20 per minute
- openai/gpt-5-nano requests: 30 per minute
- openai/gpt-5.4-nano requests: 30 per minute
- openai/gpt-5.4-mini requests: 15 per minute
- openai/gpt-5.4 requests: 5 per minute
- openai/gpt-image-2 requests: 5 per minute
- anthropic/claude-haiku-4.5 requests: 15 per minute
- anthropic/claude-sonnet-4.6 requests: 5 per minute
- google/gemini-embedding requests: 20 per minute
- google/gemini-3.1-flash-lite requests: 30 per minute
- google/gemini-3.5-flash requests: 10 per minute
- google/gemini-flash-image requests: 5 per minute
- xai/grok-4.3 requests: 15 per minute
- xai/grok-imagine-image requests: 5 per minute

**Claimed Plus**

- openai/text-embedding-3-small requests: 300 per minute
- openai/text-embedding-3-large requests: 100 per minute
- openai/gpt-5-nano requests: 100 per minute
- openai/gpt-5.4-nano requests: 100 per minute
- openai/gpt-5.4-mini requests: 60 per minute
- openai/gpt-5.4 requests: 20 per minute
- openai/gpt-image-2 requests: 20 per minute
- openai/gpt-5.5 requests: 10 per minute
- anthropic/claude-haiku-4.5 requests: 60 per minute
- anthropic/claude-sonnet-4.6 requests: 20 per minute
- anthropic/claude-opus-4.7 requests: 10 per minute
- google/gemini-embedding requests: 100 per minute
- google/gemini-3.1-flash-lite requests: 100 per minute
- google/gemini-3.5-flash requests: 30 per minute
- google/gemini-flash-image requests: 20 per minute
- google/gemini-3.1-pro-preview requests: 20 per minute
- google/gemini-pro-image requests: 10 per minute
- xai/grok-4.3 requests: 60 per minute
- xai/grok-imagine-image requests: 20 per minute
- xai/grok-imagine-image-quality requests: 10 per minute

**Claimed Pro**

- openai/text-embedding-3-small requests: 1000 per minute
- openai/text-embedding-3-large requests: 300 per minute
- openai/gpt-5-nano requests: 300 per minute
- openai/gpt-5.4-nano requests: 300 per minute
- openai/gpt-5.4-mini requests: 200 per minute
- openai/gpt-5.4 requests: 60 per minute
- openai/gpt-image-2 requests: 60 per minute
- openai/gpt-5.5 requests: 40 per minute
- anthropic/claude-haiku-4.5 requests: 200 per minute
- anthropic/claude-sonnet-4.6 requests: 60 per minute
- anthropic/claude-opus-4.7 requests: 40 per minute
- google/gemini-embedding requests: 300 per minute
- google/gemini-3.1-flash-lite requests: 300 per minute
- google/gemini-3.5-flash requests: 100 per minute
- google/gemini-flash-image requests: 60 per minute
- google/gemini-3.1-pro-preview requests: 60 per minute
- google/gemini-pro-image requests: 30 per minute
- xai/grok-4.3 requests: 200 per minute
- xai/grok-imagine-image requests: 60 per minute
- xai/grok-imagine-image-quality requests: 30 per minute

### Notes

- AI Gateway is fluid-only for claimed accounts after model-tier and per-minute checks. Ephemeral tenants get only the listed openai, anthropic, google, and xai starter models with lifetime and burst caps.
- Cohesivity exposes only POST /v1/chat/completions, POST /v1/embeddings, and POST /v1/images/generations through Cohesivity AI Gateway. Allowlisted OpenAI, Anthropic, Google, and xAI chat models may stream when terminal usage is returned; model listing, provider-native passthrough, tenant-supplied provider keys, audio, files, batches, realtime, vector stores, assistants, edits, variations, embeddings streaming, and image streaming are blocked before model execution.