# OpenAI API Proxies OpenAI Responses, embeddings, and image generation through Cohesivity. Cohesivity injects the upstream OpenAI API key server-side; tenant requests authenticate with the existing Cohesivity application key or an edge session token. ## Prerequisites Provision this resource before use. Edge requests without provisioning will error. ### Provision curl -s -X POST https://cohesivity.ai/api/resources/openai-api \ -H "Authorization: Bearer " ### Delete curl -s -X DELETE https://cohesivity.ai/api/resources/openai-api \ -H "Authorization: Bearer " **Important:** Provision this resource now, before building or running the application. Provisioning is the agent's job, not the application's. ## Official Docs https://platform.openai.com/docs — read the relevant Responses, embeddings, and image generation docs before coding. > **Server-side only.** `coh_application_key` is a secret. Call this from your `vercel-hosting` API routes, `cloudflare-workers`, or your own server tier — never from a browser, mobile app, or other client-side code. See the canonical key-secrecy directive in `.cohesivity` for details. ## Edge Usage - **Base URL:** https://cohesivity.ai/edge/openai-api - **Auth:** `coh_application_key` as the **key** query parameter, or a Cohesivity edge session token in `Authorization: Bearer ` - **Format:** OpenAI-compatible JSON request and response shapes for the supported endpoints below - **Upstream auth safety:** do not send an OpenAI key. Cohesivity removes tenant-supplied `Authorization`, cookies, `key`, and OpenAI organization/project/beta headers before adding platform upstream auth. ## Supported Endpoints - `GET /v1/models` — returns a Cohesivity-synthesized OpenAI-compatible model list for the tenant tier - `GET /v1/models/` — returns a synthesized model object only when that model is available to the tenant tier - `POST /v1/responses` — text Responses with allowed GPT-5-family models, function tools, and the gated `image_generation` hosted tool - `POST /v1/embeddings` — `text-embedding-3-small` and, after claim, `text-embedding-3-large` - `POST /v1/images/generations` — direct JSON image generation with `model: "gpt-image-2"` after claim Endpoint calls outside this list are rejected before upstream. POST endpoints must use a JSON object body so Cohesivity can inspect model, tool, streaming, and service-tier fields before forwarding. ## Launch Model Access - **Ephemeral:** Responses text `gpt-5-nano` only; embeddings `text-embedding-3-small` only; no image generation. Each allowed OpenAI family has a 100-request tenant-lifetime hard cap and 5/minute burst cap. - **Claimed Free:** Responses text `gpt-5-nano`, `gpt-5-mini`, `gpt-5.4-nano`, `gpt-5.4-mini`; embeddings `text-embedding-3-small` and `text-embedding-3-large`; image generation `gpt-image-2`. - **Claimed Plus / Pro:** all Free OpenAI models plus `gpt-5`, `gpt-5.1`, `gpt-5.2`, `gpt-5.4`, and `gpt-5.5`, with higher per-minute limits. - `GET /v1/models` is the source of truth for base model IDs available to the current tenant. Exact dated snapshots of an allowed text slug, such as `gpt-5-nano-YYYY-MM-DD`, may also pass policy. - Denied model families stay denied for every tier: IDs containing `pro`, `codex`, `chat`, `chatgpt`, `search`, `realtime`, `audio`, `tts`, `transcrib`, `moderation`, or `video` are blocked unless a future Cohesivity launch contract explicitly adds them. - Non-default `service_tier`, `priority`, and `data_residency` variants are blocked for launch. Omit those fields, or leave `service_tier` as `auto` / `default`. ## Common Mistakes - **Calling Chat Completions or another OpenAI endpoint.** This offering supports only the five endpoint shapes listed above. Use `POST /v1/responses` for text generation. - **Using a model not returned by `GET /v1/models` or an exact dated snapshot of one of those text models.** The model catalog is tier-filtered. A model that is valid on Plus or Pro may still be blocked on Ephemeral or Claimed Free tenants. - **Sending a non-JSON POST body.** Responses, embeddings, and image generation requests must use a JSON object body so Cohesivity can enforce model, tool, streaming, and billing policy. - **Putting `gpt-image-2` in the top-level Responses `model`.** Use an allowed text model at the top level and put `gpt-image-2` only inside the `image_generation` tool. - **Omitting the image tool model.** Responses `image_generation` tools must explicitly set `model: "gpt-image-2"` so upstream default-model changes cannot alter cost or behavior. - **Streaming images or embeddings.** Streaming is supported only for `POST /v1/responses` with `stream: true`; image-generation and embeddings streaming are blocked. - **Requesting transparent image backgrounds.** `background: "transparent"` is blocked for `gpt-image-2`; use an opaque background or omit the field. ## Responses API Tools - The top-level `model` must be an allowed text model for the tenant tier. `gpt-image-2` is never valid as the top-level Responses model. - Function tools (`type: "function"`) may pass through because they do not create OpenAI-hosted external cost by themselves. - Hosted `image_generation` tools are allowed only after claim and must set exactly `model: "gpt-image-2"`. Omitted image tool models are blocked to avoid upstream default-model roulette. - `background: "transparent"` is blocked for `gpt-image-2`; use an opaque background or omit the field. - Hosted web search, file search, code interpreter, computer use, shell/container, MCP/connectors, and all other hosted tools are blocked for launch. ## Examples - List allowed models: `GET https://cohesivity.ai/edge/openai-api/v1/models?key=` - Model detail: `GET https://cohesivity.ai/edge/openai-api/v1/models/gpt-5-nano?key=` - Responses text: `POST https://cohesivity.ai/edge/openai-api/v1/responses?key=` with body `{ "model": "gpt-5-nano", "input": "Write one haiku about infrastructure." }` - Responses streaming: `POST https://cohesivity.ai/edge/openai-api/v1/responses?key=` with body `{ "model": "gpt-5-nano", "input": "Stream a short answer.", "stream": true }` - Embeddings: `POST https://cohesivity.ai/edge/openai-api/v1/embeddings?key=` with body `{ "model": "text-embedding-3-small", "input": "Cohesivity gives agents infrastructure." }` - Direct image generation after claim: `POST https://cohesivity.ai/edge/openai-api/v1/images/generations?key=` with body `{ "model": "gpt-image-2", "prompt": "A polished product dashboard for usage analytics", "size": "1024x1024", "quality": "medium", "n": 1 }` - Responses image tool after claim: `POST https://cohesivity.ai/edge/openai-api/v1/responses?key=` with body `{ "model": "gpt-5-mini", "input": "Create one product hero image.", "tools": [{ "type": "image_generation", "model": "gpt-image-2", "size": "1024x1024" }] }` ## Image Generation Image generation is available only after claim. Use `POST /v1/images/generations` with `model: "gpt-image-2"` for direct image requests, or use a Responses `image_generation` tool with the same exact tool model. Ephemeral tenants must claim before using either image path. Transparent backgrounds and image-generation streaming are blocked. ## Streaming Streaming is supported only on `POST /v1/responses` with `stream: true`. Cohesivity forwards the OpenAI SSE stream as-is and watches for a terminal `response.completed` or `response.incomplete` event with `response.usage`. If the stream ends, errors, or is canceled before terminal usage arrives, the preflight reservation is revoked and no wallet debit is finalized. ## Response Format Non-streaming billable responses are returned in OpenAI-compatible JSON shapes. Responses text may appear in `output_text` when OpenAI includes it; otherwise inspect the `output` content parts. Embeddings responses return `data[]` entries with vectors. Direct image-generation responses return the upstream image payload in `data[]`; Responses image-tool output stays in the Responses output structure. ## Structured Output Responses request fields supported by OpenAI can pass through when they do not conflict with Cohesivity policy. Structured-output options are therefore allowed behind the same model, tool, endpoint, streaming, and service-tier checks described above. ## Billing and Usage - Claimed OpenAI usage is fluid-only after tier and rate checks. There is no fixed monthly OpenAI token bucket. - Cohesivity records request counters, endpoint/model-family counters, input/output token counters, cached token counters, and image token counters when OpenAI returns them. Prompts, images, and raw request bodies are not stored in usage events. - Wallet debit is finalized only from parseable OpenAI returned `usage`. A successful billable non-streaming response without parseable usage returns a Cohesivity settlement error after revoking the preflight reservation instead of guessing a charge. - Failed upstream responses do not burn quota or fluid; Cohesivity revokes the preflight counters synchronously before returning the upstream failure. ## Launch Rate Limits Ephemeral tenants pause as a whole if any authoritative hard cap below is exceeded. Claimed tiers use account-scoped buckets shared across every project owned by the Cohesivity user; OpenAI, Deepgram, and Exa are fluid-only after tier, rate, and concurrency checks; Deepgram has no fixed monthly usage bucket for claimed tiers. **Ephemeral** - GPT-5 nano Responses requests: 100 per ephemeral tenant lifetime before claim or expiry - text-embedding-3-small requests: 100 per ephemeral tenant lifetime before claim or expiry - GPT-5 nano Responses requests: 5 per minute - text-embedding-3-small requests: 5 per minute **Claimed Free** - GPT-5 nano Responses requests: 30 per minute - GPT-5.4 nano Responses requests: 30 per minute - GPT-5 mini Responses requests: 15 per minute - GPT-5.4 mini Responses requests: 15 per minute - text-embedding-3-small requests: 60 per minute - text-embedding-3-large requests: 20 per minute - image generation requests: 5 per minute **Claimed Plus** - GPT-5 nano Responses requests: 100 per minute - GPT-5.4 nano Responses requests: 100 per minute - GPT-5 mini Responses requests: 60 per minute - GPT-5.4 mini Responses requests: 60 per minute - GPT-5 / 5.1 / 5.2 Responses requests: 30 per minute - GPT-5.4 Responses requests: 20 per minute - GPT-5.5 Responses requests: 10 per minute - text-embedding-3-small requests: 300 per minute - text-embedding-3-large requests: 100 per minute - image generation requests: 20 per minute **Claimed Pro** - GPT-5 nano Responses requests: 300 per minute - GPT-5.4 nano Responses requests: 300 per minute - GPT-5 mini Responses requests: 200 per minute - GPT-5.4 mini Responses requests: 200 per minute - GPT-5 / 5.1 / 5.2 Responses requests: 100 per minute - GPT-5.4 Responses requests: 60 per minute - GPT-5.5 Responses requests: 40 per minute - text-embedding-3-small requests: 1000 per minute - text-embedding-3-large requests: 300 per minute - image generation requests: 60 per minute ### Notes - OpenAI is fluid-only for claimed accounts after tier and per-minute checks. Ephemeral tenants get only GPT-5 nano Responses text and text-embedding-3-small, each capped to 100 lifetime requests and 5/minute. - Streaming Responses are allowed and settle only from terminal OpenAI usage events. Image streaming, chat completions, audio, realtime, files, vector stores, assistants, batch, fine-tuning, web search, code interpreter, computer use, MCP/connectors, and shell/container tools are blocked for launch.