# OpenAI API

Proxies OpenAI Responses, embeddings, and image generation through Cohesivity. Cohesivity injects the upstream OpenAI API key server-side; tenant requests authenticate with the existing Cohesivity application key or an edge session token.

## Prerequisites

Provision this resource before use. Edge requests without provisioning will error.

### Provision

    curl -s -X POST https://cohesivity.ai/api/resources/openai-api \
      -H "Authorization: Bearer <coh_management_key>"

### Delete

    curl -s -X DELETE https://cohesivity.ai/api/resources/openai-api \
      -H "Authorization: Bearer <coh_management_key>"

**Important:** Provision this resource now, before building or running the application. Provisioning is the agent's job, not the application's.

## Official Docs

https://platform.openai.com/docs — read the relevant Responses, embeddings, and image generation docs before coding.

> **Server-side only.** `coh_application_key` is a secret. Call this from your `vercel-hosting` API routes, `cloudflare-workers`, or your own server tier — never from a browser, mobile app, or other client-side code. See the canonical key-secrecy directive in `.cohesivity` for details.

## Edge Usage

- **Base URL:** https://cohesivity.ai/edge/openai-api
- **Auth:** `coh_application_key` as the **key** query parameter, or a Cohesivity edge session token in `Authorization: Bearer <token>`
- **Format:** OpenAI-compatible JSON request and response shapes for the supported endpoints below
- **Upstream auth safety:** do not send an OpenAI key. Cohesivity removes tenant-supplied `Authorization`, cookies, `key`, and OpenAI organization/project/beta headers before adding platform upstream auth.

## Supported Endpoints

- `GET /v1/models` — returns a Cohesivity-synthesized OpenAI-compatible model list for the tenant tier
- `GET /v1/models/<model>` — returns a synthesized model object only when that model is available to the tenant tier
- `POST /v1/responses` — text Responses with allowed GPT-5-family models, function tools, and the gated `image_generation` hosted tool
- `POST /v1/embeddings` — `text-embedding-3-small` and, after claim, `text-embedding-3-large`
- `POST /v1/images/generations` — direct JSON image generation with `model: "gpt-image-2"` after claim

Endpoint calls outside this list are rejected before upstream. POST endpoints must use a JSON object body so Cohesivity can inspect model, tool, streaming, and service-tier fields before forwarding.

## Launch Model Access

- **Ephemeral:** Responses text `gpt-5-nano` only; embeddings `text-embedding-3-small` only; no image generation. Each allowed OpenAI family has a 100-request tenant-lifetime hard cap and 5/minute burst cap.
- **Claimed Free:** Responses text `gpt-5-nano`, `gpt-5-mini`, `gpt-5.4-nano`, `gpt-5.4-mini`; embeddings `text-embedding-3-small` and `text-embedding-3-large`; image generation `gpt-image-2`.
- **Claimed Plus / Pro:** all Free OpenAI models plus `gpt-5`, `gpt-5.1`, `gpt-5.2`, `gpt-5.4`, and `gpt-5.5`, with higher per-minute limits.
- `GET /v1/models` is the source of truth for base model IDs available to the current tenant. Exact dated snapshots of an allowed text slug, such as `gpt-5-nano-YYYY-MM-DD`, may also pass policy.
- Denied model families stay denied for every tier: IDs containing `pro`, `codex`, `chat`, `chatgpt`, `search`, `realtime`, `audio`, `tts`, `transcrib`, `moderation`, or `video` are blocked unless a future Cohesivity launch contract explicitly adds them.
- Non-default `service_tier`, `priority`, and `data_residency` variants are blocked for launch. Omit those fields, or leave `service_tier` as `auto` / `default`.

## Common Mistakes

- **Calling Chat Completions or another OpenAI endpoint.** This offering supports only the five endpoint shapes listed above. Use `POST /v1/responses` for text generation.
- **Using a model not returned by `GET /v1/models` or an exact dated snapshot of one of those text models.** The model catalog is tier-filtered. A model that is valid on Plus or Pro may still be blocked on Ephemeral or Claimed Free tenants.
- **Sending a non-JSON POST body.** Responses, embeddings, and image generation requests must use a JSON object body so Cohesivity can enforce model, tool, streaming, and billing policy.
- **Putting `gpt-image-2` in the top-level Responses `model`.** Use an allowed text model at the top level and put `gpt-image-2` only inside the `image_generation` tool.
- **Omitting the image tool model.** Responses `image_generation` tools must explicitly set `model: "gpt-image-2"` so upstream default-model changes cannot alter cost or behavior.
- **Streaming images or embeddings.** Streaming is supported only for `POST /v1/responses` with `stream: true`; image-generation and embeddings streaming are blocked.
- **Requesting transparent image backgrounds.** `background: "transparent"` is blocked for `gpt-image-2`; use an opaque background or omit the field.

## Responses API Tools

- The top-level `model` must be an allowed text model for the tenant tier. `gpt-image-2` is never valid as the top-level Responses model.
- Function tools (`type: "function"`) may pass through because they do not create OpenAI-hosted external cost by themselves.
- Hosted `image_generation` tools are allowed only after claim and must set exactly `model: "gpt-image-2"`. Omitted image tool models are blocked to avoid upstream default-model roulette.
- `background: "transparent"` is blocked for `gpt-image-2`; use an opaque background or omit the field.
- Hosted web search, file search, code interpreter, computer use, shell/container, MCP/connectors, and all other hosted tools are blocked for launch.

## Examples

- List allowed models: `GET https://cohesivity.ai/edge/openai-api/v1/models?key=<coh_application_key>`
- Model detail: `GET https://cohesivity.ai/edge/openai-api/v1/models/gpt-5-nano?key=<coh_application_key>`
- Responses text: `POST https://cohesivity.ai/edge/openai-api/v1/responses?key=<coh_application_key>` with body `{ "model": "gpt-5-nano", "input": "Write one haiku about infrastructure." }`
- Responses streaming: `POST https://cohesivity.ai/edge/openai-api/v1/responses?key=<coh_application_key>` with body `{ "model": "gpt-5-nano", "input": "Stream a short answer.", "stream": true }`
- Embeddings: `POST https://cohesivity.ai/edge/openai-api/v1/embeddings?key=<coh_application_key>` with body `{ "model": "text-embedding-3-small", "input": "Cohesivity gives agents infrastructure." }`
- Direct image generation after claim: `POST https://cohesivity.ai/edge/openai-api/v1/images/generations?key=<coh_application_key>` with body `{ "model": "gpt-image-2", "prompt": "A polished product dashboard for usage analytics", "size": "1024x1024", "quality": "medium", "n": 1 }`
- Responses image tool after claim: `POST https://cohesivity.ai/edge/openai-api/v1/responses?key=<coh_application_key>` with body `{ "model": "gpt-5-mini", "input": "Create one product hero image.", "tools": [{ "type": "image_generation", "model": "gpt-image-2", "size": "1024x1024" }] }`

## Image Generation

Image generation is available only after claim. Use `POST /v1/images/generations` with `model: "gpt-image-2"` for direct image requests, or use a Responses `image_generation` tool with the same exact tool model. Ephemeral tenants must claim before using either image path. Transparent backgrounds and image-generation streaming are blocked.

## Streaming

Streaming is supported only on `POST /v1/responses` with `stream: true`. Cohesivity forwards the OpenAI SSE stream as-is and watches for a terminal `response.completed` or `response.incomplete` event with `response.usage`. If the stream ends, errors, or is canceled before terminal usage arrives, the preflight reservation is revoked and no wallet debit is finalized.

## Response Format

Non-streaming billable responses are returned in OpenAI-compatible JSON shapes. Responses text may appear in `output_text` when OpenAI includes it; otherwise inspect the `output` content parts. Embeddings responses return `data[]` entries with vectors. Direct image-generation responses return the upstream image payload in `data[]`; Responses image-tool output stays in the Responses output structure.

## Structured Output

Responses request fields supported by OpenAI can pass through when they do not conflict with Cohesivity policy. Structured-output options are therefore allowed behind the same model, tool, endpoint, streaming, and service-tier checks described above.

## Billing and Usage

- Claimed OpenAI usage is fluid-only after tier and rate checks. There is no fixed monthly OpenAI token bucket.
- Cohesivity records request counters, endpoint/model-family counters, input/output token counters, cached token counters, and image token counters when OpenAI returns them. Prompts, images, and raw request bodies are not stored in usage events.
- Wallet debit is finalized only from parseable OpenAI returned `usage`. A successful billable non-streaming response without parseable usage returns a Cohesivity settlement error after revoking the preflight reservation instead of guessing a charge.
- Failed upstream responses do not burn quota or fluid; Cohesivity revokes the preflight counters synchronously before returning the upstream failure.

## Launch Rate Limits

Ephemeral tenants pause as a whole if any authoritative hard cap below is exceeded. Claimed tiers use account-scoped buckets shared across every project owned by the Cohesivity user; OpenAI, Deepgram, and Exa are fluid-only after tier, rate, and concurrency checks; Deepgram has no fixed monthly usage bucket for claimed tiers.

**Ephemeral**

- GPT-5 nano Responses requests: 100 per ephemeral tenant lifetime before claim or expiry
- text-embedding-3-small requests: 100 per ephemeral tenant lifetime before claim or expiry
- GPT-5 nano Responses requests: 5 per minute
- text-embedding-3-small requests: 5 per minute

**Claimed Free**

- GPT-5 nano Responses requests: 30 per minute
- GPT-5.4 nano Responses requests: 30 per minute
- GPT-5 mini Responses requests: 15 per minute
- GPT-5.4 mini Responses requests: 15 per minute
- text-embedding-3-small requests: 60 per minute
- text-embedding-3-large requests: 20 per minute
- image generation requests: 5 per minute

**Claimed Plus**

- GPT-5 nano Responses requests: 100 per minute
- GPT-5.4 nano Responses requests: 100 per minute
- GPT-5 mini Responses requests: 60 per minute
- GPT-5.4 mini Responses requests: 60 per minute
- GPT-5 / 5.1 / 5.2 Responses requests: 30 per minute
- GPT-5.4 Responses requests: 20 per minute
- GPT-5.5 Responses requests: 10 per minute
- text-embedding-3-small requests: 300 per minute
- text-embedding-3-large requests: 100 per minute
- image generation requests: 20 per minute

**Claimed Pro**

- GPT-5 nano Responses requests: 300 per minute
- GPT-5.4 nano Responses requests: 300 per minute
- GPT-5 mini Responses requests: 200 per minute
- GPT-5.4 mini Responses requests: 200 per minute
- GPT-5 / 5.1 / 5.2 Responses requests: 100 per minute
- GPT-5.4 Responses requests: 60 per minute
- GPT-5.5 Responses requests: 40 per minute
- text-embedding-3-small requests: 1000 per minute
- text-embedding-3-large requests: 300 per minute
- image generation requests: 60 per minute

### Notes

- OpenAI is fluid-only for claimed accounts after tier and per-minute checks. Ephemeral tenants get only GPT-5 nano Responses text and text-embedding-3-small, each capped to 100 lifetime requests and 5/minute.
- Streaming Responses are allowed and settle only from terminal OpenAI usage events. Image streaming, chat completions, audio, realtime, files, vector stores, assistants, batch, fine-tuning, web search, code interpreter, computer use, MCP/connectors, and shell/container tools are blocked for launch.