# Google Generative AI API (Gemini)

Proxies Google's Generative AI API. Text generation, image generation, vision (image analysis), streaming — no separate Google API key.

## Prerequisites

Provision this resource before use. Edge requests without provisioning will error.

### Provision

    curl -s -X POST https://cohesivity.ai/api/resources/google-generative-ai-api \
      -H "Authorization: Bearer <coh_management_key>"

### Delete

    curl -s -X DELETE https://cohesivity.ai/api/resources/google-generative-ai-api \
      -H "Authorization: Bearer <coh_management_key>"

**Important:** Provision this resource now, before building or running the application. Provisioning is the agent's job, not the application's.

## Common Mistakes

- **Using deprecated model.** `gemini-2.0-flash` is no longer available. Use `gemini-3-flash-preview` for text.
- **Not setting `responseMimeType` for JSON output.** Without `generationConfig.responseMimeType: "application/json"`, Gemini wraps JSON in markdown code blocks (```json ... ```). Set it explicitly for reliable JSON parsing.
- **Setting `maxOutputTokens` too low.** WARNING: Gemini 2.5 models use internal "thinking" tokens (300-1000) that are deducted from `maxOutputTokens` before any output is generated. Set `maxOutputTokens` to at least 1500 even for short responses. For large structured responses (multi-item lists, itineraries, detailed JSON), use 4000-10000. A value of 200 may produce only a few words. Truncated responses still show `finishReason: STOP` with no indication of truncation.

## Official Docs

https://ai.google.dev/gemini-api/docs

## Edge Usage

- **Base URL:** https://cohesivity.ai/edge/google-generative-ai-api
- **Auth:** `coh_application_key` as the **key** query parameter
- **Format:** same as Google's Gemini API

## Models

Recommended models by use case. All models available through `GET /v1beta/models` are supported.

**Text generation** (https://ai.google.dev/gemini-api/docs/text-generation):
- `gemini-3-flash-preview` — fast and intelligent
- `gemini-3.1-pro-preview` — most intelligent
- `gemini-3.1-flash-lite-preview` — fastest

**Image generation / editing** (https://ai.google.dev/gemini-api/docs/image-generation):
- `gemini-3.1-flash-image-preview` — fast image generation and editing
- `gemini-3-pro-image-preview` — higher quality image generation and editing

**Video generation** (https://ai.google.dev/gemini-api/docs/video):
- `veo-3.1-generate-preview` — video generation (use `predictLongRunning` method)
- `veo-3.1-fast-generate-preview` — faster video generation

**Text-to-speech** (https://ai.google.dev/gemini-api/docs/speech-generation):
- `gemini-2.5-pro-preview-tts` — higher quality TTS
- `gemini-2.5-flash-preview-tts` — faster TTS

**Live audio-to-audio** (https://ai.google.dev/gemini-api/docs/live-api):
- `gemini-2.5-flash-native-audio-latest` — real-time bidirectional audio (stable)
- `gemini-3.1-flash-live-preview` — latest preview (may have availability issues)

## Gemini Live (Real-time Audio)

Gemini Live uses a WebSocket connection for bidirectional audio streaming. Connect via:

    WS wss://cohesivity.ai/edge/google-generative-ai-api/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent?key=<coh_application_key>

The WebSocket relays directly to Google's Gemini Live API. Send and receive messages in the same format as the Google API — Cohesivity injects the API key and proxies all frames.

Setup message (send first after connecting):

    {
      "setup": {
        "model": "models/gemini-2.5-flash-native-audio-latest",
        "generationConfig": {
          "responseModalities": ["AUDIO"],
          "speechConfig": { "voiceConfig": { "prebuiltVoiceConfig": { "voiceName": "Puck" } } }
        }
      }
    }

After `setupComplete`, send text or audio:

    { "clientContent": { "turns": [{ "role": "user", "parts": [{ "text": "Hello" }] }], "turnComplete": true } }

Response includes text (thinking) + audio (inlineData with audio/pcm). See the official docs for the full protocol.

## Examples

- List models: `GET https://cohesivity.ai/edge/google-generative-ai-api/v1beta/models?key=<coh_application_key>`
- Generate text: `POST https://cohesivity.ai/edge/google-generative-ai-api/v1beta/models/gemini-2.5-flash:generateContent?key=<coh_application_key>` with body `{ "contents": [{ "parts": [{ "text": "Hello" }] }] }`
- Generate image: `POST https://cohesivity.ai/edge/google-generative-ai-api/v1beta/models/gemini-2.5-flash-image:generateContent?key=<coh_application_key>` with body `{ "contents": [{ "parts": [{ "text": "A landscape painting" }] }], "generationConfig": { "responseModalities": ["IMAGE", "TEXT"] } }`

## Image Generation

Use `gemini-2.5-flash-image` with `responseModalities: ["IMAGE", "TEXT"]` in generationConfig. The response includes `inlineData` with the image as base64 PNG.

Response parts:
- Text part: `{ "text": "Here is your image" }`
- Image part: `{ "inlineData": { "mimeType": "image/png", "data": "<base64>" } }`

To save the image, decode the base64 `data` field.

## Streaming

Streaming is supported. Use the `streamGenerateContent` action instead of `generateContent`.

- **Raw passthrough (default):** the proxy preserves the upstream response as-is. If you want Google SSE frames specifically, pass `?alt=sse`.
- **Normalized Cohesivity mode:** append `?coh_stream=normalized` to receive canonical SSE frames. Cohesivity automatically forces `alt=sse` upstream in this mode. Frame types:
  - `data: {"delta":"text chunk"}` — a text chunk from the model
  - `data: {"error":"error message"}` — an error from the model (rate limit, safety filter, etc.)
  - `data: {"done":true}` — stream complete (always the last frame)

## Response Format

The response is deeply nested. Extract the text from: `response.candidates[0].content.parts[0].text`

## JSON Output

To get reliable JSON from Gemini, set `generationConfig.responseMimeType: "application/json"` in the request body. Without this, Gemini wraps JSON in markdown code blocks.

## Token Budget

WARNING: Set `maxOutputTokens` to at least 1500 even for short responses. Gemini 2.5 models use internal "thinking" tokens (300-1000) that are deducted from this budget before any output is generated. A configured limit of 200 may produce only a few words.

Scaling guide: 1500 for simple responses, 3000-5000 for structured JSON (lists, objects), 6000-10000 for large outputs (multi-day itineraries, detailed reports). Truncated responses still show `finishReason: STOP` with no indication of truncation.