# Google Generative AI API (Gemini) Proxies Google's Generative AI API. Text generation, image generation, vision (image analysis), streaming — no separate Google API key. ## Prerequisites Provision this resource before use. Edge requests without provisioning will error. ### Provision curl -s -X POST https://cohesivity.ai/api/resources/google-generative-ai-api \ -H "Authorization: Bearer " ### Delete curl -s -X DELETE https://cohesivity.ai/api/resources/google-generative-ai-api \ -H "Authorization: Bearer " **Important:** Provision this resource now, before building or running the application. Provisioning is the agent's job, not the application's. ## Common Mistakes - **Using deprecated model.** `gemini-2.0-flash` is no longer available. Use `gemini-3-flash-preview` for text. - **Not setting `responseMimeType` for JSON output.** Without `generationConfig.responseMimeType: "application/json"`, Gemini wraps JSON in markdown code blocks (```json ... ```). Set it explicitly for reliable JSON parsing. - **Setting `maxOutputTokens` too low.** WARNING: Gemini 2.5 models use internal "thinking" tokens (300-1000) that are deducted from `maxOutputTokens` before any output is generated. Set `maxOutputTokens` to at least 1500 even for short responses. For large structured responses (multi-item lists, itineraries, detailed JSON), use 4000-10000. A value of 200 may produce only a few words. Truncated responses still show `finishReason: STOP` with no indication of truncation. ## Official Docs https://ai.google.dev/gemini-api/docs ## Edge Usage - **Base URL:** https://cohesivity.ai/edge/google-generative-ai-api - **Auth:** `coh_application_key` as the **key** query parameter - **Format:** same as Google's Gemini API ## Models Recommended models by use case. All models available through `GET /v1beta/models` are supported. **Text generation** (https://ai.google.dev/gemini-api/docs/text-generation): - `gemini-3-flash-preview` — fast and intelligent - `gemini-3.1-pro-preview` — most intelligent - `gemini-3.1-flash-lite-preview` — fastest **Image generation / editing** (https://ai.google.dev/gemini-api/docs/image-generation): - `gemini-3.1-flash-image-preview` — fast image generation and editing - `gemini-3-pro-image-preview` — higher quality image generation and editing **Video generation** (https://ai.google.dev/gemini-api/docs/video): - `veo-3.1-generate-preview` — video generation (use `predictLongRunning` method) - `veo-3.1-fast-generate-preview` — faster video generation **Text-to-speech** (https://ai.google.dev/gemini-api/docs/speech-generation): - `gemini-2.5-pro-preview-tts` — higher quality TTS - `gemini-2.5-flash-preview-tts` — faster TTS **Live audio-to-audio** (https://ai.google.dev/gemini-api/docs/live-api): - `gemini-2.5-flash-native-audio-latest` — real-time bidirectional audio (stable) - `gemini-3.1-flash-live-preview` — latest preview (may have availability issues) ## Gemini Live (Real-time Audio) Gemini Live uses a WebSocket connection for bidirectional audio streaming. Connect via: WS wss://cohesivity.ai/edge/google-generative-ai-api/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent?key= The WebSocket relays directly to Google's Gemini Live API. Send and receive messages in the same format as the Google API — Cohesivity injects the API key and proxies all frames. Setup message (send first after connecting): { "setup": { "model": "models/gemini-2.5-flash-native-audio-latest", "generationConfig": { "responseModalities": ["AUDIO"], "speechConfig": { "voiceConfig": { "prebuiltVoiceConfig": { "voiceName": "Puck" } } } } } } After `setupComplete`, send text or audio: { "clientContent": { "turns": [{ "role": "user", "parts": [{ "text": "Hello" }] }], "turnComplete": true } } Response includes text (thinking) + audio (inlineData with audio/pcm). See the official docs for the full protocol. ## Examples - List models: `GET https://cohesivity.ai/edge/google-generative-ai-api/v1beta/models?key=` - Generate text: `POST https://cohesivity.ai/edge/google-generative-ai-api/v1beta/models/gemini-2.5-flash:generateContent?key=` with body `{ "contents": [{ "parts": [{ "text": "Hello" }] }] }` - Generate image: `POST https://cohesivity.ai/edge/google-generative-ai-api/v1beta/models/gemini-2.5-flash-image:generateContent?key=` with body `{ "contents": [{ "parts": [{ "text": "A landscape painting" }] }], "generationConfig": { "responseModalities": ["IMAGE", "TEXT"] } }` ## Image Generation Use `gemini-2.5-flash-image` with `responseModalities: ["IMAGE", "TEXT"]` in generationConfig. The response includes `inlineData` with the image as base64 PNG. Response parts: - Text part: `{ "text": "Here is your image" }` - Image part: `{ "inlineData": { "mimeType": "image/png", "data": "" } }` To save the image, decode the base64 `data` field. ## Streaming Streaming is supported. Use the `streamGenerateContent` action instead of `generateContent`. - **Raw passthrough (default):** the proxy preserves the upstream response as-is. If you want Google SSE frames specifically, pass `?alt=sse`. - **Normalized Cohesivity mode:** append `?coh_stream=normalized` to receive canonical SSE frames. Cohesivity automatically forces `alt=sse` upstream in this mode. Frame types: - `data: {"delta":"text chunk"}` — a text chunk from the model - `data: {"error":"error message"}` — an error from the model (rate limit, safety filter, etc.) - `data: {"done":true}` — stream complete (always the last frame) ## Response Format The response is deeply nested. Extract the text from: `response.candidates[0].content.parts[0].text` ## JSON Output To get reliable JSON from Gemini, set `generationConfig.responseMimeType: "application/json"` in the request body. Without this, Gemini wraps JSON in markdown code blocks. ## Token Budget WARNING: Set `maxOutputTokens` to at least 1500 even for short responses. Gemini 2.5 models use internal "thinking" tokens (300-1000) that are deducted from this budget before any output is generated. A configured limit of 200 may produce only a few words. Scaling guide: 1500 for simple responses, 3000-5000 for structured JSON (lists, objects), 6000-10000 for large outputs (multi-day itineraries, detailed reports). Truncated responses still show `finishReason: STOP` with no indication of truncation.