# Google Generative Language API (Gemini) Proxies Google's Generative Language API. Generate content, list models — no separate Google API key. ## Prerequisites Provision this resource before use. Edge requests without provisioning will error. ### Provision curl -s -X POST https://cohesivity.ai/api/resources/google-generative-language-api \ -H "Authorization: Bearer " ### Delete curl -s -X DELETE https://cohesivity.ai/api/resources/google-generative-language-api \ -H "Authorization: Bearer " **Important:** Provision this resource now, before building or running the application. Provisioning is the agent's job, not the application's. ## Common Mistakes - **Using deprecated model.** `gemini-2.0-flash` is deprecated and will return errors. Use `gemini-2.5-flash` instead. - **Not setting `responseMimeType` for JSON output.** Without `generationConfig.responseMimeType: "application/json"`, Gemini wraps JSON in markdown code blocks (```json ... ```). Set it explicitly for reliable JSON parsing. - **Setting `maxOutputTokens` too low.** WARNING: Gemini 2.5 models use internal "thinking" tokens (300-1000) that are deducted from `maxOutputTokens` before any output is generated. Set `maxOutputTokens` to at least 1500 even for short responses. For large structured responses (multi-item lists, itineraries, detailed JSON), use 4000-10000. A value of 200 may produce only a few words. Truncated responses still show `finishReason: STOP` with no indication of truncation. ## Official Docs https://ai.google.dev/gemini-api/docs — read before coding. ## Edge Usage - **Base URL:** https://cohesivity.ai/edge/google-generative-language-api - **Auth:** `coh_application_key` as the **key** query parameter - **Format:** same as Google's Gemini API ## Examples - List models: `GET https://cohesivity.ai/edge/google-generative-language-api/v1beta/models?key=` - Generate: `POST https://cohesivity.ai/edge/google-generative-language-api/v1beta/models/gemini-2.5-flash:generateContent?key=` with body `{ "contents": [{ "parts": [{ "text": "Hello" }] }] }` ## Streaming Streaming is supported. Use the `streamGenerateContent` action instead of `generateContent`. - **Raw passthrough (default):** the proxy preserves the upstream response as-is. If you want Google SSE frames specifically, pass `?alt=sse`. - **Normalized Cohesivity mode:** append `?coh_stream=normalized` to receive canonical SSE frames. Cohesivity automatically forces `alt=sse` upstream in this mode. Frame types: - `data: {"delta":"text chunk"}` — a text chunk from the model - `data: {"error":"error message"}` — an error from the model (rate limit, safety filter, etc.) - `data: {"done":true}` — stream complete (always the last frame) Example: - `POST https://cohesivity.ai/edge/google-generative-language-api/v1beta/models/gemini-2.5-flash:streamGenerateContent?coh_stream=normalized&key=` - `POST https://cohesivity.ai/edge/google-generative-language-api/v1beta/models/gemini-2.5-flash:streamGenerateContent?alt=sse&key=` ## Response Format The response is deeply nested. Extract the text from: `response.candidates[0].content.parts[0].text` ## JSON Output To get reliable JSON from Gemini, set `generationConfig.responseMimeType: "application/json"` in the request body. Without this, Gemini wraps JSON in markdown code blocks. ## Token Budget WARNING: Set `maxOutputTokens` to at least 1500 even for short responses. Gemini 2.5 models use internal "thinking" tokens (300-1000) that are deducted from this budget before any output is generated. A configured limit of 200 may produce only a few words. Scaling guide: 1500 for simple responses, 3000-5000 for structured JSON (lists, objects), 6000-10000 for large outputs (multi-day itineraries, detailed reports). Truncated responses still show `finishReason: STOP` with no indication of truncation.