# Google Generative Language API (Gemini)

Proxies Google's Generative Language API. Generate content, list models — no separate Google API key.

## Prerequisites

Provision this resource before use. Edge requests without provisioning will error.

### Provision

    curl -s -X POST https://cohesivity.ai/api/resources/google-generative-language-api \
      -H "Authorization: Bearer <coh_management_key>"

### Delete

    curl -s -X DELETE https://cohesivity.ai/api/resources/google-generative-language-api \
      -H "Authorization: Bearer <coh_management_key>"

**Important:** Provision this resource now, before building or running the application. Provisioning is the agent's job, not the application's.

## Common Mistakes

- **Using deprecated model.** `gemini-2.0-flash` is deprecated and will return errors. Use `gemini-2.5-flash` instead.
- **Not setting `responseMimeType` for JSON output.** Without `generationConfig.responseMimeType: "application/json"`, Gemini wraps JSON in markdown code blocks (```json ... ```). Set it explicitly for reliable JSON parsing.
- **Setting `maxOutputTokens` too low.** WARNING: Gemini 2.5 models use internal "thinking" tokens (300-1000) that are deducted from `maxOutputTokens` before any output is generated. Set `maxOutputTokens` to at least 1500 even for short responses. For large structured responses (multi-item lists, itineraries, detailed JSON), use 4000-10000. A value of 200 may produce only a few words. Truncated responses still show `finishReason: STOP` with no indication of truncation.

## Official Docs

https://ai.google.dev/gemini-api/docs — read before coding.

## Edge Usage

- **Base URL:** https://cohesivity.ai/edge/google-generative-language-api
- **Auth:** `coh_application_key` as the **key** query parameter
- **Format:** same as Google's Gemini API

## Examples

- List models: `GET https://cohesivity.ai/edge/google-generative-language-api/v1beta/models?key=<coh_application_key>`
- Generate: `POST https://cohesivity.ai/edge/google-generative-language-api/v1beta/models/gemini-2.5-flash:generateContent?key=<coh_application_key>` with body `{ "contents": [{ "parts": [{ "text": "Hello" }] }] }`

## Streaming

Streaming is supported. Use the `streamGenerateContent` action instead of `generateContent`.

- **Raw passthrough (default):** the proxy preserves the upstream response as-is. If you want Google SSE frames specifically, pass `?alt=sse`.
- **Normalized Cohesivity mode:** append `?coh_stream=normalized` to receive canonical SSE frames. Cohesivity automatically forces `alt=sse` upstream in this mode. Frame types:
  - `data: {"delta":"text chunk"}` — a text chunk from the model
  - `data: {"error":"error message"}` — an error from the model (rate limit, safety filter, etc.)
  - `data: {"done":true}` — stream complete (always the last frame)

Example:

- `POST https://cohesivity.ai/edge/google-generative-language-api/v1beta/models/gemini-2.5-flash:streamGenerateContent?coh_stream=normalized&key=<coh_application_key>`
- `POST https://cohesivity.ai/edge/google-generative-language-api/v1beta/models/gemini-2.5-flash:streamGenerateContent?alt=sse&key=<coh_application_key>`

## Response Format

The response is deeply nested. Extract the text from: `response.candidates[0].content.parts[0].text`

## JSON Output

To get reliable JSON from Gemini, set `generationConfig.responseMimeType: "application/json"` in the request body. Without this, Gemini wraps JSON in markdown code blocks.

## Token Budget

WARNING: Set `maxOutputTokens` to at least 1500 even for short responses. Gemini 2.5 models use internal "thinking" tokens (300-1000) that are deducted from this budget before any output is generated. A configured limit of 200 may produce only a few words.

Scaling guide: 1500 for simple responses, 3000-5000 for structured JSON (lists, objects), 6000-10000 for large outputs (multi-day itineraries, detailed reports). Truncated responses still show `finishReason: STOP` with no indication of truncation.