> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gloo.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Responses API

> The Gloo AI Responses API (v1) — the new standard for building on Gloo AI. OpenAI-compatible, multimodal, and built for text, vision, and image generation.

The **Responses API** is the recommended way to build on Gloo AI. It exposes the entire Gloo AI model catalog through a single, OpenAI-compatible `/responses` endpoint — text, vision, reasoning, tool use, and native image generation — with one request and response shape that works the same across every provider.

<Note>
  **New to Gloo AI? Start here.** The Responses API is the standard surface going forward. If you already use [Completions V2](/api-guides/completions-v2), it remains fully supported and backwards-compatible — see [Moving from Completions to Responses](#moving-from-completions-to-responses).
</Note>

## Why the Responses API

The Responses API offers a more capable, forward-looking request shape that the broader ecosystem is standardizing on:

* **One shape for every modality.** Text, image input (vision), and image generation all use the same `input` array and the same typed `output` array — no separate endpoints or bespoke payloads per capability.
* **OpenAI-compatible.** The request and response formats mirror the OpenAI Responses API, so existing tooling, SDKs, and mental models carry over directly. Point your base URL at Gloo and go.
* **Typed, structured output.** Responses come back as an `output[]` array of typed items (`message`, `image_generation_call`, reasoning, tool calls) instead of a single opaque `choices[].message.content` string — easier to parse, and extensible as new item types arrive.
* **Built for multimodal.** Native image generation and image input are first-class, not bolted on. The same endpoint that answers a text prompt can return a generated image.

If you are starting a new integration, build on the Responses API.

## Which API should I use?

| Use case                                             | Recommended endpoint                         |
| :--------------------------------------------------- | :------------------------------------------- |
| New integration with OpenAI-compatible tooling       | **Responses API (v1)**                       |
| Vision (image input) or image generation             | **Responses API (v1)**                       |
| Direct model selection with typed, structured output | **Responses API (v1)**                       |
| Intelligent auto-routing across models               | [Completions V2](/api-guides/completions-v2) |
| Values-aligned (`tradition`) responses               | [Completions V2](/api-guides/completions-v2) |
| Grounded / RAG completions with source attribution   | [Completions V2](/api-guides/completions-v2) |

When in doubt, start with the Responses API — it's the default for new work, and Completions V2 stays available whenever you need routing, values-alignment, or grounding.

## Endpoint

**URL:** `https://platform.ai.gloo.com/ai/v1/responses`

**Operation:** `POST`

```bash theme={null}
curl -X POST 'https://platform.ai.gloo.com/ai/v1/responses' \
  -H 'Authorization: Bearer ${ACCESS_TOKEN}' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gloo-anthropic-claude-sonnet-4.6",
    "input": [
      { "role": "user", "content": "Explain prompt caching in two sentences." }
    ]
  }'
```

Authentication is identical to the rest of the platform — a Bearer access token from your Client ID / Client Secret. See [Manage API Credentials](/studio/manage-api-credentials).

## Request format

The Responses API uses `input` instead of `messages`, and a few renamed fields. If you know the chat-completions format, the mapping is small:

| Responses API       | Chat Completions equivalent | Notes                                                                   |
| :------------------ | :-------------------------- | :---------------------------------------------------------------------- |
| `input`             | `messages`                  | A string, or an array of input items (`role` + `content`).              |
| `instructions`      | `system` message            | Top-level system / developer instructions.                              |
| `max_output_tokens` | `max_tokens`                | Cap on generated tokens.                                                |
| `model`             | `model`                     | A Gloo model ID — see [Supported Models](/api-guides/supported-models). |

| Parameter           | Type             | Required? | Description                                                                    |
| :------------------ | :--------------- | :-------- | :----------------------------------------------------------------------------- |
| `model`             | string           | Yes       | Gloo model ID (e.g. `gloo-anthropic-claude-sonnet-4.6`).                       |
| `input`             | string \| array  | Yes       | A plain string prompt, or an array of typed input items.                       |
| `instructions`      | string           | No        | System-level instructions applied to the request.                              |
| `max_output_tokens` | integer          | No        | Maximum number of tokens to generate.                                          |
| `temperature`       | float            | No        | Sampling temperature.                                                          |
| `top_p`             | float            | No        | Nucleus sampling.                                                              |
| `tools`             | array            | No        | Tool / function definitions. See [Tool Use](/api-guides/tool-use).             |
| `tool_choice`       | string \| object | No        | Controls tool selection.                                                       |
| `stream`            | boolean          | No        | Stream the response as SSE events (default `false`).                           |
| `reasoning`         | object           | No        | Reasoning controls (e.g. effort) for capable models.                           |
| `response_format`   | object           | No        | Structured-output / JSON schema controls.                                      |
| `prompt_cache_key`  | string           | No        | Improves cache-hit routing — see [Prompt Caching](/api-guides/prompt-caching). |
| `image_generation`  | object           | No        | Image-generation options (`quality`, `size`) for image-capable models.         |

<Info>
  Conversation state is managed client-side by passing the full `input` history on each request. The `previous_response_id` parameter (used by OpenAI's Responses API for server-side conversation chaining) is not supported.
</Info>

<Info>
  Today the Responses API gives you direct, OpenAI-compatible access to the exact `model` you specify. Gloo's intelligent auto-routing, `model_family` selection, `tradition` (values-aligned) responses, and guardrailed safety layers are **planned for the Responses API**. In the meantime, those capabilities are available now on [Completions V2](/api-guides/completions-v2) / [Grounded Completions](/api-guides/grounded-completions), which remain fully supported.
</Info>

## Response format

Responses return a typed `output[]` array. Each item has a `type`; a normal text answer arrives as a `message` item, a generated image as an `image_generation_call` item.

```json theme={null}
{
  "id": "resp_...",
  "object": "response",
  "model": "gloo-anthropic-claude-sonnet-4.6",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        { "type": "output_text", "text": "Prompt caching reuses processed prompt tokens..." }
      ]
    }
  ],
  "usage": { "input_tokens": 24, "output_tokens": 38, "total_tokens": 62 }
}
```

### Streaming

Set `"stream": true` to receive the response as Server-Sent Events. Each event has an `event:` line and a `data:` line carrying a typed JSON payload:

```text theme={null}
event: response.created
data: {"type":"response.created","response":{"id":"resp_a1b2c3","object":"response",...}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_...","delta":"Prompt "}

event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_...","delta":"caching "}

event: response.output_text.done
data: {"type":"response.output_text.done","item_id":"msg_...","text":"Prompt caching reuses..."}

event: response.completed
data: {"type":"response.completed","response":{"id":"resp_a1b2c3",...,"output":[...],"usage":{...}}}
```

Key event types to handle:

* `response.created` — emitted once at the start of the response.
* `response.output_text.delta` / `response.output_text.done` — incremental text tokens for a `message` item.
* `response.output_item.added` / `response.output_item.done` — emitted for typed items in `output[]` (including `image_generation_call` and `function_call`).
* `response.completed` — emitted once when the response is finished; the final payload includes the full `output[]` array and `usage`.
* Error events — surfaced as event types prefixed with `error.` (e.g. `error`); treat any unknown event as terminal and close the stream.

## Multimodal

### Image input (vision)

Pass images as input items alongside text. Any vision-capable model accepts them:

```bash theme={null}
curl -X POST 'https://platform.ai.gloo.com/ai/v1/responses' \
  -H 'Authorization: Bearer ${ACCESS_TOKEN}' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gloo-google-gemini-3.1-pro",
    "input": [
      { "role": "user", "content": [
        { "type": "input_text", "text": "Describe this image." },
        { "type": "input_image", "image_url": "https://example.com/photo.jpg" }
      ]}
    ]
  }'
```

Images may be supplied as a remote URL or a base64 `data:` URI.

### Image generation

Image-capable models return a generated image as an `image_generation_call` output item (base64 result). Optional `image_generation` controls (`quality`, `size`) are forwarded to providers that support them.

```bash theme={null}
curl -X POST 'https://platform.ai.gloo.com/ai/v1/responses' \
  -H 'Authorization: Bearer ${ACCESS_TOKEN}' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gloo-google-gemini-3-pro-image",
    "input": [
      { "role": "user", "content": [
        { "type": "input_text", "text": "A watercolor painting of a lighthouse at sunset." }
      ]}
    ]
  }'
```

The generated image comes back as a typed `image_generation_call` item in `output[]`. The `result` field carries the image as a base64-encoded string:

```json theme={null}
{
  "id": "resp_...",
  "object": "response",
  "model": "gloo-google-gemini-3-pro-image",
  "output": [
    {
      "type": "image_generation_call",
      "id": "ig_...",
      "status": "completed",
      "result": "iVBORw0KGgoAAAANSUhEUgAA...<truncated base64 PNG>..."
    }
  ],
  "usage": { "input_tokens": 14, "output_tokens": 1290, "total_tokens": 1304 }
}
```

Decode the base64 `result` to get the image bytes. Optional `image_generation` controls (`quality`, `size`) are forwarded to providers that support them.

See [Supported Models](/api-guides/supported-models) for which models support image input and image generation.

## Pricing & token spend

The Responses API is billed per token at each model's standard rates — there is no separate or premium price for using `/responses`. Cost is driven entirely by the `model` you select and the tokens you consume:

```
total_cost = input_tokens  × input_rate
           + cached_tokens  × cache_read_rate
           + output_tokens  × output_rate
```

* **Per-model rates.** Input and output rates vary by model. The live rates are on the [Supported Models](/api-guides/supported-models) page and programmatically on `GET /platform/v2/models`.
* **Prompt caching** reduces input cost when prompt prefixes repeat — cached tokens are billed at a discounted `cache_read` rate. See [Prompt Caching](/api-guides/prompt-caching).
* **Image generation** is billed using the image model's token accounting; check the model's rates on the models endpoint.
* A **5.5% Studio markup** applies to all segments, consistent with the rest of the platform.

Track real spend in the [Gloo Studio billing dashboard](https://studio.ai.gloo.com/billing) and [API usage](/studio/api-usage).

## Supported models

The Responses API works across the full Gloo AI catalog — Anthropic, OpenAI, Google, and open-source families — including the multimodal and image-generation models. Use the **Model ID** as the `model` field. The complete, live list (with capabilities and pricing) is on the [Supported Models](/api-guides/supported-models) page.

## Moving from Completions to Responses

[Completions V2](/api-guides/completions-v2) (`/ai/v2/chat/completions`) is **fully supported and backwards-compatible** — existing integrations continue to work unchanged, and it's where Gloo's auto-routing, `model_family` selection, `tradition` (values-aligned) responses, and grounded/guardrailed completions live today, while those capabilities make their way to the Responses API.

The Responses API is the recommended surface for new work because it standardizes on the OpenAI-compatible Responses shape and makes multimodal (vision + image generation) first-class. The migration is mostly a rename:

| Completions V2                       | Responses API            |
| :----------------------------------- | :----------------------- |
| `POST /ai/v2/chat/completions`       | `POST /ai/v1/responses`  |
| `messages`                           | `input`                  |
| `system` role message                | `instructions`           |
| `max_tokens`                         | `max_output_tokens`      |
| `choices[].message.content` (string) | `output[]` (typed items) |

```json theme={null}
// Completions V2
{
  "model": "gloo-anthropic-claude-sonnet-4.6",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Hello" }
  ],
  "max_tokens": 256
}
```

```json theme={null}
// Responses API (equivalent)
{
  "model": "gloo-anthropic-claude-sonnet-4.6",
  "instructions": "You are a helpful assistant.",
  "input": [
    { "role": "user", "content": "Hello" }
  ],
  "max_output_tokens": 256
}
```

<Note>
  Auto-routing, `model_family`, `tradition`, and grounded/guardrailed responses are **planned for the Responses API**. Until then, reach for [Completions V2](/api-guides/completions-v2) when you need them today — and choose the Responses API for OpenAI-compatible, multimodal, direct-model integrations.
</Note>

## Related Documentation

* [Supported Models](/api-guides/supported-models) — model IDs, capabilities, and live pricing
* [Prompt Caching](/api-guides/prompt-caching) — reduce cost and latency with cached prompt prefixes
* [Tool Use](/api-guides/tool-use) — function calling
* [Completions V2](/api-guides/completions-v2) — routing, values-alignment, and backwards-compatibility