Responses API

The Responses API is the recommended way to build on Gloo AI. It exposes the entire Gloo AI model catalog through a single, OpenAI-compatible /responses endpoint — text, vision, reasoning, tool use, and native image generation — with one request and response shape that works the same across every provider.

New to Gloo AI? Start here. The Responses API is the standard surface going forward. If you already use Completions V2, it remains fully supported and backwards-compatible — see Moving from Completions to Responses.

Why the Responses API

The Responses API offers a more capable, forward-looking request shape that the broader ecosystem is standardizing on:

One shape for every modality. Text, image input (vision), and image generation all use the same input array and the same typed output array — no separate endpoints or bespoke payloads per capability.
OpenAI-compatible. The request and response formats mirror the OpenAI Responses API, so existing tooling, SDKs, and mental models carry over directly. Point your base URL at Gloo and go.
Typed, structured output. Responses come back as an output[] array of typed items (message, image_generation_call, reasoning, tool calls) instead of a single opaque choices[].message.content string — easier to parse, and extensible as new item types arrive.
Built for multimodal. Native image generation and image input are first-class, not bolted on. The same endpoint that answers a text prompt can return a generated image.

If you are starting a new integration, build on the Responses API.

Which API should I use?

Use case	Recommended endpoint
New integration with OpenAI-compatible tooling	Responses API (v1)
Vision (image input) or image generation	Responses API (v1)
Direct model selection with typed, structured output	Responses API (v1)
Intelligent auto-routing across models	Completions V2
Values-aligned (`tradition`) responses	Completions V2
Grounded / RAG completions with source attribution	Completions V2

When in doubt, start with the Responses API — it’s the default for new work, and Completions V2 stays available whenever you need routing, values-alignment, or grounding.

Endpoint

URL: https://platform.ai.gloo.com/ai/v1/responses Operation: POST

curl -X POST 'https://platform.ai.gloo.com/ai/v1/responses' \
  -H 'Authorization: Bearer ${ACCESS_TOKEN}' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gloo-anthropic-claude-sonnet-4.6",
    "input": [
      { "role": "user", "content": "Explain prompt caching in two sentences." }
    ]
  }'

Authentication is identical to the rest of the platform — a Bearer access token from your Client ID / Client Secret. See Manage API Credentials.

Request format

The Responses API uses input instead of messages, and a few renamed fields. If you know the chat-completions format, the mapping is small:

Responses API	Chat Completions equivalent	Notes
`input`	`messages`	A string, or an array of input items (`role` + `content`).
`instructions`	`system` message	Top-level system / developer instructions.
`max_output_tokens`	`max_tokens`	Cap on generated tokens.
`model`	`model`	A Gloo model ID — see Supported Models.

Parameter	Type	Required?	Description
`model`	string	Yes	Gloo model ID (e.g. `gloo-anthropic-claude-sonnet-4.6`).
`input`	string \| array	Yes	A plain string prompt, or an array of typed input items.
`instructions`	string	No	System-level instructions applied to the request.
`max_output_tokens`	integer	No	Maximum number of tokens to generate.
`temperature`	float	No	Sampling temperature.
`top_p`	float	No	Nucleus sampling.
`tools`	array	No	Tool / function definitions. See Tool Use.
`tool_choice`	string \| object	No	Controls tool selection.
`stream`	boolean	No	Stream the response as SSE events (default `false`).
`reasoning`	object	No	Reasoning controls (e.g. effort) for capable models.
`response_format`	object	No	Structured-output / JSON schema controls.
`prompt_cache_key`	string	No	Improves cache-hit routing — see Prompt Caching.
`image_generation`	object	No	Image-generation options (`quality`, `size`) for image-capable models.

Conversation state is managed client-side by passing the full input history on each request. The previous_response_id parameter (used by OpenAI’s Responses API for server-side conversation chaining) is not supported.

Today the Responses API gives you direct, OpenAI-compatible access to the exact model you specify. Gloo’s intelligent auto-routing, model_family selection, tradition (values-aligned) responses, and guardrailed safety layers are planned for the Responses API. In the meantime, those capabilities are available now on Completions V2 / Grounded Completions, which remain fully supported.

Response format

Responses return a typed output[] array. Each item has a type; a normal text answer arrives as a message item, a generated image as an image_generation_call item.

{
  "id": "resp_...",
  "object": "response",
  "model": "gloo-anthropic-claude-sonnet-4.6",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        { "type": "output_text", "text": "Prompt caching reuses processed prompt tokens..." }
      ]
    }
  ],
  "usage": { "input_tokens": 24, "output_tokens": 38, "total_tokens": 62 }
}

Streaming

Set "stream": true to receive the response as Server-Sent Events. Each event has an event: line and a data: line carrying a typed JSON payload:

event: response.created
data: {"type":"response.created","response":{"id":"resp_a1b2c3","object":"response",...}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_...","delta":"Prompt "}

event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_...","delta":"caching "}

event: response.output_text.done
data: {"type":"response.output_text.done","item_id":"msg_...","text":"Prompt caching reuses..."}

event: response.completed
data: {"type":"response.completed","response":{"id":"resp_a1b2c3",...,"output":[...],"usage":{...}}}

Key event types to handle:

response.created — emitted once at the start of the response.
response.output_text.delta / response.output_text.done — incremental text tokens for a message item.
response.output_item.added / response.output_item.done — emitted for typed items in output[] (including image_generation_call and function_call).
response.completed — emitted once when the response is finished; the final payload includes the full output[] array and usage.
Error events — surfaced as event types prefixed with error. (e.g. error); treat any unknown event as terminal and close the stream.

Multimodal

Image input (vision)

Pass images as input items alongside text. Any vision-capable model accepts them:

curl -X POST 'https://platform.ai.gloo.com/ai/v1/responses' \
  -H 'Authorization: Bearer ${ACCESS_TOKEN}' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gloo-google-gemini-3.1-pro",
    "input": [
      { "role": "user", "content": [
        { "type": "input_text", "text": "Describe this image." },
        { "type": "input_image", "image_url": "https://example.com/photo.jpg" }
      ]}
    ]
  }'

Images may be supplied as a remote URL or a base64 data: URI.

Image generation

Image-capable models return a generated image as an image_generation_call output item (base64 result). Optional image_generation controls (quality, size) are forwarded to providers that support them.

curl -X POST 'https://platform.ai.gloo.com/ai/v1/responses' \
  -H 'Authorization: Bearer ${ACCESS_TOKEN}' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gloo-google-gemini-3-pro-image",
    "input": [
      { "role": "user", "content": [
        { "type": "input_text", "text": "A watercolor painting of a lighthouse at sunset." }
      ]}
    ]
  }'

The generated image comes back as a typed image_generation_call item in output[]. The result field carries the image as a base64-encoded string:

{
  "id": "resp_...",
  "object": "response",
  "model": "gloo-google-gemini-3-pro-image",
  "output": [
    {
      "type": "image_generation_call",
      "id": "ig_...",
      "status": "completed",
      "result": "iVBORw0KGgoAAAANSUhEUgAA...<truncated base64 PNG>..."
    }
  ],
  "usage": { "input_tokens": 14, "output_tokens": 1290, "total_tokens": 1304 }
}

Decode the base64 result to get the image bytes. Optional image_generation controls (quality, size) are forwarded to providers that support them. See Supported Models for which models support image input and image generation.

Pricing & token spend

The Responses API is billed per token at each model’s standard rates — there is no separate or premium price for using /responses. Cost is driven entirely by the model you select and the tokens you consume:

total_cost = input_tokens  × input_rate
           + cached_tokens  × cache_read_rate
           + output_tokens  × output_rate

Per-model rates. Input and output rates vary by model. The live rates are on the Supported Models page and programmatically on GET /platform/v2/models.
Prompt caching reduces input cost when prompt prefixes repeat — cached tokens are billed at a discounted cache_read rate. See Prompt Caching.
Image generation is billed using the image model’s token accounting; check the model’s rates on the models endpoint.
A 5.5% Studio markup applies to all segments, consistent with the rest of the platform.

Track real spend in the Gloo Studio billing dashboard and API usage.

Supported models

The Responses API works across the full Gloo AI catalog — Anthropic, OpenAI, Google, and open-source families — including the multimodal and image-generation models. Use the Model ID as the model field. The complete, live list (with capabilities and pricing) is on the Supported Models page.

Moving from Completions to Responses

Completions V2 (/ai/v2/chat/completions) is fully supported and backwards-compatible — existing integrations continue to work unchanged, and it’s where Gloo’s auto-routing, model_family selection, tradition (values-aligned) responses, and grounded/guardrailed completions live today, while those capabilities make their way to the Responses API. The Responses API is the recommended surface for new work because it standardizes on the OpenAI-compatible Responses shape and makes multimodal (vision + image generation) first-class. The migration is mostly a rename:

Completions V2	Responses API
`POST /ai/v2/chat/completions`	`POST /ai/v1/responses`
`messages`	`input`
`system` role message	`instructions`
`max_tokens`	`max_output_tokens`
`choices[].message.content` (string)	`output[]` (typed items)

// Completions V2
{
  "model": "gloo-anthropic-claude-sonnet-4.6",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Hello" }
  ],
  "max_tokens": 256
}

// Responses API (equivalent)
{
  "model": "gloo-anthropic-claude-sonnet-4.6",
  "instructions": "You are a helpful assistant.",
  "input": [
    { "role": "user", "content": "Hello" }
  ],
  "max_output_tokens": 256
}

Auto-routing, model_family, tradition, and grounded/guardrailed responses are planned for the Responses API. Until then, reach for Completions V2 when you need them today — and choose the Responses API for OpenAI-compatible, multimodal, direct-model integrations.

Supported Models — model IDs, capabilities, and live pricing
Prompt Caching — reduce cost and latency with cached prompt prefixes
Tool Use — function calling
Completions V2 — routing, values-alignment, and backwards-compatibility

​Why the Responses API

​Which API should I use?

​Endpoint

​Request format

​Response format

​Streaming

​Multimodal

​Image input (vision)

​Image generation

​Pricing & token spend

​Supported models

​Moving from Completions to Responses

​Related Documentation

Why the Responses API

Which API should I use?

Endpoint

Request format

Response format

Streaming

Multimodal

Image input (vision)

Image generation

Pricing & token spend

Supported models

Moving from Completions to Responses

Related Documentation