/responses endpoint — text, vision, reasoning, tool use, and native image generation — with one request and response shape that works the same across every provider.
New to Gloo AI? Start here. The Responses API is the standard surface going forward. If you already use Completions V2, it remains fully supported and backwards-compatible — see Moving from Completions to Responses.
Why the Responses API
The Responses API offers a more capable, forward-looking request shape that the broader ecosystem is standardizing on:- One shape for every modality. Text, image input (vision), and image generation all use the same
inputarray and the same typedoutputarray — no separate endpoints or bespoke payloads per capability. - OpenAI-compatible. The request and response formats mirror the OpenAI Responses API, so existing tooling, SDKs, and mental models carry over directly. Point your base URL at Gloo and go.
- Typed, structured output. Responses come back as an
output[]array of typed items (message,image_generation_call, reasoning, tool calls) instead of a single opaquechoices[].message.contentstring — easier to parse, and extensible as new item types arrive. - Built for multimodal. Native image generation and image input are first-class, not bolted on. The same endpoint that answers a text prompt can return a generated image.
Which API should I use?
| Use case | Recommended endpoint |
|---|---|
| New integration with OpenAI-compatible tooling | Responses API (v1) |
| Vision (image input) or image generation | Responses API (v1) |
| Direct model selection with typed, structured output | Responses API (v1) |
| Intelligent auto-routing across models | Completions V2 |
Values-aligned (tradition) responses | Completions V2 |
| Grounded / RAG completions with source attribution | Completions V2 |
Endpoint
URL:https://platform.ai.gloo.com/ai/v1/responses
Operation: POST
Request format
The Responses API usesinput instead of messages, and a few renamed fields. If you know the chat-completions format, the mapping is small:
| Responses API | Chat Completions equivalent | Notes |
|---|---|---|
input | messages | A string, or an array of input items (role + content). |
instructions | system message | Top-level system / developer instructions. |
max_output_tokens | max_tokens | Cap on generated tokens. |
model | model | A Gloo model ID — see Supported Models. |
| Parameter | Type | Required? | Description |
|---|---|---|---|
model | string | Yes | Gloo model ID (e.g. gloo-anthropic-claude-sonnet-4.6). |
input | string | array | Yes | A plain string prompt, or an array of typed input items. |
instructions | string | No | System-level instructions applied to the request. |
max_output_tokens | integer | No | Maximum number of tokens to generate. |
temperature | float | No | Sampling temperature. |
top_p | float | No | Nucleus sampling. |
tools | array | No | Tool / function definitions. See Tool Use. |
tool_choice | string | object | No | Controls tool selection. |
stream | boolean | No | Stream the response as SSE events (default false). |
reasoning | object | No | Reasoning controls (e.g. effort) for capable models. |
response_format | object | No | Structured-output / JSON schema controls. |
prompt_cache_key | string | No | Improves cache-hit routing — see Prompt Caching. |
image_generation | object | No | Image-generation options (quality, size) for image-capable models. |
Conversation state is managed client-side by passing the full
input history on each request. The previous_response_id parameter (used by OpenAI’s Responses API for server-side conversation chaining) is not supported.Today the Responses API gives you direct, OpenAI-compatible access to the exact
model you specify. Gloo’s intelligent auto-routing, model_family selection, tradition (values-aligned) responses, and guardrailed safety layers are planned for the Responses API. In the meantime, those capabilities are available now on Completions V2 / Grounded Completions, which remain fully supported.Response format
Responses return a typedoutput[] array. Each item has a type; a normal text answer arrives as a message item, a generated image as an image_generation_call item.
Streaming
Set"stream": true to receive the response as Server-Sent Events. Each event has an event: line and a data: line carrying a typed JSON payload:
response.created— emitted once at the start of the response.response.output_text.delta/response.output_text.done— incremental text tokens for amessageitem.response.output_item.added/response.output_item.done— emitted for typed items inoutput[](includingimage_generation_callandfunction_call).response.completed— emitted once when the response is finished; the final payload includes the fulloutput[]array andusage.- Error events — surfaced as event types prefixed with
error.(e.g.error); treat any unknown event as terminal and close the stream.
Multimodal
Image input (vision)
Pass images as input items alongside text. Any vision-capable model accepts them:data: URI.
Image generation
Image-capable models return a generated image as animage_generation_call output item (base64 result). Optional image_generation controls (quality, size) are forwarded to providers that support them.
image_generation_call item in output[]. The result field carries the image as a base64-encoded string:
result to get the image bytes. Optional image_generation controls (quality, size) are forwarded to providers that support them.
See Supported Models for which models support image input and image generation.
Pricing & token spend
The Responses API is billed per token at each model’s standard rates — there is no separate or premium price for using/responses. Cost is driven entirely by the model you select and the tokens you consume:
- Per-model rates. Input and output rates vary by model. The live rates are on the Supported Models page and programmatically on
GET /platform/v2/models. - Prompt caching reduces input cost when prompt prefixes repeat — cached tokens are billed at a discounted
cache_readrate. See Prompt Caching. - Image generation is billed using the image model’s token accounting; check the model’s rates on the models endpoint.
- A 5.5% Studio markup applies to all segments, consistent with the rest of the platform.
Supported models
The Responses API works across the full Gloo AI catalog — Anthropic, OpenAI, Google, and open-source families — including the multimodal and image-generation models. Use the Model ID as themodel field. The complete, live list (with capabilities and pricing) is on the Supported Models page.
Moving from Completions to Responses
Completions V2 (/ai/v2/chat/completions) is fully supported and backwards-compatible — existing integrations continue to work unchanged, and it’s where Gloo’s auto-routing, model_family selection, tradition (values-aligned) responses, and grounded/guardrailed completions live today, while those capabilities make their way to the Responses API.
The Responses API is the recommended surface for new work because it standardizes on the OpenAI-compatible Responses shape and makes multimodal (vision + image generation) first-class. The migration is mostly a rename:
| Completions V2 | Responses API |
|---|---|
POST /ai/v2/chat/completions | POST /ai/v1/responses |
messages | input |
system role message | instructions |
max_tokens | max_output_tokens |
choices[].message.content (string) | output[] (typed items) |
Auto-routing,
model_family, tradition, and grounded/guardrailed responses are planned for the Responses API. Until then, reach for Completions V2 when you need them today — and choose the Responses API for OpenAI-compatible, multimodal, direct-model integrations.Related Documentation
- Supported Models — model IDs, capabilities, and live pricing
- Prompt Caching — reduce cost and latency with cached prompt prefixes
- Tool Use — function calling
- Completions V2 — routing, values-alignment, and backwards-compatibility

