Skip to main content
The Responses API is the recommended way to build on Gloo AI. It exposes the entire Gloo AI model catalog through a single, OpenAI-compatible /responses endpoint — text, vision, reasoning, tool use, and native image generation — with one request and response shape that works the same across every provider.
New to Gloo AI? Start here. The Responses API is the standard surface going forward. If you already use Completions V2, it remains fully supported and backwards-compatible — see Moving from Completions to Responses.

Why the Responses API

The Responses API offers a more capable, forward-looking request shape that the broader ecosystem is standardizing on:
  • One shape for every modality. Text, image input (vision), and image generation all use the same input array and the same typed output array — no separate endpoints or bespoke payloads per capability.
  • OpenAI-compatible. The request and response formats mirror the OpenAI Responses API, so existing tooling, SDKs, and mental models carry over directly. Point your base URL at Gloo and go.
  • Typed, structured output. Responses come back as an output[] array of typed items (message, image_generation_call, reasoning, tool calls) instead of a single opaque choices[].message.content string — easier to parse, and extensible as new item types arrive.
  • Built for multimodal. Native image generation and image input are first-class, not bolted on. The same endpoint that answers a text prompt can return a generated image.
If you are starting a new integration, build on the Responses API.

Which API should I use?

Use caseRecommended endpoint
New integration with OpenAI-compatible toolingResponses API (v1)
Vision (image input) or image generationResponses API (v1)
Direct model selection with typed, structured outputResponses API (v1)
Intelligent auto-routing across modelsCompletions V2
Values-aligned (tradition) responsesCompletions V2
Grounded / RAG completions with source attributionCompletions V2
When in doubt, start with the Responses API — it’s the default for new work, and Completions V2 stays available whenever you need routing, values-alignment, or grounding.

Endpoint

URL: https://platform.ai.gloo.com/ai/v1/responses Operation: POST
curl -X POST 'https://platform.ai.gloo.com/ai/v1/responses' \
  -H 'Authorization: Bearer ${ACCESS_TOKEN}' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gloo-anthropic-claude-sonnet-4.6",
    "input": [
      { "role": "user", "content": "Explain prompt caching in two sentences." }
    ]
  }'
Authentication is identical to the rest of the platform — a Bearer access token from your Client ID / Client Secret. See Manage API Credentials.

Request format

The Responses API uses input instead of messages, and a few renamed fields. If you know the chat-completions format, the mapping is small:
Responses APIChat Completions equivalentNotes
inputmessagesA string, or an array of input items (role + content).
instructionssystem messageTop-level system / developer instructions.
max_output_tokensmax_tokensCap on generated tokens.
modelmodelA Gloo model ID — see Supported Models.
ParameterTypeRequired?Description
modelstringYesGloo model ID (e.g. gloo-anthropic-claude-sonnet-4.6).
inputstring | arrayYesA plain string prompt, or an array of typed input items.
instructionsstringNoSystem-level instructions applied to the request.
max_output_tokensintegerNoMaximum number of tokens to generate.
temperaturefloatNoSampling temperature.
top_pfloatNoNucleus sampling.
toolsarrayNoTool / function definitions. See Tool Use.
tool_choicestring | objectNoControls tool selection.
streambooleanNoStream the response as SSE events (default false).
reasoningobjectNoReasoning controls (e.g. effort) for capable models.
response_formatobjectNoStructured-output / JSON schema controls.
prompt_cache_keystringNoImproves cache-hit routing — see Prompt Caching.
image_generationobjectNoImage-generation options (quality, size) for image-capable models.
Conversation state is managed client-side by passing the full input history on each request. The previous_response_id parameter (used by OpenAI’s Responses API for server-side conversation chaining) is not supported.
Today the Responses API gives you direct, OpenAI-compatible access to the exact model you specify. Gloo’s intelligent auto-routing, model_family selection, tradition (values-aligned) responses, and guardrailed safety layers are planned for the Responses API. In the meantime, those capabilities are available now on Completions V2 / Grounded Completions, which remain fully supported.

Response format

Responses return a typed output[] array. Each item has a type; a normal text answer arrives as a message item, a generated image as an image_generation_call item.
{
  "id": "resp_...",
  "object": "response",
  "model": "gloo-anthropic-claude-sonnet-4.6",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        { "type": "output_text", "text": "Prompt caching reuses processed prompt tokens..." }
      ]
    }
  ],
  "usage": { "input_tokens": 24, "output_tokens": 38, "total_tokens": 62 }
}

Streaming

Set "stream": true to receive the response as Server-Sent Events. Each event has an event: line and a data: line carrying a typed JSON payload:
event: response.created
data: {"type":"response.created","response":{"id":"resp_a1b2c3","object":"response",...}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_...","delta":"Prompt "}

event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_...","delta":"caching "}

event: response.output_text.done
data: {"type":"response.output_text.done","item_id":"msg_...","text":"Prompt caching reuses..."}

event: response.completed
data: {"type":"response.completed","response":{"id":"resp_a1b2c3",...,"output":[...],"usage":{...}}}
Key event types to handle:
  • response.created — emitted once at the start of the response.
  • response.output_text.delta / response.output_text.done — incremental text tokens for a message item.
  • response.output_item.added / response.output_item.done — emitted for typed items in output[] (including image_generation_call and function_call).
  • response.completed — emitted once when the response is finished; the final payload includes the full output[] array and usage.
  • Error events — surfaced as event types prefixed with error. (e.g. error); treat any unknown event as terminal and close the stream.

Multimodal

Image input (vision)

Pass images as input items alongside text. Any vision-capable model accepts them:
curl -X POST 'https://platform.ai.gloo.com/ai/v1/responses' \
  -H 'Authorization: Bearer ${ACCESS_TOKEN}' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gloo-google-gemini-3.1-pro",
    "input": [
      { "role": "user", "content": [
        { "type": "input_text", "text": "Describe this image." },
        { "type": "input_image", "image_url": "https://example.com/photo.jpg" }
      ]}
    ]
  }'
Images may be supplied as a remote URL or a base64 data: URI.

Image generation

Image-capable models return a generated image as an image_generation_call output item (base64 result). Optional image_generation controls (quality, size) are forwarded to providers that support them.
curl -X POST 'https://platform.ai.gloo.com/ai/v1/responses' \
  -H 'Authorization: Bearer ${ACCESS_TOKEN}' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gloo-google-gemini-3-pro-image",
    "input": [
      { "role": "user", "content": [
        { "type": "input_text", "text": "A watercolor painting of a lighthouse at sunset." }
      ]}
    ]
  }'
The generated image comes back as a typed image_generation_call item in output[]. The result field carries the image as a base64-encoded string:
{
  "id": "resp_...",
  "object": "response",
  "model": "gloo-google-gemini-3-pro-image",
  "output": [
    {
      "type": "image_generation_call",
      "id": "ig_...",
      "status": "completed",
      "result": "iVBORw0KGgoAAAANSUhEUgAA...<truncated base64 PNG>..."
    }
  ],
  "usage": { "input_tokens": 14, "output_tokens": 1290, "total_tokens": 1304 }
}
Decode the base64 result to get the image bytes. Optional image_generation controls (quality, size) are forwarded to providers that support them. See Supported Models for which models support image input and image generation.

Pricing & token spend

The Responses API is billed per token at each model’s standard rates — there is no separate or premium price for using /responses. Cost is driven entirely by the model you select and the tokens you consume:
total_cost = input_tokens  × input_rate
           + cached_tokens  × cache_read_rate
           + output_tokens  × output_rate
  • Per-model rates. Input and output rates vary by model. The live rates are on the Supported Models page and programmatically on GET /platform/v2/models.
  • Prompt caching reduces input cost when prompt prefixes repeat — cached tokens are billed at a discounted cache_read rate. See Prompt Caching.
  • Image generation is billed using the image model’s token accounting; check the model’s rates on the models endpoint.
  • A 5.5% Studio markup applies to all segments, consistent with the rest of the platform.
Track real spend in the Gloo Studio billing dashboard and API usage.

Supported models

The Responses API works across the full Gloo AI catalog — Anthropic, OpenAI, Google, and open-source families — including the multimodal and image-generation models. Use the Model ID as the model field. The complete, live list (with capabilities and pricing) is on the Supported Models page.

Moving from Completions to Responses

Completions V2 (/ai/v2/chat/completions) is fully supported and backwards-compatible — existing integrations continue to work unchanged, and it’s where Gloo’s auto-routing, model_family selection, tradition (values-aligned) responses, and grounded/guardrailed completions live today, while those capabilities make their way to the Responses API. The Responses API is the recommended surface for new work because it standardizes on the OpenAI-compatible Responses shape and makes multimodal (vision + image generation) first-class. The migration is mostly a rename:
Completions V2Responses API
POST /ai/v2/chat/completionsPOST /ai/v1/responses
messagesinput
system role messageinstructions
max_tokensmax_output_tokens
choices[].message.content (string)output[] (typed items)
// Completions V2
{
  "model": "gloo-anthropic-claude-sonnet-4.6",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Hello" }
  ],
  "max_tokens": 256
}
// Responses API (equivalent)
{
  "model": "gloo-anthropic-claude-sonnet-4.6",
  "instructions": "You are a helpful assistant.",
  "input": [
    { "role": "user", "content": "Hello" }
  ],
  "max_output_tokens": 256
}
Auto-routing, model_family, tradition, and grounded/guardrailed responses are planned for the Responses API. Until then, reach for Completions V2 when you need them today — and choose the Responses API for OpenAI-compatible, multimodal, direct-model integrations.