Changelog - Gloo

July 2026

Grounded Responses — RAG on the Responses API

Grounding comes to the Responses API. The new POST /ai/v1/grounded/responses endpoint accepts the standard Responses API request extended with grounding controls, retrieves relevant passages from your uploaded publisher content, and returns the familiar typed output[] — plus a sources_returned flag (or X-Sources-Returned header when streaming) confirming the answer was grounded.

Same shape, grounded — no need to switch to the chat-completions format to get publisher-grounded answers.
Grounding controls — rag_publisher scopes retrieval to your content, sources_limit (1–10) tunes how many sources are injected, and certainty_threshold sets a minimum similarity bar for sources.
Explicit grounding signal — every response reports whether sources were used, including the zero-sources case.
Docs — A new Grounded Responses guide covers the request format, streaming, and how it compares to Grounded Completions.

June 2026

The Responses API (v1): Recommended for OpenAI-Compatible Integrations

The new Responses API (POST /ai/v1/responses) is our recommended surface for OpenAI-compatible, multimodal integrations. It’s OpenAI-compatible and exposes the full model catalog through a single endpoint for text, vision, reasoning, tool use, and native image generation, with one typed request/response shape across every provider. It gives direct access to the exact model you request; Gloo’s guardrails, output moderation, values-alignment, and routing stay on Completions V2 (below).

Multimodal in one endpoint — image input (vision) and image generation alongside text, no separate endpoints or bespoke payloads.
OpenAI-compatible — use the official OpenAI SDKs (responses.create); just point the base URL at https://platform.ai.gloo.com/ai/v1.
Typed, structured output — responses return an output[] array of typed items (messages, tool calls, generated images, reasoning).
Completions V2 stays supported — fully backwards-compatible, and still the home of Gloo’s guardrails, output moderation, auto-routing, model_family, tradition, and grounded completions.
Docs — A new Responses API guide covers usage, pricing, tool use, and migrating from Completions.

Prompt Caching Is Widely Available

A major cost-saving feature for all customers. When you send repeated prompt prefixes — system instructions, few-shot examples, large context blocks — the platform caches them and serves subsequent requests at a fraction of the cost. Cache read tokens are priced well below standard input tokens.

Studio — The Usage page now breaks spend down by cache segment (fresh, cache read, cache write), so you can see exactly what caching is saving you. Model cards show cache pricing alongside standard rates.
Programmatic access — Cache token counts and pricing are now reported programmatically, so savings can be calculated automatically.
Docs — A new Prompt Caching guide covers how caching works across Anthropic, OpenAI, DeepSeek, and Gemini, with code examples and a supported-models matrix.

GlooCode v1.6 — Web UI & More

Four GlooCode releases shipped this week (v1.5.3 → v1.6.2), a significant milestone for the product.

Web UI — GlooCode now includes a built-in web interface with 30+ themes, file diffing, session management, and support for 18 languages.
Faster, more reliable updates — gloocode update now performs direct binary swaps, and install.sh always performs a fresh install.
New built-in agent optimizations — Better and more intelligent default agent models.
More reliable releases — Hardened testing and guaranteed-fresh installers on every release.

Platform Security & Trust

Org-level access controls — Administrative tooling to immediately revoke platform access at the organization level.
GDPR right to erasure — Enterprise customers can request full deletion of user data across the platform, along with data export for subject-access requests.

Data Engine

Cache cost reporting — Usage reporting and CSV exports now break spend down by cache segment.
Performance under load — Improved reliability and throughput at scale.

Billing

Free Plan retired — New users must add a payment method during onboarding to activate their Pay-as-you-go plan and unlock the Studio Playground, API access, and other features. Existing paid plans are unaffected. See Billing & Plans for the updated lineup.
Enterprise billing flexibility — Enterprise admins can now switch to pay-as-you-go billing.
Automatic subscription hygiene — Downgrades automatically clean up associated subscriptions.

Week of May 26, 2026

11 New AI Models

Claude Opus 4.8 — Anthropic’s latest, available across all providers with fast-mode support
GPT-5.3 Codex — OpenAI’s reasoning-focused coding model with 400K context
Claude Opus 4.7 Fast & 4.6 Fast — Fast-output Opus variants for speed-sensitive workloads
Qwen 3.5 Plus, Qwen 3.5 27B, Qwen 3 235B A22B variants — Expanded Qwen family, including thinking/reasoning variants
GLM-4.7 Flash — Z-AI’s fast inference model
MiMo V2 Flash & V2.5 — Xiaomi’s new MoE models optimized for reasoning, coding, and agentic tasks
Gemini 3.1 Flash Lite — promoted from preview to GA

Studio

Spend visibility for all orgs — Every organization can now see its spend breakdown on the Usage page.
Enterprise → pay-as-you-go — Enterprise admins can switch to flexible billing directly from the Billing or Plans page.
Smarter usage charts — Date labels respect your timezone, and charts clearly indicate hourly, daily, weekly, or monthly granularity.
Improved GlooCode usage filters — Faster, more reliable filter dropdowns.

Completions

Prompt caching (beta) — Now works consistently across all completion types, with cache hit rates and token counts in usage reporting.
More reliable streaming — Improved stream stability for reasoning models, including during long thinking pauses.
Smarter guardrails — Tuned content filtering reduces unnecessary blocks, with graduated warnings replacing hard blocks in more cases.
Clearer errors — Provider issues now surface as clear, actionable errors.

GlooCode

v1.5.0 → v1.5.2 — GlooCode now runs on an improved background service for more reliable sessions; existing installations upgrade via gloocode update.
More reliable releases — Hardened release process and installer delivery.

Data Engine

Enterprise self-serve billing — Enterprises can transition billing directly in Studio.
Timezone-aware metrics — Usage reporting now supports timezone-aware grouping with correct daylight-saving handling.
Ingestion refinements — Improved YouTube ingestion consistency for publisher metadata and status tracking.

Documentation

Restructured and refreshed console documentation and developer reference for better discoverability.

Week of May 18, 2026

9 New Models in the Catalog

Gemini 3.5 Flash — 1M context, major coding gains at high speed
Gemma 4 31B — Google’s most capable open-weight model, with vision
GPT OSS 20B — OpenAI’s open-weight MoE model for consumer hardware
Ministral 3B / 8B / 14B — Mistral’s edge-to-server lineup
Qwen 3.7 Max — 1M context for agentic workloads
Qwen 3.5 397B and 122B — Large MoE models
Gemini 3.1 Flash Lite — promoted to GA

GlooCode v1.5.0 — Sub-Agents & Persistent Sessions

GlooCode is Gloo’s autonomous AI coding agent — it writes, tests, and ships code while intelligently optimizing for cost and quality.

Improved background service — More reliable sessions and sign-in.
Explore sub-agent — Fast codebase context-gathering for more relevant results.
Session persistence — Sessions persist through long, context-heavy work.
Org-level usage analytics — Workspace, user, project, and branch tracking, so organizations can see exactly who’s using what.
Studio integration — New GlooCode product and get-started pages with a cost comparison calculator and agent overview, plus a usage analytics dashboard with user/branch filters and cache hit rate metrics.

Studio

New Data Engine pages — Dedicated product pages for ingestion, search, enrichment, and a hub overview, each with polished motion design and market context.
Community section — A new Summer Challenge page with toolkit cards for YouVersion and Gloo AI APIs, judging criteria, and a responsive timeline.

Enterprise Billing Overhaul

Dedicated enterprise billing view with Pro upgrade path
Billing history scoped to org admins
Per-cache-segment cost breakdowns and per-model output limits across all providers

Reliability & Quality

Continued investment in platform resilience: more stable streaming for reasoning models, more reliable spend-limit handling, hardened content enrichment, clearer provider-error reporting, structured-output validation for applicable models, and faster recovery during provider capacity events. Studio also received fixes for mobile layout, usage filters, and billing views.

April 21 – May 15, 2026

New Features

GlooCode v1.0 — Gloo’s autonomous AI coding agent launched as a single native binary for macOS and Linux. Its specialized agent roles (plan, build, explore, review, QA) are each optimized for cost and quality.
Citations — Grounded completions now return source citations including title, URL, author, publisher, date, and content snippets.
20+ new models — Including GPT-5.5, GPT-5.5 Pro, GPT-5.1, Claude Opus 4.1, DeepSeek V4 Flash, DeepSeek V4 Pro, DeepSeek R1-0528, Devstral 2, Qwen 3.6 Max Preview (1M context), Kimi K2.6 (vision), and others.
Model Details pages — Every model in the catalog now has a dedicated Studio page with specs, pricing comparison, modalities, and capability flags.
Enterprise provisioning — Enterprise accounts are provisioned and activated automatically, including plan changes and cancellations.
Usage page model breakdown — New “By Capability” and “By Model” views with human-readable names, cross-provider merging, stacked input/output bars, and per-credential filtering.
Multilingual safety messaging — French and Dutch support for safety responses and corrections; refusal language now respects the user’s language.
Prompt caching — Automatic across supported providers, with full cache-hit visibility.

Bug Fixes

Completions streaming responses are more resilient, with retries for empty responses
Tool-call-only streams are no longer misclassified
Spend-limit responses return clear status and retry guidance
Structured output is now forwarded consistently for applicable models
Model refusals now surface explicitly so callers can handle them programmatically
Rate-limit handling improved for long-standing customer accounts

Improvements

Significant streaming latency reductions across the completions pipeline
All error codes enriched with name, category, and description
Studio organization administration surfaces revamped with detail pages, stat cards, search, sorting, and inline editing
Polished page transitions across Studio

February 11 – April 20, 2026

New Features

Multi-provider routing — Integrations with Anthropic, OpenAI, Google Gemini, DeepSeek, and AWS Bedrock, with automatic health-based failover that shifts traffic when a provider degrades and seamlessly continues responses on a backup — without the caller ever seeing the failure.
13 new models — Claude Opus 4.6, Claude Sonnet 4.6, Claude Opus 4.7 (available same day as Anthropic’s release), GPT-5.2 Pro, GPT-5.4, GPT-4.1, GPT-4.1-mini, GPT-5-Nano, Llama 4 Maverick, DeepSeek V3.2, DeepSeek R1, and Llama Guard 4 12B.
Public model catalog — Pricing, speed ratings, and capability flags (tools, streaming, reasoning, vision) for all available models are now publicly accessible.
Vision support — Image inputs across all providers.
Structured error responses — Every error includes an error code, fault attribution (client/provider/platform), and a retryable flag.
Trace IDs — Every response includes a trace ID for request correlation.
Self-serve Pro tier — Plan selection, upgrades, promo codes, and payment management entirely in-product. A developer can discover, evaluate, pay, and build without a sales conversation.
Spending limits — Per-organization budget caps with a contact-sales path for over-cap requests.
In-product support — A Support button in the Studio sidebar opens a chat with Fin, Gloo’s AI support agent.

Bug Fixes

OpenAI tool conversion issue resolved
Content updates appear in search and grounded completions much faster
Organization auto-activation extended to all new organizations
Prayer guardrail detection improved

Improvements

Major latency reductions across the request pipeline
Streaming reliability upgrades — faster first responses and more robust mid-stream handling
YouTube ingestion overhauled with automated reliability monitoring
Content enrichment hardened with automatic retries
Studio Content Library rebuilt — instant loads, with live updates for in-progress items

January 25 – February 10, 2026

New Features

Spending limits — Weekly budget caps with retry guidance when limits are reached
New models — Claude Sonnet 4, Llama 4 Maverick, and DeepSeek R1 added to the catalog
Smarter guardrails — Edge cases now receive contextual guidance toward appropriate alternatives instead of hard blocks
Batch file upload — Upload multiple files to Data Engine in a single request
Performance tracking — Response-time measurement to drive latency improvements
Payment method gating — Payment method required before platform access is granted
Per-organization billing — Spending controls and billing attribution scoped to individual organizations

Bug Fixes

YouTube transcription failures resolved across multiple scenarios including duplicate detection, live streams, playlists, and timeouts
Batch ingestion now reports per-item errors with full transparency

Improvements

Better handling of large document batches
Smarter model routing for cost and performance

January 18 – 24, 2026

New Features

Enterprise organization controls — Plan-based search access and admin controls for publisher settings
Payment method onboarding — Self-serve customers add payment methods during signup, with the flow adapting to organization type
JSON file uploads — JSON files can now be uploaded and processed in Data Engine
Sitemap fallback — Sites with broken sitemaps now fall back to direct page crawling instead of failing

Bug Fixes

Hardened input validation in Data Engine ingestion
Search failures resolved for publishers without enrichment configured
Content upload failures for new publishers resolved
Content Library progress indicators fixed
Access denied errors for customers with valid credentials resolved
Prayer response guidelines for Faith Assistant now trigger reliably

Improvements

AI model upgrade in Faith Assistant delivers better responses at lower cost
Full light mode support across Studio
User role badge added to the profile dropdown
Long API keys display correctly in the Studio API Key modal

January 11 – 17, 2026

New Features

Streamlined billing onboarding — Connecting a payment account now provisions metering automatically, with optional pay-as-you-go enrollment, so organizations activate without manual setup
Automatic organization activation — Organizations activate automatically when onboarding completes
Dynamic search routing — Completions automatically route to the correct data collection based on publisher configuration
Context window error guidance — Requests exceeding a model’s context limit now return the limit, the affected model, and suggested higher-capacity alternatives

Bug Fixes

Content enrichment status now reflects actual completion state
5 broken models removed from the catalog

Improvements

Guardrail accuracy improved for theological content

​July 2026

​Grounded Responses — RAG on the Responses API

​June 2026

​The Responses API (v1): Recommended for OpenAI-Compatible Integrations

​Prompt Caching Is Widely Available

​GlooCode v1.6 — Web UI & More

​Platform Security & Trust

​Data Engine

​Billing

​Week of May 26, 2026

​11 New AI Models

​Studio

​Completions

​GlooCode

​Data Engine

​Documentation

​Week of May 18, 2026

​9 New Models in the Catalog

​GlooCode v1.5.0 — Sub-Agents & Persistent Sessions

​Studio

​Enterprise Billing Overhaul

​Reliability & Quality

​April 21 – May 15, 2026

​New Features

​Bug Fixes

​Improvements

​February 11 – April 20, 2026

​New Features

​Bug Fixes

​Improvements

​January 25 – February 10, 2026

​New Features

​Bug Fixes

​Improvements

​January 18 – 24, 2026

​New Features

​Bug Fixes

​Improvements

​January 11 – 17, 2026

​New Features

​Bug Fixes

​Improvements

July 2026

Grounded Responses — RAG on the Responses API

June 2026

The Responses API (v1): Recommended for OpenAI-Compatible Integrations

Prompt Caching Is Widely Available

GlooCode v1.6 — Web UI & More

Platform Security & Trust

Data Engine

Billing

Week of May 26, 2026

11 New AI Models

Studio

Completions

GlooCode

Data Engine

Documentation

Week of May 18, 2026

9 New Models in the Catalog

GlooCode v1.5.0 — Sub-Agents & Persistent Sessions

Studio

Enterprise Billing Overhaul

Reliability & Quality

April 21 – May 15, 2026

New Features

Bug Fixes

Improvements

February 11 – April 20, 2026

New Features

Bug Fixes

Improvements

January 25 – February 10, 2026

New Features

Bug Fixes

Improvements

January 18 – 24, 2026

New Features

Bug Fixes

Improvements

January 11 – 17, 2026

New Features

Bug Fixes

Improvements