June 2026
Prompt Caching Is Widely Available
A major cost-saving feature for all customers. When you send repeated prompt prefixes — system instructions, few-shot examples, large context blocks — the platform caches them and serves subsequent requests at a fraction of the cost. Cache read tokens are priced well below standard input tokens.- Studio — The Usage page now breaks spend down by cache segment (fresh, cache read, cache write), so you can see exactly what caching is saving you. Model cards show cache pricing alongside standard rates.
- Programmatic access — Cache token counts and pricing are now reported programmatically, so savings can be calculated automatically.
- Docs — A new Prompt Caching guide covers how caching works across Anthropic, OpenAI, DeepSeek, and Gemini, with code examples and a supported-models matrix.
GlooCode v1.6 — Web UI & More
Four GlooCode releases shipped this week (v1.5.3 → v1.6.2), a significant milestone for the product.- Web UI — GlooCode now includes a built-in web interface with 30+ themes, file diffing, session management, and support for 18 languages.
- Faster, more reliable updates —
gloocode updatenow performs direct binary swaps, andinstall.shalways performs a fresh install. - New built-in agent optimizations — Better and more intelligent default agent models.
- More reliable releases — Hardened testing and guaranteed-fresh installers on every release.
Platform Security & Trust
- Org-level access controls — Administrative tooling to immediately revoke platform access at the organization level.
- GDPR right to erasure — Enterprise customers can request full deletion of user data across the platform, along with data export for subject-access requests.
Data Engine
- Cache cost reporting — Usage reporting and CSV exports now break spend down by cache segment.
- Performance under load — Improved reliability and throughput at scale.
Billing
- Enterprise billing flexibility — Enterprise admins can now switch to pay-as-you-go billing.
- Automatic subscription hygiene — Downgrades automatically clean up associated subscriptions.
Week of May 26, 2026
11 New AI Models
- Claude Opus 4.8 — Anthropic’s latest, available across all providers with fast-mode support
- GPT-5.3 Codex — OpenAI’s reasoning-focused coding model with 400K context
- Claude Opus 4.7 Fast & 4.6 Fast — Fast-output Opus variants for speed-sensitive workloads
- Qwen 3.5 Plus, Qwen 3.5 27B, Qwen 3 235B A22B variants — Expanded Qwen family, including thinking/reasoning variants
- GLM-4.7 Flash — Z-AI’s fast inference model
- MiMo V2 Flash & V2.5 — Xiaomi’s new MoE models optimized for reasoning, coding, and agentic tasks
- Gemini 3.1 Flash Lite — promoted from preview to GA
Studio
- Spend visibility for all orgs — Every organization can now see its spend breakdown on the Usage page.
- Enterprise → pay-as-you-go — Enterprise admins can switch to flexible billing directly from the Billing or Plans page.
- Smarter usage charts — Date labels respect your timezone, and charts clearly indicate hourly, daily, weekly, or monthly granularity.
- Improved GlooCode usage filters — Faster, more reliable filter dropdowns.
Completions
- Prompt caching (beta) — Now works consistently across all completion types, with cache hit rates and token counts in usage reporting.
- More reliable streaming — Improved stream stability for reasoning models, including during long thinking pauses.
- Smarter guardrails — Tuned content filtering reduces unnecessary blocks, with graduated warnings replacing hard blocks in more cases.
- Clearer errors — Provider issues now surface as clear, actionable errors.
GlooCode
- v1.5.0 → v1.5.2 — GlooCode now runs on an improved background service for more reliable sessions; existing installations upgrade via
gloocode update. - More reliable releases — Hardened release process and installer delivery.
Data Engine
- Enterprise self-serve billing — Enterprises can transition billing directly in Studio.
- Timezone-aware metrics — Usage reporting now supports timezone-aware grouping with correct daylight-saving handling.
- Ingestion refinements — Improved YouTube ingestion consistency for publisher metadata and status tracking.
Documentation
- Restructured and refreshed console documentation and developer reference for better discoverability.
Week of May 18, 2026
9 New Models in the Catalog
- Gemini 3.5 Flash — 1M context, major coding gains at high speed
- Gemma 4 31B — Google’s most capable open-weight model, with vision
- GPT OSS 20B — OpenAI’s open-weight MoE model for consumer hardware
- Ministral 3B / 8B / 14B — Mistral’s edge-to-server lineup
- Qwen 3.7 Max — 1M context for agentic workloads
- Qwen 3.5 397B and 122B — Large MoE models
- Gemini 3.1 Flash Lite — promoted to GA
GlooCode v1.5.0 — Sub-Agents & Persistent Sessions
GlooCode is Gloo’s autonomous AI coding agent — it writes, tests, and ships code while intelligently optimizing for cost and quality.- Improved background service — More reliable sessions and sign-in.
- Explore sub-agent — Fast codebase context-gathering for more relevant results.
- Session persistence — Sessions persist through long, context-heavy work.
- Org-level usage analytics — Workspace, user, project, and branch tracking, so organizations can see exactly who’s using what.
- Studio integration — New GlooCode product and get-started pages with a cost comparison calculator and agent overview, plus a usage analytics dashboard with user/branch filters and cache hit rate metrics.
Studio
- New Data Engine pages — Dedicated product pages for ingestion, search, enrichment, and a hub overview, each with polished motion design and market context.
- Community section — A new Summer Challenge page with toolkit cards for YouVersion and Gloo AI APIs, judging criteria, and a responsive timeline.
Enterprise Billing Overhaul
- Dedicated enterprise billing view with Pro upgrade path
- Billing history scoped to org admins
- Per-cache-segment cost breakdowns and per-model output limits across all providers
Reliability & Quality
Continued investment in platform resilience: more stable streaming for reasoning models, more reliable spend-limit handling, hardened content enrichment, clearer provider-error reporting, structured-output validation for applicable models, and faster recovery during provider capacity events. Studio also received fixes for mobile layout, usage filters, and billing views.April 21 – May 15, 2026
New Features
- GlooCode v1.0 — Gloo’s autonomous AI coding agent launched as a single native binary for macOS and Linux. Its specialized agent roles (plan, build, explore, review, QA) are each optimized for cost and quality.
- Citations — Grounded completions now return source citations including title, URL, author, publisher, date, and content snippets.
- 20+ new models — Including GPT-5.5, GPT-5.5 Pro, GPT-5.1, Claude Opus 4.1, DeepSeek V4 Flash, DeepSeek V4 Pro, DeepSeek R1-0528, Devstral 2, Qwen 3.6 Max Preview (1M context), Kimi K2.6 (vision), and others.
- Model Details pages — Every model in the catalog now has a dedicated Studio page with specs, pricing comparison, modalities, and capability flags.
- Enterprise provisioning — Enterprise accounts are provisioned and activated automatically, including plan changes and cancellations.
- Usage page model breakdown — New “By Capability” and “By Model” views with human-readable names, cross-provider merging, stacked input/output bars, and per-credential filtering.
- Multilingual safety messaging — French and Dutch support for safety responses and corrections; refusal language now respects the user’s language.
- Prompt caching — Automatic across supported providers, with full cache-hit visibility.
Bug Fixes
- Completions streaming responses are more resilient, with retries for empty responses
- Tool-call-only streams are no longer misclassified
- Spend-limit responses return clear status and retry guidance
- Structured output is now forwarded consistently for applicable models
- Model refusals now surface explicitly so callers can handle them programmatically
- Rate-limit handling improved for long-standing customer accounts
Improvements
- Significant streaming latency reductions across the completions pipeline
- All error codes enriched with name, category, and description
- Studio organization administration surfaces revamped with detail pages, stat cards, search, sorting, and inline editing
- Polished page transitions across Studio
February 11 – April 20, 2026
New Features
- Multi-provider routing — Integrations with Anthropic, OpenAI, Google Gemini, DeepSeek, and AWS Bedrock, with automatic health-based failover that shifts traffic when a provider degrades and seamlessly continues responses on a backup — without the caller ever seeing the failure.
- 13 new models — Claude Opus 4.6, Claude Sonnet 4.6, Claude Opus 4.7 (available same day as Anthropic’s release), GPT-5.2 Pro, GPT-5.4, GPT-4.1, GPT-4.1-mini, GPT-5-Nano, Llama 4 Maverick, DeepSeek V3.2, DeepSeek R1, and Llama Guard 4 12B.
- Public model catalog — Pricing, speed ratings, and capability flags (tools, streaming, reasoning, vision) for all available models are now publicly accessible.
- Vision support — Image inputs across all providers.
- Structured error responses — Every error includes an error code, fault attribution (client/provider/platform), and a retryable flag.
- Trace IDs — Every response includes a trace ID for request correlation.
- Self-serve Pro tier — Plan selection, upgrades, promo codes, and payment management entirely in-product. A developer can discover, evaluate, pay, and build without a sales conversation.
- Spending limits — Per-organization budget caps with a contact-sales path for over-cap requests.
- In-product support — A Support button in the Studio sidebar opens a chat with Fin, Gloo’s AI support agent.
Bug Fixes
- OpenAI tool conversion issue resolved
- Content updates appear in search and grounded completions much faster
- Organization auto-activation extended to all new organizations
- Prayer guardrail detection improved
Improvements
- Major latency reductions across the request pipeline
- Streaming reliability upgrades — faster first responses and more robust mid-stream handling
- YouTube ingestion overhauled with automated reliability monitoring
- Content enrichment hardened with automatic retries
- Studio Content Library rebuilt — instant loads, with live updates for in-progress items
January 25 – February 10, 2026
New Features
- Spending limits — Weekly budget caps with retry guidance when limits are reached
- New models — Claude Sonnet 4, Llama 4 Maverick, and DeepSeek R1 added to the catalog
- Smarter guardrails — Edge cases now receive contextual guidance toward appropriate alternatives instead of hard blocks
- Batch file upload — Upload multiple files to Data Engine in a single request
- Performance tracking — Response-time measurement to drive latency improvements
- Payment method gating — Payment method required before platform access is granted
- Per-organization billing — Spending controls and billing attribution scoped to individual organizations
Bug Fixes
- YouTube transcription failures resolved across multiple scenarios including duplicate detection, live streams, playlists, and timeouts
- Batch ingestion now reports per-item errors with full transparency
Improvements
- Better handling of large document batches
- Smarter model routing for cost and performance
January 18 – 24, 2026
New Features
- Enterprise organization controls — Plan-based search access and admin controls for publisher settings
- Payment method onboarding — Self-serve customers add payment methods during signup, with the flow adapting to organization type
- JSON file uploads — JSON files can now be uploaded and processed in Data Engine
- Sitemap fallback — Sites with broken sitemaps now fall back to direct page crawling instead of failing
Bug Fixes
- Hardened input validation in Data Engine ingestion
- Search failures resolved for publishers without enrichment configured
- Content upload failures for new publishers resolved
- Content Library progress indicators fixed
- Access denied errors for customers with valid credentials resolved
- Prayer response guidelines for Faith Assistant now trigger reliably
Improvements
- AI model upgrade in Faith Assistant delivers better responses at lower cost
- Full light mode support across Studio
- User role badge added to the profile dropdown
- Long API keys display correctly in the Studio API Key modal
January 11 – 17, 2026
New Features
- Streamlined billing onboarding — Connecting a payment account now provisions metering automatically, with optional pay-as-you-go enrollment, so organizations activate without manual setup
- Automatic organization activation — Organizations activate automatically when onboarding completes
- Dynamic search routing — Completions automatically route to the correct data collection based on publisher configuration
- Context window error guidance — Requests exceeding a model’s context limit now return the limit, the affected model, and suggested higher-capacity alternatives
Bug Fixes
- Content enrichment status now reflects actual completion state
- 5 broken models removed from the catalog
Improvements
- Guardrail accuracy improved for theological content

