Handling Streaming Failures

When you stream Completions V2 responses (stream: true), failures can arrive after the HTTP 200 text/event-stream response has already started. By then your application may have rendered partial output to the user, so a naive “just retry the request” strategy can erase visible text, duplicate it, or silently replace one answer with a different one. This guide covers how to:

classify stream-time failures using the structured error object
choose between preserving partial output, bounded retry, application-level continuation, and surfacing an interruption
apply concrete retry ceilings and backoff
avoid duplicating tool-call side effects
log the right fields for support and observability

The reference implementations are in Python and TypeScript. Building in Go, Java, PHP, or another language? The language-neutral streaming error object table and decision tree below are the source of truth — port the same logic.

Streaming Failure Phases

A streamed completion can fail in four distinct ways. Handle each differently:

Phase	What you observe	Handling
Pre-stream HTTP error	Non-2xx HTTP response before any SSE bytes	Standard HTTP error handling — see Errors
Stream error (Signal A)	SSE event with `choices[0].finish_reason: "error"` and a structured `error` object	The retry/continuation logic in this guide
Transport disconnect	The connection drops with no structured SSE error	Treat as plausibly transient; same decision tree as Signal A
Content filter (Signal B)	SSE event with `finish_reason: "content_filter"` and no `error` object	Terminal. Surface to the user; never retry or continue

Two distinct stream-time signals

Split detection into two signals so your client never misclassifies a content-filter stop as a clean early completion — or as a retryable error:

Signal A — structured stream error. choices[0].finish_reason is "error" and the event carries the structured error object described below. This is the only path that retry and continuation logic applies to.
Signal B — content-filter termination. choices[0].finish_reason is "content_filter". The event carries no error object. It is terminal and non-retryable: surface it to the user, and never retry or continue automatically.

The streaming error object

A Signal A event includes a structured error object with nine fields, plus a top-level trace_id on the SSE chunk itself. This is the canonical reference — every section below branches on these fields.

{
  "id": "chatcmpl-error-abc123",
  "object": "chat.completion.chunk",
  "created": 1769792268,
  "model": "gloo-anthropic-claude-sonnet-4.6",
  "choices": [
    {
      "delta": { "content": null, "role": null },
      "finish_reason": "error",
      "index": 0
    }
  ],
  "error": {
    "message": "An unexpected provider error occurred mid-stream.",
    "type": "internal_error",
    "code": 3001,
    "name": "INTERNAL_ERROR",
    "category": "provider_error",
    "description": "An unexpected provider error occurred mid-stream.",
    "fault": "provider",
    "retryable": true,
    "trace_id": "trace-id"
  },
  "trace_id": "trace-id"
}

Field	Branch on it?	Notes
`error.retryable`	Yes — primary retry signal	Whether retrying the same request may succeed
`error.fault`	Yes — secondary	Responsible party: `client`, `provider`, or `internal`
`error.code`	Yes — classification after `retryable`/`fault`	Numeric internal code, such as `3001`
`error.name`	Yes — classification	Symbolic name, such as `INTERNAL_ERROR`
`error.type`	No — observability/display only	Stable wire-format error type
`error.category`	No — observability/display only	`client_error`, `provider_error`, or `platform_error`
`error.description`	No — display only	Stable explanation of the error code
`error.message`	No — display only	Request-specific customer-facing message
`error.trace_id`	No — observability	See trace ID precedence
top-level `trace_id`	No — observability	Present on every SSE chunk; see trace ID precedence

Read the Whole Error Object

The same error.code can represent different situations, so never branch on code alone. 3001 / INTERNAL_ERROR is the clearest example:

A retryable stream-time provider anomaly surfaces as code: 3001, name: "INTERNAL_ERROR", fault: "provider", retryable: true.
A platform-internal 3001 surfaces with fault: "internal" and retryable: false.

Branch on error.retryable first, then use fault, code, and name for classification and observability:

error.retryable — should this request be retried at all?
error.fault — was it a provider issue or internal to the platform?
error.code / error.name — what exactly happened (for logs, metrics, alerts)?

Treating every 3001 / INTERNAL_ERROR as non-retryable will drop recoverable streams; treating every 3001 as retryable will loop on genuine platform errors. Read error.retryable.

Decision Tree

Two independent axes govern recovery — keep them separate so “attempt” is never ambiguous:

Which operation? Driven by whether content was already shown to the user.
- Not shown → full retry (a fresh stream).
- Shown → Continue (application-level continuation — never a full retry, which would force erasing or duplicating visible text). Continuation is exclusive to the post-content phase.
How many attempts? Driven by context (ceilings below). Always use full-jitter backoff; always stop early on retryable: false.

The flow:

Stream fails before any visible content
- Retry only when error.retryable is true, or the failure is a transport error that is plausibly transient.
- Ceiling: 2 full-retry attempts for live UX.
Stream fails after visible content (post-content recovery)
- Preserve the partial output. Never silently replace the visible answer.
- If a side-effecting tool call was emitted or executed this turn, skip the automatic attempt (see Tool Calls and Side Effects) and surface the interruption directly.
- Otherwise attempt exactly one automatic Continue (continuation).
- If the automatic Continue fails or was skipped, mark the answer interrupted and surface a user-initiated Continue (primary, non-destructive) and an explicitly labelled, destructive Try again (full retry, secondary).
- “At most one” means one automatic attempt — never a silent loop. User-initiated actions are user-governed and don’t count against the automatic budget.
- For high-stakes, factual, or doctrinal answers, steer away from automatic Continue toward Try again or explicit user review (see Application-Level Continuation).
Background/batch workflow
- Up to 3 full-retry attempts with jittered exponential backoff and an overall job timeout. Partial output can be discarded safely because nothing was shown to a user.
error.retryable is false
- Do not retry unchanged. Surface the error, change the request or model, or escalate with the trace ID.
finish_reason is "content_filter" (Signal B)
- Terminal and non-retryable. Surface it; never retry or continue.

Application-Level Continuation

Completions V2 does not expose true stream resumption — there is no cursor, offset, replay token, or resume endpoint. What your application can do instead is application-level continuation: a new request that includes the partial output as context and asks the model to continue. Send the partial output inside a new user message:

{
  "messages": [
    {
      "role": "user",
      "content": "Explain Romans 8 in simple terms."
    },
    {
      "role": "user",
      "content": "The previous streamed answer was interrupted after this text:\n\nRomans 8 teaches that...\n\nContinue from that point without repeating the text above."
    }
  ],
  "auto_routing": true,
  "stream": true
}

Do not end the continuation request with an assistant message containing the partial text — final assistant turns are not supported uniformly across models and routing modes, while the user-message pattern works everywhere.

Continuation can drift. A continuation is a new request: it can repeat, drift, or complete differently from the failed stream. Because Continue appends with no visual seam, the consumer reads the result as one continuous answer. Two consequences for you as the developer:

High-stakes, factual, or doctrinal content: avoid automatic Continue. Prefer Try again (a clean replacement) or explicit user review, since drift can introduce unflagged errors mid-answer.
The prompt contract is best-effort. Instructing the model to “continue without repeating” usually works, but your application should still de-duplicate overlap at the seam.

Retry Budgets and Backoff

Recommended defaults:

Context	Operation	Ceiling
Live, no visible content yet	Full retry	2 attempts
Live, visible partial content	Automatic Continue (continuation)	1 attempt, then user-initiated controls
Background/batch	Full retry	3 attempts

Use full-jitter exponential backoff between attempts.
Cap live UX delays aggressively (a couple of seconds at most); use a broader cap for background jobs.
Stop early the moment an error reports retryable: false.
Do not blindly retry five times for live streams — generic rate-limit retry examples are not a live-streaming UX policy.

Both reference implementations use the same full-jitter formula:

import random

def full_jitter_delay(attempt: int, base: float = 0.5, cap: float = 2.0) -> float:
    """Full-jitter backoff: random() * min(cap, base * 2**attempt)."""
    return random.random() * min(cap, base * (2**attempt))

/** Full-jitter backoff: random() * min(cap, base * 2**attempt). */
export function fullJitterDelayMs(attempt: number, baseMs = 500, capMs = 2_000): number {
  return Math.random() * Math.min(capMs, baseMs * 2 ** attempt);
}

Rate Limits

Rate limits are a separate concern from generic retryable stream failures:

An HTTP 429 before the stream starts is a normal rate-limit response. Respect Retry-After or X-RateLimit-Reset when the response actually includes those headers, and back off with jitter. See Rate Limits.
A rate limit that occurs after the stream has started (for example, an upstream provider limit) arrives as a Signal A stream error such as 2004 / RATE_LIMIT. Handle it through the normalized SSE error object — stream-time SSE error events do not carry Retry-After, X-RateLimit-Reset, or provider rate-limit headers.
Rate-limit retries should use jittered backoff and must not run indefinitely.

Partial Output UX

When recovery involves text a user has already seen:

Keep partial text visible. Never make displayed output disappear silently.
Mark the answer as interrupted if recovery fails, so the user knows it is incomplete.
Offer user-visible controls where appropriate: Continue (primary, appends) and Try again (secondary, explicitly labelled as replacing the current answer).
If full retry is chosen, make the replacement explicit — the user should understand the visible answer is being replaced, not extended.
Retain enough application state to know whether any content was displayed; that single flag drives the choice between full retry and continuation.

Tool Calls and Side Effects

Full retries and continuations can duplicate side effects if the model already emitted a tool call and your application executed it — sending an email twice, charging a card twice.

Use idempotency keys or operation IDs for tool execution, so a replayed call can be detected and dropped.
Record, per turn, whether a tool call was emitted and whether it was executed, before any retry decision.
Do not automatically replay side-effecting tool calls unless your application can prove the prior attempt did not execute.
If a side-effecting tool call fired in the interrupted turn, skip the automatic Continue and surface the interruption — prefer fail-fast or explicit user confirmation over silent recovery.

Observability Checklist

Log the full streaming error object — including the non-branching fields error.type, error.category, and error.description — even though retry decisions only use retryable, fault, code, and name. Trace IDs. Log one normalized field named trace_id, with this precedence:

event.error?.trace_id ?? event.trace_id ?? response.headers.get("x-sentry-trace-id")

If event.error.trace_id and the top-level event.trace_id both exist and differ, log both — and include both in support requests. Recommended fields per failed or recovered stream:

normalized trace_id (plus the secondary trace ID on mismatch)
error.code, error.name, error.type, error.fault, error.retryable
selected model (or requested model) and routing mode
whether content was displayed, and the partial output length
attempt number and backoff delay
whether continuation or full retry was used
whether tool calls were emitted or executed

Summary Table

Condition	Recommended action	Retry ceiling	UX note
Pre-stream retryable error (`408`, `429`, `5xx`)	Full retry with backoff	2 (live)	Nothing rendered yet; retry is invisible
Stream error before first token	Full retry if `retryable: true`	2 full retries	Show a loading state, not an error, until the budget is spent
Stream error after partial output	One automatic Continue, then user controls	1 automatic continuation	Preserve partial text; mark interrupted on failure
Rate limit (HTTP 429 pre-stream)	Back off; honor `Retry-After`/`X-RateLimit-Reset` if present	Bounded, jittered	Consider a “busy, retrying” indicator
Rate limit (stream event `2004`)	Treat as Signal A via `error` object	Same as stream phase	No retry headers exist on SSE errors
Content filter (`finish_reason: "content_filter"`)	Surface; never retry or continue	0	Terminal signal, not an error object
Non-retryable error (`retryable: false`)	Stop; surface, change request/model, or escalate	0	Include `trace_id` in support requests
Side-effecting tool workflow	Skip automatic recovery; fail fast or confirm	0 automatic	Require explicit user action
Background/batch workflow	Full retry with backoff and job timeout	3 full retries	Partial output can be discarded safely

Reference Implementations

The modules below implement the full decision tree: SSE parsing, the two signals, trace-ID normalization, retry ceilings, full-jitter backoff, the one-automatic-Continue rule, and the tool-call side-effect guard. Copy either one into your project as a small client module. Usage:

result = stream_with_recovery(
    "https://platform.ai.gloo.com/ai/v2/chat/completions",
    {"Authorization": f"Bearer {access_token}"},
    {
        "messages": [{"role": "user", "content": "Explain Romans 8 in simple terms."}],
        "auto_routing": True,
        "stream": True,
    },
    on_delta=lambda text: print(text, end="", flush=True),
)
# result.status is "complete", "content_filter", "interrupted", or "failed"
# result.text always contains everything safe to display

const result = await streamWithRecovery(
  "https://platform.ai.gloo.com/ai/v2/chat/completions",
  { Authorization: `Bearer ${accessToken}` },
  {
    messages: [{ role: "user", content: "Explain Romans 8 in simple terms." }],
    auto_routing: true,
    stream: true,
  },
  { onDelta: (text) => process.stdout.write(text) },
);
// result.status is "complete", "content_filter", "interrupted", or "failed"
// result.text always contains everything safe to display

Recovery:

# Standard library plus `requests`:

"""Streaming recovery client for Gloo AI Completions V2."""

import json
import random
import time
from dataclasses import dataclass

import requests

LIVE_PRE_TOKEN_RETRIES = 2     # full-retry attempts before any content is visible
BACKGROUND_RETRIES = 3         # full-retry attempts for background/batch jobs
LIVE_BACKOFF_CAP_S = 2.0       # keep live UX delays short
BACKGROUND_BACKOFF_CAP_S = 30.0


def full_jitter_delay(attempt: int, base: float = 0.5, cap: float = LIVE_BACKOFF_CAP_S) -> float:
    """Full-jitter backoff: random() * min(cap, base * 2**attempt)."""
    return random.random() * min(cap, base * (2**attempt))


@dataclass
class StreamOutcome:
    """Everything one stream attempt tells you, including observability fields."""

    partial_text: str = ""
    received_content: bool = False
    tool_call_emitted: bool = False
    finish_reason: str | None = None
    error: dict | None = None
    trace_id: str | None = None
    secondary_trace_id: str | None = None  # set when error-level and top-level IDs differ
    transport_error: str | None = None
    http_status: int | None = None

    @property
    def completed(self) -> bool:
        return self.finish_reason in ("stop", "length", "tool_calls")

    @property
    def content_filtered(self) -> bool:
        return self.finish_reason == "content_filter"

    @property
    def retryable(self) -> bool:
        if self.transport_error is not None:
            return True  # transport drops are plausibly transient
        return bool(self.error and self.error.get("retryable"))


@dataclass
class RecoveryResult:
    # "complete" | "content_filter" | "interrupted" | "failed"
    status: str
    text: str
    last_outcome: StreamOutcome
    used_continuation: bool = False
    full_retries_used: int = 0


def stream_once(url: str, headers: dict, body: dict, on_delta=None) -> StreamOutcome:
    """Run one streaming request and fold every SSE event into a StreamOutcome."""
    outcome = StreamOutcome()
    try:
        with requests.post(url, headers=headers, json=body, stream=True, timeout=120) as response:
            outcome.http_status = response.status_code
            if response.status_code != 200:
                # Pre-stream HTTP error: no SSE events will follow.
                outcome.transport_error = f"HTTP {response.status_code}"
                return outcome
            # Lowest-precedence trace source; SSE events override it below.
            outcome.trace_id = response.headers.get("x-sentry-trace-id")
            for line in response.iter_lines(decode_unicode=True):
                if not line or not line.startswith("data:"):
                    continue
                payload = line[len("data:"):].strip()
                if payload == "[DONE]":
                    break
                _apply_event(outcome, json.loads(payload), on_delta)
                if outcome.finish_reason is not None:
                    break
    except requests.RequestException as exc:
        outcome.transport_error = str(exc)
    return outcome


def _apply_event(outcome: StreamOutcome, event: dict, on_delta=None) -> None:
    # trace_id precedence: error-level, then top-level, then response header.
    error_trace = (event.get("error") or {}).get("trace_id")
    top_trace = event.get("trace_id")
    outcome.trace_id = error_trace or top_trace or outcome.trace_id
    if error_trace and top_trace and error_trace != top_trace:
        outcome.secondary_trace_id = top_trace  # log both, send both to support

    choice = (event.get("choices") or [{}])[0]
    delta = choice.get("delta") or {}
    if delta.get("content"):
        outcome.partial_text += delta["content"]
        outcome.received_content = True
        if on_delta:
            on_delta(delta["content"])
    if delta.get("tool_calls") or delta.get("function_call"):
        outcome.tool_call_emitted = True
    if choice.get("finish_reason"):
        outcome.finish_reason = choice["finish_reason"]
    if event.get("error"):
        outcome.error = event["error"]


def continuation_messages(messages: list[dict], partial_text: str) -> list[dict]:
    """Provider-neutral continuation: partial output goes in a NEW user message."""
    return [
        *messages,
        {
            "role": "user",
            "content": (
                "The previous streamed answer was interrupted after this text:\n\n"
                f"{partial_text}\n\n"
                "Continue from that point without repeating the text above."
            ),
        },
    ]


def stream_with_recovery(
    url: str, headers: dict, body: dict, *, background: bool = False, on_delta=None
) -> RecoveryResult:
    """One streamed completion with bounded recovery.

    Live mode: 2 full retries before the first token; after visible content,
    exactly one automatic Continue, then surface user controls.
    Background mode: up to 3 full retries; partial output is discarded safely
    because nothing was shown to a user.
    """
    max_full_retries = BACKGROUND_RETRIES if background else LIVE_PRE_TOKEN_RETRIES
    backoff_cap = BACKGROUND_BACKOFF_CAP_S if background else LIVE_BACKOFF_CAP_S

    full_retries = 0
    while True:
        outcome = stream_once(url, headers, body, on_delta)

        if outcome.completed:
            return RecoveryResult("complete", outcome.partial_text, outcome,
                                  full_retries_used=full_retries)
        if outcome.content_filtered:
            # Terminal non-error signal: surface it; never retry or continue.
            return RecoveryResult("content_filter", outcome.partial_text, outcome,
                                  full_retries_used=full_retries)
        if not outcome.retryable:
            # retryable=false: do not retry unchanged.
            return RecoveryResult("failed", outcome.partial_text, outcome,
                                  full_retries_used=full_retries)

        if outcome.received_content and not background:
            # Content is on screen: a full retry would erase or duplicate it.
            return _recover_after_partial(url, headers, body, outcome, on_delta)

        if full_retries >= max_full_retries:
            return RecoveryResult("failed", outcome.partial_text, outcome,
                                  full_retries_used=full_retries)
        time.sleep(full_jitter_delay(full_retries, cap=backoff_cap))
        full_retries += 1


def _recover_after_partial(
    url: str, headers: dict, body: dict, first: StreamOutcome, on_delta=None
) -> RecoveryResult:
    """Post-content recovery: keep the partial text, try one automatic Continue."""
    if first.tool_call_emitted:
        # Side-effect guard: an automatic attempt could replay the tool call.
        # Mark the answer interrupted and hand control to the user.
        return RecoveryResult("interrupted", first.partial_text, first)

    time.sleep(full_jitter_delay(0))
    continuation_body = {**body, "messages": continuation_messages(body["messages"], first.partial_text)}
    second = stream_once(url, headers, continuation_body, on_delta)
    if second.completed:
        return RecoveryResult("complete", first.partial_text + second.partial_text,
                              second, used_continuation=True)

    # The one automatic Continue failed. Keep everything visible, mark the
    # answer interrupted, and surface user-initiated Continue / Try again.
    return RecoveryResult("interrupted", first.partial_text + second.partial_text,
                          second, used_continuation=True)

// Browser and Node 18+ compatible; no dependencies:
// 
/** Streaming recovery client for Gloo AI Completions V2. */

export interface StreamError {
  message: string;
  type: string;
  code: number;
  name: string;
  category: string;
  description: string;
  fault: "client" | "provider" | "internal";
  retryable: boolean;
  trace_id?: string;
}

export interface StreamEvent {
  choices?: Array<{
    delta?: {
      content?: string | null;
      role?: string | null;
      tool_calls?: unknown[];
      function_call?: unknown;
    };
    finish_reason?: string | null;
    index?: number;
  }>;
  error?: StreamError;
  trace_id?: string;
}

export interface ChatMessage {
  role: "system" | "user" | "assistant" | "tool";
  content: string;
}

export interface StreamOutcome {
  partialText: string;
  receivedContent: boolean;
  toolCallEmitted: boolean;
  finishReason: string | null;
  error: StreamError | null;
  traceId: string | null;
  /** Set when error-level and top-level trace IDs differ; log both. */
  secondaryTraceId: string | null;
  transportError: string | null;
  httpStatus: number | null;
}

export type RecoveryStatus = "complete" | "content_filter" | "interrupted" | "failed";

export interface RecoveryResult {
  status: RecoveryStatus;
  text: string;
  lastOutcome: StreamOutcome;
  usedContinuation: boolean;
  fullRetriesUsed: number;
}

export const LIVE_PRE_TOKEN_RETRIES = 2; // full-retry attempts before any content is visible
export const BACKGROUND_RETRIES = 3; // full-retry attempts for background/batch jobs
export const LIVE_BACKOFF_CAP_MS = 2_000; // keep live UX delays short
export const BACKGROUND_BACKOFF_CAP_MS = 30_000;

/** Full-jitter backoff: random() * min(cap, base * 2**attempt). */
export function fullJitterDelayMs(attempt: number, baseMs = 500, capMs = LIVE_BACKOFF_CAP_MS): number {
  return Math.random() * Math.min(capMs, baseMs * 2 ** attempt);
}

function sleep(ms: number): Promise<void> {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

function newOutcome(): StreamOutcome {
  return {
    partialText: "",
    receivedContent: false,
    toolCallEmitted: false,
    finishReason: null,
    error: null,
    traceId: null,
    secondaryTraceId: null,
    transportError: null,
    httpStatus: null,
  };
}

const completed = (o: StreamOutcome) =>
  o.finishReason === "stop" || o.finishReason === "length" || o.finishReason === "tool_calls";
const contentFiltered = (o: StreamOutcome) => o.finishReason === "content_filter";
// Transport drops are plausibly transient; structured errors carry the flag.
const isRetryable = (o: StreamOutcome) =>
  o.transportError !== null ? true : Boolean(o.error?.retryable);

function applyEvent(outcome: StreamOutcome, event: StreamEvent, onDelta?: (text: string) => void): void {
  // trace_id precedence: error-level, then top-level, then response header.
  const errorTrace = event.error?.trace_id;
  const topTrace = event.trace_id;
  outcome.traceId = errorTrace ?? topTrace ?? outcome.traceId;
  if (errorTrace && topTrace && errorTrace !== topTrace) {
    outcome.secondaryTraceId = topTrace; // log both, send both to support
  }

  const choice = event.choices?.[0];
  const delta = choice?.delta;
  if (delta?.content) {
    outcome.partialText += delta.content;
    outcome.receivedContent = true;
    onDelta?.(delta.content);
  }
  if (delta?.tool_calls?.length || delta?.function_call) {
    outcome.toolCallEmitted = true;
  }
  if (choice?.finish_reason) {
    outcome.finishReason = choice.finish_reason;
  }
  if (event.error) {
    outcome.error = event.error;
  }
}

/** Run one streaming request and fold every SSE event into a StreamOutcome. */
export async function streamOnce(
  url: string,
  headers: Record<string, string>,
  body: Record<string, unknown>,
  onDelta?: (text: string) => void,
): Promise<StreamOutcome> {
  const outcome = newOutcome();
  try {
    const response = await fetch(url, {
      method: "POST",
      headers: { "Content-Type": "application/json", ...headers },
      body: JSON.stringify(body),
    });
    outcome.httpStatus = response.status;
    if (!response.ok || !response.body) {
      // Pre-stream HTTP error: no SSE events will follow.
      outcome.transportError = `HTTP ${response.status}`;
      return outcome;
    }
    // Lowest-precedence trace source; SSE events override it.
    outcome.traceId = response.headers.get("x-sentry-trace-id");

    const reader: ReadableStreamDefaultReader<Uint8Array> = response.body.getReader();
    const decoder = new TextDecoder();
    let buffer = "";
    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      buffer += decoder.decode(value, { stream: true });
      let newlineIndex: number;
      while ((newlineIndex = buffer.indexOf("\n")) >= 0) {
        const line = buffer.slice(0, newlineIndex).trim();
        buffer = buffer.slice(newlineIndex + 1);
        if (!line.startsWith("data:")) continue;
        const payload = line.slice("data:".length).trim();
        if (payload === "[DONE]") {
          await reader.cancel();
          return outcome;
        }
        applyEvent(outcome, JSON.parse(payload) as StreamEvent, onDelta);
        if (outcome.finishReason !== null) {
          await reader.cancel();
          return outcome;
        }
      }
    }
  } catch (err) {
    outcome.transportError = err instanceof Error ? err.message : String(err);
  }
  return outcome;
}

/** Provider-neutral continuation: partial output goes in a NEW user message. */
export function continuationMessages(messages: ChatMessage[], partialText: string): ChatMessage[] {
  return [
    ...messages,
    {
      role: "user",
      content:
        "The previous streamed answer was interrupted after this text:\n\n" +
        `${partialText}\n\n` +
        "Continue from that point without repeating the text above.",
    },
  ];
}

/**
 * One streamed completion with bounded recovery.
 *
 * Live mode: 2 full retries before the first token; after visible content,
 * exactly one automatic Continue, then surface user controls.
 * Background mode: up to 3 full retries; partial output is discarded safely
 * because nothing was shown to a user.
 */
export async function streamWithRecovery(
  url: string,
  headers: Record<string, string>,
  body: Record<string, unknown> & { messages: ChatMessage[] },
  options: { background?: boolean; onDelta?: (text: string) => void } = {},
): Promise<RecoveryResult> {
  const { background = false, onDelta } = options;
  const maxFullRetries = background ? BACKGROUND_RETRIES : LIVE_PRE_TOKEN_RETRIES;
  const backoffCapMs = background ? BACKGROUND_BACKOFF_CAP_MS : LIVE_BACKOFF_CAP_MS;

  let fullRetries = 0;
  while (true) {
    const outcome = await streamOnce(url, headers, body, onDelta);

    if (completed(outcome)) {
      return {
        status: "complete",
        text: outcome.partialText,
        lastOutcome: outcome,
        usedContinuation: false,
        fullRetriesUsed: fullRetries,
      };
    }
    if (contentFiltered(outcome)) {
      // Terminal non-error signal: surface it; never retry or continue.
      return {
        status: "content_filter",
        text: outcome.partialText,
        lastOutcome: outcome,
        usedContinuation: false,
        fullRetriesUsed: fullRetries,
      };
    }
    if (!isRetryable(outcome)) {
      // retryable=false: do not retry unchanged.
      return {
        status: "failed",
        text: outcome.partialText,
        lastOutcome: outcome,
        usedContinuation: false,
        fullRetriesUsed: fullRetries,
      };
    }

    if (outcome.receivedContent && !background) {
      // Content is on screen: a full retry would erase or duplicate it.
      return recoverAfterPartial(url, headers, body, outcome, onDelta);
    }

    if (fullRetries >= maxFullRetries) {
      return {
        status: "failed",
        text: outcome.partialText,
        lastOutcome: outcome,
        usedContinuation: false,
        fullRetriesUsed: fullRetries,
      };
    }
    await sleep(fullJitterDelayMs(fullRetries, 500, backoffCapMs));
    fullRetries += 1;
  }
}

/** Post-content recovery: keep the partial text, try one automatic Continue. */
async function recoverAfterPartial(
  url: string,
  headers: Record<string, string>,
  body: Record<string, unknown> & { messages: ChatMessage[] },
  first: StreamOutcome,
  onDelta?: (text: string) => void,
): Promise<RecoveryResult> {
  if (first.toolCallEmitted) {
    // Side-effect guard: an automatic attempt could replay the tool call.
    // Mark the answer interrupted and hand control to the user.
    return {
      status: "interrupted",
      text: first.partialText,
      lastOutcome: first,
      usedContinuation: false,
      fullRetriesUsed: 0,
    };
  }

  await sleep(fullJitterDelayMs(0));
  const continuationBody = {
    ...body,
    messages: continuationMessages(body.messages, first.partialText),
  };
  const second = await streamOnce(url, headers, continuationBody, onDelta);
  if (completed(second)) {
    return {
      status: "complete",
      text: first.partialText + second.partialText,
      lastOutcome: second,
      usedContinuation: true,
      fullRetriesUsed: 0,
    };
  }

  // The one automatic Continue failed. Keep everything visible, mark the
  // answer interrupted, and surface user-initiated Continue / Try again.
  return {
    status: "interrupted",
    text: first.partialText + second.partialText,
    lastOutcome: second,
    usedContinuation: true,
    fullRetriesUsed: 0,
  };
}

Errors Reference

Error codes, the AI error object, and streaming error delivery.

Rate Limits

HTTP 429 handling and backoff guidance.

Completions V2 Guide

Routing modes, streaming support, and request parameters.

Tool Use Guide

Function calling across routing modes, including streaming.

​Streaming Failure Phases

​Two distinct stream-time signals

​The streaming error object

​Read the Whole Error Object

​Decision Tree

​Application-Level Continuation

​Retry Budgets and Backoff

​Rate Limits

​Partial Output UX

​Tool Calls and Side Effects

​Observability Checklist

​Summary Table

​Reference Implementations

​Related Pages

Errors Reference

Rate Limits

Completions V2 Guide

Tool Use Guide

Streaming Failure Phases

Two distinct stream-time signals

The streaming error object

Read the Whole Error Object

Decision Tree

Application-Level Continuation

Retry Budgets and Backoff

Rate Limits

Partial Output UX

Tool Calls and Side Effects

Observability Checklist

Summary Table

Reference Implementations

Related Pages