stream: true), failures can arrive after the HTTP 200 text/event-stream response has already started. By then your application may have rendered partial output to the user, so a naive “just retry the request” strategy can erase visible text, duplicate it, or silently replace one answer with a different one.
This guide covers how to:
- classify stream-time failures using the structured
errorobject - choose between preserving partial output, bounded retry, application-level continuation, and surfacing an interruption
- apply concrete retry ceilings and backoff
- avoid duplicating tool-call side effects
- log the right fields for support and observability
The reference implementations are in Python and TypeScript. Building in Go, Java, PHP, or another language? The language-neutral streaming error object table and decision tree below are the source of truth — port the same logic.
Streaming Failure Phases
A streamed completion can fail in four distinct ways. Handle each differently:| Phase | What you observe | Handling |
|---|---|---|
| Pre-stream HTTP error | Non-2xx HTTP response before any SSE bytes | Standard HTTP error handling — see Errors |
| Stream error (Signal A) | SSE event with choices[0].finish_reason: "error" and a structured error object | The retry/continuation logic in this guide |
| Transport disconnect | The connection drops with no structured SSE error | Treat as plausibly transient; same decision tree as Signal A |
| Content filter (Signal B) | SSE event with finish_reason: "content_filter" and no error object | Terminal. Surface to the user; never retry or continue |
Two distinct stream-time signals
Split detection into two signals so your client never misclassifies a content-filter stop as a clean early completion — or as a retryable error:- Signal A — structured stream error.
choices[0].finish_reasonis"error"and the event carries the structurederrorobject described below. This is the only path that retry and continuation logic applies to. - Signal B — content-filter termination.
choices[0].finish_reasonis"content_filter". The event carries noerrorobject. It is terminal and non-retryable: surface it to the user, and never retry or continue automatically.
The streaming error object
A Signal A event includes a structurederror object with nine fields, plus a top-level trace_id on the SSE chunk itself. This is the canonical reference — every section below branches on these fields.
| Field | Branch on it? | Notes |
|---|---|---|
error.retryable | Yes — primary retry signal | Whether retrying the same request may succeed |
error.fault | Yes — secondary | Responsible party: client, provider, or internal |
error.code | Yes — classification after retryable/fault | Numeric internal code, such as 3001 |
error.name | Yes — classification | Symbolic name, such as INTERNAL_ERROR |
error.type | No — observability/display only | Stable wire-format error type |
error.category | No — observability/display only | client_error, provider_error, or platform_error |
error.description | No — display only | Stable explanation of the error code |
error.message | No — display only | Request-specific customer-facing message |
error.trace_id | No — observability | See trace ID precedence |
top-level trace_id | No — observability | Present on every SSE chunk; see trace ID precedence |
Read the Whole Error Object
The sameerror.code can represent different situations, so never branch on code alone. 3001 / INTERNAL_ERROR is the clearest example:
- A retryable stream-time provider anomaly surfaces as
code: 3001,name: "INTERNAL_ERROR",fault: "provider",retryable: true. - A platform-internal
3001surfaces withfault: "internal"andretryable: false.
error.retryable first, then use fault, code, and name for classification and observability:
error.retryable— should this request be retried at all?error.fault— was it aproviderissue orinternalto the platform?error.code/error.name— what exactly happened (for logs, metrics, alerts)?
Decision Tree
Two independent axes govern recovery — keep them separate so “attempt” is never ambiguous:- Which operation? Driven by whether content was already shown to the user.
- Not shown → full retry (a fresh stream).
- Shown → Continue (application-level continuation — never a full retry, which would force erasing or duplicating visible text). Continuation is exclusive to the post-content phase.
- How many attempts? Driven by context (ceilings below). Always use full-jitter backoff; always stop early on
retryable: false.
- Stream fails before any visible content
- Retry only when
error.retryableistrue, or the failure is a transport error that is plausibly transient. - Ceiling: 2 full-retry attempts for live UX.
- Retry only when
- Stream fails after visible content (post-content recovery)
- Preserve the partial output. Never silently replace the visible answer.
- If a side-effecting tool call was emitted or executed this turn, skip the automatic attempt (see Tool Calls and Side Effects) and surface the interruption directly.
- Otherwise attempt exactly one automatic Continue (continuation).
- If the automatic Continue fails or was skipped, mark the answer interrupted and surface a user-initiated Continue (primary, non-destructive) and an explicitly labelled, destructive Try again (full retry, secondary).
- “At most one” means one automatic attempt — never a silent loop. User-initiated actions are user-governed and don’t count against the automatic budget.
- For high-stakes, factual, or doctrinal answers, steer away from automatic Continue toward Try again or explicit user review (see Application-Level Continuation).
- Background/batch workflow
- Up to 3 full-retry attempts with jittered exponential backoff and an overall job timeout. Partial output can be discarded safely because nothing was shown to a user.
error.retryableisfalse- Do not retry unchanged. Surface the error, change the request or model, or escalate with the trace ID.
finish_reasonis"content_filter"(Signal B)- Terminal and non-retryable. Surface it; never retry or continue.
Application-Level Continuation
Completions V2 does not expose true stream resumption — there is no cursor, offset, replay token, or resume endpoint. What your application can do instead is application-level continuation: a new request that includes the partial output as context and asks the model to continue. Send the partial output inside a new user message:Retry Budgets and Backoff
Recommended defaults:| Context | Operation | Ceiling |
|---|---|---|
| Live, no visible content yet | Full retry | 2 attempts |
| Live, visible partial content | Automatic Continue (continuation) | 1 attempt, then user-initiated controls |
| Background/batch | Full retry | 3 attempts |
- Use full-jitter exponential backoff between attempts.
- Cap live UX delays aggressively (a couple of seconds at most); use a broader cap for background jobs.
- Stop early the moment an error reports
retryable: false. - Do not blindly retry five times for live streams — generic rate-limit retry examples are not a live-streaming UX policy.
Rate Limits
Rate limits are a separate concern from generic retryable stream failures:- An HTTP
429before the stream starts is a normal rate-limit response. RespectRetry-AfterorX-RateLimit-Resetwhen the response actually includes those headers, and back off with jitter. See Rate Limits. - A rate limit that occurs after the stream has started (for example, an upstream provider limit) arrives as a Signal A stream error such as
2004 / RATE_LIMIT. Handle it through the normalized SSEerrorobject — stream-time SSE error events do not carryRetry-After,X-RateLimit-Reset, or provider rate-limit headers. - Rate-limit retries should use jittered backoff and must not run indefinitely.
Partial Output UX
When recovery involves text a user has already seen:- Keep partial text visible. Never make displayed output disappear silently.
- Mark the answer as interrupted if recovery fails, so the user knows it is incomplete.
- Offer user-visible controls where appropriate: Continue (primary, appends) and Try again (secondary, explicitly labelled as replacing the current answer).
- If full retry is chosen, make the replacement explicit — the user should understand the visible answer is being replaced, not extended.
- Retain enough application state to know whether any content was displayed; that single flag drives the choice between full retry and continuation.
Tool Calls and Side Effects
Full retries and continuations can duplicate side effects if the model already emitted a tool call and your application executed it — sending an email twice, charging a card twice.- Use idempotency keys or operation IDs for tool execution, so a replayed call can be detected and dropped.
- Record, per turn, whether a tool call was emitted and whether it was executed, before any retry decision.
- Do not automatically replay side-effecting tool calls unless your application can prove the prior attempt did not execute.
- If a side-effecting tool call fired in the interrupted turn, skip the automatic Continue and surface the interruption — prefer fail-fast or explicit user confirmation over silent recovery.
Observability Checklist
Log the full streaming error object — including the non-branching fieldserror.type, error.category, and error.description — even though retry decisions only use retryable, fault, code, and name.
Trace IDs. Log one normalized field named trace_id, with this precedence:
event.error.trace_id and the top-level event.trace_id both exist and differ, log both — and include both in support requests.
Recommended fields per failed or recovered stream:
- normalized
trace_id(plus the secondary trace ID on mismatch) error.code,error.name,error.type,error.fault,error.retryable- selected model (or requested model) and routing mode
- whether content was displayed, and the partial output length
- attempt number and backoff delay
- whether continuation or full retry was used
- whether tool calls were emitted or executed
Summary Table
| Condition | Recommended action | Retry ceiling | UX note |
|---|---|---|---|
Pre-stream retryable error (408, 429, 5xx) | Full retry with backoff | 2 (live) | Nothing rendered yet; retry is invisible |
| Stream error before first token | Full retry if retryable: true | 2 full retries | Show a loading state, not an error, until the budget is spent |
| Stream error after partial output | One automatic Continue, then user controls | 1 automatic continuation | Preserve partial text; mark interrupted on failure |
| Rate limit (HTTP 429 pre-stream) | Back off; honor Retry-After/X-RateLimit-Reset if present | Bounded, jittered | Consider a “busy, retrying” indicator |
Rate limit (stream event 2004) | Treat as Signal A via error object | Same as stream phase | No retry headers exist on SSE errors |
Content filter (finish_reason: "content_filter") | Surface; never retry or continue | 0 | Terminal signal, not an error object |
Non-retryable error (retryable: false) | Stop; surface, change request/model, or escalate | 0 | Include trace_id in support requests |
| Side-effecting tool workflow | Skip automatic recovery; fail fast or confirm | 0 automatic | Require explicit user action |
| Background/batch workflow | Full retry with backoff and job timeout | 3 full retries | Partial output can be discarded safely |
Reference Implementations
The modules below implement the full decision tree: SSE parsing, the two signals, trace-ID normalization, retry ceilings, full-jitter backoff, the one-automatic-Continue rule, and the tool-call side-effect guard. Copy either one into your project as a small client module. Usage:Related Pages
Errors Reference
Error codes, the AI error object, and streaming error delivery.
Rate Limits
HTTP 429 handling and backoff guidance.
Completions V2 Guide
Routing modes, streaming support, and request parameters.
Tool Use Guide
Function calling across routing modes, including streaming.

