> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gloo.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Streaming AI Responses in Real Time

> Learn how to stream Gloo AI completions token-by-token using Server-Sent Events, with patterns for terminal rendering and a server-side proxy.

## Overview

The Gloo AI completions API supports streaming responses, so instead of waiting for the full answer, your application receives tokens one at a time as the model generates them. This creates a faster, more interactive user experience and is the standard pattern for chat and content-generation products.

In this tutorial you'll build a streaming client from scratch: parsing the SSE wire protocol, accumulating tokens, handling errors, and rendering output as it arrives. You'll also build a server-side proxy that shields your API credentials from the browser.

## What You'll Build

By the end of this tutorial, you'll have a complete streaming implementation featuring:

* **SSE stream parser** that reads tokens as they arrive from the API
* **Token accumulator** that assembles the full response with timing and token count
* **Streaming-aware error handler** that catches auth and rate-limit errors before reading the stream
* **Terminal renderer** that displays tokens in real time with a typing effect
* **Server-side proxy** that relays the stream to browser clients without exposing your credentials

## Understanding Server-Sent Events

When you set `"stream": true` in a completions request, the API switches from a single JSON response to an SSE stream. Each token arrives as a line formatted `data: <json>`, with blank lines separating events:

```
data: {"choices":[{"delta":{"content":"The"},"finish_reason":null}]}

data: {"choices":[{"delta":{"content":" resurrection"},"finish_reason":null}]}

data: {"choices":[{"delta":{"content":" is"},"finish_reason":null}]}

data: {"choices":[{"delta":{"content":"..."},"finish_reason":"stop"}]}
```

The stream ends when a chunk arrives with a non-null `finish_reason` (typically `"stop"`).

## Two Approaches: Direct vs. Proxy

This tutorial covers two ways to consume the stream:

| Approach     | How it works                                           | When to use                                                  |
| ------------ | ------------------------------------------------------ | ------------------------------------------------------------ |
| **Terminal** | Your server calls the API directly and prints tokens   | Background jobs, CLIs, server-side rendering                 |
| **Proxy**    | A lightweight server relays SSE to any external client | Web apps, any case where browser JS would expose credentials |

## Prerequisites

Before starting, ensure you have:

* A Gloo AI Studio account with API credentials
* Your Client ID and Client Secret from the [API Credentials page](/studio/manage-api-credentials)
* **Authentication setup** — complete the [Authentication Tutorial](/tutorials/authentication) first

<Info>
  The starter project includes a pre-built auth module. You don't need to implement authentication in this tutorial — it's already working in the starter code.
</Info>

## Getting Started with the Starter Project

This tutorial uses a hands-on approach where you'll build the streaming client incrementally. The starter code provides complete scaffolding with TODO markers guiding each step.

### Download the Starter Code

Choose your preferred language and download the starter project:

<CardGroup cols={3}>
  <Card title="Python" icon="python" href="https://github.com/GlooDeveloper/gloo-ai-docs-cookbook/tree/main/completions-streaming/starter/python">
    Python 3.9+ · requests · Flask
  </Card>

  <Card title="JavaScript" icon="js" href="https://github.com/GlooDeveloper/gloo-ai-docs-cookbook/tree/main/completions-streaming/starter/javascript">
    Node.js 18+ · native fetch · Express
  </Card>

  <Card title="TypeScript" icon="code" href="https://github.com/GlooDeveloper/gloo-ai-docs-cookbook/tree/main/completions-streaming/starter/typescript">
    TypeScript 5+ · typed SSE chunks
  </Card>

  <Card title="PHP" icon="php" href="https://github.com/GlooDeveloper/gloo-ai-docs-cookbook/tree/main/completions-streaming/starter/php">
    PHP 8.1+ · cURL write callback
  </Card>

  <Card title="Go" icon="golang" href="https://github.com/GlooDeveloper/gloo-ai-docs-cookbook/tree/main/completions-streaming/starter/go">
    Go 1.20+ · bufio.Scanner · http.Flusher
  </Card>

  <Card title="Java" icon="java" href="https://github.com/GlooDeveloper/gloo-ai-docs-cookbook/tree/main/completions-streaming/starter/java">
    Java 17+ · HttpClient · Maven
  </Card>
</CardGroup>

### Quick Setup

<CodeGroup>
  ```bash Python theme={null}
  cd starter/python
  python -m venv venv
  source venv/bin/activate  # Windows: venv\Scripts\activate
  pip install -r requirements.txt
  cp .env.example .env
  # Edit .env with your GLOO_CLIENT_ID and GLOO_CLIENT_SECRET
  ```

  ```bash JavaScript theme={null}
  cd starter/javascript
  npm install
  cp .env.example .env
  # Edit .env with your GLOO_CLIENT_ID and GLOO_CLIENT_SECRET
  ```

  ```bash TypeScript theme={null}
  cd starter/typescript
  npm install
  cp .env.example .env
  # Edit .env with your GLOO_CLIENT_ID and GLOO_CLIENT_SECRET
  ```

  ```bash PHP theme={null}
  cd starter/php
  composer install
  cp .env.example .env
  # Edit .env with your GLOO_CLIENT_ID and GLOO_CLIENT_SECRET
  ```

  ```bash Go theme={null}
  cd starter/go
  go mod download
  cp .env.example .env
  # Edit .env with your GLOO_CLIENT_ID and GLOO_CLIENT_SECRET
  ```

  ```bash Java theme={null}
  cd starter/java
  mvn clean compile
  cp .env.example .env
  # Edit .env with your GLOO_CLIENT_ID and GLOO_CLIENT_SECRET
  ```
</CodeGroup>

### Test Your Setup

Run the entry point — it should load your credentials and confirm the stubs are in place:

<CodeGroup>
  ```bash Python theme={null}
  python main.py
  ```

  ```bash JavaScript theme={null}
  npm start
  ```

  ```bash TypeScript theme={null}
  npm start
  ```

  ```bash PHP theme={null}
  composer start
  ```

  ```bash Go theme={null}
  go run main.go
  ```

  ```bash Java theme={null}
  mvn -q compile exec:java -Dexec.mainClass="com.gloo.streaming.Main"
  ```
</CodeGroup>

You should see credentials load successfully, followed by `NotImplementedError` (or equivalent) from the first stub — confirming that setup is complete and you're ready to implement.

## Architecture Overview

### Component Architecture

<div style={{fontSize: '18px'}}>
  ```mermaid theme={null}
  %%{init: {'theme':'dark', 'themeVariables': { 'fontSize':'18px'}}}%%
  graph TB
      Auth["Auth Module<br/><br/>OAuth2 client credentials<br/>Token caching + refresh<br/>Pre-built — no changes needed"]

      StreamClient["Streaming Client<br/><br/>Make Streaming Request<br/>Parse SSE Line<br/>Extract Token Content<br/>Stream Completion<br/>Handle Stream Error"]

      Renderer["Terminal Renderer<br/><br/>Render Stream to Terminal<br/>Unbuffered stdout output<br/>Typing effect"]

      Proxy["Proxy Server<br/><br/>POST /api/stream<br/>Relays SSE upstream<br/>Keeps credentials server-side"]

      BrowserClient["Browser Client<br/><br/>fetch() + ReadableStream<br/>Markdown re-parse pattern<br/>Pre-built index.html"]

      GlooAPI["Gloo AI API<br/><br/>POST /ai/v2/chat/completions<br/>stream: true<br/>SSE response"]

      Auth --> StreamClient
      Auth --> Proxy
      StreamClient --> Renderer
      StreamClient --> GlooAPI
      Proxy --> GlooAPI
      BrowserClient --> Proxy

      classDef prebuilt fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#9ca3af
      classDef teaching fill:#4a5568,stroke:#cbd5e0,stroke-width:3px,color:#fff
      classDef external fill:#1a202c,stroke:#cbd5e0,stroke-width:2px,color:#9ca3af

      class Auth,BrowserClient prebuilt
      class StreamClient,Renderer,Proxy teaching
      class GlooAPI external
  ```
</div>

### Implementation Roadmap

|  Step | What You Build                  | Track    | Validates                                                  |
| :---: | ------------------------------- | -------- | ---------------------------------------------------------- |
| **1** | Environment setup               | Shared   | Auth loads; streaming endpoint reachable                   |
| **2** | Handle stream errors            | Shared   | 401/403/429 errors thrown before stream read               |
| **3** | Streaming request + SSE parsing | Shared   | HTTP connection opens; SSE lines parsed; `[DONE]` detected |
| **4** | Token extraction + accumulation | Shared   | Token text extracted; full response assembled with timing  |
| **5** | Render stream to terminal       | Terminal | Tokens print live to terminal                              |
| **6** | Proxy stream handler            | Proxy    | SSE relayed through server                                 |
|  7 †  | Testing & browser demo          | Proxy    | End-to-end validation                                      |

<Note>
  † No new implementation — run the demo, test the proxy via API, and explore the browser client.
</Note>

Steps 1–5 build the streaming client. Step 6 adds the server-side proxy. Step 7 walks through the browser demo.

Let's get started!

***

## Step 1: Environment Setup & Auth Verification

The starter project includes a pre-built auth module that handles OAuth2 client credentials. Before implementing any streaming logic, confirm it works with the streaming endpoint.

### What You'll Verify

1. Credentials load correctly from `.env`
2. A token can be obtained from the Gloo AI auth server
3. A request to the completions endpoint returns `200 OK` with `Content-Type: text/event-stream`

### Testing Your Setup

Run the Step 1 checkpoint now — it should pass with the pre-built auth:

<CodeGroup>
  ```bash Python theme={null}
  python tests/step1_auth_test.py
  ```

  ```bash JavaScript theme={null}
  npm run test:step1
  ```

  ```bash TypeScript theme={null}
  npm run test:step1
  ```

  ```bash PHP theme={null}
  php tests/Step1AuthTest.php
  ```

  ```bash Go theme={null}
  go run tests/step1_auth.go
  ```

  ```bash Java theme={null}
  mvn -q compile exec:java -Dexec.mainClass="com.gloo.streaming.tests.Step1AuthTest"
  ```
</CodeGroup>

### ✓ Checkpoint: Auth Verification

Your output should look similar to the following:

```
🧪 Testing: Environment Setup & Auth Verification

✓ GLOO_CLIENT_ID loaded
✓ GLOO_CLIENT_SECRET loaded

Test 1: Obtaining access token...
✓ Access token obtained
  Expires in: 3600 seconds

Test 2: Token caching (ensure_valid_token)...
✓ Token cached correctly — same token returned on consecutive calls

Test 3: Verifying streaming endpoint...
✓ Status: 200 OK
✓ Content-Type: text/event-stream; charset=utf-8

✅ Auth and streaming endpoint verified.
   Next: Making the Streaming Request
```

**If tests fail**, check:

* `.env` file exists in the language directory (not just `.env.example`)
* `GLOO_CLIENT_ID` and `GLOO_CLIENT_SECRET` are set correctly
* You've completed the [Authentication Tutorial](/tutorials/authentication) prerequisites

***

## Step 2: Streaming-Aware Error Handling

Now implement the stream error handler, a focused function that maps HTTP status codes to descriptive exceptions before any stream data is read.

### Key Concepts

#### Two-Phase Error Handling

Streaming introduces two distinct error phases:

**Phase 1 — Pre-stream (before reading bytes)**: The HTTP status tells you everything. A 401 means bad token; a 429 means slow down. Check the status immediately and throw a specific error before touching the body. This is what the stream error handler does.

**Phase 2 — Mid-stream (while reading bytes)**: The connection is live when something fails — network drop, server restart, timeout. Catch these in the accumulation loop with a try/catch around the read loop. If you've already accumulated partial text, preserve it and return what you have rather than discarding the work.

Separating these phases makes errors debuggable: pre-stream errors have status codes; mid-stream errors have partial content.

### Implementation Guide

Open your streaming client file and find the error handler method, it's a small, focused function with one case per status code. Review the TODO comments, then implement the function:

<CodeGroup>
  ```python Python theme={null}
  # File: streaming/stream_client.py
  def handle_stream_error(status_code: int, response_body: str = "") -> None:
      if status_code == 401:
          raise Exception("Authentication failed (401): Invalid or expired token")
      elif status_code == 403:
          raise Exception("Authorization failed (403): Insufficient permissions")
      elif status_code == 429:
          raise Exception("Rate limit exceeded (429): Too many requests")
      elif status_code != 200:
          raise Exception(f"API error ({status_code}): {response_body[:200]}")
  ```

  ```javascript JavaScript theme={null}
  // File: src/streaming/streamClient.js
  export function handleStreamError(statusCode, responseBody = "") {
    if (statusCode === 401) {
      throw new Error("Authentication failed (401): Invalid or expired token");
    } else if (statusCode === 403) {
      throw new Error("Authorization failed (403): Insufficient permissions");
    } else if (statusCode === 429) {
      throw new Error("Rate limit exceeded (429): Too many requests");
    } else if (statusCode !== 200) {
      throw new Error(`API error (${statusCode}): ${String(responseBody).slice(0, 200)}`);
    }
  }
  ```

  ```typescript TypeScript theme={null}
  // File: src/streaming/streamClient.ts
  export function handleStreamError(statusCode: number, responseBody = ""): void {
    if (statusCode === 401) {
      throw new Error("Authentication failed (401): Invalid or expired token");
    } else if (statusCode === 403) {
      throw new Error("Authorization failed (403): Insufficient permissions");
    } else if (statusCode === 429) {
      throw new Error("Rate limit exceeded (429): Too many requests");
    } else if (statusCode !== 200) {
      throw new Error(`API error (${statusCode}): ${String(responseBody).slice(0, 200)}`);
    }
  }
  ```

  ```php PHP theme={null}
  // File: src/Streaming/StreamClient.php
  public static function handleStreamError(int $statusCode, string $responseBody = ''): void
  {
      if ($statusCode === 401) {
          throw new \RuntimeException('Authentication failed (401): Invalid or expired token');
      } elseif ($statusCode === 403) {
          throw new \RuntimeException('Authorization failed (403): Insufficient permissions');
      } elseif ($statusCode === 429) {
          throw new \RuntimeException('Rate limit exceeded (429): Too many requests');
      } elseif ($statusCode !== 200) {
          $preview = substr($responseBody, 0, 200);
          throw new \RuntimeException("API error ({$statusCode}): {$preview}");
      }
  }
  ```

  ```go Go theme={null}
  // File: pkg/streaming/client.go
  func HandleStreamError(statusCode int, responseBody string) error {
      switch statusCode {
      case http.StatusOK:
          return nil
      case http.StatusUnauthorized:
          return fmt.Errorf("authentication failed (401): invalid or expired token")
      case http.StatusForbidden:
          return fmt.Errorf("authorization failed (403): insufficient permissions")
      case http.StatusTooManyRequests:
          return fmt.Errorf("rate limit exceeded (429): too many requests")
      default:
          preview := responseBody
          if len(preview) > 200 {
              preview = preview[:200]
          }
          return fmt.Errorf("API error (%d): %s", statusCode, preview)
      }
  }
  ```

  ```java Java theme={null}
  // File: src/main/java/com/gloo/streaming/streaming/StreamClient.java
  public static void handleStreamError(int statusCode, String responseBody) {
      switch (statusCode) {
          case 200 -> { /* ok */ }
          case 401 -> throw new RuntimeException(
              "Authentication failed (401): Invalid or expired token"
          );
          case 403 -> throw new RuntimeException(
              "Authorization failed (403): Insufficient permissions"
          );
          case 429 -> throw new RuntimeException(
              "Rate limit exceeded (429): Too many requests"
          );
          default -> {
              String preview = responseBody != null && responseBody.length() > 200
                  ? responseBody.substring(0, 200) : responseBody;
              throw new RuntimeException("API error (" + statusCode + "): " + preview);
          }
      }
  }
  ```
</CodeGroup>

The code does the following:

* Throws an authentication error on 401 if the token is missing, expired, or malformed
* Throws an authorization error on 403 if the token is valid but lacks permission for this resource
* Throws a rate limit error on 429 if the request was rejected before the API spent any compute
* Throws a generic error for any other non-200 status, including the response body for diagnostic context
* Returns without throwing on 200 so the caller can proceed to read the stream

### ✓ Checkpoint: Error Handling

Run the error handling test:

<CodeGroup>
  ```bash Python theme={null}
  python tests/step2_error_handling_test.py
  ```

  ```bash JavaScript theme={null}
  npm run test:step2
  ```

  ```bash TypeScript theme={null}
  npm run test:step2
  ```

  ```bash PHP theme={null}
  php tests/Step2ErrorHandlingTest.php
  ```

  ```bash Go theme={null}
  go run tests/step2_error_handling.go
  ```

  ```bash Java theme={null}
  mvn -q compile exec:java -Dexec.mainClass="com.gloo.streaming.tests.Step2ErrorHandlingTest"
  ```
</CodeGroup>

Your output should look similar to the following:

```
🧪 Testing: Streaming Error Handling

Test 1: handle_stream_error(401)...
✓ 401 raises: Authentication failed (401): Invalid or expired token
Test 2: handle_stream_error(403)...
✓ 403 raises: Authorization failed (403): Insufficient permissions
Test 3: handle_stream_error(429)...
✓ 429 raises: Rate limit exceeded (429): Too many requests
Test 4: handle_stream_error(200) — success, no exception...
✓ 200 OK — no exception raised
Test 5: handle_stream_error(500)...
✓ 500 throws with body: API error (500): Internal Server Error

✅ Two-phase error handling working.
   Next: Streaming Requests & SSE Parsing
```

**If tests fail**, check:

* Status 200 must **not** raise an exception
* The error message for non-200 includes the status code
* The response body is truncated (first 200 chars) to avoid enormous error messages

***

## Step 3: Streaming Requests & SSE Parsing

Time to wire up the streaming connection. You'll open a persistent HTTP connection to the completions API and write the parser that converts raw SSE lines into something you can actually work with.

### What You'll Implement

1. A function to initiate a streaming request
2. A function to parse individual SSE lines

### Making the Streaming Request

#### Why `stream: true` Changes Everything

Without `stream: true`, the API buffers the entire response and returns it as a single JSON object. With `stream: true`, it switches to SSE mode: the connection stays open and bytes arrive incrementally as the model generates them.

This is why you return the **raw response object** rather than parsed JSON — the body isn't fully available yet. The caller will read it line by line in the next steps.

#### Fail Fast Before Reading

Checking the HTTP status code *before* starting to read the stream is important for a clean user experience. A 401 response will never produce SSE data — it returns a JSON error body. If you skipped the status check and tried to parse lines from a 401 response, you'd get confusing parse errors instead of a clear "authentication failed" message.

#### Implementation Guide

Still in the same streaming client file, find the streaming request method, review the TODO comments, then implement the changes outlined in the code block:

<CodeGroup>
  ```python Python theme={null}
  # File: streaming/stream_client.py
  def make_streaming_request(message: str, token: str):
      headers = {
          "Authorization": f"Bearer {token}",
          "Content-Type": "application/json",
      }
      payload = {
          "messages": [{"role": "user", "content": message}],
          "auto_routing": True,
          "stream": True,
      }
      response = requests.post(API_URL, headers=headers, json=payload, stream=True)
      handle_stream_error(
          response.status_code,
          response.text if response.status_code != 200 else "",
      )
      return response
  ```

  ```javascript JavaScript theme={null}
  // File: src/streaming/streamClient.js
  export async function makeStreamingRequest(message, token) {
    const response = await fetch(API_URL, {
      method: "POST",
      headers: {
        Authorization: `Bearer ${token}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        messages: [{ role: "user", content: message }],
        auto_routing: true,
        stream: true,
      }),
    });

    if (!response.ok) {
      const body = await response.text();
      handleStreamError(response.status, body);
    }

    return response;
  }
  ```

  ```typescript TypeScript theme={null}
  // File: src/streaming/streamClient.ts
  export async function makeStreamingRequest(message: string, token: string): Promise<Response> {
    const response = await fetch(API_URL, {
      method: "POST",
      headers: {
        Authorization: `Bearer ${token}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        messages: [{ role: "user", content: message }],
        auto_routing: true,
        stream: true,
      }),
    });

    if (!response.ok) {
      const body = await response.text();
      handleStreamError(response.status, body);
    }

    return response;
  }
  ```

  ```php PHP theme={null}
  // File: src/Streaming/StreamClient.php
  public static function makeStreamingRequest(string $message, string $token, callable $writeCallback): void
  {
      $payload = json_encode([
          'messages' => [['role' => 'user', 'content' => $message]],
          'auto_routing' => true,
          'stream' => true,
      ]);

      $headers = [
          'Authorization: Bearer ' . $token,
          'Content-Type: application/json',
      ];

      $statusCode = 0;

      $ch = curl_init(self::API_URL);
      curl_setopt_array($ch, [
          CURLOPT_POST => true,
          CURLOPT_POSTFIELDS => $payload,
          CURLOPT_HTTPHEADER => $headers,
          CURLOPT_RETURNTRANSFER => false,
          CURLOPT_HEADERFUNCTION => function ($ch, $header) use (&$statusCode) {
              if (preg_match('/HTTP\/\d+\.?\d*\s+(\d+)/', $header, $m)) {
                  $statusCode = (int)$m[1];
              }
              return strlen($header);
          },
          CURLOPT_WRITEFUNCTION => function ($ch, $data) use (&$statusCode, $writeCallback) {
              if ($statusCode && $statusCode !== 200) {
                  self::handleStreamError($statusCode, $data);
              }
              $writeCallback($data);
              return strlen($data);
          },
      ]);

      $result = curl_exec($ch);
      $error = $result === false ? curl_error($ch) : '';
      curl_close($ch);

      if ($error) {
          throw new \RuntimeException("Streaming request failed: {$error}");
      }
  }
  ```

  ```go Go theme={null}
  // File: pkg/streaming/client.go
  func MakeStreamingRequest(message, token string) (*http.Response, error) {
      payload := map[string]any{
          "messages":     []map[string]string{{"role": "user", "content": message}},
          "auto_routing": true,
          "stream":       true,
      }

      body, err := json.Marshal(payload)
      if err != nil {
          return nil, fmt.Errorf("failed to encode request: %w", err)
      }

      req, err := http.NewRequest(http.MethodPost, apiURL, bytes.NewReader(body))
      if err != nil {
          return nil, fmt.Errorf("failed to create request: %w", err)
      }
      req.Header.Set("Authorization", "Bearer "+token)
      req.Header.Set("Content-Type", "application/json")

      client := &http.Client{}
      resp, err := client.Do(req)
      if err != nil {
          return nil, fmt.Errorf("streaming request failed: %w", err)
      }

      if resp.StatusCode != http.StatusOK {
          bodyBytes, _ := io.ReadAll(resp.Body)
          resp.Body.Close()
          return nil, HandleStreamError(resp.StatusCode, string(bodyBytes))
      }

      return resp, nil
  }
  ```

  ```java Java theme={null}
  // File: src/main/java/com/gloo/streaming/streaming/StreamClient.java
  public static HttpResponse<java.io.InputStream> makeStreamingRequest(
      String message, String token
  ) {
      String payload = GSON.toJson(Map.of(
          "messages", List.of(Map.of("role", "user", "content", message)),
          "auto_routing", true,
          "stream", true
      ));

      HttpRequest request = HttpRequest.newBuilder()
          .uri(URI.create(API_URL))
          .header("Authorization", "Bearer " + token)
          .header("Content-Type", "application/json")
          .POST(HttpRequest.BodyPublishers.ofString(payload))
          .build();

      try {
          HttpResponse<java.io.InputStream> response = HTTP_CLIENT.send(
              request, HttpResponse.BodyHandlers.ofInputStream()
          );
          if (response.statusCode() != 200) {
              String body = new String(response.body().readAllBytes(), StandardCharsets.UTF_8);
              handleStreamError(response.statusCode(), body);
          }
          return response;
      } catch (Exception e) {
          if (e instanceof RuntimeException re) throw re;
          throw new RuntimeException("Streaming request failed: " + e.getMessage(), e);
      }
  }
  ```
</CodeGroup>

The code does the following:

* Sets `Authorization` and `Content-Type` headers using the provided token
* Builds the request payload with `stream: true` to enable SSE mode and `auto_routing: true` to let Gloo select the best model
* Checks the HTTP status before reading any response data, raising a descriptive error for non-200 responses
* Returns the raw response object so the caller can iterate its body line by line

<Note>
  **PHP note**: cURL's streaming architecture doesn't allow inspecting the HTTP status before the write callback fires. The status check happens on the first data chunk instead. This is the idiomatic PHP pattern for streaming with cURL.
</Note>

### Parsing SSE Lines

#### The SSE Wire Format

SSE is a simple text protocol. Each event is one line starting with `data: `, terminated by a blank line. In practice, the Gloo AI stream looks like:

```
data: {"id":"...","choices":[{"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"...","choices":[{"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"...","choices":[{"delta":{"content":" world"},"finish_reason":"stop"}]}
```

Blank lines are separators, not errors — they're common and should be silently skipped. Lines that don't start with `data: ` (such as `event:` or `:` comment lines) should also be skipped.

#### Defensive Parsing

The JSON parse is wrapped in a try/catch. Mid-stream network hiccups can produce partial lines — you don't want a single malformed chunk to crash the entire stream. Return `null` for unparseable lines and let the accumulation loop move on.

#### Implementation Guide

You're still working with the streaming client file. Find the SSE line parser method, review the TODO comments, then implement:

<CodeGroup>
  ```python Python theme={null}
  # File: streaming/stream_client.py
  def parse_sse_line(line: str):
      if not line or not line.strip():
          return None
      if not line.startswith("data: "):
          return None
      data = line[6:]  # strip 'data: ' prefix
      if data.strip() == "[DONE]":
          return "[DONE]"
      try:
          return json.loads(data)
      except json.JSONDecodeError:
          return None
  ```

  ```javascript JavaScript theme={null}
  // File: src/streaming/streamClient.js
  export function parseSseLine(line) {
    if (!line || !line.trim()) return null;
    if (!line.startsWith("data: ")) return null;
    const data = line.slice(6); // strip 'data: '
    if (data.trim() === "[DONE]") return "[DONE]";
    try {
      return JSON.parse(data);
    } catch {
      return null;
    }
  }
  ```

  ```typescript TypeScript theme={null}
  // File: src/streaming/streamClient.ts
  export function parseSseLine(line: string): null | "[DONE]" | SSEChunk {
    if (!line || !line.trim()) return null;
    if (!line.startsWith("data: ")) return null;
    const data = line.slice(6); // strip 'data: '
    if (data.trim() === "[DONE]") return "[DONE]";
    try {
      return JSON.parse(data) as SSEChunk;
    } catch {
      return null;
    }
  }
  ```

  ```php PHP theme={null}
  // File: src/Streaming/StreamClient.php
  public static function parseSseLine(string $line): mixed
  {
      if (!$line || !trim($line)) {
          return null;
      }
      if (!str_starts_with($line, 'data: ')) {
          return null;
      }
      $data = substr($line, 6); // strip 'data: '
      if (trim($data) === '[DONE]') {
          return '[DONE]';
      }
      $decoded = json_decode($data, true);
      return $decoded ?? null;
  }
  ```

  ```go Go theme={null}
  // File: pkg/streaming/client.go
  func ParseSSELine(line string) any {
      if strings.TrimSpace(line) == "" {
          return nil
      }
      if !strings.HasPrefix(line, "data: ") {
          return nil
      }
      data := line[6:] // strip "data: "
      if strings.TrimSpace(data) == "[DONE]" {
          return "[DONE]"
      }
      var chunk SSEChunk
      if err := json.Unmarshal([]byte(data), &chunk); err != nil {
          return nil
      }
      return &chunk
  }
  ```

  ```java Java theme={null}
  // File: src/main/java/com/gloo/streaming/streaming/StreamClient.java
  public static Object parseSseLine(String line) {
      if (line == null || line.isBlank()) return null;
      if (!line.startsWith("data: ")) return null;
      String data = line.substring(6); // strip "data: "
      if (data.trim().equals("[DONE]")) return "[DONE]";
      try {
          return GSON.fromJson(data, new TypeToken<Map<String, Object>>() {}.getType());
      } catch (Exception e) {
          return null;
      }
  }
  ```
</CodeGroup>

The code does the following:

* Returns `null` for blank lines and lines that don't start with `data: `, signalling the caller to skip to the next line
* Strips the `data: ` prefix to isolate the raw JSON payload
* Detects the `[DONE]` sentinel before attempting JSON parsing and returns it as a string to signal the end of the stream
* Parses the payload as JSON and returns the result, or `null` if parsing fails — never throws on malformed input

### ✓ Checkpoint: Streaming Request & SSE Parsing

Run the validation test for this step:

<CodeGroup>
  ```bash Python theme={null}
  python tests/step3_sse_parsing_test.py
  ```

  ```bash JavaScript theme={null}
  npm run test:step3
  ```

  ```bash TypeScript theme={null}
  npm run test:step3
  ```

  ```bash PHP theme={null}
  php tests/Step3SseParsingTest.php
  ```

  ```bash Go theme={null}
  go run tests/step3_sse_parsing.go
  ```

  ```bash Java theme={null}
  mvn -q compile exec:java -Dexec.mainClass="com.gloo.streaming.tests.Step3SseParsingTest"
  ```
</CodeGroup>

Your output should look similar to the following:

```
🧪 Testing: Streaming Request & SSE Line Parsing

✓ Token obtained

Test 1: parse_sse_line — blank line...
✓ Blank line → None
Test 2: parse_sse_line — non-data line...
✓ Non-data line → None
Test 3: parse_sse_line — [DONE] sentinel...
✓ data: [DONE] → '[DONE]'
Test 4: parse_sse_line — valid JSON data line...
✓ data: {json} → parsed dict
Test 5: parse_sse_line — malformed JSON...
✓ Malformed JSON → None (gracefully handled)

Test 6: make_streaming_request() — live connection...
✓ Streaming connection opened (status 200)
Test 7: Iterating SSE lines and detecting stream termination...
✓ Processed 5 lines, 2 data chunks
✓ Stream terminated cleanly (finish_reason=stop)

Test 8: Bad credentials → authentication error before reading stream...
✓ Bad credentials caught (pre-stream): Authorization failed (403): Insufficient permissions

✅ Streaming request and SSE parsing working.
   Next: Token Extraction & Accumulation
```

**If tests fail**, check:

* The streaming request function sets `stream` to `true` in the payload
* The SSE line parser strips exactly 6 characters (`"data: "` has a space after the colon)
* The `[DONE]` check happens before the JSON parse

***

## Step 4: Token Extraction & Accumulation

Next you'll add the pieces to pull the token out of each parsed SSE chunk, and the accumulation loop that stitches everything together into a complete result.

### What You'll Implement

1. A function to extract token content from a parsed SSE chunk
2. A function to collect the full stream into a result object

### Extracting Token Content

#### Why Content Can Be Absent

Not every SSE chunk carries text. The first chunk establishes the role (`delta: {"role": "assistant"}`), while the final chunk carries the finish reason with an empty or absent delta. Only chunks in the middle carry actual content.

This is why you return an empty string rather than throwing since an absent `content` field is completely normal. The accumulation loop skips empty strings when counting tokens.

#### Null-Safe Navigation

Different languages handle missing keys differently. In Python, `.get()` returns `None` without raising; in JavaScript/TypeScript, optional chaining (`?.`) does the same. In Go and Java the struct is fully typed, so missing content simply maps to the zero value. The goal in all languages is the same: never throw when a field is absent.

#### Implementation Guide

Still in the streaming client file, find the token content extractor, review the TODO comments, then implement:

<CodeGroup>
  ```python Python theme={null}
  # File: streaming/stream_client.py
  def extract_token_content(chunk: dict) -> str:
      try:
          choices = chunk.get("choices", [])
          if not choices:
              return ""
          delta = choices[0].get("delta", {})
          return delta.get("content") or ""
      except (IndexError, AttributeError, KeyError):
          return ""
  ```

  ```javascript JavaScript theme={null}
  // File: src/streaming/streamClient.js
  export function extractTokenContent(chunk) {
    try {
      const choices = chunk?.choices;
      if (!choices || choices.length === 0) return "";
      const content = choices[0]?.delta?.content;
      return content || "";
    } catch {
      return "";
    }
  }
  ```

  ```typescript TypeScript theme={null}
  // File: src/streaming/streamClient.ts
  export function extractTokenContent(chunk: SSEChunk): string {
    try {
      const choices = chunk?.choices;
      if (!choices || choices.length === 0) return "";
      const content = choices[0]?.delta?.content;
      return content || "";
    } catch {
      return "";
    }
  }
  ```

  ```php PHP theme={null}
  // File: src/Streaming/StreamClient.php
  public static function extractTokenContent(array $chunk): string
  {
      try {
          $choices = $chunk['choices'] ?? [];
          if (empty($choices)) {
              return '';
          }
          $content = $choices[0]['delta']['content'] ?? '';
          return $content ?: '';
      } catch (\Throwable) {
          return '';
      }
  }
  ```

  ```go Go theme={null}
  // File: pkg/streaming/client.go
  func ExtractTokenContent(chunk *SSEChunk) string {
      if chunk == nil || len(chunk.Choices) == 0 {
          return ""
      }
      return chunk.Choices[0].Delta.Content
  }
  ```

  ```java Java theme={null}
  // File: src/main/java/com/gloo/streaming/streaming/StreamClient.java
  @SuppressWarnings("unchecked")
  public static String extractTokenContent(Map<String, Object> chunk) {
      try {
          List<Map<String, Object>> choices =
              (List<Map<String, Object>>) chunk.get("choices");
          if (choices == null || choices.isEmpty()) return "";
          Map<String, Object> delta = (Map<String, Object>) choices.get(0).get("delta");
          if (delta == null) return "";
          Object content = delta.get("content");
          return content != null ? content.toString() : "";
      } catch (Exception e) {
          return "";
      }
  }
  ```
</CodeGroup>

The code does the following:

* Returns an empty string immediately if `choices` is absent or empty — the first and last chunks often carry no content
* Reads `delta.content` from the first choice, returning an empty string if the field is absent or `null`
* Handles any unexpected chunk structure by returning an empty string rather than throwing, keeping the accumulation loop running cleanly

### Accumulating the Full Response

#### Two Ways to Consume a Stream

You can either accumulate all tokens into a string (what the function in this step does) or print each token immediately as it arrives (what the function in Step 5 does). The choice depends on whether you need the full text before taking action:

* **Accumulate**: useful when you need to parse the full response, log it, or return it from an API
* **Print immediately**: useful for CLI tools and browser UIs where you want the typing effect

#### The Line Buffer (JS/TS/PHP)

In Python and Go, the HTTP libraries provide line-at-a-time iteration. In JavaScript, TypeScript, and PHP, you read raw bytes and split on `\n` yourself. This requires a **line buffer**: keep any incomplete final chunk in a variable and prepend it to the next read's output. Without it, tokens near chunk boundaries get split across two parse calls.

```js theme={null}
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() ?? ""; // save incomplete last line
```

#### Implementation Guide

Open the streaming client file and find the accumulation loop method. This one brings together everything from the previous steps, with the TODO comments showing each stage. Take a moment to trace through the structure before implementing.

<CodeGroup>
  ```python Python theme={null}
  # File: streaming/stream_client.py
  def stream_completion(message: str, token: str) -> dict:
      start_time = time.time()
      response = make_streaming_request(message, token)

      full_text = ""
      token_count = 0
      finish_reason = "unknown"

      try:
          for raw_line in response.iter_lines(decode_unicode=True):
              chunk = parse_sse_line(raw_line)
              if chunk is None:
                  continue
              if chunk == "[DONE]":
                  break
              content = extract_token_content(chunk)
              if content:
                  full_text += content
                  token_count += 1
              choices = chunk.get("choices", [])
              if choices and choices[0].get("finish_reason"):
                  finish_reason = choices[0]["finish_reason"]
      except Exception:
          if full_text:
              pass  # preserve partial output on error
          else:
              raise

      duration_ms = int((time.time() - start_time) * 1000)
      return {
          "text": full_text,
          "token_count": token_count,
          "duration_ms": duration_ms,
          "finish_reason": finish_reason,
      }
  ```

  ```javascript JavaScript theme={null}
  // File: src/streaming/streamClient.js
  export async function streamCompletion(message, token) {
    const startTime = Date.now();
    const response = await makeStreamingRequest(message, token);

    let fullText = "";
    let tokenCount = 0;
    let finishReason = "unknown";

    const reader = response.body.getReader();
    const decoder = new TextDecoder();
    let buffer = "";

    try {
      while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        buffer += decoder.decode(value, { stream: true });
        const lines = buffer.split("\n");
        buffer = lines.pop() ?? "";

        for (const line of lines) {
          const chunk = parseSseLine(line);
          if (chunk === null) continue;
          if (chunk === "[DONE]") break;

          const content = extractTokenContent(chunk);
          if (content) {
            fullText += content;
            tokenCount += 1;
          }

          const choices = chunk?.choices;
          if (choices && choices[0]?.finish_reason) {
            finishReason = choices[0].finish_reason;
          }
        }
      }
    } finally {
      reader.releaseLock();
    }

    return {
      text: fullText,
      token_count: tokenCount,
      duration_ms: Date.now() - startTime,
      finish_reason: finishReason,
    };
  }
  ```

  ```typescript TypeScript theme={null}
  // File: src/streaming/streamClient.ts
  export async function streamCompletion(message: string, token: string): Promise<StreamResult> {
    const startTime = Date.now();
    const response = await makeStreamingRequest(message, token);

    let fullText = "";
    let tokenCount = 0;
    let finishReason = "unknown";

    const reader = response.body!.getReader();
    const decoder = new TextDecoder();
    let buffer = "";

    try {
      while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        buffer += decoder.decode(value, { stream: true });
        const lines = buffer.split("\n");
        buffer = lines.pop() ?? "";

        for (const line of lines) {
          const chunk = parseSseLine(line);
          if (chunk === null) continue;
          if (chunk === "[DONE]") break;

          const content = extractTokenContent(chunk);
          if (content) {
            fullText += content;
            tokenCount += 1;
          }

          const choices = chunk?.choices;
          if (choices && choices[0]?.finish_reason) {
            finishReason = choices[0].finish_reason;
          }
        }
      }
    } finally {
      reader.releaseLock();
    }

    return {
      text: fullText,
      token_count: tokenCount,
      duration_ms: Date.now() - startTime,
      finish_reason: finishReason,
    };
  }
  ```

  ```php PHP theme={null}
  // File: src/Streaming/StreamClient.php
  public static function streamCompletion(string $message, string $token): array
  {
      $startTime = microtime(true);

      $fullText = '';
      $tokenCount = 0;
      $finishReason = 'unknown';
      $lineBuffer = '';
      $done = false;

      $writeCallback = function (string $data) use (
          &$fullText, &$tokenCount, &$finishReason, &$lineBuffer, &$done
      ): void {
          if ($done) return;

          $lineBuffer .= $data;
          $lines = explode("\n", $lineBuffer);
          $lineBuffer = array_pop($lines); // save incomplete last line

          foreach ($lines as $line) {
              $chunk = self::parseSseLine($line);
              if ($chunk === null) continue;
              if ($chunk === '[DONE]') { $done = true; break; }

              $content = self::extractTokenContent($chunk);
              if ($content !== '') {
                  $fullText .= $content;
                  $tokenCount++;
              }
              $choices = $chunk['choices'] ?? [];
              if (!empty($choices) && !empty($choices[0]['finish_reason'])) {
                  $finishReason = $choices[0]['finish_reason'];
              }
          }
      };

      self::makeStreamingRequest($message, $token, $writeCallback);

      return [
          'text' => $fullText,
          'token_count' => $tokenCount,
          'duration_ms' => (int)((microtime(true) - $startTime) * 1000),
          'finish_reason' => $finishReason,
      ];
  }
  ```

  ```go Go theme={null}
  // File: pkg/streaming/client.go
  func StreamCompletion(message, token string) (*StreamResult, error) {
      start := time.Now()

      resp, err := MakeStreamingRequest(message, token)
      if err != nil {
          return nil, err
      }
      defer resp.Body.Close()

      var (
          fullText     strings.Builder
          tokenCount   int
          finishReason = "unknown"
      )

      scanner := bufio.NewScanner(resp.Body)
      for scanner.Scan() {
          line := scanner.Text()
          parsed := ParseSSELine(line)
          if parsed == nil {
              continue
          }
          if s, ok := parsed.(string); ok && s == "[DONE]" {
              break
          }
          chunk, ok := parsed.(*SSEChunk)
          if !ok {
              continue
          }

          content := ExtractTokenContent(chunk)
          if content != "" {
              fullText.WriteString(content)
              tokenCount++
          }

          if len(chunk.Choices) > 0 && chunk.Choices[0].FinishReason != nil {
              finishReason = *chunk.Choices[0].FinishReason
          }
      }

      if err := scanner.Err(); err != nil && fullText.Len() == 0 {
          return nil, fmt.Errorf("error reading stream: %w", err)
      }

      return &StreamResult{
          Text:         fullText.String(),
          TokenCount:   tokenCount,
          DurationMs:   time.Since(start).Milliseconds(),
          FinishReason: finishReason,
      }, nil
  }
  ```

  ```java Java theme={null}
  // File: src/main/java/com/gloo/streaming/streaming/StreamClient.java
  @SuppressWarnings("unchecked")
  public static StreamResult streamCompletion(String message, String token) {
      Instant start = Instant.now();
      HttpResponse<java.io.InputStream> response = makeStreamingRequest(message, token);

      StringBuilder fullText = new StringBuilder();
      int tokenCount = 0;
      String finishReason = "unknown";

      try (BufferedReader reader = new BufferedReader(
          new InputStreamReader(response.body(), StandardCharsets.UTF_8)
      )) {
          String line;
          while ((line = reader.readLine()) != null) {
              Object parsed = parseSseLine(line);
              if (parsed == null) continue;
              if ("[DONE]".equals(parsed)) break;

              Map<String, Object> chunk = (Map<String, Object>) parsed;
              String content = extractTokenContent(chunk);
              if (!content.isEmpty()) {
                  fullText.append(content);
                  tokenCount++;
              }

              List<Map<String, Object>> choices = (List<Map<String, Object>>) chunk.get("choices");
              if (choices != null && !choices.isEmpty()) {
                  Object fr = choices.get(0).get("finish_reason");
                  if (fr != null && !fr.toString().equals("null")) {
                      finishReason = fr.toString();
                  }
              }
          }
      } catch (Exception e) {
          if (fullText.length() == 0) {
              throw new RuntimeException("Stream read failed: " + e.getMessage(), e);
          }
          // Preserve partial output on mid-stream error
      }

      long durationMs = Instant.now().toEpochMilli() - start.toEpochMilli();
      return new StreamResult(fullText.toString(), tokenCount, durationMs, finishReason);
  }
  ```
</CodeGroup>

The code does the following:

* Records the start time before opening the stream so elapsed duration includes connection overhead
* Initializes accumulators for the full text, token count, and finish reason
* Iterates the stream line by line, parsing each with the SSE parser and skipping `null` lines
* Stops the loop when a non-null `finish_reason` is detected or a `[DONE]` sentinel arrives
* Returns a single result object containing the assembled text, token count, elapsed duration in milliseconds, and finish reason

### ✓ Checkpoint: Token Extraction & Accumulation

Run the validation test for this step:

<CodeGroup>
  ```bash Python theme={null}
  python tests/step4_accumulation_test.py
  ```

  ```bash JavaScript theme={null}
  npm run test:step4
  ```

  ```bash TypeScript theme={null}
  npm run test:step4
  ```

  ```bash PHP theme={null}
  php tests/Step4AccumulationTest.php
  ```

  ```bash Go theme={null}
  go run tests/step4_accumulation.go
  ```

  ```bash Java theme={null}
  mvn -q compile exec:java -Dexec.mainClass="com.gloo.streaming.tests.Step4AccumulationTest"
  ```
</CodeGroup>

Your output should look similar to the following:

```
🧪 Testing: Token Extraction & Accumulation

Test 1: extract_token_content — normal chunk...
✓ Normal chunk → 'Hello'
Test 2: extract_token_content — null content delta...
✓ Null content → ''
Test 3: extract_token_content — empty delta (role-only chunk)...
✓ Empty delta → ''
Test 4: extract_token_content — no choices...
✓ Empty choices → ''
Test 5: extract_token_content — finish_reason chunk...
✓ finish_reason chunk → '' (no content tokens from finish chunk)

Test 6: stream_completion — full response assembly...
✓ Delta content extraction working
✓ Null delta handled gracefully
✓ finish_reason detected: stop
✓ Duration tracked: 2098ms
✓ Token count: 2 tokens
  Response preview: '1 2 3 4 5'

✅ Full response assembled.
   Next: Typing-Effect Renderer
```

**If tests fail**, check:

* The token content extractor returns `""` (not `None`/`null`) when content is absent
* The accumulation loop reads `finish_reason` from `choices[0]`, not from the top-level chunk
* The line buffer (`buffer = lines.pop()`) is in place for JS/TS/PHP

***

## Step 5: Typing-Effect Terminal Renderer

Now implement the terminal renderer, a function that prints each token immediately to stdout without a newline, creating a live typing effect in the terminal.

This step demonstrates an important pattern: consuming the stream directly rather than accumulating it first. The renderer calls the streaming request, SSE parsing, and token extraction functions, but skips the accumulation loop entirely.

### Key Concepts

#### Unbuffered Output

By default, most languages buffer stdout which means that output is held until the buffer fills or the program exits. For a typing effect you need every token to appear immediately. Each language has its own way to force this:

| Language                | Unbuffered write                                                   |
| ----------------------- | ------------------------------------------------------------------ |
| Python                  | `print(content, end="", flush=True)`                               |
| JavaScript / TypeScript | `process.stdout.write(content)`                                    |
| PHP                     | `echo $content; ob_flush(); flush();`                              |
| Go                      | `fmt.Fprint(os.Stdout, content)` (stdout is unbuffered by default) |
| Java                    | `System.out.print(content); System.out.flush();`                   |

#### Direct Stream Consumption vs. Accumulation

The stream completion function from Step 4 accumulates everything and returns once the stream is complete. The terminal renderer function prints as it goes, the user sees output before the model has finished generating. Both patterns are valid; the right choice depends on whether the output needs to be complete before it's useful.

### Implementation Guide

Open the renderer file referenced in the code block. Unlike the streaming client, this file has a single method to implement. Review the TODO comments, then implement:

<CodeGroup>
  ```python Python theme={null}
  # File: browser/renderer.py
  def render_stream_to_terminal(message: str, token: str) -> None:
      print(f"Prompt: {message}\n")
      print("Response: ", end="", flush=True)

      response = make_streaming_request(message, token)

      total_tokens = 0
      finish_reason = "unknown"

      for raw_line in response.iter_lines(decode_unicode=True):
          chunk = parse_sse_line(raw_line)
          if chunk is None:
              continue
          if chunk == "[DONE]":
              break
          content = extract_token_content(chunk)
          if content:
              print(content, end="", flush=True)
              total_tokens += 1
          choices = chunk.get("choices", [])
          if choices and choices[0].get("finish_reason"):
              finish_reason = choices[0]["finish_reason"]

      print()
      print(f"\n[{total_tokens} tokens, finish_reason={finish_reason}]")
  ```

  ```javascript JavaScript theme={null}
  // File: src/browser/renderer.js
  export async function renderStreamToTerminal(message, token) {
    process.stdout.write(`Prompt: ${message}\n\nResponse: `);

    const response = await makeStreamingRequest(message, token);
    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    let buffer = "";
    let totalTokens = 0;
    let finishReason = "unknown";

    try {
      while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        buffer += decoder.decode(value, { stream: true });
        const lines = buffer.split("\n");
        buffer = lines.pop() ?? "";

        for (const line of lines) {
          const chunk = parseSseLine(line);
          if (chunk === null) continue;
          if (chunk === "[DONE]") break;

          const content = extractTokenContent(chunk);
          if (content) {
            process.stdout.write(content);
            totalTokens += 1;
          }

          const choices = chunk?.choices;
          if (choices && choices[0]?.finish_reason) {
            finishReason = choices[0].finish_reason;
          }
        }
      }
    } finally {
      reader.releaseLock();
    }

    process.stdout.write(`\n\n[${totalTokens} tokens, finish_reason=${finishReason}]\n`);
  }
  ```

  ```typescript TypeScript theme={null}
  // File: src/browser/renderer.ts
  export async function renderStreamToTerminal(message: string, token: string): Promise<void> {
    process.stdout.write(`Prompt: ${message}\n\nResponse: `);

    const response = await makeStreamingRequest(message, token);
    const reader = response.body!.getReader();
    const decoder = new TextDecoder();

    let buffer = "";
    let totalTokens = 0;
    let finishReason = "unknown";

    try {
      while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        buffer += decoder.decode(value, { stream: true });
        const lines = buffer.split("\n");
        buffer = lines.pop() ?? "";

        for (const line of lines) {
          const chunk = parseSseLine(line);
          if (chunk === null) continue;
          if (chunk === "[DONE]") break;

          const content = extractTokenContent(chunk);
          if (content) {
            process.stdout.write(content);
            totalTokens += 1;
          }

          const choices = chunk?.choices;
          if (choices && choices[0]?.finish_reason) {
            finishReason = choices[0].finish_reason;
          }
        }
      }
    } finally {
      reader.releaseLock();
    }

    process.stdout.write(`\n\n[${totalTokens} tokens, finish_reason=${finishReason}]\n`);
  }
  ```

  ```php PHP theme={null}
  // File: src/Browser/Renderer.php
  public static function renderStreamToTerminal(string $message, string $token): void
  {
      echo "Prompt: {$message}\n\nResponse: ";

      $totalTokens = 0;
      $finishReason = 'unknown';
      $lineBuffer = '';
      $done = false;

      $writeCallback = function (string $data) use (
          &$totalTokens, &$finishReason, &$lineBuffer, &$done
      ): void {
          if ($done) return;

          $lineBuffer .= $data;
          $lines = explode("\n", $lineBuffer);
          $lineBuffer = array_pop($lines);

          foreach ($lines as $line) {
              $chunk = StreamClient::parseSseLine($line);
              if ($chunk === null) continue;
              if ($chunk === '[DONE]') { $done = true; break; }

              $content = StreamClient::extractTokenContent($chunk);
              if ($content !== '') {
                  echo $content;
                  if (ob_get_level() > 0) ob_flush();
                  flush();
                  $totalTokens++;
              }
              $choices = $chunk['choices'] ?? [];
              if (!empty($choices) && !empty($choices[0]['finish_reason'])) {
                  $finishReason = $choices[0]['finish_reason'];
              }
          }
      };

      StreamClient::makeStreamingRequest($message, $token, $writeCallback);

      echo "\n\n[{$totalTokens} tokens, finish_reason={$finishReason}]\n";
  }
  ```

  ```go Go theme={null}
  // File: pkg/browser/renderer.go
  func RenderStreamToTerminal(message, token string) error {
      fmt.Printf("Prompt: %s\n\nResponse: ", message)

      resp, err := streaming.MakeStreamingRequest(message, token)
      if err != nil {
          return err
      }
      defer resp.Body.Close()

      totalTokens := 0
      finishReason := "unknown"

      scanner := bufio.NewScanner(resp.Body)
      for scanner.Scan() {
          line := scanner.Text()
          parsed := streaming.ParseSSELine(line)
          if parsed == nil {
              continue
          }
          if s, ok := parsed.(string); ok && s == "[DONE]" {
              break
          }
          chunk, ok := parsed.(*streaming.SSEChunk)
          if !ok {
              continue
          }

          content := streaming.ExtractTokenContent(chunk)
          if content != "" {
              fmt.Fprint(os.Stdout, content)
              totalTokens++
          }

          if len(chunk.Choices) > 0 && chunk.Choices[0].FinishReason != nil {
              finishReason = *chunk.Choices[0].FinishReason
          }
      }

      fmt.Printf("\n\n[%d tokens, finish_reason=%s]\n", totalTokens, finishReason)
      return nil
  }
  ```

  ```java Java theme={null}
  // File: src/main/java/com/gloo/streaming/browser/Renderer.java
  @SuppressWarnings("unchecked")
  public static void renderStreamToTerminal(String message, String token) {
      System.out.print("Prompt: " + message + "\n\nResponse: ");
      System.out.flush();

      HttpResponse<java.io.InputStream> response = StreamClient.makeStreamingRequest(message, token);

      int totalTokens = 0;
      String finishReason = "unknown";

      try (BufferedReader reader = new BufferedReader(
          new InputStreamReader(response.body(), StandardCharsets.UTF_8)
      )) {
          String line;
          while ((line = reader.readLine()) != null) {
              Object parsed = StreamClient.parseSseLine(line);
              if (parsed == null) continue;
              if ("[DONE]".equals(parsed)) break;

              Map<String, Object> chunk = (Map<String, Object>) parsed;
              String content = StreamClient.extractTokenContent(chunk);
              if (!content.isEmpty()) {
                  System.out.print(content);
                  System.out.flush();
                  totalTokens++;
              }

              List<Map<String, Object>> choices = (List<Map<String, Object>>) chunk.get("choices");
              if (choices != null && !choices.isEmpty()) {
                  Object fr = choices.get(0).get("finish_reason");
                  if (fr != null && !fr.toString().equals("null")) {
                      finishReason = fr.toString();
                  }
              }
          }
      } catch (Exception e) {
          System.err.println("Error reading stream: " + e.getMessage());
      }

      System.out.println("\n\n[" + totalTokens + " tokens, finish_reason=" + finishReason + "]");
  }
  ```
</CodeGroup>

The code does the following:

* Prints the user's message as a prompt header before the response begins
* Opens the stream and iterates SSE lines directly, without an accumulation loop, so tokens are available to print as soon as they arrive
* Writes each token to stdout without a trailing newline and flushes immediately, producing a character-by-character typing effect
* Prints a summary line with the total token count and finish reason after the stream ends

### ✓ Checkpoint: Terminal Renderer

Run the validation test:

<CodeGroup>
  ```bash Python theme={null}
  python tests/step5_renderer_test.py
  ```

  ```bash JavaScript theme={null}
  npm run test:step5
  ```

  ```bash TypeScript theme={null}
  npm run test:step5
  ```

  ```bash PHP theme={null}
  php tests/Step5RendererTest.php
  ```

  ```bash Go theme={null}
  go run tests/step5_renderer.go
  ```

  ```bash Java theme={null}
  mvn -q compile exec:java -Dexec.mainClass="com.gloo.streaming.tests.Step5RendererTest"
  ```
</CodeGroup>

Your output should look similar to the following:

```
🧪 Testing: Typing-Effect Renderer

✓ Token obtained

Test 1: render_stream_to_terminal() — streaming to terminal...
Prompt: Reply with exactly: Hello streaming world

Response: Hello streaming world

[2 tokens, finish_reason=stop]
✓ Prompt header printed
✓ Response label printed
✓ Token summary found: 2 tokens, finish_reason=stop

✅ Typing-effect renderer working.
   Next: Server-Side Proxy
```

<Note>
  With a short prompt like this, tokens arrive so quickly that the typing effect may not be visible — the response appears all at once. That's expected. In production, longer AI responses make the effect clear: each token renders as it arrives rather than waiting for the full response. This is the pattern your chat UI will use.
</Note>

**If tests fail**, check:

* Each token is written with no trailing newline
* `flush()` or equivalent is called after each write
* The summary line format is `[N tokens, finish_reason=X]`

***

## Step 6: Server-Side Proxy

In this step you'll implement the proxy server's stream handler. This is the route that receives requests from browser clients, forwards them upstream to Gloo AI with a server-side auth token, and pipes the SSE response back.

### Key Concepts

#### Why a Proxy?

Browser JavaScript cannot safely include API credentials because anything in client code is visible to anyone who opens DevTools. A proxy server is the standard solution: the browser POSTs to your server, your server adds the auth token and POSTs to Gloo AI, and the SSE stream flows back through your server to the browser.

An additional benefit: the proxy can add rate limiting, logging, and multi-tenant auth logic without touching client code.

#### SSE Headers That Matter

Three headers tell the browser (and any reverse proxies like nginx) that this is a live stream, not a buffered response:

| Header              | Value               | Why                                                  |
| ------------------- | ------------------- | ---------------------------------------------------- |
| `Content-Type`      | `text/event-stream` | Identifies the SSE protocol                          |
| `Cache-Control`     | `no-cache`          | Prevents browser caching of the stream               |
| `X-Accel-Buffering` | `no`                | Disables nginx buffering so bytes arrive immediately |

#### Language-Specific Flushing

Each language needs an explicit flush mechanism to push bytes to the client immediately:

| Language              | Flush mechanism                                          |
| --------------------- | -------------------------------------------------------- |
| Python (Flask)        | `yield` from a generator — Flask flushes on each `yield` |
| JavaScript/TypeScript | `res.write()` — Express sends immediately                |
| PHP                   | `flush()` after each write                               |
| Go                    | `flusher.Flush()` — requires `http.Flusher` interface    |
| Java                  | `out.flush()` after each write                           |

### Implementation Guide

Open the proxy server file referenced in the code block. The server setup and routing are already in place. Find the stream handler method (or route handler, depending on the language), review the TODO comments, and implement the relay logic:

<CodeGroup>
  ```python Python theme={null}
  # File: proxy/server.py
  @app.route("/api/stream", methods=["POST", "OPTIONS"])
  def stream_proxy():
      if request.method == "OPTIONS":
          return Response(status=204)

      request_data = request.get_json() or {}

      def generate():
          try:
              auth_token = ensure_valid_token()
              headers = {
                  "Authorization": f"Bearer {auth_token}",
                  "Content-Type": "application/json",
              }
              payload = {**request_data, "stream": True}

              with requests.post(
                  API_URL, headers=headers, json=payload, stream=True
              ) as resp:
                  if resp.status_code != 200:
                      yield f'data: {{"error": "API error {resp.status_code}"}}\n\n'
                      return

                  for line in resp.iter_lines():
                      if line:
                          decoded = line.decode("utf-8")
                          yield f"{decoded}\n\n"

          except Exception as e:
              yield f'data: {{"error": "{str(e)}"}}\n\n'

      return Response(
          generate(),
          mimetype="text/event-stream",
          headers={
              "Cache-Control": "no-cache",
              "X-Accel-Buffering": "no",
          },
      )
  ```

  ```javascript JavaScript theme={null}
  // File: src/proxy/server.js
  app.post("/api/stream", async (req, res) => {
    res.setHeader("Content-Type", "text/event-stream");
    res.setHeader("Cache-Control", "no-cache");
    res.setHeader("X-Accel-Buffering", "no");
    res.setHeader("Connection", "keep-alive");

    try {
      const token = await ensureValidToken();
      const payload = { ...req.body, stream: true };

      const upstream = await fetch(API_URL, {
        method: "POST",
        headers: {
          Authorization: `Bearer ${token}`,
          "Content-Type": "application/json",
        },
        body: JSON.stringify(payload),
      });

      if (!upstream.ok) {
        const errText = await upstream.text();
        res.write(`data: {"error": "API error ${upstream.status}: ${errText.slice(0, 100)}"}\n\n`);
        res.end();
        return;
      }

      const reader = upstream.body.getReader();
      const decoder = new TextDecoder();

      while (true) {
        const { done, value } = await reader.read();
        if (done) break;
        const text = decoder.decode(value, { stream: true });
        for (const line of text.split("\n")) {
          if (line.trim()) {
            res.write(`${line}\n\n`);
          }
        }
      }

      reader.releaseLock();
    } catch (err) {
      res.write(`data: {"error": "${err.message}"}\n\n`);
    } finally {
      res.end();
    }
  });
  ```

  ```typescript TypeScript theme={null}
  // File: src/proxy/server.ts
  app.post("/api/stream", async (req: Request, res: Response): Promise<void> => {
    res.setHeader("Content-Type", "text/event-stream");
    res.setHeader("Cache-Control", "no-cache");
    res.setHeader("X-Accel-Buffering", "no");
    res.setHeader("Connection", "keep-alive");

    try {
      const token = await ensureValidToken();
      const payload = { ...req.body, stream: true };

      const upstream = await fetch(API_URL, {
        method: "POST",
        headers: {
          Authorization: `Bearer ${token}`,
          "Content-Type": "application/json",
        },
        body: JSON.stringify(payload),
      });

      if (!upstream.ok) {
        const errText = await upstream.text();
        res.write(`data: {"error": "API error ${upstream.status}: ${errText.slice(0, 100)}"}\n\n`);
        res.end();
        return;
      }

      const reader = upstream.body!.getReader();
      const decoder = new TextDecoder();

      while (true) {
        const { done, value } = await reader.read();
        if (done) break;
        const text = decoder.decode(value, { stream: true });
        for (const line of text.split("\n")) {
          if (line.trim()) {
            res.write(`${line}\n\n`);
          }
        }
      }

      reader.releaseLock();
    } catch (err: unknown) {
      const message = err instanceof Error ? err.message : String(err);
      res.write(`data: {"error": "${message}"}\n\n`);
    } finally {
      res.end();
    }
  });
  ```

  ```php PHP theme={null}
  // File: src/Proxy/Server.php
  public static function handle(): void
  {
      $corsOrigin = $_ENV['PROXY_CORS_ORIGIN'] ?? 'http://localhost:3000';
      header('Access-Control-Allow-Origin: ' . $corsOrigin);
      header('Access-Control-Allow-Headers: Content-Type, Authorization');
      header('Access-Control-Allow-Methods: POST, OPTIONS');

      if ($_SERVER['REQUEST_METHOD'] === 'OPTIONS') {
          http_response_code(204);
          exit;
      }

      $path = parse_url($_SERVER['REQUEST_URI'] ?? '/', PHP_URL_PATH);
      if ($path === '/health') {
          http_response_code(200);
          header('Content-Type: application/json');
          echo json_encode(['status' => 'ok', 'service' => 'completions-streaming-proxy']);
          exit;
      }

      if ($_SERVER['REQUEST_METHOD'] !== 'POST') {
          http_response_code(405);
          exit;
      }

      header('Content-Type: text/event-stream');
      header('Cache-Control: no-cache');
      header('X-Accel-Buffering: no');

      while (ob_get_level() > 0) ob_end_flush();

      try {
          $authToken = TokenManager::ensureValidToken();
          $body = json_decode(file_get_contents('php://input'), true) ?? [];
          $body['stream'] = true;

          $ch = curl_init(self::API_URL);
          curl_setopt_array($ch, [
              CURLOPT_POST => true,
              CURLOPT_POSTFIELDS => json_encode($body),
              CURLOPT_HTTPHEADER => [
                  'Authorization: Bearer ' . $authToken,
                  'Content-Type: application/json',
              ],
              CURLOPT_RETURNTRANSFER => false,
              CURLOPT_WRITEFUNCTION => function ($ch, $data) {
                  foreach (explode("\n", $data) as $line) {
                      if (trim($line)) {
                          echo $line . "\n\n";
                          flush();
                      }
                  }
                  return strlen($data);
              },
          ]);

          curl_exec($ch);
          curl_close($ch);
      } catch (\Throwable $e) {
          echo 'data: {"error": "' . addslashes($e->getMessage()) . '"}' . "\n\n";
          flush();
      }
  }
  ```

  ```go Go theme={null}
  // File: pkg/proxy/server.go
  func streamProxy(w http.ResponseWriter, r *http.Request) {
      corsOrigin := os.Getenv("PROXY_CORS_ORIGIN")
      if corsOrigin == "" {
          corsOrigin = "http://localhost:3000"
      }
      w.Header().Set("Access-Control-Allow-Origin", corsOrigin)
      w.Header().Set("Access-Control-Allow-Headers", "Content-Type, Authorization")
      w.Header().Set("Access-Control-Allow-Methods", "POST, OPTIONS")

      if r.Method == http.MethodOptions {
          w.WriteHeader(http.StatusNoContent)
          return
      }
      if r.Method != http.MethodPost {
          http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
          return
      }

      w.Header().Set("Content-Type", "text/event-stream")
      w.Header().Set("Cache-Control", "no-cache")
      w.Header().Set("X-Accel-Buffering", "no")

      flusher := w.(http.Flusher)

      token, err := auth.EnsureValidToken()
      if err != nil {
          fmt.Fprintf(w, "data: {\"error\": \"%s\"}\n\n", err.Error())
          flusher.Flush()
          return
      }

      body, _ := io.ReadAll(r.Body)
      var reqBody map[string]any
      json.Unmarshal(body, &reqBody)
      reqBody["stream"] = true
      payload, _ := json.Marshal(reqBody)

      req, _ := http.NewRequest(http.MethodPost, apiURL, bytes.NewReader(payload))
      req.Header.Set("Authorization", "Bearer "+token)
      req.Header.Set("Content-Type", "application/json")

      resp, err := (&http.Client{}).Do(req)
      if err != nil {
          fmt.Fprintf(w, "data: {\"error\": \"%s\"}\n\n", err.Error())
          flusher.Flush()
          return
      }
      defer resp.Body.Close()

      if resp.StatusCode != http.StatusOK {
          fmt.Fprintf(w, "data: {\"error\": \"API error %d\"}\n\n", resp.StatusCode)
          flusher.Flush()
          return
      }

      scanner := bufio.NewScanner(resp.Body)
      for scanner.Scan() {
          if line := scanner.Text(); line != "" {
              fmt.Fprintf(w, "%s\n\n", line)
              flusher.Flush()
          }
      }
  }
  ```

  ```java Java theme={null}
  // File: src/main/java/com/gloo/streaming/proxy/ProxyServer.java
  @Override
  public void handle(HttpExchange exchange) {
      try {
          exchange.getResponseHeaders().add("Access-Control-Allow-Origin", corsOrigin);
          exchange.getResponseHeaders().add("Access-Control-Allow-Headers", "Content-Type, Authorization");
          exchange.getResponseHeaders().add("Access-Control-Allow-Methods", "POST, OPTIONS");

          if ("OPTIONS".equals(exchange.getRequestMethod())) {
              exchange.sendResponseHeaders(204, -1);
              return;
          }
          if (!"POST".equals(exchange.getRequestMethod())) {
              exchange.sendResponseHeaders(405, -1);
              return;
          }

          exchange.getResponseHeaders().add("Content-Type", "text/event-stream");
          exchange.getResponseHeaders().add("Cache-Control", "no-cache");
          exchange.getResponseHeaders().add("X-Accel-Buffering", "no");
          exchange.sendResponseHeaders(200, 0);

          OutputStream out = exchange.getResponseBody();

          try {
              String authToken = TokenManager.ensureValidToken();
              String rawBody = new String(
                  exchange.getRequestBody().readAllBytes(), StandardCharsets.UTF_8
              );

              Map<String, Object> body = GSON.fromJson(rawBody, Map.class);
              if (body == null) body = new java.util.HashMap<>();
              body.put("stream", true);

              HttpRequest request = HttpRequest.newBuilder()
                  .uri(URI.create(API_URL))
                  .header("Authorization", "Bearer " + authToken)
                  .header("Content-Type", "application/json")
                  .POST(HttpRequest.BodyPublishers.ofString(GSON.toJson(body)))
                  .build();

              HttpResponse<java.io.InputStream> upstream = HTTP_CLIENT.send(
                  request, HttpResponse.BodyHandlers.ofInputStream()
              );

              if (upstream.statusCode() != 200) {
                  String errMsg = "data: {\"error\": \"API error " + upstream.statusCode() + "\"}\n\n";
                  out.write(errMsg.getBytes(StandardCharsets.UTF_8));
                  out.flush();
                  return;
              }

              try (BufferedReader reader = new BufferedReader(
                  new InputStreamReader(upstream.body(), StandardCharsets.UTF_8)
              )) {
                  String line;
                  while ((line = reader.readLine()) != null) {
                      if (!line.isBlank()) {
                          out.write((line + "\n\n").getBytes(StandardCharsets.UTF_8));
                          out.flush();
                      }
                  }
              }
          } catch (Exception e) {
              String errMsg = "data: {\"error\": \"" + e.getMessage().replace("\"", "'") + "\"}\n\n";
              out.write(errMsg.getBytes(StandardCharsets.UTF_8));
              out.flush();
          } finally {
              out.close();
          }
      } catch (Exception e) {
          System.err.println("Handler error: " + e.getMessage());
      }
  }
  ```
</CodeGroup>

The code does the following:

* Sets `Content-Type: text/event-stream`, `Cache-Control: no-cache`, and `X-Accel-Buffering: no` before writing any response data
* Handles `OPTIONS` preflight requests immediately so browsers can POST cross-origin
* Retrieves a fresh auth token using the pre-built token manager, keeping credentials server-side
* Reads the incoming request body, injects `stream: true`, and forwards the request to the Gloo AI API
* Relays each non-blank SSE line to the client and flushes immediately so tokens reach the browser as they arrive
* Writes a structured error SSE frame if the upstream request fails, avoiding a silent stream close

<Note>
  PHP, Go, and Java use a generic HTTP handler that receives all request methods, so they include an explicit 405 check before the streaming logic. Python, JavaScript, and TypeScript register the route for POST only, so the framework rejects other methods automatically.
</Note>

### ✓ Checkpoint: Proxy Server

Run the proxy server validation test:

<CodeGroup>
  ```bash Python theme={null}
  python tests/step6_proxy_test.py
  ```

  ```bash JavaScript theme={null}
  npm run test:step6
  ```

  ```bash TypeScript theme={null}
  npm run test:step6
  ```

  ```bash PHP theme={null}
  php tests/Step6ProxyTest.php
  ```

  ```bash Go theme={null}
  go run tests/step6_proxy.go
  ```

  ```bash Java theme={null}
  mvn -q compile exec:java -Dexec.mainClass="com.gloo.streaming.tests.Step6ProxyTest"
  ```
</CodeGroup>

Your output should look similar to the following:

```
🧪 Testing: Server-Side Proxy

Test 1: Starting proxy server on port 3001...
 * Serving Flask app 'proxy.server'
 * Debug mode: off
✓ Proxy server running at http://localhost:3001

Test 2: /health endpoint...
✓ /health returns: {'service': 'completions-streaming-proxy', 'status': 'ok'}

Test 3: POST /api/stream — Content-Type header...
✓ Content-Type: text/event-stream; charset=utf-8

Test 4: SSE line format (data: prefix)...
✓ All lines have 'data: ' prefix (3 data chunks received)
✓ Stream terminated cleanly (finish_reason=stop)

Test 5: CORS headers on response...
✓ Access-Control-Allow-Origin: http://localhost:3000

✅ Proxy server relaying SSE end-to-end.
   Proxy complete: credentials stay server-side, client receives SSE.
```

**If tests fail**, check:

* CORS headers are set before sending the response headers (Java)
* `X-Accel-Buffering: no` is present (required to disable nginx buffering)
* Go: the flusher interface assertion must succeed — this panics if the `ResponseWriter` doesn't support flushing
* PHP: clear any existing output buffers before setting SSE headers

***

## Step 7: Testing Your Complete Implementation

With all six steps implemented, you can now run the full demo, test the proxy server via API, and explore the browser demo.

### Run the Demo Script

The entry point runs both examples back-to-back: first it accumulates a full response and prints it, then it streams a second response to the terminal with a typing effect.

<CodeGroup>
  ```bash Python theme={null}
  python main.py
  ```

  ```bash JavaScript theme={null}
  npm start
  ```

  ```bash TypeScript theme={null}
  npm start
  ```

  ```bash PHP theme={null}
  composer start
  ```

  ```bash Go theme={null}
  go run main.go
  ```

  ```bash Java theme={null}
  mvn -q compile exec:java -Dexec.mainClass="com.gloo.streaming.Main"
  ```
</CodeGroup>

Your output should look similar to:

```
Streaming AI Responses in Real Time

Environment variables loaded

Example: Streaming a completion (accumulate full text)...

Full response:
The resurrection of Jesus Christ is a cornerstone of Christian 
faith, holding profound significance for believers. It's not 
merely a historical event but a theological truth that reshapes 
our understanding of God, humanity, and...

Received 16 tokens in 6864ms
  Finish reason: stop

Example: Typing-effect rendering...
Prompt: Tell me about Christian discipleship.

Response: Christian discipleship is a transformative journey of 
following Jesus Christ, learning from His teachings, and striving 
to live a life that reflects His character and mission...

[11 tokens, finish_reason=stop]
```

### Test the Proxy Server via API

Start the proxy server in one terminal:

<CodeGroup>
  ```bash Python theme={null}
  python proxy/server.py
  ```

  ```bash JavaScript theme={null}
  npm run proxy
  ```

  ```bash TypeScript theme={null}
  npm run proxy
  ```

  ```bash PHP theme={null}
  composer proxy
  ```

  ```bash Go theme={null}
  go run cmd/proxy/main.go
  ```

  ```bash Java theme={null}
  mvn -q compile exec:java -Dexec.mainClass="com.gloo.streaming.proxy.ProxyServer"
  ```
</CodeGroup>

Then send a request from another terminal using curl:

```bash theme={null}
curl -X POST http://localhost:3001/api/stream \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Hello!"}], "auto_routing": true}'
```

You will see the SSE stream arrive line by line:

```
data: {"id": "gen-abc123", "choices": [{"delta": {"content": "Hello", "function_call": null, "refusal": null, "role": "assistant", "tool_calls": null}, "finish_reason": null, "index": 0, "logprobs": null, "native_finish_reason": null}], "created": 1774527271, "model": "google/gemini-2.5-flash", "object": "chat.completion.chunk", "service_tier": null, "system_fingerprint": null, "usage": null, "provider": "Gloo AI", "ttft_ms": 940.61}

data: {"id": "gen-abc123", "choices": [{"delta": {"content": "! How", "function_call": null, "refusal": null, "role": "assistant", "tool_calls": null}, "finish_reason": null, "index": 0, "logprobs": null, "native_finish_reason": null}], "created": 1774527271, "model": "google/gemini-2.5-flash", "object": "chat.completion.chunk", "service_tier": null, "system_fingerprint": null, "usage": null, "provider": "Gloo AI"}

data: {"id": "gen-abc123", "choices": [{"delta": {"content": " can I help you today?", "function_call": null, "refusal": null, "role": "assistant", "tool_calls": null}, "finish_reason": null, "index": 0, "logprobs": null, "native_finish_reason": null}], "created": 1774527271, "model": "google/gemini-2.5-flash", "object": "chat.completion.chunk", "service_tier": null, "system_fingerprint": null, "usage": null, "provider": "Gloo AI"}

data: {"id": "gen-abc123", "choices": [{"delta": {"content": "", "function_call": null, "refusal": null, "role": "assistant", "tool_calls": null}, "finish_reason": "stop", "index": 0, "logprobs": null, "native_finish_reason": "STOP"}], "created": 1774527271, "model": "google/gemini-2.5-flash", "object": "chat.completion.chunk", "service_tier": null, "system_fingerprint": null, "usage": null, "provider": "Gloo AI"}
```

Each line is a JSON-encoded delta. The final chunk signals the end of the stream with a non-null `finish_reason`.

### Browser Demo

The browser demo is a standalone HTML file separate from the language starter projects — no install step required.

<Card title="frontend-example/" icon="github" href="https://github.com/GlooDeveloper/gloo-ai-docs-cookbook/tree/main/completions-streaming/frontend-example">
  Download or clone this directory alongside your language starter
</Card>

The file connects to the proxy over HTTP, so it works with **any language's proxy server**.

With the proxy already running on port 3001, serve the browser client from the `frontend-example/` directory using whichever tool you have available:

```bash theme={null}
# Node
npx serve

# Python
python -m http.server 3000

# PHP
php -S localhost:3000
```

<Warning>
  Do not open `index.html` directly via `File > Open`. When loaded as a `file://` URL, the browser reports `Origin: null`, which the proxy's CORS policy rejects. You must serve the file over HTTP so the origin is `http://localhost:3000`.
</Warning>

Then open `http://localhost:3000` in your browser, type a question, and click **Send** — tokens appear one by one as they arrive from the proxy.

<img src="https://mintcdn.com/gloo-b243725a/FO0xxOeSEWhVs-PF/images/completions-streaming-browser-demo.png?fit=max&auto=format&n=FO0xxOeSEWhVs-PF&q=85&s=bdb7326d5cb30e9fdd58f3da889c3a60" alt="Gloo AI Streaming Demo browser page showing a streamed response to &#x22;What is my purpose in life&#x22;" width="1756" height="1420" data-path="images/completions-streaming-browser-demo.png" />

#### How the Browser Connects to the Stream

Browsers have a built-in API called `EventSource` designed for receiving server-sent events — but it only supports `GET` requests. Since the completions API requires a `POST` body containing the message text, `EventSource` can't be used here. Instead, the demo page uses `fetch()` with a `ReadableStream`, which supports any HTTP method:

```js theme={null}
// File: frontend-example/index.html
const response = await fetch("http://localhost:3001/api/stream", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ messages: [{ role: "user", content: message }] }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
```

The `ReadableStream` API works identically to what you used in the terminal renderer — the same line buffer, SSE parser, and token extractor pattern applies.

#### Markdown Rendering

AI responses often contain markdown. Inserting raw tokens directly into the DOM produces broken mid-stream output — `**bo` appears before `ld**` closes the bold span. The correct pattern is to **accumulate tokens and re-parse the full buffer on each token**:

```js theme={null}
// File: frontend-example/index.html
let buffer = "";

// On each token:
buffer += content;
outputEl.innerHTML = DOMPurify.sanitize(marked.parse(buffer));
```

`marked.parse()` runs on every token — slightly redundant but always produces valid HTML. `DOMPurify.sanitize()` prevents XSS from any HTML in the AI response.

<Tip>
  For production, serve the browser client from the same origin as the proxy, or set `PROXY_CORS_ORIGIN` in your `.env` to match your frontend domain.
</Tip>

<Tip>
  For React applications, the [Vercel AI SDK](https://sdk.vercel.ai/) `useChat` hook handles streaming, markdown rendering, and state management out of the box — it's a higher-level alternative to building this pattern manually.
</Tip>

***

## Troubleshooting

**Stream hangs and never produces output**
: Verify `"stream": true` is in the request payload. Without it, the API returns a single buffered JSON response so the connection may appear to hang while waiting for a response format that never arrives.

**Garbled or split tokens**
: The line buffer is missing or incorrect. In JS/TS/PHP, raw bytes must be accumulated and split on `\n` before parsing. Make sure `buffer = lines.pop()` saves the incomplete last fragment.

**`Authentication failed (401)`**
: Your `.env` file is missing `GLOO_CLIENT_ID` or `GLOO_CLIENT_SECRET`, or the values are incorrect. Run the Step 1 checkpoint to verify credentials load correctly.

**Browser blocks direct API calls (CORS error)**
: Browsers enforce same-origin policy. Direct calls from browser JavaScript to `platform.ai.gloo.com` will be blocked. Use the proxy server (Step 6) so API calls happen server-side.

**`Failed to fetch` when serving the browser demo on a port other than 3000**
: The proxy allows requests only from `http://localhost:3000` by default. If your file server uses a different port (e.g. VS Code / Cursor Live Server on port 5500, or `python -m http.server 8080`), the browser's `Origin` header won't match and the proxy blocks the request. Fix: set `PROXY_CORS_ORIGIN` in your `.env` to the **exact** origin shown in your browser's address bar, then restart the proxy.

```bash theme={null}
# .env — must be an exact match including hostname
PROXY_CORS_ORIGIN=http://127.0.0.1:5500  # Cursor / VS Code Live Server
```

Note that `http://localhost:5500` and `http://127.0.0.1:5500` are treated as different origins by the browser even though they resolve to the same address. Copy the origin directly from the address bar to avoid a mismatch.

**PHP output appears all at once**
: PHP's output buffering is active. Call `ob_end_flush()` (or `while (ob_get_level() > 0) ob_end_flush()`) before the SSE loop to disable buffering.

**Go panics on `w.(http.Flusher)`**
: Your `http.ResponseWriter` doesn't implement `http.Flusher`. This shouldn't happen with the standard `net/http` server, but will happen with some test wrappers. Make sure you're using `http.ResponseWriter` directly.

**Mid-stream disconnect loses all output**
: Wrap the read loop in try/catch (or check errors in Go). If `fullText` already has content when the error occurs, return it rather than re-raising — partial responses are usually more useful than nothing.

**Broken markdown mid-stream**
: Do not insert raw tokens into `innerHTML`. Accumulate the full buffer and call `marked.parse(buffer)` on every token — this ensures the markdown is always valid HTML at each step.

***

## View the Completed Project

If you want to see a working reference before or after completing the steps, the final project is available in the tutorial repository:

<Card title="Completed Project" href="https://github.com/gloo-ai/gloo-ai-docs-cookbook/tree/main/completions-streaming/final" icon="github">
  Browse the complete implementation in all six languages — Python, JavaScript, TypeScript, PHP, Go, and Java.
</Card>

***

## Next Steps

* **[Grounded Completions](/tutorials/completions-grounded)** — add retrieved context from your content library to improve response accuracy
* **[Tool Use](/tutorials/completions-tool-use)** — combine streaming with function calling for real-time tool-augmented responses
* **[Completions API reference](/api-guides/completions-v2)** — explore all available parameters including `tradition`, `model_family`, and `model`
