> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gloo.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Part 1: Set Up the Pipeline

> Create a publisher, ingest content with metadata, and verify indexing — the foundation of an end-to-end RAG pipeline on Gloo AI.

This is **Part 1 of the Build an End-to-End RAG Pipeline series**. Across three parts, you'll connect Gloo AI's features into one continuous workflow: the publisher and content you set up here are the same ones you'll manage in Part 2 and harden in Part 3.

In this part you'll create a publisher, upload content with descriptive metadata, and verify that it's fully indexed and ready for retrieval.

## Pipeline at a Glance

<Steps>
  <Step title="Publisher setup (Studio)">
    Create the publisher that owns your content — covered below.
  </Step>

  <Step title="Ingest content with metadata">
    Upload files and enrich them — covered below, with a deep dive in [Upload Files to Data Engine](/tutorials/upload-files).
  </Step>

  <Step title="Verify indexing">
    Poll item status until your content is searchable — covered below.
  </Step>

  <Step title="Semantic search">
    Query your content — deep dive: [Building Custom Search](/tutorials/search).
  </Step>

  <Step title="Grounded completions with sources">
    Answer questions from your content with citations — deep dive: [Grounded Completions with RAG](/tutorials/completions-grounded).
  </Step>

  <Step title="Content lifecycle">
    Update, bulk-edit, and delete content — [Part 2](/tutorials/rag-pipeline-part-2).
  </Step>

  <Step title="Verification, errors & resilience">
    Error handling and retry patterns — [Part 3](/tutorials/rag-pipeline-part-3).
  </Step>
</Steps>

## Prerequisites

Before starting, ensure you have:

* A Gloo AI Studio account
* Your Client ID and Client Secret from the [API Credentials page](/studio/manage-api-credentials)
* **Authentication setup** - Complete the [Authentication Tutorial](/tutorials/authentication) first

<Info>
  All API calls in this series use Bearer token authentication via the OAuth2 client credentials flow. The snippets below include a minimal token fetch; see the [Authentication Tutorial](/tutorials/authentication) for token caching and expiration handling.
</Info>

***

## Step 1: Create Your Publisher

Content in the Data Engine belongs to a **publisher**. Create one in Gloo AI Studio:

1. In [Gloo AI Studio](https://studio.ai.gloo.com), click your user account in the bottom-left corner and select **Manage Organizations**
2. Select the organization you want to add the publisher to, then click **View Publishers**
3. Click **Create Publisher** and give the new publisher a name
4. Copy the **Publisher ID** (a UUID) — every API call in this series uses it

See [Manage Publishers](/studio/manage-publishers) for the full Studio walkthrough.

<Tip>
  Use one dedicated publisher for this series. Parts 2 and 3 operate on the content you upload here, and a dedicated publisher keeps those operations cleanly separated from your production content.
</Tip>

## Step 2: Upload Content with a Producer ID

Upload a file to **POST** `/ingestion/v2/files`. The `producer_id` query parameter attaches your own stable identifier to the item — this is what makes the pipeline manageable later: re-running the upload detects a duplicate instead of creating a copy, and in Part 2 you'll update, bulk-edit, and delete these same items.

This series uses a short Markdown article as its sample content: grab [`building-stronger-communities.md` from the cookbook repository](https://github.com/GlooDeveloper/gloo-ai-docs-cookbook/blob/main/rag-pipeline-part-1/sample_files/building-stronger-communities.md) and save it next to your script. (Any Markdown, text, PDF, or Word file of your own works too — just adjust the filename.) Then upload it:

<CodeGroup>
  ```python Python theme={null}
  import requests

  CLIENT_ID = "your_client_id"
  CLIENT_SECRET = "your_client_secret"
  PUBLISHER_ID = "your_publisher_id"
  PRODUCER_ID = "rag-pipeline-part1-building-stronger-communities"

  # Get an access token (see the Authentication tutorial)
  token = requests.post(
      "https://platform.ai.gloo.com/oauth2/token",
      data={"grant_type": "client_credentials", "scope": "api/access"},
      auth=(CLIENT_ID, CLIENT_SECRET),
  ).json()["access_token"]

  # Upload the file with a stable producer ID
  with open("building-stronger-communities.md", "rb") as f:
      response = requests.post(
          "https://platform.ai.gloo.com/ingestion/v2/files",
          headers={"Authorization": f"Bearer {token}"},
          params={"producer_id": PRODUCER_ID},
          files={"files": ("building-stronger-communities.md", f)},
          data={"publisher_id": PUBLISHER_ID},
      )
  result = response.json()

  # A fresh upload returns the new item ID in "ingesting";
  # re-uploading the same content returns it in "duplicates".
  item_id = (result["ingesting"] or result["duplicates"])[0]
  print(f"Item ID: {item_id}")
  ```

  ```javascript JavaScript theme={null}
  import { readFile } from "node:fs/promises";

  const CLIENT_ID = "your_client_id";
  const CLIENT_SECRET = "your_client_secret";
  const PUBLISHER_ID = "your_publisher_id";
  const PRODUCER_ID = "rag-pipeline-part1-building-stronger-communities";

  // Get an access token (see the Authentication tutorial)
  const tokenResponse = await fetch("https://platform.ai.gloo.com/oauth2/token", {
    method: "POST",
    headers: {
      "Content-Type": "application/x-www-form-urlencoded",
      Authorization: `Basic ${Buffer.from(`${CLIENT_ID}:${CLIENT_SECRET}`).toString("base64")}`,
    },
    body: new URLSearchParams({ grant_type: "client_credentials", scope: "api/access" }),
  });
  const token = (await tokenResponse.json()).access_token;

  // Upload the file with a stable producer ID
  const form = new FormData();
  form.append("publisher_id", PUBLISHER_ID);
  form.append(
    "files",
    new Blob([await readFile("building-stronger-communities.md")]),
    "building-stronger-communities.md"
  );

  const response = await fetch(
    `https://platform.ai.gloo.com/ingestion/v2/files?producer_id=${encodeURIComponent(PRODUCER_ID)}`,
    { method: "POST", headers: { Authorization: `Bearer ${token}` }, body: form }
  );
  const result = await response.json();

  // A fresh upload returns the new item ID in "ingesting";
  // re-uploading the same content returns it in "duplicates".
  const itemId = (result.ingesting.length ? result.ingesting : result.duplicates)[0];
  console.log(`Item ID: ${itemId}`);
  ```

  ```typescript TypeScript theme={null}
  import { readFileSync } from "node:fs";

  const CLIENT_ID = "your_client_id";
  const CLIENT_SECRET = "your_client_secret";
  const PUBLISHER_ID = "your_publisher_id";
  const PRODUCER_ID = "rag-pipeline-part1-building-stronger-communities";

  interface UploadResponse {
    ingesting: string[];
    duplicates: string[];
  }

  (async () => {
    // Get an access token (see the Authentication tutorial)
    const tokenResponse = await fetch("https://platform.ai.gloo.com/oauth2/token", {
      method: "POST",
      headers: {
        "Content-Type": "application/x-www-form-urlencoded",
        Authorization: `Basic ${Buffer.from(`${CLIENT_ID}:${CLIENT_SECRET}`).toString("base64")}`,
      },
      body: new URLSearchParams({ grant_type: "client_credentials", scope: "api/access" }),
    });
    const token = ((await tokenResponse.json()) as { access_token: string }).access_token;

    // Upload the file with a stable producer ID
    const form = new FormData();
    form.append("publisher_id", PUBLISHER_ID);
    form.append(
      "files",
      new Blob([readFileSync("building-stronger-communities.md")]),
      "building-stronger-communities.md"
    );

    const response = await fetch(
      `https://platform.ai.gloo.com/ingestion/v2/files?producer_id=${encodeURIComponent(PRODUCER_ID)}`,
      { method: "POST", headers: { Authorization: `Bearer ${token}` }, body: form }
    );
    const result = (await response.json()) as UploadResponse;

    // A fresh upload returns the new item ID in "ingesting";
    // re-uploading the same content returns it in "duplicates".
    const itemId = (result.ingesting.length ? result.ingesting : result.duplicates)[0];
    console.log(`Item ID: ${itemId}`);
  })();
  ```

  ```php PHP theme={null}
  <?php

  $clientId = 'your_client_id';
  $clientSecret = 'your_client_secret';
  $publisherId = 'your_publisher_id';
  $producerId = 'rag-pipeline-part1-building-stronger-communities';

  // Get an access token (see the Authentication tutorial)
  $ch = curl_init('https://platform.ai.gloo.com/oauth2/token');
  curl_setopt_array($ch, [
      CURLOPT_RETURNTRANSFER => true,
      CURLOPT_POST => true,
      CURLOPT_USERPWD => "$clientId:$clientSecret",
      CURLOPT_POSTFIELDS => http_build_query([
          'grant_type' => 'client_credentials',
          'scope' => 'api/access',
      ]),
  ]);
  $token = json_decode(curl_exec($ch), true)['access_token'];
  curl_close($ch);

  // Upload the file with a stable producer ID
  $url = 'https://platform.ai.gloo.com/ingestion/v2/files?producer_id=' . urlencode($producerId);
  $ch = curl_init($url);
  curl_setopt_array($ch, [
      CURLOPT_RETURNTRANSFER => true,
      CURLOPT_POST => true,
      CURLOPT_HTTPHEADER => ["Authorization: Bearer $token"],
      CURLOPT_POSTFIELDS => [
          'publisher_id' => $publisherId,
          'files' => new CURLFile('building-stronger-communities.md', 'text/markdown'),
      ],
  ]);
  $result = json_decode(curl_exec($ch), true);
  curl_close($ch);

  // A fresh upload returns the new item ID in "ingesting";
  // re-uploading the same content returns it in "duplicates".
  $itemId = ($result['ingesting'] ?: $result['duplicates'])[0];
  echo "Item ID: $itemId\n";
  ```

  ```go Go theme={null}
  package main

  import (
  	"bytes"
  	"encoding/json"
  	"fmt"
  	"mime/multipart"
  	"net/http"
  	"net/url"
  	"os"
  	"strings"
  )

  const (
  	clientID     = "your_client_id"
  	clientSecret = "your_client_secret"
  	publisherID  = "your_publisher_id"
  	producerID   = "rag-pipeline-part1-building-stronger-communities"
  )

  func main() {
  	// Get an access token (see the Authentication tutorial)
  	form := url.Values{"grant_type": {"client_credentials"}, "scope": {"api/access"}}
  	req, _ := http.NewRequest("POST", "https://platform.ai.gloo.com/oauth2/token",
  		strings.NewReader(form.Encode()))
  	req.SetBasicAuth(clientID, clientSecret)
  	req.Header.Set("Content-Type", "application/x-www-form-urlencoded")
  	resp, err := http.DefaultClient.Do(req)
  	if err != nil {
  		panic(err)
  	}
  	defer resp.Body.Close()
  	var tokenData struct {
  		AccessToken string `json:"access_token"`
  	}
  	json.NewDecoder(resp.Body).Decode(&tokenData)

  	// Upload the file with a stable producer ID
  	fileBytes, _ := os.ReadFile("building-stronger-communities.md")
  	var buf bytes.Buffer
  	writer := multipart.NewWriter(&buf)
  	writer.WriteField("publisher_id", publisherID)
  	part, _ := writer.CreateFormFile("files", "building-stronger-communities.md")
  	part.Write(fileBytes)
  	writer.Close()

  	uploadURL := "https://platform.ai.gloo.com/ingestion/v2/files?producer_id=" +
  		url.QueryEscape(producerID)
  	req, _ = http.NewRequest("POST", uploadURL, &buf)
  	req.Header.Set("Authorization", "Bearer "+tokenData.AccessToken)
  	req.Header.Set("Content-Type", writer.FormDataContentType())
  	resp, err = http.DefaultClient.Do(req)
  	if err != nil {
  		panic(err)
  	}
  	defer resp.Body.Close()

  	// A fresh upload returns the new item ID in "ingesting";
  	// re-uploading the same content returns it in "duplicates".
  	var result struct {
  		Ingesting  []string `json:"ingesting"`
  		Duplicates []string `json:"duplicates"`
  	}
  	json.NewDecoder(resp.Body).Decode(&result)
  	itemID := append(result.Ingesting, result.Duplicates...)[0]
  	fmt.Println("Item ID:", itemID)
  }
  ```

  ```java Java theme={null}
  import com.google.gson.Gson;
  import com.google.gson.JsonObject;
  import java.net.URI;
  import java.net.URLEncoder;
  import java.net.http.HttpClient;
  import java.net.http.HttpRequest;
  import java.net.http.HttpResponse;
  import java.nio.charset.StandardCharsets;
  import java.nio.file.Files;
  import java.nio.file.Path;
  import java.util.Base64;

  public class UploadContent {
      static final String CLIENT_ID = "your_client_id";
      static final String CLIENT_SECRET = "your_client_secret";
      static final String PUBLISHER_ID = "your_publisher_id";
      static final String PRODUCER_ID = "rag-pipeline-part1-building-stronger-communities";

      public static void main(String[] args) throws Exception {
          HttpClient http = HttpClient.newHttpClient();
          Gson gson = new Gson();

          // Get an access token (see the Authentication tutorial)
          String basicAuth = Base64.getEncoder()
              .encodeToString((CLIENT_ID + ":" + CLIENT_SECRET).getBytes(StandardCharsets.UTF_8));
          HttpRequest tokenRequest = HttpRequest.newBuilder()
              .uri(URI.create("https://platform.ai.gloo.com/oauth2/token"))
              .header("Content-Type", "application/x-www-form-urlencoded")
              .header("Authorization", "Basic " + basicAuth)
              .POST(HttpRequest.BodyPublishers.ofString(
                  "grant_type=client_credentials&scope=api/access"))
              .build();
          String token = gson
              .fromJson(http.send(tokenRequest, HttpResponse.BodyHandlers.ofString()).body(),
                  JsonObject.class)
              .get("access_token").getAsString();

          // Upload the file with a stable producer ID (manual multipart body)
          String boundary = "----GlooBoundary";
          var body = new java.io.ByteArrayOutputStream();
          body.write(("--" + boundary + "\r\n"
              + "Content-Disposition: form-data; name=\"publisher_id\"\r\n\r\n"
              + PUBLISHER_ID + "\r\n").getBytes(StandardCharsets.UTF_8));
          body.write(("--" + boundary + "\r\n"
              + "Content-Disposition: form-data; name=\"files\"; "
              + "filename=\"building-stronger-communities.md\"\r\n"
              + "Content-Type: text/markdown\r\n\r\n").getBytes(StandardCharsets.UTF_8));
          body.write(Files.readAllBytes(Path.of("building-stronger-communities.md")));
          body.write(("\r\n--" + boundary + "--\r\n").getBytes(StandardCharsets.UTF_8));

          HttpRequest uploadRequest = HttpRequest.newBuilder()
              .uri(URI.create("https://platform.ai.gloo.com/ingestion/v2/files?producer_id="
                  + URLEncoder.encode(PRODUCER_ID, StandardCharsets.UTF_8)))
              .header("Authorization", "Bearer " + token)
              .header("Content-Type", "multipart/form-data; boundary=" + boundary)
              .POST(HttpRequest.BodyPublishers.ofByteArray(body.toByteArray()))
              .build();
          JsonObject result = gson
              .fromJson(http.send(uploadRequest, HttpResponse.BodyHandlers.ofString()).body(),
                  JsonObject.class);

          // A fresh upload returns the new item ID in "ingesting";
          // re-uploading the same content returns it in "duplicates".
          var ingesting = result.getAsJsonArray("ingesting");
          String itemId = (ingesting.size() > 0
              ? ingesting
              : result.getAsJsonArray("duplicates")).get(0).getAsString();
          System.out.println("Item ID: " + itemId);
      }
  }
  ```
</CodeGroup>

### What You'll See

```
Item ID: 822308cc-72fc-478d-a2eb-fbdf01a6a15d
```

Keep this item ID — the next two steps use it, and Parts 2 and 3 operate on this same item.

<Tip>
  For multi-file uploads, batch processing, and supported file types, see the [Upload Files to Data Engine](/tutorials/upload-files) deep dive.
</Tip>

## Step 3: Enrich with Metadata

Attach descriptive metadata with **PATCH** `/engine/v2/item`. Good metadata pays off downstream: titles and authors appear in search results and source citations, and tags let you organize and bulk-manage content in Part 2.

<CodeGroup>
  ```python Python theme={null}
  metadata = {
      "publisher_id": PUBLISHER_ID,
      "item_id": item_id,
      "item_title": "Building Stronger Communities Through Service",
      "item_summary": "Practical guidance for starting and sustaining community service efforts.",
      "author": ["Gloo AI Docs Team"],
      "item_tags": ["community", "service", "rag-pipeline-series"],
  }
  response = requests.patch(
      "https://platform.ai.gloo.com/engine/v2/item",
      headers={"Authorization": f"Bearer {token}", "Content-Type": "application/json"},
      json=metadata,
  )
  response.raise_for_status()
  print("Metadata set")
  ```

  ```javascript JavaScript theme={null}
  const metadata = {
    publisher_id: PUBLISHER_ID,
    item_id: itemId,
    item_title: "Building Stronger Communities Through Service",
    item_summary: "Practical guidance for starting and sustaining community service efforts.",
    author: ["Gloo AI Docs Team"],
    item_tags: ["community", "service", "rag-pipeline-series"],
  };
  const metadataResponse = await fetch("https://platform.ai.gloo.com/engine/v2/item", {
    method: "PATCH",
    headers: { Authorization: `Bearer ${token}`, "Content-Type": "application/json" },
    body: JSON.stringify(metadata),
  });
  if (!metadataResponse.ok) throw new Error(`HTTP ${metadataResponse.status}`);
  console.log("Metadata set");
  ```

  ```typescript TypeScript theme={null}
  const metadata = {
    publisher_id: PUBLISHER_ID,
    item_id: itemId,
    item_title: "Building Stronger Communities Through Service",
    item_summary: "Practical guidance for starting and sustaining community service efforts.",
    author: ["Gloo AI Docs Team"],
    item_tags: ["community", "service", "rag-pipeline-series"],
  };
  const metadataResponse = await fetch("https://platform.ai.gloo.com/engine/v2/item", {
    method: "PATCH",
    headers: { Authorization: `Bearer ${token}`, "Content-Type": "application/json" },
    body: JSON.stringify(metadata),
  });
  if (!metadataResponse.ok) throw new Error(`HTTP ${metadataResponse.status}`);
  console.log("Metadata set");
  ```

  ```php PHP theme={null}
  $metadata = [
      'publisher_id' => $publisherId,
      'item_id' => $itemId,
      'item_title' => 'Building Stronger Communities Through Service',
      'item_summary' => 'Practical guidance for starting and sustaining community service efforts.',
      'author' => ['Gloo AI Docs Team'],
      'item_tags' => ['community', 'service', 'rag-pipeline-series'],
  ];
  $ch = curl_init('https://platform.ai.gloo.com/engine/v2/item');
  curl_setopt_array($ch, [
      CURLOPT_RETURNTRANSFER => true,
      CURLOPT_CUSTOMREQUEST => 'PATCH',
      CURLOPT_HTTPHEADER => [
          "Authorization: Bearer $token",
          'Content-Type: application/json',
      ],
      CURLOPT_POSTFIELDS => json_encode($metadata),
  ]);
  curl_exec($ch);
  curl_close($ch);
  echo "Metadata set\n";
  ```

  ```go Go theme={null}
  metadata := map[string]any{
  	"publisher_id": publisherID,
  	"item_id":      itemID,
  	"item_title":   "Building Stronger Communities Through Service",
  	"item_summary": "Practical guidance for starting and sustaining community service efforts.",
  	"author":       []string{"Gloo AI Docs Team"},
  	"item_tags":    []string{"community", "service", "rag-pipeline-series"},
  }
  payload, _ := json.Marshal(metadata)
  req, _ = http.NewRequest("PATCH", "https://platform.ai.gloo.com/engine/v2/item",
  	bytes.NewReader(payload))
  req.Header.Set("Authorization", "Bearer "+tokenData.AccessToken)
  req.Header.Set("Content-Type", "application/json")
  resp, err = http.DefaultClient.Do(req)
  if err != nil {
  	panic(err)
  }
  resp.Body.Close()
  fmt.Println("Metadata set")
  ```

  ```java Java theme={null}
  String metadataJson = """
      {
        "publisher_id": "%s",
        "item_id": "%s",
        "item_title": "Building Stronger Communities Through Service",
        "item_summary": "Practical guidance for starting and sustaining community service efforts.",
        "author": ["Gloo AI Docs Team"],
        "item_tags": ["community", "service", "rag-pipeline-series"]
      }
      """.formatted(PUBLISHER_ID, itemId);
  HttpRequest metadataRequest = HttpRequest.newBuilder()
      .uri(URI.create("https://platform.ai.gloo.com/engine/v2/item"))
      .header("Authorization", "Bearer " + token)
      .header("Content-Type", "application/json")
      .method("PATCH", HttpRequest.BodyPublishers.ofString(metadataJson))
      .build();
  http.send(metadataRequest, HttpResponse.BodyHandlers.ofString());
  System.out.println("Metadata set");
  ```
</CodeGroup>

<Note>
  Metadata can be set as soon as the item ID exists — you don't need to wait for ingestion to finish. You can also target a single item by `producer_id` instead of `item_id`.
</Note>

## Step 4: Verify Indexing

Ingestion is **asynchronous**: the upload response means your file is queued, not searchable. Poll **GET** `/engine/v2/items/{item_id}` until `status` reaches `COMPLETED` — typically a few minutes for a small file. While processing you'll see intermediate states such as `CHUNKING`.

<CodeGroup>
  ```python Python theme={null}
  import time

  POLL_INTERVAL_SECONDS = 15
  POLL_TIMEOUT_SECONDS = 600

  deadline = time.time() + POLL_TIMEOUT_SECONDS
  while time.time() < deadline:
      item = requests.get(
          f"https://platform.ai.gloo.com/engine/v2/items/{item_id}",
          headers={"Authorization": f"Bearer {token}"},
      ).json()
      status = item.get("status", "unknown")
      print(f"Status: {status}")

      if status.upper() == "COMPLETED":
          print(f"Indexed: {item['item_title']} (tags: {', '.join(item['item_tags'])})")
          break
      if status.upper() in ("FAILED", "ERROR"):
          raise RuntimeError(f"Ingestion failed with status: {status}")

      time.sleep(POLL_INTERVAL_SECONDS)
  else:
      raise TimeoutError(f"Not indexed within {POLL_TIMEOUT_SECONDS}s")
  ```

  ```javascript JavaScript theme={null}
  const POLL_INTERVAL_MS = 15_000;
  const POLL_TIMEOUT_MS = 600_000;
  const sleep = (ms) => new Promise((resolve) => setTimeout(resolve, ms));

  const deadline = Date.now() + POLL_TIMEOUT_MS;
  let indexed = false;
  while (Date.now() < deadline) {
    const itemResponse = await fetch(
      `https://platform.ai.gloo.com/engine/v2/items/${itemId}`,
      { headers: { Authorization: `Bearer ${token}` } }
    );
    const item = await itemResponse.json();
    const status = item.status ?? "unknown";
    console.log(`Status: ${status}`);

    if (status.toUpperCase() === "COMPLETED") {
      console.log(`Indexed: ${item.item_title} (tags: ${item.item_tags.join(", ")})`);
      indexed = true;
      break;
    }
    if (["FAILED", "ERROR"].includes(status.toUpperCase())) {
      throw new Error(`Ingestion failed with status: ${status}`);
    }

    await sleep(POLL_INTERVAL_MS);
  }
  if (!indexed) throw new Error(`Not indexed within ${POLL_TIMEOUT_MS / 1000}s`);
  ```

  ```typescript TypeScript theme={null}
  const POLL_INTERVAL_MS = 15_000;
  const POLL_TIMEOUT_MS = 600_000;
  const sleep = (ms: number): Promise<void> =>
    new Promise((resolve) => setTimeout(resolve, ms));

  interface ItemMetadata {
    status: string;
    item_title: string | null;
    item_tags: string[];
  }

  const deadline = Date.now() + POLL_TIMEOUT_MS;
  let indexed = false;
  while (Date.now() < deadline) {
    const itemResponse = await fetch(
      `https://platform.ai.gloo.com/engine/v2/items/${itemId}`,
      { headers: { Authorization: `Bearer ${token}` } }
    );
    const item = (await itemResponse.json()) as ItemMetadata;
    const status = item.status ?? "unknown";
    console.log(`Status: ${status}`);

    if (status.toUpperCase() === "COMPLETED") {
      console.log(`Indexed: ${item.item_title} (tags: ${item.item_tags.join(", ")})`);
      indexed = true;
      break;
    }
    if (["FAILED", "ERROR"].includes(status.toUpperCase())) {
      throw new Error(`Ingestion failed with status: ${status}`);
    }

    await sleep(POLL_INTERVAL_MS);
  }
  if (!indexed) throw new Error(`Not indexed within ${POLL_TIMEOUT_MS / 1000}s`);
  ```

  ```php PHP theme={null}
  const POLL_INTERVAL_SECONDS = 15;
  const POLL_TIMEOUT_SECONDS = 600;

  $deadline = time() + POLL_TIMEOUT_SECONDS;
  $indexed = false;
  while (time() < $deadline) {
      $ch = curl_init("https://platform.ai.gloo.com/engine/v2/items/$itemId");
      curl_setopt_array($ch, [
          CURLOPT_RETURNTRANSFER => true,
          CURLOPT_HTTPHEADER => ["Authorization: Bearer $token"],
      ]);
      $item = json_decode(curl_exec($ch), true);
      curl_close($ch);

      $status = $item['status'] ?? 'unknown';
      echo "Status: $status\n";

      if (strtoupper($status) === 'COMPLETED') {
          echo "Indexed: {$item['item_title']} (tags: " . implode(', ', $item['item_tags']) . ")\n";
          $indexed = true;
          break;
      }
      if (in_array(strtoupper($status), ['FAILED', 'ERROR'], true)) {
          throw new RuntimeException("Ingestion failed with status: $status");
      }

      sleep(POLL_INTERVAL_SECONDS);
  }
  if (!$indexed) {
      throw new RuntimeException('Not indexed within ' . POLL_TIMEOUT_SECONDS . 's');
  }
  ```

  ```go Go theme={null}
  const (
  	pollInterval = 15 * time.Second
  	pollTimeout  = 600 * time.Second
  )

  deadline := time.Now().Add(pollTimeout)
  indexed := false
  for time.Now().Before(deadline) {
  	req, _ := http.NewRequest("GET",
  		"https://platform.ai.gloo.com/engine/v2/items/"+itemID, nil)
  	req.Header.Set("Authorization", "Bearer "+tokenData.AccessToken)
  	resp, err := http.DefaultClient.Do(req)
  	if err != nil {
  		panic(err)
  	}
  	var item struct {
  		Status    string   `json:"status"`
  		ItemTitle string   `json:"item_title"`
  		ItemTags  []string `json:"item_tags"`
  	}
  	json.NewDecoder(resp.Body).Decode(&item)
  	resp.Body.Close()
  	fmt.Println("Status:", item.Status)

  	if strings.EqualFold(item.Status, "COMPLETED") {
  		fmt.Printf("Indexed: %s (tags: %s)\n", item.ItemTitle, strings.Join(item.ItemTags, ", "))
  		indexed = true
  		break
  	}
  	if strings.EqualFold(item.Status, "FAILED") || strings.EqualFold(item.Status, "ERROR") {
  		panic("Ingestion failed with status: " + item.Status)
  	}

  	time.Sleep(pollInterval)
  }
  if !indexed {
  	panic("Not indexed within timeout")
  }
  ```

  ```java Java theme={null}
  Duration pollInterval = Duration.ofSeconds(15);
  Duration pollTimeout = Duration.ofSeconds(600);

  Instant deadline = Instant.now().plus(pollTimeout);
  boolean indexed = false;
  while (Instant.now().isBefore(deadline)) {
      HttpRequest statusRequest = HttpRequest.newBuilder()
          .uri(URI.create("https://platform.ai.gloo.com/engine/v2/items/" + itemId))
          .header("Authorization", "Bearer " + token)
          .GET()
          .build();
      JsonObject item = gson
          .fromJson(http.send(statusRequest, HttpResponse.BodyHandlers.ofString()).body(),
              JsonObject.class);
      String status = item.has("status") ? item.get("status").getAsString() : "unknown";
      System.out.println("Status: " + status);

      if (status.equalsIgnoreCase("COMPLETED")) {
          System.out.println("Indexed: " + item.get("item_title").getAsString());
          indexed = true;
          break;
      }
      if (status.equalsIgnoreCase("FAILED") || status.equalsIgnoreCase("ERROR")) {
          throw new IllegalStateException("Ingestion failed with status: " + status);
      }

      Thread.sleep(pollInterval.toMillis());
  }
  if (!indexed) {
      throw new IllegalStateException("Not indexed within timeout");
  }
  ```
</CodeGroup>

### What You'll See

For a fresh upload (about 6 minutes for the sample file):

```
Status: QUEUED
Status: CHUNKING
...
Status: COMPLETED
Indexed: Building Stronger Communities Through Service (tags: community, service, rag-pipeline-series)
```

If you re-run against already-indexed content, the first poll returns `COMPLETED` immediately. The metadata you set in Step 3 round-trips on the same response — confirming title, author, and tags are attached to the indexed item.

## Run the Complete Example

The cookbook contains the full pipeline — upload, metadata, and polling with token caching and error handling — as one runnable program in all six languages. From the [cookbook repository](https://github.com/GlooDeveloper/gloo-ai-docs-cookbook), install dependencies, copy `.env.example` to `.env` and add your credentials, then run it:

<CodeGroup>
  ```bash Python theme={null}
  cd rag-pipeline-part-1/python
  python3 -m venv venv
  source venv/bin/activate  # On Windows: venv\Scripts\activate
  pip install -r requirements.txt
  cp .env.example .env       # then add your Client ID, Secret, and Publisher ID
  python main.py
  ```

  ```bash JavaScript theme={null}
  cd rag-pipeline-part-1/javascript
  npm install
  cp .env.example .env       # then add your Client ID, Secret, and Publisher ID
  npm start
  ```

  ```bash TypeScript theme={null}
  cd rag-pipeline-part-1/typescript
  npm install
  cp .env.example .env       # then add your Client ID, Secret, and Publisher ID
  npm start
  ```

  ```bash PHP theme={null}
  cd rag-pipeline-part-1/php
  composer install
  cp .env.example .env       # then add your Client ID, Secret, and Publisher ID
  php index.php
  ```

  ```bash Go theme={null}
  cd rag-pipeline-part-1/go
  go mod tidy
  cp .env.example .env       # then add your Client ID, Secret, and Publisher ID
  go run main.go
  ```

  ```bash Java theme={null}
  cd rag-pipeline-part-1/java
  mvn compile
  cp .env.example .env       # then add your Client ID, Secret, and Publisher ID
  mvn -q exec:java
  ```
</CodeGroup>

You'll see:

```
Step 1: Uploading sample content...
  Queued for ingestion: 822308cc-72fc-478d-a2eb-fbdf01a6a15d

Step 2: Setting item metadata...
  Metadata set: title, summary, author, 3 tags

Step 3: Verifying indexing (polling)...
  Status: QUEUED
  Status: COMPLETED

Pipeline content is indexed and ready.
  Item ID:  822308cc-72fc-478d-a2eb-fbdf01a6a15d
  Title:    Building Stronger Communities Through Service
  Author:   Gloo AI Docs Team
  Tags:     community, service, rag-pipeline-series
  Status:   COMPLETED
```

## Working Code Sample

<Card title="View Complete Code" icon="github" href="https://github.com/GlooDeveloper/gloo-ai-docs-cookbook/tree/main/rag-pipeline-part-1">
  Clone or browse the complete working examples for all 6 languages (JavaScript, TypeScript, Python, PHP, Go, Java) with setup instructions and the sample content file.
</Card>

<Tip>
  The code snippets above are **simplified and self-contained** — designed for readability and easy copy-paste. The cookbook examples add token caching, duplicate handling, and structured error handling. Both implement the same APIs and patterns.
</Tip>

## Troubleshooting

### Error: 401 Unauthorized

Your access token is missing, expired, or malformed. Tokens expire after one hour — re-run the token request. See the [Authentication Tutorial](/tutorials/authentication).

### Error: 400 publisher\_not\_found

The `publisher_id` doesn't exist or isn't accessible with your credentials. Copy the Publisher ID (a UUID) from **Studio > Data Engine > Publishers**.

### Error: 400 too\_many\_files

`producer_id` applies to a single file. When uploading multiple files in one request, omit it.

### Polling times out

Larger files take longer to process. Increase the timeout, or check **Studio > Ingestion Analytics** to see whether the publisher is still processing.

## Next Steps

Your pipeline now has indexed, metadata-rich content. Wire it into retrieval:

1. **[Building Custom Search](/tutorials/search)** — query this content semantically with the Search API
2. **[Grounded Completions with RAG](/tutorials/completions-grounded)** — answer questions from this content with source citations
3. **[Part 2: Content Lifecycle](/tutorials/rag-pipeline-part-2)** — update, bulk-edit, and delete the items you created here
4. **\[Part 3: Verification, Error Handling & Resilience]\((/tutorials/rag-pipeline-part-3)** — production-grade retry and verification patterns
