Skip to main content
This is Part 2 of the Build an End-to-End RAG Pipeline series. In Part 1 you ingested content and verified it was indexed. Real content doesn’t stay still: titles get corrected, you re-tag in bulk, and you remove outdated material. This part covers those lifecycle operations — and how to run them safely against a publisher that may also hold content you don’t want to touch. The key idea is scoping: every edit and delete here targets the exact items this recipe created, identified by the item IDs you captured when you uploaded them — never the whole publisher.

Pipeline at a Glance

1

Publisher setup (Studio)

Create the publisher that owns your content — Part 1.
2

Ingest content with metadata

Upload files and enrich them — Part 1.
3

Verify indexing

Poll item status until your content is searchable — Part 1.
4

Semantic search

Query your content — deep dive: Building Custom Search.
5

Grounded completions with sources

Answer questions from your content with citations — deep dive: Grounded Completions with RAG.
6

Content lifecycle

Update, bulk-edit, and delete content — covered below.
7

Verification, errors & resilience

Error handling and retry patterns — Part 3.

Prerequisites

Before starting, ensure you have:
  • A Gloo AI Studio account
  • Your Client ID and Client Secret from the API Credentials page
  • A publisher with credentials — see Part 1 to create one
  • Familiarity with uploading and indexing content from Part 1
This recipe seeds its own sample content, so it runs standalone — you don’t need to have completed Part 1 first. It reuses Part 1’s upload and indexing patterns for that seeding step.

How Scoping Keeps This Safe

A publisher can hold content from many sources. Bulk edits and deletes are powerful, so this recipe never operates publisher-wide. Instead it follows one rule:
Keep the item ID the API returns when you upload each item, and scope every bulk edit and delete to that exact list of item IDs.
The upload response gives you each item’s ID directly (Part 1, Step 2). Those IDs are authoritative handles for content you own, so building your bulk operations from them guarantees you only ever touch your own items.
There’s also a lookup endpoint, POST /engine/v2/publisher/{publisher_id}/items/by-producer, that resolves your producer IDs back to item IDs — handy when a separate process only has the producer IDs it assigned. Be aware it’s cached (~5 minutes) and isn’t invalidated when items change, so right after an upload or delete it can return stale or already-deleted item IDs. Treat the upload response (or a GET on the item) as the source of truth; don’t rely on by-producer immediately after a write.

Step 1: Set Up Sample Content

This recipe manages three short articles, each uploaded under a stable producer ID. The upload response returns each item’s ID — capture it; that’s the handle every later step uses.
Producer IDInitial title
rag-pipeline-part2-volunteer-onboardingOnboarding New Volunteers
rag-pipeline-part2-measuring-impactMeasuring Community Impact
rag-pipeline-part2-sustaining-engagementSustaining Long-Term Engagement
The files live in sample_files/. Upload each one, set its initial metadata, and keep the returned item_id in a map keyed by producer ID:
import requests

CLIENT_ID = "your_client_id"
CLIENT_SECRET = "your_client_secret"
PUBLISHER_ID = "your_publisher_id"

SEED_ITEMS = [
    {"file": "volunteer-onboarding.md",
     "producer_id": "rag-pipeline-part2-volunteer-onboarding",
     "item_title": "Onboarding New Volunteers",
     "item_tags": ["volunteers", "rag-pipeline-series"]},
    {"file": "measuring-community-impact.md",
     "producer_id": "rag-pipeline-part2-measuring-impact",
     "item_title": "Measuring Community Impact",
     "item_tags": ["measurement", "rag-pipeline-series"]},
    {"file": "sustaining-engagement.md",
     "producer_id": "rag-pipeline-part2-sustaining-engagement",
     "item_title": "Sustaining Long-Term Engagement",
     "item_tags": ["engagement", "rag-pipeline-series"]},
]

# Get an access token (see the Authentication tutorial)
token = requests.post(
    "https://platform.ai.gloo.com/oauth2/token",
    data={"grant_type": "client_credentials", "scope": "api/access"},
    auth=(CLIENT_ID, CLIENT_SECRET),
).json()["access_token"]

# Upload each file and KEEP the returned item_id — your authoritative handle
mapping = {}
for item in SEED_ITEMS:
    with open(f"sample_files/{item['file']}", "rb") as f:
        result = requests.post(
            "https://platform.ai.gloo.com/ingestion/v2/files",
            headers={"Authorization": f"Bearer {token}"},
            params={"producer_id": item["producer_id"]},
            files={"files": (item["file"], f)},
            data={"publisher_id": PUBLISHER_ID},
        ).json()
    item_id = (result["ingesting"] or result["duplicates"])[0]
    mapping[item["producer_id"]] = item_id
    # Set initial metadata (see Part 1, Step 3)
    requests.patch(
        "https://platform.ai.gloo.com/engine/v2/item",
        headers={"Authorization": f"Bearer {token}", "Content-Type": "application/json"},
        json={"publisher_id": PUBLISHER_ID, "item_id": item_id,
              "item_title": item["item_title"], "item_tags": item["item_tags"]},
    )
    print(f"Uploaded {item['producer_id']} -> {item_id}")

item_ids = list(mapping.values())
# Wait for every item to finish indexing before editing (see Part 1, Step 4).

What You’ll See

Uploaded rag-pipeline-part2-volunteer-onboarding -> 0bfc3629-137d-4797-aac6-a2836ca39930
Uploaded rag-pipeline-part2-measuring-impact -> 8e9f12bc-4ef2-4e77-b1aa-293935a67c9a
Uploaded rag-pipeline-part2-sustaining-engagement -> ca849be9-cf4c-4423-9369-3793736cd75a
Ingestion is asynchronous — wait until every item reaches COMPLETED (the polling loop from Part 1, Step 4) before editing it. The complete cookbook program includes this wait.
The following steps reuse token, PUBLISHER_ID, mapping, and item_ids from here.

Step 2: Update a Single Item

Correct one item’s metadata with PATCH /engine/v2/item. Identify the target by item_id (or by producer_id — either works for a single item). Only the fields you send change; everything else is left as-is.
target_id = mapping["rag-pipeline-part2-volunteer-onboarding"]
response = requests.patch(
    "https://platform.ai.gloo.com/engine/v2/item",
    headers={"Authorization": f"Bearer {token}", "Content-Type": "application/json"},
    json={
        "publisher_id": PUBLISHER_ID,
        "item_id": target_id,
        "item_title": "Onboarding New Volunteers: A First-Day Playbook",
        "item_summary": "A practical first-day checklist for welcoming and retaining new volunteers.",
    },
)
response.raise_for_status()
print("Single item updated")

Step 3: Bulk-Edit Multiple Items

Apply changes to several items at once with PATCH /engine/v2/items?publisher_id=.... The request has two parts: a filter selecting which items to touch, and a list of ops describing the changes. Here the filter is the exact item_ids you captured in Step 1 — that’s what keeps the operation scoped to your content. Each op has an op (append, replace, or remove), a field, and a value. Below, every item gets a reviewed-q2-2026 tag appended and its author replaced.
response = requests.patch(
    "https://platform.ai.gloo.com/engine/v2/items",
    headers={"Authorization": f"Bearer {token}", "Content-Type": "application/json"},
    params={"publisher_id": PUBLISHER_ID},
    json={
        "filter": {"item_ids": item_ids},
        "ops": [
            {"op": "append", "field": "item_tags", "value": ["reviewed-q2-2026"]},
            {"op": "replace", "field": "author", "value": ["Community Programs Team"]},
        ],
    },
)
response.raise_for_status()
result = response.json()
print(f"Matched {result['total_matched']}, patched {result['total_patched']}, "
      f"failed {result['total_failed']}")

What You’ll See

Matched 3, patched 3, failed 0

Step 4: Verify Your Changes

Re-fetch each item with GET /engine/v2/items/{item_id} to confirm the edits.
Edits succeed immediately, but the read path is eventually consistent: a freshly patched item can take a few seconds to reflect the change on a subsequent GET. Verify by re-fetching until the change appears, rather than reading once.
import time

def get_item(item_id):
    r = requests.get(
        f"https://platform.ai.gloo.com/engine/v2/items/{item_id}",
        headers={"Authorization": f"Bearer {token}"},
    )
    r.raise_for_status()
    return r.json()

for item_id in item_ids:
    item = get_item(item_id)
    for _ in range(20):  # read-after-write retry
        if "reviewed-q2-2026" in (item.get("item_tags") or []):
            break
        time.sleep(3)
        item = get_item(item_id)
    print(item["item_title"])
    print(f"  author: {', '.join(item.get('author') or [])}")
    print(f"  tags:   {', '.join(item.get('item_tags') or [])}")

What You’ll See

Onboarding New Volunteers: A First-Day Playbook
  author: Community Programs Team
  tags:   volunteers, rag-pipeline-series, reviewed-q2-2026
Measuring Community Impact
  author: Community Programs Team
  tags:   measurement, rag-pipeline-series, reviewed-q2-2026
Sustaining Long-Term Engagement
  author: Community Programs Team
  tags:   engagement, rag-pipeline-series, reviewed-q2-2026
The single-item update from Step 2 (the new title on the first item) and the bulk edits from Step 3 (new author and added tag on all three) are both reflected.

Step 5: Delete Items

Remove items with DELETE /engine/v2/items, passing the item_ids to delete. This both demonstrates deletion and cleans up the content this recipe created.
response = requests.delete(
    "https://platform.ai.gloo.com/engine/v2/items",
    headers={"Authorization": f"Bearer {token}", "Content-Type": "application/json"},
    json={"item_ids": item_ids},
)
response.raise_for_status()
deletion = response.json()
print(f"Requested {deletion['total_requested']}, deleted {deletion['total_deleted']}, "
      f"failed {deletion['total_failed']}")

# GET is authoritative for deletion: it returns 404 once an item is gone.
for item_id in item_ids:
    r = requests.get(
        f"https://platform.ai.gloo.com/engine/v2/items/{item_id}",
        headers={"Authorization": f"Bearer {token}"},
    )
    print(f"{item_id} -> {r.status_code}")

What You’ll See

Requested 3, deleted 3, failed 0
0bfc3629-137d-4797-aac6-a2836ca39930 -> 404
8e9f12bc-4ef2-4e77-b1aa-293935a67c9a -> 404
ca849be9-cf4c-4423-9369-3793736cd75a -> 404
A GET on the item ID is authoritative for deletion — it returns 404 as soon as the item is gone. Don’t use the by-producer lookup to confirm a delete: it’s cached and can keep returning the deleted item’s ID for a few minutes.

Run the Complete Example

The cookbook runs the whole lifecycle — seed, update, bulk-edit, verify, and delete — as one program in all six languages. From the cookbook repository, install dependencies, copy .env.example to .env and add your credentials, then run it:
cd rag-pipeline-part-2/python
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env       # then add your Client ID, Secret, and Publisher ID
python main.py
You’ll see:
Step 1: Seeding sample content...
  Uploaded rag-pipeline-part2-volunteer-onboarding -> 0bfc3629-137d-4797-aac6-a2836ca39930
  Uploaded rag-pipeline-part2-measuring-impact -> 8e9f12bc-4ef2-4e77-b1aa-293935a67c9a
  Uploaded rag-pipeline-part2-sustaining-engagement -> ca849be9-cf4c-4423-9369-3793736cd75a
  Waiting for ingestion to complete (this can take a few minutes)...
  All seed items indexed.

Step 2: Updating a single item...
  Updated title and summary for rag-pipeline-part2-volunteer-onboarding

Step 3: Bulk-editing all seeded items...
  Matched 3, patched 3, failed 0

Step 4: Verifying changes...
  Onboarding New Volunteers: A First-Day Playbook
    author: Community Programs Team
    tags:   volunteers, rag-pipeline-series, reviewed-q2-2026
  Measuring Community Impact
    author: Community Programs Team
    tags:   measurement, rag-pipeline-series, reviewed-q2-2026
  Sustaining Long-Term Engagement
    author: Community Programs Team
    tags:   engagement, rag-pipeline-series, reviewed-q2-2026

Step 5: Deleting items (cleanup)...
  Requested 3, deleted 3, failed 0
  All items confirmed deleted: true

Lifecycle complete. The publisher is back to its pre-recipe state.

Working Code Sample

View Complete Code

Clone or browse the complete working examples for all 6 languages (JavaScript, TypeScript, Python, PHP, Go, Java) with setup instructions and the sample content files.
The code snippets above are simplified and self-contained — designed for readability and easy copy-paste. The cookbook examples add token caching, the indexing wait, and structured error handling. Both implement the same APIs and patterns.

Troubleshooting

Error: 400 Invalid request — Item ID or Producer ID with Publisher ID required

A single-item PATCH /engine/v2/item needs publisher_id plus either item_id or producer_id. Include one of the identifiers.

Bulk patch reports fewer matched than expected

The filter didn’t match all your items. Confirm your item_ids are the ones returned by the uploads and that the items still exist (a prior run may have deleted them).

Verification shows stale metadata

Reads are eventually consistent. Re-fetch until the change appears (as shown in Step 4) rather than reading once immediately after an edit.

Editing right after upload returns 404 item_not_found

If you looked the item up with by-producer, its result may be stale (it’s cached ~5 minutes and not invalidated on writes). Use the item_id from the upload response instead — that’s authoritative.

Next Steps

You can now manage content through its full lifecycle, safely scoped to the items you own. To round out the pipeline:
  1. Part 3: Verification, Error Handling & Resilience — interpret API error responses and add retry patterns for production
  2. Building Custom Search — surface your updated content to users
  3. Grounded Completions with RAG — answer questions from your content with citations