Pipeline at a Glance
Ingest content with metadata
Upload files and enrich them — covered below, with a deep dive in Upload Files to Data Engine.
Semantic search
Query your content — deep dive: Building Custom Search.
Grounded completions with sources
Answer questions from your content with citations — deep dive: Grounded Completions with RAG.
Content lifecycle
Update, bulk-edit, and delete content — Part 2.
Verification, errors & resilience
Error handling and retry patterns — Part 3.
Prerequisites
Before starting, ensure you have:- A Gloo AI Studio account
- Your Client ID and Client Secret from the API Credentials page
- Authentication setup - Complete the Authentication Tutorial first
All API calls in this series use Bearer token authentication via the OAuth2 client credentials flow. The snippets below include a minimal token fetch; see the Authentication Tutorial for token caching and expiration handling.
Step 1: Create Your Publisher
Content in the Data Engine belongs to a publisher. Create one in Gloo AI Studio:- In Gloo AI Studio, click your user account in the bottom-left corner and select Manage Organizations
- Select the organization you want to add the publisher to, then click View Publishers
- Click Create Publisher and give the new publisher a name
- Copy the Publisher ID (a UUID) — every API call in this series uses it
Step 2: Upload Content with a Producer ID
Upload a file to POST/ingestion/v2/files. The producer_id query parameter attaches your own stable identifier to the item — this is what makes the pipeline manageable later: re-running the upload detects a duplicate instead of creating a copy, and in Part 2 you’ll update, bulk-edit, and delete these same items.
This series uses a short Markdown article as its sample content: grab building-stronger-communities.md from the cookbook repository and save it next to your script. (Any Markdown, text, PDF, or Word file of your own works too — just adjust the filename.) Then upload it:
What You’ll See
Step 3: Enrich with Metadata
Attach descriptive metadata with PATCH/engine/v2/item. Good metadata pays off downstream: titles and authors appear in search results and source citations, and tags let you organize and bulk-manage content in Part 2.
Metadata can be set as soon as the item ID exists — you don’t need to wait for ingestion to finish. You can also target a single item by
producer_id instead of item_id.Step 4: Verify Indexing
Ingestion is asynchronous: the upload response means your file is queued, not searchable. Poll GET/engine/v2/items/{item_id} until status reaches COMPLETED — typically a few minutes for a small file. While processing you’ll see intermediate states such as CHUNKING.
What You’ll See
For a fresh upload (about 6 minutes for the sample file):COMPLETED immediately. The metadata you set in Step 3 round-trips on the same response — confirming title, author, and tags are attached to the indexed item.
Run the Complete Example
The cookbook contains the full pipeline — upload, metadata, and polling with token caching and error handling — as one runnable program in all six languages. From the cookbook repository, install dependencies, copy.env.example to .env and add your credentials, then run it:
Working Code Sample
View Complete Code
Clone or browse the complete working examples for all 6 languages (JavaScript, TypeScript, Python, PHP, Go, Java) with setup instructions and the sample content file.
Troubleshooting
Error: 401 Unauthorized
Your access token is missing, expired, or malformed. Tokens expire after one hour — re-run the token request. See the Authentication Tutorial.Error: 400 publisher_not_found
Thepublisher_id doesn’t exist or isn’t accessible with your credentials. Copy the Publisher ID (a UUID) from Studio > Data Engine > Publishers.
Error: 400 too_many_files
producer_id applies to a single file. When uploading multiple files in one request, omit it.
Polling times out
Larger files take longer to process. Increase the timeout, or check Studio > Ingestion Analytics to see whether the publisher is still processing.Next Steps
Your pipeline now has indexed, metadata-rich content. Wire it into retrieval:- Building Custom Search — query this content semantically with the Search API
- Grounded Completions with RAG — answer questions from this content with source citations
- Part 2: Content Lifecycle — update, bulk-edit, and delete the items you created here
- [Part 3: Verification, Error Handling & Resilience]((/tutorials/rag-pipeline-part-3) — production-grade retry and verification patterns

