This recipe shows how to upload files directly to the Gloo AI Data Engine using the Upload Files API. You’ll learn to upload single files, batch multiple files, associate custom IDs, and add metadata to your content.
Upload once. Search instantly. Build faster.
The Upload Files API lets you send any file directly to Gloo for real-time processing and indexing. Within minutes, your content becomes searchable, context-aware, and ready for AI-driven interactions.
Prerequisites
Before starting, ensure you have:
The Upload Files API requires Bearer token authentication. If you haven’t set up authentication yet, follow the Authentication Tutorial to learn how to exchange your credentials for access tokens and manage token expiration.
Understanding the Upload Files API
The Upload Files API allows you to upload documents that get processed and made available for search and AI interaction. The primary endpoint is:
POST /ingestion/v2/files
Key Features
Direct File Upload : Send files via multipart/form-data
Multi-File Support : Upload multiple files in a single request
Automatic Processing : Files are parsed, indexed, and made searchable
Format Detection : Automatic detection and processing of various file types
Duplicate Detection : Identifies and reports duplicate content
Required Fields
The Upload Files API requires two form fields:
publisher_id: Your publisher ID from Gloo AI Studio
files: The file(s) to upload (use the field name files, not file)
Supported File Types
PDF documents
Microsoft Word (.doc, .docx)
Plain text (.txt)
Markdown (.md)
And more
Step 1: Basic Single File Upload
Let’s start with uploading a single file. This demonstrates the core API call with proper authentication.
Shell
Python
JavaScript
TypeScript
PHP
Go
Java
curl -X POST https://platform.ai.gloo.com/ingestion/v2/files \
-H "Authorization: Bearer $GLOO_ACCESS_TOKEN " \
-F "publisher_id= $GLOO_PUBLISHER_ID " \
-F "files=@/path/to/your/document.pdf"
Expected Response
{
"success" : true ,
"message" : "File processing started in background." ,
"ingesting" : [
"c999008e-de60-495c-8c9f-6a4b59cdb04b"
],
"duplicates" : []
}
Step 2: Multi-File Upload
You can upload multiple files in a single request for better efficiency.
You can mix file types freely - the endpoint automatically detects and processes each format.
Shell
Python
JavaScript
TypeScript
PHP
Go
Java
curl -X POST https://platform.ai.gloo.com/ingestion/v2/files \
-H "Authorization: Bearer $GLOO_ACCESS_TOKEN " \
-F "publisher_id= $GLOO_PUBLISHER_ID " \
-F "files=@/path/to/document1.pdf" \
-F "files=@/path/to/document2.pdf" \
-F "files=@/path/to/document3.docx"
Expected Response
{
"success" : true ,
"message" : "File processing started in background." ,
"ingesting" : [
"c999008e-de60-495c-8c9f-6a4b59cdb04b" ,
"b10e85d8-243d-46e2-9504-d93874a9ebcb" ,
"b45058a8-2f8c-4a88-8aba-7adb4afcd38d"
],
"duplicates" : []
}
Step 3: Using Producer ID
The producer_id query parameter lets you associate your internal ID with an uploaded file. This is useful for tracking and updating content from your system.
If the producer_id field is supplied with a multi-file upload, the ID will be ignored. Producer IDs must have a one-to-one relationship with a file.
Shell
Python
JavaScript
TypeScript
PHP
Go
Java
curl -X POST "https://platform.ai.gloo.com/ingestion/v2/files?producer_id=my-internal-id-12345" \
-H "Authorization: Bearer $GLOO_ACCESS_TOKEN " \
-F "publisher_id= $GLOO_PUBLISHER_ID " \
-F "files=@/path/to/your/document.pdf"
Step 4: Adding Metadata to Uploaded Content
After uploading files, you can add or update metadata using the Update Item Metadata endpoint. Identify content by either Gloo’s item_id (returned from the upload) or your producer_id.
Only the fields you include are updated; omitted fields remain unchanged.
Shell
Python
JavaScript
TypeScript
PHP
Go
Java
curl -X POST https://platform.ai.gloo.com/engine/v2/item \
-H "Authorization: Bearer $GLOO_ACCESS_TOKEN " \
-H "Content-Type: application/json" \
-d '{
"publisher_id": "your-publisher-id",
"item_id": "c999008e-de60-495c-8c9f-6a4b59cdb04b",
"item_title": "Document Title",
"item_subtitle": "A brief subtitle",
"author": ["John Doe", "Jane Smith"],
"publication_date": "2025-01-15",
"item_tags": ["category1", "category2"]
}'
Field Type Description publisher_idstring Required. Your publisher IDitem_idstring Gloo’s item ID (from upload response) producer_idstring Your internal ID item_titlestring Document title item_subtitlestring Document subtitle file_namestring Original filename publication_datestring Publication date (YYYY-MM-DD) item_imagestring Image URL item_urlstring Source URL item_summarystring Brief summary authorstring[] List of authors item_tagsstring[] Categorization tags
If you provide both item_id and producer_id, the service will prefer item_id when loading the target item. This allows for the producer_id to be updated.
Step 5: Verifying Your Upload
After uploading content, you can verify it in Gloo AI Studio:
Log in to Gloo AI Studio
Navigate to the Data Engine section from the main sidebar
Click on Your Data
You’ll see your uploaded files with their processing status and metadata.
Complete Example
Here’s a complete example that combines authentication, file upload, and metadata update:
Python
JavaScript
TypeScript
PHP
Go
Java
"""
Complete file upload workflow for Gloo AI Data Engine.
"""
import requests
import time
import os
from dotenv import load_dotenv
load_dotenv()
# Configuration
CLIENT_ID = os.getenv( "GLOO_CLIENT_ID" )
CLIENT_SECRET = os.getenv( "GLOO_CLIENT_SECRET" )
PUBLISHER_ID = os.getenv( "GLOO_PUBLISHER_ID" )
TOKEN_URL = "https://platform.ai.gloo.com/oauth2/token"
UPLOAD_URL = "https://platform.ai.gloo.com/ingestion/v2/files"
METADATA_URL = "https://platform.ai.gloo.com/engine/v2/item"
# Token management
access_token_info = {}
def get_access_token ():
"""Retrieve a new access token."""
headers = { "Content-Type" : "application/x-www-form-urlencoded" }
data = { "grant_type" : "client_credentials" , "scope" : "api/access" }
response = requests.post( TOKEN_URL , headers = headers, data = data,
auth = ( CLIENT_ID , CLIENT_SECRET ))
response.raise_for_status()
token_data = response.json()
token_data[ 'expires_at' ] = int (time.time()) + token_data[ 'expires_in' ]
return token_data
def ensure_valid_token ():
"""Ensure we have a valid access token."""
global access_token_info
if not access_token_info or time.time() > (access_token_info.get( 'expires_at' , 0 ) - 60 ):
print ( "Fetching new access token..." )
access_token_info = get_access_token()
return access_token_info[ 'access_token' ]
def upload_file ( file_path , producer_id = None ):
"""Upload a file to the Data Engine."""
token = ensure_valid_token()
headers = { "Authorization" : f "Bearer { token } " }
params = { "producer_id" : producer_id} if producer_id else {}
with open (file_path, 'rb' ) as f:
files = { "files" : (os.path.basename(file_path), f)}
data = { "publisher_id" : PUBLISHER_ID }
response = requests.post( UPLOAD_URL , headers = headers, files = files, data = data, params = params)
response.raise_for_status()
return response.json()
def update_metadata ( item_id , ** metadata ):
"""Update metadata for an uploaded item."""
token = ensure_valid_token()
headers = {
"Authorization" : f "Bearer { token } " ,
"Content-Type" : "application/json"
}
data = { "publisher_id" : PUBLISHER_ID , "item_id" : item_id, ** metadata}
response = requests.post( METADATA_URL , headers = headers, json = data)
response.raise_for_status()
return response.json()
def main ():
"""Main workflow: upload file and add metadata."""
file_path = "/path/to/your/document.pdf"
# Step 1: Upload the file
print ( f "Uploading { file_path } ..." )
upload_result = upload_file(file_path, producer_id = "my-doc-001" )
print ( f "Upload result: { upload_result } " )
if upload_result.get( 'ingesting' ):
item_id = upload_result[ 'ingesting' ][ 0 ]
# Step 2: Add metadata
print ( f "Adding metadata to item { item_id } ..." )
metadata_result = update_metadata(
item_id,
item_title = "My Document Title" ,
author = [ "Author Name" ],
publication_date = "2025-01-15" ,
item_tags = [ "documentation" , "api" ]
)
print ( f "Metadata result: { metadata_result } " )
if __name__ == "__main__" :
main()
Troubleshooting
Error: 401 Unauthorized
Cause : Token expired or invalid credentials.
Solution : Refresh your access token and verify your Client ID and Secret.
Error: 403 Forbidden
Cause : Insufficient permissions for the publisher.
Solution : Verify you have access to the specified publisher in Gloo AI Studio.
Error: 413 Request Entity Too Large
Cause : File size exceeds the limit.
Solution : Check the API limits and consider splitting large files.
Error: Duplicate detected
Cause : A file with identical content already exists.
Solution : This is informational - the file won’t be processed again. Check the duplicates array in the response.
Next Steps
Now that you can upload files to the Data Engine, explore:
Search API - Query your uploaded content
Chat Integration - Use uploaded content in conversations
Content Controls - Manage and update your content
Upload Files API - For additional API information
Update Item Metadata API - For additional API information