Upload Files to Data Engine

This recipe shows how to upload files directly to the Gloo AI Data Engine using the Upload Files API. You’ll learn to upload single files, batch multiple files, associate custom IDs, and add metadata to your content. Upload once. Search instantly. Build faster. The Upload Files API lets you send any file directly to Gloo for real-time processing and indexing. Within minutes, your content becomes searchable, context-aware, and ready for AI-driven interactions.

Prerequisites

Before starting, ensure you have:

A Gloo AI Studio account
Your Client ID and Client Secret from the API Credentials page
Your Publisher ID from your Gloo AI Studio account
Authentication setup - Complete the Authentication Tutorial first

The Upload Files API requires Bearer token authentication. If you haven’t set up authentication yet, follow the Authentication Tutorial to learn how to exchange your credentials for access tokens and manage token expiration.

Understanding the Upload Files API

The Upload Files API allows you to upload documents that get processed and made available for search and AI interaction. The primary endpoint is: POST /ingestion/v2/files

Key Features

Direct File Upload: Send files via multipart/form-data
Multi-File Support: Upload multiple files in a single request
Automatic Processing: Files are parsed, indexed, and made searchable
Format Detection: Automatic detection and processing of various file types
Duplicate Detection: Identifies and reports duplicate content

Required Fields

The Upload Files API requires two form fields:

publisher_id: Your publisher ID from Gloo AI Studio
files: The file(s) to upload (use the field name files, not file)

Supported File Types

PDF documents
Microsoft Word (.doc, .docx)
Plain text (.txt)
Markdown (.md)
And more

Step 1: Basic Single File Upload

Let’s start with uploading a single file. This demonstrates the core API call with proper authentication.

curl -X POST https://platform.ai.gloo.com/ingestion/v2/files \
  -H "Authorization: Bearer $GLOO_ACCESS_TOKEN" \
  -F "publisher_id=$GLOO_PUBLISHER_ID" \
  -F "files=@/path/to/your/document.pdf"

Expected Response

{
    "success": true,
    "message": "File processing started in background.",
    "ingesting": [
        "c999008e-de60-495c-8c9f-6a4b59cdb04b"
    ],
    "duplicates": []
}

Step 2: Multi-File Upload

You can upload multiple files in a single request for better efficiency.

You can mix file types freely - the endpoint automatically detects and processes each format.

curl -X POST https://platform.ai.gloo.com/ingestion/v2/files \
  -H "Authorization: Bearer $GLOO_ACCESS_TOKEN" \
  -F "publisher_id=$GLOO_PUBLISHER_ID" \
  -F "files=@/path/to/document1.pdf" \
  -F "files=@/path/to/document2.pdf" \
  -F "files=@/path/to/document3.docx"

Expected Response

{
    "success": true,
    "message": "File processing started in background.",
    "ingesting": [
        "c999008e-de60-495c-8c9f-6a4b59cdb04b",
        "b10e85d8-243d-46e2-9504-d93874a9ebcb",
        "b45058a8-2f8c-4a88-8aba-7adb4afcd38d"
    ],
    "duplicates": []
}

Step 3: Using Producer ID

The producer_id query parameter lets you associate your internal ID with an uploaded file. This is useful for tracking and updating content from your system.

If the producer_id field is supplied with a multi-file upload, the ID will be ignored. Producer IDs must have a one-to-one relationship with a file.

curl -X POST "https://platform.ai.gloo.com/ingestion/v2/files?producer_id=my-internal-id-12345" \
  -H "Authorization: Bearer $GLOO_ACCESS_TOKEN" \
  -F "publisher_id=$GLOO_PUBLISHER_ID" \
  -F "files=@/path/to/your/document.pdf"

Step 4: Adding Metadata to Uploaded Content

After uploading files, you can add or update metadata using the Update Item Metadata endpoint. Identify content by either Gloo’s item_id (returned from the upload) or your producer_id.

Only the fields you include are updated; omitted fields remain unchanged.

curl -X POST https://platform.ai.gloo.com/engine/v2/item \
  -H "Authorization: Bearer $GLOO_ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "publisher_id": "your-publisher-id",
    "item_id": "c999008e-de60-495c-8c9f-6a4b59cdb04b",
    "item_title": "Document Title",
    "item_subtitle": "A brief subtitle",
    "author": ["John Doe", "Jane Smith"],
    "publication_date": "2025-01-15",
    "item_tags": ["category1", "category2"]
  }'

Available Metadata Fields

Field	Type	Description
`publisher_id`	string	Required. Your publisher ID
`item_id`	string	Gloo’s item ID (from upload response)
`producer_id`	string	Your internal ID
`item_title`	string	Document title
`item_subtitle`	string	Document subtitle
`file_name`	string	Original filename
`publication_date`	string	Publication date (YYYY-MM-DD)
`item_image`	string	Image URL
`item_url`	string	Source URL
`item_summary`	string	Brief summary
`author`	string[]	List of authors
`item_tags`	string[]	Categorization tags

If you provide both item_id and producer_id, the service will prefer item_id when loading the target item. This allows for the producer_id to be updated.

Step 5: Verifying Your Upload

After uploading content, you can verify it in Gloo AI Studio:

Log in to Gloo AI Studio
Navigate to the Data Engine section from the main sidebar
Click on Your Data

You’ll see your uploaded files with their processing status and metadata.

Complete Example

Here’s a complete example that combines authentication, file upload, and metadata update:

"""
Complete file upload workflow for Gloo AI Data Engine.
"""
import requests
import time
import os
from dotenv import load_dotenv

load_dotenv()

# Configuration
CLIENT_ID = os.getenv("GLOO_CLIENT_ID")
CLIENT_SECRET = os.getenv("GLOO_CLIENT_SECRET")
PUBLISHER_ID = os.getenv("GLOO_PUBLISHER_ID")

TOKEN_URL = "https://platform.ai.gloo.com/oauth2/token"
UPLOAD_URL = "https://platform.ai.gloo.com/ingestion/v2/files"
METADATA_URL = "https://platform.ai.gloo.com/engine/v2/item"

# Token management
access_token_info = {}

def get_access_token():
    """Retrieve a new access token."""
    headers = {"Content-Type": "application/x-www-form-urlencoded"}
    data = {"grant_type": "client_credentials", "scope": "api/access"}
    response = requests.post(TOKEN_URL, headers=headers, data=data,
                           auth=(CLIENT_ID, CLIENT_SECRET))
    response.raise_for_status()
    token_data = response.json()
    token_data['expires_at'] = int(time.time()) + token_data['expires_in']
    return token_data

def ensure_valid_token():
    """Ensure we have a valid access token."""
    global access_token_info
    if not access_token_info or time.time() > (access_token_info.get('expires_at', 0) - 60):
        print("Fetching new access token...")
        access_token_info = get_access_token()
    return access_token_info['access_token']

def upload_file(file_path, producer_id=None):
    """Upload a file to the Data Engine."""
    token = ensure_valid_token()
    headers = {"Authorization": f"Bearer {token}"}
    params = {"producer_id": producer_id} if producer_id else {}

    with open(file_path, 'rb') as f:
        files = {"files": (os.path.basename(file_path), f)}
        data = {"publisher_id": PUBLISHER_ID}
        response = requests.post(UPLOAD_URL, headers=headers, files=files, data=data, params=params)

    response.raise_for_status()
    return response.json()

def update_metadata(item_id, **metadata):
    """Update metadata for an uploaded item."""
    token = ensure_valid_token()
    headers = {
        "Authorization": f"Bearer {token}",
        "Content-Type": "application/json"
    }

    data = {"publisher_id": PUBLISHER_ID, "item_id": item_id, **metadata}
    response = requests.post(METADATA_URL, headers=headers, json=data)
    response.raise_for_status()
    return response.json()

def main():
    """Main workflow: upload file and add metadata."""
    file_path = "/path/to/your/document.pdf"

    # Step 1: Upload the file
    print(f"Uploading {file_path}...")
    upload_result = upload_file(file_path, producer_id="my-doc-001")
    print(f"Upload result: {upload_result}")

    if upload_result.get('ingesting'):
        item_id = upload_result['ingesting'][0]

        # Step 2: Add metadata
        print(f"Adding metadata to item {item_id}...")
        metadata_result = update_metadata(
            item_id,
            item_title="My Document Title",
            author=["Author Name"],
            publication_date="2025-01-15",
            item_tags=["documentation", "api"]
        )
        print(f"Metadata result: {metadata_result}")

if __name__ == "__main__":
    main()

Troubleshooting

Error: 401 Unauthorized

Cause: Token expired or invalid credentials. Solution: Refresh your access token and verify your Client ID and Secret.

Error: 403 Forbidden

Cause: Insufficient permissions for the publisher. Solution: Verify you have access to the specified publisher in Gloo AI Studio.

Error: 413 Request Entity Too Large

Cause: File size exceeds the limit. Solution: Check the API limits and consider splitting large files.

Error: Duplicate detected

Cause: A file with identical content already exists. Solution: This is informational - the file won’t be processed again. Check the duplicates array in the response.

Working Code Sample

View Complete Code

Clone or browse the complete working examples for all 6 languages (JavaScript, TypeScript, Python, PHP, Go, Java) with setup instructions.

Next Steps

Now that you can upload files to the Data Engine, explore:

Search API - Query your uploaded content
Chat Integration - Use uploaded content in conversations
Content Controls - Manage and update your content
Upload Files API - For additional API information
Update Item Metadata API - For additional API information

Legacy

​Prerequisites

​Understanding the Upload Files API

​Key Features

​Required Fields

​Supported File Types

​Step 1: Basic Single File Upload

​Expected Response

​Step 2: Multi-File Upload

​Expected Response

​Step 3: Using Producer ID

​Step 4: Adding Metadata to Uploaded Content

​Available Metadata Fields

​Step 5: Verifying Your Upload

​Complete Example

​Troubleshooting

​Error: 401 Unauthorized

​Error: 403 Forbidden

​Error: 413 Request Entity Too Large

​Error: Duplicate detected

​Working Code Sample

View Complete Code

​Next Steps

Prerequisites

Understanding the Upload Files API

Key Features

Required Fields

Supported File Types

Step 1: Basic Single File Upload

Expected Response

Step 2: Multi-File Upload

Expected Response

Step 3: Using Producer ID

Step 4: Adding Metadata to Uploaded Content

Available Metadata Fields

Step 5: Verifying Your Upload

Complete Example

Troubleshooting

Error: 401 Unauthorized

Error: 403 Forbidden

Error: 413 Request Entity Too Large

Error: Duplicate detected

Working Code Sample

Next Steps