Skip to main content
This recipe shows how to upload files directly to the Gloo AI Data Engine using the Upload Files API. You’ll learn to upload single files, batch multiple files, associate custom IDs, and add metadata to your content. Upload once. Search instantly. Build faster. The Upload Files API lets you send any file directly to Gloo for real-time processing and indexing. Within minutes, your content becomes searchable, context-aware, and ready for AI-driven interactions.

Prerequisites

Before starting, ensure you have:
The Upload Files API requires Bearer token authentication. If you haven’t set up authentication yet, follow the Authentication Tutorial to learn how to exchange your credentials for access tokens and manage token expiration.

Understanding the Upload Files API

The Upload Files API allows you to upload documents that get processed and made available for search and AI interaction. The primary endpoint is: POST /ingestion/v2/files

Key Features

  • Direct File Upload: Send files via multipart/form-data
  • Multi-File Support: Upload multiple files in a single request
  • Automatic Processing: Files are parsed, indexed, and made searchable
  • Format Detection: Automatic detection and processing of various file types
  • Duplicate Detection: Identifies and reports duplicate content

Required Fields

The Upload Files API requires two form fields:
  • publisher_id: Your publisher ID from Gloo AI Studio
  • files: The file(s) to upload (use the field name files, not file)

Supported File Types

  • PDF documents
  • Microsoft Word (.doc, .docx)
  • Plain text (.txt)
  • Markdown (.md)
  • And more

Step 1: Basic Single File Upload

Let’s start with uploading a single file. This demonstrates the core API call with proper authentication.
curl -X POST https://platform.ai.gloo.com/ingestion/v2/files \
  -H "Authorization: Bearer $GLOO_ACCESS_TOKEN" \
  -F "publisher_id=$GLOO_PUBLISHER_ID" \
  -F "files=@/path/to/your/document.pdf"

Expected Response

{
    "success": true,
    "message": "File processing started in background.",
    "ingesting": [
        "c999008e-de60-495c-8c9f-6a4b59cdb04b"
    ],
    "duplicates": []
}

Step 2: Multi-File Upload

You can upload multiple files in a single request for better efficiency.
You can mix file types freely - the endpoint automatically detects and processes each format.
curl -X POST https://platform.ai.gloo.com/ingestion/v2/files \
  -H "Authorization: Bearer $GLOO_ACCESS_TOKEN" \
  -F "publisher_id=$GLOO_PUBLISHER_ID" \
  -F "files=@/path/to/document1.pdf" \
  -F "files=@/path/to/document2.pdf" \
  -F "files=@/path/to/document3.docx"

Expected Response

{
    "success": true,
    "message": "File processing started in background.",
    "ingesting": [
        "c999008e-de60-495c-8c9f-6a4b59cdb04b",
        "b10e85d8-243d-46e2-9504-d93874a9ebcb",
        "b45058a8-2f8c-4a88-8aba-7adb4afcd38d"
    ],
    "duplicates": []
}

Step 3: Using Producer ID

The producer_id query parameter lets you associate your internal ID with an uploaded file. This is useful for tracking and updating content from your system.
If the producer_id field is supplied with a multi-file upload, the ID will be ignored. Producer IDs must have a one-to-one relationship with a file.
curl -X POST "https://platform.ai.gloo.com/ingestion/v2/files?producer_id=my-internal-id-12345" \
  -H "Authorization: Bearer $GLOO_ACCESS_TOKEN" \
  -F "publisher_id=$GLOO_PUBLISHER_ID" \
  -F "files=@/path/to/your/document.pdf"

Step 4: Adding Metadata to Uploaded Content

After uploading files, you can add or update metadata using the Update Item Metadata endpoint. Identify content by either Gloo’s item_id (returned from the upload) or your producer_id.
Only the fields you include are updated; omitted fields remain unchanged.
curl -X POST https://platform.ai.gloo.com/engine/v2/item \
  -H "Authorization: Bearer $GLOO_ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "publisher_id": "your-publisher-id",
    "item_id": "c999008e-de60-495c-8c9f-6a4b59cdb04b",
    "item_title": "Document Title",
    "item_subtitle": "A brief subtitle",
    "author": ["John Doe", "Jane Smith"],
    "publication_date": "2025-01-15",
    "item_tags": ["category1", "category2"]
  }'

Available Metadata Fields

FieldTypeDescription
publisher_idstringRequired. Your publisher ID
item_idstringGloo’s item ID (from upload response)
producer_idstringYour internal ID
item_titlestringDocument title
item_subtitlestringDocument subtitle
file_namestringOriginal filename
publication_datestringPublication date (YYYY-MM-DD)
item_imagestringImage URL
item_urlstringSource URL
item_summarystringBrief summary
authorstring[]List of authors
item_tagsstring[]Categorization tags
If you provide both item_id and producer_id, the service will prefer item_id when loading the target item. This allows for the producer_id to be updated.

Step 5: Verifying Your Upload

After uploading content, you can verify it in Gloo AI Studio:
  1. Log in to Gloo AI Studio
  2. Navigate to the Data Engine section from the main sidebar
  3. Click on Your Data
You’ll see your uploaded files with their processing status and metadata.

Complete Example

Here’s a complete example that combines authentication, file upload, and metadata update:
"""
Complete file upload workflow for Gloo AI Data Engine.
"""
import requests
import time
import os
from dotenv import load_dotenv

load_dotenv()

# Configuration
CLIENT_ID = os.getenv("GLOO_CLIENT_ID")
CLIENT_SECRET = os.getenv("GLOO_CLIENT_SECRET")
PUBLISHER_ID = os.getenv("GLOO_PUBLISHER_ID")

TOKEN_URL = "https://platform.ai.gloo.com/oauth2/token"
UPLOAD_URL = "https://platform.ai.gloo.com/ingestion/v2/files"
METADATA_URL = "https://platform.ai.gloo.com/engine/v2/item"

# Token management
access_token_info = {}

def get_access_token():
    """Retrieve a new access token."""
    headers = {"Content-Type": "application/x-www-form-urlencoded"}
    data = {"grant_type": "client_credentials", "scope": "api/access"}
    response = requests.post(TOKEN_URL, headers=headers, data=data,
                           auth=(CLIENT_ID, CLIENT_SECRET))
    response.raise_for_status()
    token_data = response.json()
    token_data['expires_at'] = int(time.time()) + token_data['expires_in']
    return token_data

def ensure_valid_token():
    """Ensure we have a valid access token."""
    global access_token_info
    if not access_token_info or time.time() > (access_token_info.get('expires_at', 0) - 60):
        print("Fetching new access token...")
        access_token_info = get_access_token()
    return access_token_info['access_token']

def upload_file(file_path, producer_id=None):
    """Upload a file to the Data Engine."""
    token = ensure_valid_token()
    headers = {"Authorization": f"Bearer {token}"}
    params = {"producer_id": producer_id} if producer_id else {}

    with open(file_path, 'rb') as f:
        files = {"files": (os.path.basename(file_path), f)}
        data = {"publisher_id": PUBLISHER_ID}
        response = requests.post(UPLOAD_URL, headers=headers, files=files, data=data, params=params)

    response.raise_for_status()
    return response.json()

def update_metadata(item_id, **metadata):
    """Update metadata for an uploaded item."""
    token = ensure_valid_token()
    headers = {
        "Authorization": f"Bearer {token}",
        "Content-Type": "application/json"
    }

    data = {"publisher_id": PUBLISHER_ID, "item_id": item_id, **metadata}
    response = requests.post(METADATA_URL, headers=headers, json=data)
    response.raise_for_status()
    return response.json()

def main():
    """Main workflow: upload file and add metadata."""
    file_path = "/path/to/your/document.pdf"

    # Step 1: Upload the file
    print(f"Uploading {file_path}...")
    upload_result = upload_file(file_path, producer_id="my-doc-001")
    print(f"Upload result: {upload_result}")

    if upload_result.get('ingesting'):
        item_id = upload_result['ingesting'][0]

        # Step 2: Add metadata
        print(f"Adding metadata to item {item_id}...")
        metadata_result = update_metadata(
            item_id,
            item_title="My Document Title",
            author=["Author Name"],
            publication_date="2025-01-15",
            item_tags=["documentation", "api"]
        )
        print(f"Metadata result: {metadata_result}")

if __name__ == "__main__":
    main()

Troubleshooting

Error: 401 Unauthorized

Cause: Token expired or invalid credentials. Solution: Refresh your access token and verify your Client ID and Secret.

Error: 403 Forbidden

Cause: Insufficient permissions for the publisher. Solution: Verify you have access to the specified publisher in Gloo AI Studio.

Error: 413 Request Entity Too Large

Cause: File size exceeds the limit. Solution: Check the API limits and consider splitting large files.

Error: Duplicate detected

Cause: A file with identical content already exists. Solution: This is informational - the file won’t be processed again. Check the duplicates array in the response.

Next Steps

Now that you can upload files to the Data Engine, explore:
  1. Search API - Query your uploaded content
  2. Chat Integration - Use uploaded content in conversations
  3. Content Controls - Manage and update your content
  4. Upload Files API - For additional API information
  5. Update Item Metadata API - For additional API information