Observe & Operations

The Observe section is the operational heart of Burgundy. It combines real-time event monitoring with an operations center for triage, execution traces for debugging, AI-powered analysis for diagnostics, and operator notes for institutional context.

Operations Center

The Operations Center (/ops) surfaces items requiring operator action, prioritized by severity. Failed runs, budget breaches, gateway outages, and escalated approvals all appear here.

Action	How
Triage	Items sorted by severity (critical first, then warning, then info)
Acknowledge	Click to suppress an item temporarily. Reappears if the condition persists
Add notes	Attach context for other operators investigating the same issue
Navigate	Each item links directly to the source resource (run, agent, gateway)

Attention Item Types

Critical

Gateway offline, infrastructure failures — affect all agents on the gateway

Warning

Run failures, budget pauses, gateway degradation, policy violations

Info

Aging approvals, unclassified failures

Correlated Changes

6-hour lookback for deployments, config changes, and policy updates that may explain the issue

Event Timeline

The event timeline (/observe) provides a unified view of all platform events. It merges Convex platform events with Bridge SSE into a single filterable stream.

Filter by category — executions, governance decisions, lifecycle transitions, security events
Filter by time range — 5m, 15m, 1h, 24h, or all
Filter by resource — scope to a specific run, agent, or gateway
Filter by actor — see all actions by a specific user or agent
Search — free-text search across event payloads

Events link directly to their source resource. Click any event to see the full detail payload.

Traces

Step-level execution traces for debugging workflow runs. For each step, the trace shows:

Detail	Description
Timing breakdown	Queued, started, completed timestamps with duration
Conversation log	The agent’s full message transcript for that step
Tool calls	Every tool invocation with arguments and results
Outputs	The step’s produced outputs and artifacts
Failure detail	Error message and stack trace for failed steps

Traces are essential for diagnosing why a step failed, took too long, or produced unexpected results.

Copilot Analysis

AI-powered run analysis for failed or anomalous runs. When a run fails or exhibits unusual behavior, trigger Copilot analysis from the run detail page. Copilot examines the execution trace — step sequence, tool calls, errors, timing — and suggests root causes. Analysis runs in a dedicated thread and produces a structured report with findings and suggested next steps. Useful when the failure isn’t obvious from the trace alone.

Operator Notes

Attach notes to any resource — a run, an agent, a factory. Notes are shared across the team for institutional context and persist across sessions. Use notes for:

Post-mortem annotations on failed runs
Configuration rationale on agent settings
Handoff context when transferring operational responsibility

Dashboard Guide

​Operations Center

​Event Timeline

​Traces

​Copilot Analysis

​Operator Notes

Operations Center

Event Timeline

Traces

Copilot Analysis

Operator Notes