Skip to main content
Forge produces observability from four independent signal sources. Each source captures a different layer of platform activity β€” from ephemeral gateway telemetry to durable audit records. These signals are normalized into a unified event envelope, then rendered across dedicated operator surfaces. The result is a continuous, multi-layered view of everything happening across your agent fleet. No single signal tells the full story. The observe layer merges live and historical events into a unified timeline. The analytics layer tracks cost and token economics. The audit layer provides the compliance-grade persistent record. Together, they give operators full visibility from the moment a step is dispatched to the moment its cost is attributed. See the Burgundy Dashboard Guide for the operator experience.

πŸ—οΈ Event Architecture

UNIFIED EVENT PIPELINE

Gateway Events

Runtime telemetry via SSE. Lifecycle, tools, activity.

ephemeral
Bridge Events

Transport layer. 38 event types. Dispatch, CRUD, security.

SSE + selective persist
Platform Events

Business logic in Convex. Workflows, governance, approvals.

365-day TTL
Convex Runtime

System-level log stream. Functions, OCC, console.

ephemeral
↓
Normalization Layer. event-normalizer.ts

All events converted to NormalizedEvent envelope: id, source, category, timestamp, payload. Three normalizers (gateway, bridge, runtime). 16 semantic categories.

operator surfaces↓
/observe

Unified live timeline

SSE + audit events merged

/analytics

Cost and token economics

Aggregates + time-series

/audit

Persistent compliance record

365-day append-only table

Bridge Audit Relay22 event types selectively persisted to audit table

πŸ“‘ Four Event Sources

Every observable signal in the platform originates from one of four sources. Each has different transport characteristics, lifetime guarantees, and use cases.
SourceOriginTransportLifetimePrimary Surface
Gateway EventsGateway WebSocket relaySSE (ephemeral)In-memory buffer (500 events)/observe
Bridge EventsBridge serverSSE (ephemeral)In-memory buffer + selective persistence/observe, /audit
Platform EventsConvex mutationsAppend-only table365-day TTL/audit
Convex RuntimeConvex log streamSSE (ephemeral)In-memory buffer/observe
Ephemeral events (gateway, bridge, Convex runtime) live only in the browser’s in-memory buffer. Selected bridge events are persisted to the audit table by the Bridge Audit Relay β€” a render-nothing React component that subscribes to the SSE wildcard and writes 22 audit-worthy event types to Convex.

πŸ“Ί Live Monitoring

The observe layer merges two event streams into a single chronological timeline:
  • SSE events from the useUnifiedEvents hook β€” gateway telemetry, bridge activity, and Convex runtime logs, all normalized into the NormalizedEvent envelope
  • Convex audit events from auditEvents.byTimeRange β€” persistent platform events converted into the same envelope via auditToNormalized()
Both streams are deduplicated by ID and sorted ascending (newest at bottom). The page auto-scrolls to the bottom with pin-to-bottom behavior that disengages on manual scroll-up.

Filtering

The observe layer provides four facets for narrowing the view:
FacetOptions
SourceAll, Gateway, Bridge, Convex Audit, Convex Backend
Time range5m, 15m, 1h, 24h, All
Category16 toggle-able categories (convex_function defaults to off)
Text searchMatches against event name, display label, agent ID, resource ID
Category filters are applied client-side against the full set of events returned by the time-range query, so switching between categories is instant.
Operator interfaces can pause and resume the SSE stream during incident investigation, preventing new events from pushing the event under examination off-screen.

πŸ“Š Activity Stream

The /logs page provides a terminal-style view of gateway log output, displaying BridgeLogPayload events with level, message, and agent context. This is the raw diagnostic feed from the gateway runtime. For a higher-level operational view, the InterventionCenter sidebar component surfaces items requiring immediate operator attention:
  • Pending approvals with wait-time display
  • Recent failures filtered by workflow_failed, step_failed, safety_gate_blocked, deployment_failed
The InterventionCenter queries auditEvents.recent and the pending approvals table to provide at-a-glance health awareness without navigating away from the current page.

πŸ”§ Event Normalization

All events flowing through the platform are converted into a single NormalizedEvent envelope:
interface NormalizedEvent {
  id: string;                 // Unique, usable as React key
  source: "gateway" | "bridge" | "convex" | "convex_runtime";
  category: EventCategory;    // 16 semantic categories
  event: string;              // Original event name
  displayLabel: string;       // Human-readable label
  ts: number;                 // Unix ms timestamp
  agentId?: string;
  resourceId?: string;
  resourceType?: string;
  payload: Record<string, unknown>;
}
Three normalizer functions handle the different source types:
NormalizerSourceKey Behavior
normalizeGatewayEventGateway WebSocketHandles double-nested payload.data.data structure. Refines the "agent" event type into lifecycle, tool, assistant, thinking, error by inspecting the stream sub-field.
normalizeBridgeEventBridge SSEUnwraps the bridge EventBus wrapper pattern. Prefixes event names with bridge: for category classification.
normalizeConvexRuntimeEventConvex log streamSets source to "convex_runtime". Generates display labels highlighting anomalies: function failures, OCC conflicts, slow execution, console errors.
The 16 event categories are: lifecycle, tool, chat, assistant, thinking, error, cron, system, agent_crud, skill, security, session, step, process, convex_console, convex_function.

BridgeThe Bridge Audit Relay

The Bridge Audit Relay is the critical seam between ephemeral and persistent observability. It is a render-nothing React component that subscribes to the bridge SSE wildcard ("*") and selectively persists events to the Convex auditEvents table. What gets persisted: 22 event types organized in two layers:
  • Layer 1 β€” Platform governance (13 events): Agent CRUD (5), skill changes (3), security posture, gateway health, config reload, session compaction lifecycle (2)
  • Layer 2 β€” Agent execution (7 events): agent.lifecycle.start/end/error, agent.tool.start/end/result, agent.activity
  • Convex runtime (2 events): convex.console (ERROR only), convex.function (failures, OCC conflicts, and system limit warnings only)
The Bridge Audit Relay is only mounted when the auth shell phase is authenticated_ready (admin). Non-admin users never trigger audit writes. If no admin browser tab is open, ephemeral events that would normally be persisted are lost.

πŸ’° Cost Tracking

Token cost flows through a four-stage pipeline:
1

Capture

Every LLM interaction creates a tokenEvents record with input, output, thinking, cache-read, and cache-write token counts. Cost is computed at write time using live pricing from the Convex Model Registry, stored as costAtCapture with a pricingSnapshot for auditability.
2

Hourly Aggregation

A cron runs every hour, scanning the last 2-hour window of tokenEvents and grouping them into costAggregates documents keyed by time bucket, agent, and model dimensions.
3

Daily Aggregation

A second cron runs every 24 hours, scanning the last 2 days of raw events into daily buckets.
4

Weekly Compaction

A Sunday 04:00 UTC cron collapses old fine-grained buckets: hourly aggregates older than 30 days roll into daily, daily aggregates older than 90 days roll into weekly.
The tokenEvents.summary query intelligently combines pre-computed aggregates for full-hour windows with raw event scans for partial hours at range boundaries, keeping arbitrary time-range queries fast even with millions of events.

πŸ’š Gateway Health

A Convex cron runs every 60 seconds, pinging GET /api/health on every registered gateway with a 10-second timeout. All gateways are pinged in parallel via Promise.allSettled.
ResponseStatus
HTTP 200 with status: "degraded" in bodydegraded
HTTP 200 (any other body)healthy
Any error or timeoutoffline
The lastHealthCheck timestamp is written on every check for recency tracking. When the status actually changes, a transition is recorded in gatewayHealthHistory for the 90-day audit trail.

πŸ—‚οΈ Data Retention

Data TypeRetentionManaged By
Audit events365 daysDaily cron at 03:15 UTC
Token events (raw)90 daysDaily cron at 03:00 UTC
Hourly cost aggregates30 days (then rolled into daily)Weekly compaction cron
Daily cost aggregates90 days (then rolled into weekly)Weekly compaction cron
Gateway health history90 daysRetention cron
SSE events (ephemeral)In-memory onlyBrowser session

For trace-level execution visibility, see Traces. For cost analytics and budgets, see Analytics & Cost. For the compliance audit log, see Audit Trail.