Skip to content

CLI

The agentdbg CLI lists runs, starts the local viewer, and exports runs to JSON. Storage is under ~/.agentdbg/ by default (overridable with AGENTDBG_DATA_DIR). For all configuration options and precedence, see the configuration reference.


agentdbg list

Lists recent runs (by started_at descending).

Usage:

agentdbg list [--limit N] [--json]

Options:

Option Default Description
--limit, -n 20 Maximum number of runs to list
--json - Output machine-readable JSON

Examples:

agentdbg list
agentdbg list --limit 5
agentdbg list --json

Exit codes: 0 success; 10 internal error.

Text columns: run_id (short), run_name, started_at, duration_ms, llm_calls, tool_calls, status.


agentdbg view

Starts the local viewer server and optionally opens the browser. Default bind: 127.0.0.1:8712.

Usage:

agentdbg view [RUN_ID] [--host HOST] [--port PORT] [--no-browser] [--json]

Arguments / options:

Argument/Option Default Description
RUN_ID (latest) Run to view; can be a short prefix (e.g. first 8 chars of UUID)
--host, -H 127.0.0.1 Bind host
--port, -p 8712 Bind port
--no-browser - Do not open the browser; only start the server
--json - Print run_id, url, status as JSON, then start server

Examples:

agentdbg view
agentdbg view a1b2c3d4
agentdbg view --port 9000 --no-browser
agentdbg view --json

Exit codes: 0 success; 2 run not found (or no runs); 10 internal error.

With --json, output shape: {"spec_version":"0.1","run_id":"...","url":"http://127.0.0.1:8712/?run_id=...","status":"serving"}.


agentdbg export

Exports one run to a single JSON file (run metadata + events array).

Usage:

agentdbg export RUN_ID --out FILE

Arguments / options:

Argument/Option Description
RUN_ID Run to export; can be a short prefix (e.g. first 8 chars of UUID)
--out, -o Output file path (JSON)

Examples:

agentdbg export a1b2c3d4-1234-5678-90ab-cdef12345678 --out run.json
agentdbg export a1b2c3d4 -o ./exports/run.json

Exit codes: 0 success; 2 run not found; 10 internal error.

Output file contains: spec_version, run (run metadata), events (array of event objects).


agentdbg baseline

Captures a baseline snapshot from a completed run. The snapshot records structural metrics (event counts, tool path, token usage, duration, etc.) that agentdbg assert can later compare against. See Regression testing for the full workflow.

Usage:

agentdbg baseline RUN_ID [--out PATH]

Arguments / options:

Argument/Option Default Description
RUN_ID (required) Run ID or prefix to snapshot
--out, -o .agentdbg/baselines/<run_name>.json Output path for the baseline JSON file

Examples:

agentdbg baseline a1b2c3d4
agentdbg baseline a1b2c3d4 --out baselines/support_agent_v1.json

Exit codes: 0 success; 2 run not found; 10 internal error.

The output file is a JSON object containing schema_version, source_run_id, summary metrics, tool_path, tool_call_counts, llm_models_used, event_type_sequence, and final_status. Check it into version control to share the baseline with your team.


agentdbg assert

Asserts that a completed run meets behavioral policy checks. Returns exit code 0 when all checks pass and 1 when any check fails, making it suitable for CI gates.

Usage:

agentdbg assert RUN_ID [options]

Arguments / options:

Argument/Option Default Description
RUN_ID (required) Run ID or prefix to check
--baseline, -b - Baseline JSON file to compare against
--policy .agentdbg/policy.yaml (auto-detected) Policy YAML file with assertion thresholds
--max-steps - Max total events allowed
--step-tolerance 0.5 Fractional tolerance for step count
--max-tool-calls - Max tool calls allowed
--tool-call-tolerance 0.5 Fractional tolerance for tool calls
--no-new-tools false Fail if run uses tools not in baseline
--no-loops false Fail if any LOOP_WARNING present
--no-guardrails false Fail if any guardrail was triggered
--max-cost-tokens - Max total tokens allowed
--cost-tolerance 0.5 Fractional tolerance for token cost
--max-duration-ms - Max run duration in ms
--duration-tolerance 0.5 Fractional tolerance for duration
--expect-status - Expected run status (ok or error)
--format, -f text Output format: text, json, or markdown

Precedence: CLI flags override the policy file, which overrides defaults. See the Policy YAML reference for the full override rules and threshold semantics.

Examples:

# Assert against a baseline with default tolerances
agentdbg assert a1b2c3d4 --baseline .agentdbg/baselines/my_agent.json

# Assert with standalone thresholds (no baseline)
agentdbg assert a1b2c3d4 --max-steps 80 --max-tool-calls 30 --no-loops

# Assert using a policy file
agentdbg assert a1b2c3d4 --baseline baseline.json --policy ci-policy.yaml

# Markdown output for GitHub step summaries
agentdbg assert a1b2c3d4 --baseline baseline.json --format markdown

Exit codes: 0 all checks passed; 1 one or more checks failed; 2 run or baseline not found; 10 internal error.


agentdbg diff

Compares two runs, or a run against a baseline, showing structural differences in summary metrics, tool path, and event type distribution. Useful for understanding what changed when agentdbg assert reports a failure. See Regression testing for the workflow.

Usage:

agentdbg diff RUN_A [RUN_B] [--baseline FILE] [--format FORMAT]

Exactly one of RUN_B or --baseline must be provided.

Arguments / options:

Argument/Option Description
RUN_A First run ID or prefix
RUN_B Second run ID or prefix (mutually exclusive with --baseline)
--baseline, -b Baseline JSON file to compare against (mutually exclusive with RUN_B)
--format, -f Output format: text (default)

Examples:

# Compare two runs
agentdbg diff a1b2c3d4 e5f6a7b8

# Compare a run against a baseline
agentdbg diff a1b2c3d4 --baseline .agentdbg/baselines/my_agent.json

Exit codes: 0 success; 2 run or baseline not found; 10 internal error.

Text output sections:

  • Summary — metric-by-metric comparison with percentage change (e.g. tool_calls: 10 -> 14 (+40%))
  • Tool path changes — new (+) and removed (-) tools
  • Event type distribution — per-event-type counts with percentage change