CLI
The maida CLI runs the bundled demo, scaffolds a project, lists trace-backed runs, starts the local viewer, exports runs to JSON, and gates runs against baselines. Storage is under ~/.maida/ by default (overridable with MAIDA_DATA_DIR). Current runs are identified by OTel trace IDs; the CLI keeps the user-facing argument name RUN_ID for compatibility and accepts short trace ID prefixes. For all configuration options and precedence, see the configuration reference.
Commands that take a run ID (assert, baseline, export, diff) default to the latest run when the ID is omitted. The selected run is announced on stderr so stdout stays machine-readable.
maida demo
Runs a bundled simulated customer-support agent and records a trace. No network, no API keys; all LLM/tool data is canned and nothing leaves your machine.
Usage:
maida demo [--regression]
Options:
| Option | Description |
|---|---|
--regression |
Full story: baseline a known-good run, run a "refactored" agent that loops, calls a new tool, and burns more tokens, then show the failing gate report and a PR-comment preview. Writes the baseline to .maida/baselines/demo-support-agent.json. |
Exit codes: 0 success (including when the demo gate intentionally fails); 10 internal error.
maida init
Scaffolds Maida configuration in the current directory. Never overwrites existing files unless --force is given; safe to re-run.
Usage:
maida init [--github] [--force]
Options:
| Option | Description |
|---|---|
--github |
Also write .github/workflows/maida.yml using the maida-assert action |
--force |
Overwrite existing files |
Files written:
.maida/policy.yaml— commented starter policy (no_loops,no_guardrails,no_new_tools,expect_status: ok, 50% tolerances).github/workflows/maida.yml(with--github) — PR check running your traced agent and posting the regression report as a sticky comment
Exit codes: 0 success; 10 internal error.
maida list
Lists recent runs (by started_at descending).
Usage:
maida list [--limit N] [--json]
Options:
| Option | Default | Description |
|---|---|---|
--limit, -n |
20 | Maximum number of runs to list |
--json |
- | Output machine-readable JSON |
Examples:
maida list
maida list --limit 5
maida list --json
Exit codes: 0 success; 10 internal error.
Text columns: run_id (short trace ID), run_name, started_at, duration_ms, llm_calls, tool_calls, status.
maida view
Starts the local viewer server and optionally opens the browser. Default bind: 127.0.0.1:8712.
Usage:
maida view [RUN_ID] [--host HOST] [--port PORT] [--no-browser] [--json]
Arguments / options:
| Argument/Option | Default | Description |
|---|---|---|
RUN_ID |
(latest) | Run to view; can be a short trace ID prefix (e.g. first 8 hex chars) |
--host, -H |
127.0.0.1 | Bind host |
--port, -p |
8712 | Bind port |
--no-browser |
- | Do not open the browser; only start the server |
--json |
- | Print run_id, url, status as JSON, then start server |
Examples:
maida view
maida view a1b2c3d4
maida view --port 9000 --no-browser
maida view --json
Exit codes: 0 success; 2 run not found (or no runs); 10 internal error.
With --json, output shape:
{
"spec_version": "0.2",
"run_id": "...",
"url": "http://127.0.0.1:8712/?run_id=...",
"status": "serving"
}
The run_id value is the resolved OTel trace ID.
maida export
Exports one run to a single JSON file (run metadata + projected events array).
Usage:
maida export [RUN_ID] --out FILE
Arguments / options:
| Argument/Option | Description |
|---|---|
RUN_ID |
Run to export; can be a short trace ID prefix. Defaults to the latest run when omitted |
--out, -o |
Output file path (JSON) |
Examples:
maida export --out trace-export.json # latest run
maida export a1b2c3d4 -o ./exports/trace-export.json
Exit codes: 0 success; 2 run not found; 10 internal error.
Output file contains: spec_version, run (run metadata), and events (array of event objects projected from spans).
maida baseline
Captures a baseline snapshot from a completed run. The snapshot records structural metrics (event counts, tool path, token usage, duration, etc.) that maida assert can later compare against. See Regression testing for the full workflow.
Usage:
maida baseline [RUN_ID] [--out PATH]
Arguments / options:
| Argument/Option | Default | Description |
|---|---|---|
RUN_ID |
(latest run) | OTel trace ID or prefix to snapshot |
--out, -o |
.maida/baselines/<run_name>.json |
Output path for the baseline JSON file |
Examples:
maida baseline # snapshot the latest run
maida baseline a1b2c3d4 --out baselines/support_agent_v1.json
Exit codes: 0 success; 2 run not found; 10 internal error.
The output file is a JSON object containing schema_version, source_run_id (the resolved OTel trace ID), summary metrics, tool_path, tool_call_counts, llm_models_used, event_type_sequence, and final_status. Check it into version control to share the baseline with your team.
maida assert
Asserts that a completed run meets behavioral policy checks. Returns exit code 0 when all checks pass and 1 when any check fails, making it suitable for CI gates.
Usage:
maida assert [RUN_ID] [options]
Arguments / options:
| Argument/Option | Default | Description |
|---|---|---|
RUN_ID |
(latest run) | OTel trace ID or prefix to check |
--baseline, -b |
- | Baseline JSON file to compare against |
--policy |
.maida/policy.yaml (auto-detected) |
Policy YAML file with assertion thresholds |
--max-steps |
- | Max total events allowed |
--step-tolerance |
0.5 |
Fractional tolerance for step count |
--max-tool-calls |
- | Max tool calls allowed |
--tool-call-tolerance |
0.5 |
Fractional tolerance for tool calls |
--no-new-tools |
false |
Fail if run uses tools not in baseline |
--no-loops |
false |
Fail if any LOOP_WARNING present |
--no-guardrails |
false |
Fail if any guardrail was triggered |
--max-cost-tokens |
- | Max total tokens allowed |
--cost-tolerance |
0.5 |
Fractional tolerance for token cost |
--max-duration-ms |
- | Max run duration in ms |
--duration-tolerance |
0.5 |
Fractional tolerance for duration |
--expect-status |
- | Expected run status (ok or error) |
--format, -f |
text |
Output format: text, json, or markdown |
Precedence: CLI flags override the policy file, which overrides defaults. See the Policy YAML reference for the full override rules and threshold semantics.
Examples:
# Assert the latest run against a baseline with default tolerances
maida assert --baseline .maida/baselines/my_agent.json
# Assert a specific run with standalone thresholds (no baseline)
maida assert a1b2c3d4 --max-steps 80 --max-tool-calls 30 --no-loops
# Assert using a policy file
maida assert --baseline baseline.json --policy ci-policy.yaml
# Markdown output for GitHub PR comments / step summaries
maida assert --baseline baseline.json --format markdown
Exit codes: 0 all checks passed; 1 one or more checks failed; 2 run or baseline not found; 10 internal error.
When a baseline is provided, the markdown report leads with a pass/fail verdict, lists failed checks first with expected vs actual values, collapses passing checks, and embeds a What changed vs baseline section (metric deltas, new/removed tools, model changes) plus a local-repro snippet. The text report appends the structural diff on failure.
maida diff
Compares two runs, or a run against a baseline, showing structural differences in summary metrics, tool path, and event type distribution. Useful for understanding what changed when maida assert reports a failure. See Regression testing for the workflow.
Usage:
maida diff [RUN_A] [RUN_B] [--baseline FILE] [--format FORMAT]
Exactly one of RUN_B or --baseline must be provided.
Arguments / options:
| Argument/Option | Description |
|---|---|
RUN_A |
First OTel trace ID or prefix. Defaults to the latest run when omitted |
RUN_B |
Second OTel trace ID or prefix (mutually exclusive with --baseline) |
--baseline, -b |
Baseline JSON file to compare against (mutually exclusive with RUN_B) |
--format, -f |
Output format: text (default) |
Examples:
# Compare two runs
maida diff a1b2c3d4 e5f6a7b8
# Compare the latest run against a baseline
maida diff --baseline .maida/baselines/my_agent.json
Exit codes: 0 success; 2 run or baseline not found; 10 internal error.
Text output sections:
- Summary — metric-by-metric comparison with percentage change (e.g.
tool_calls: 10 -> 14 (+40%)) - Tool path changes — new (
+) and removed (-) tools - Event type distribution — per-event-type counts with percentage change