Policy YAML reference
A policy file (.agentdbg/policy.yaml) lets teams check assertion thresholds into version control so that agentdbg assert applies consistent checks without long CLI flags.
Resolution order
agentdbg assert resolves the policy in this order:
--policy PATHflag on the CLI (explicit path).agentdbg/policy.yamlin the current working directory (auto-detected)- Empty policy (all checks disabled)
CLI flags (--max-steps, --no-loops, etc.) are then merged on top. CLI values always win over the file — see Override rules below.
File structure
The file is standard YAML. The policy loader reads a single top-level assert: mapping; all other top-level keys are ignored. Unknown keys inside assert: are also ignored.
# .agentdbg/policy.yaml
assert:
# ... assertion fields go here ...
Requires PyYAML (pip install pyyaml or included in agentdbg[yaml]). If PyYAML is not installed, agentdbg assert --policy ... raises a clear RuntimeError.
Assertion fields
All fields are optional. A check is disabled unless at least one relevant value is set (via baseline, policy file, or CLI flag).
Numeric thresholds
| YAML key | Type | Default | CLI flag | Description |
|---|---|---|---|---|
max_steps |
int or null |
null |
--max-steps |
Hard cap on total event count |
step_tolerance |
float |
0.5 |
--step-tolerance |
Fractional tolerance for step count when comparing against a baseline |
max_tool_calls |
int or null |
null |
--max-tool-calls |
Hard cap on tool call count |
tool_call_tolerance |
float |
0.5 |
--tool-call-tolerance |
Fractional tolerance for tool calls |
max_cost_tokens |
int or null |
null |
--max-cost-tokens |
Hard cap on total token count |
cost_tolerance |
float |
0.5 |
--cost-tolerance |
Fractional tolerance for token cost |
max_duration_ms |
int or null |
null |
--max-duration-ms |
Hard cap on run duration in milliseconds |
duration_tolerance |
float |
0.5 |
--duration-tolerance |
Fractional tolerance for duration |
Boolean checks
| YAML key | Type | Default | CLI flag | Description |
|---|---|---|---|---|
no_new_tools |
bool |
false |
--no-new-tools |
Fail if the run uses tools not present in the baseline |
no_loops |
bool |
false |
--no-loops |
Fail if any LOOP_WARNING event was emitted |
no_guardrails |
bool |
false |
--no-guardrails |
Fail if any guardrail event was triggered |
Status check
| YAML key | Type | Default | CLI flag | Description |
|---|---|---|---|---|
expect_status |
string or null |
null |
--expect-status |
Expected run status: "ok" or "error" |
How thresholds work
Tolerances are fractional, not percentage: 0.5 means 50%, 0.2 means 20%.
For each numeric metric (steps, tool calls, tokens, duration), the check depends on what is available:
| Baseline provided? | max_* set? |
Effective limit | Meaning |
|---|---|---|---|
| Yes | Yes | min(baseline * (1 + tolerance), max_*) |
Baseline-relative with a hard cap |
| Yes | No | baseline * (1 + tolerance) |
Baseline-relative only |
| No | Yes | max_* |
Hard cap only |
| No | No | (check disabled) | Nothing to compare against |
The check passes when actual <= limit.
Example: A baseline recorded 40 tool calls. With tool_call_tolerance: 0.25 and max_tool_calls: 60:
- Baseline limit:
40 * 1.25 = 50 - Hard cap:
60 - Effective limit:
min(50, 60) = 50 - A run with 48 tool calls passes; a run with 52 fails.
Override rules
When agentdbg assert loads a policy file and also receives CLI flags, merge_policy applies these rules:
- A CLI value of
None(flag not provided) keeps the file value. - A CLI boolean value of
False(flag not provided) keeps the file value. Only an explicit--no-loops(which sendsTrue) overrides. - Any other non-
NoneCLI value replaces the file value.
This means you can set baseline thresholds in the committed policy file and tighten or loosen individual checks on a per-invocation basis:
# File sets no_loops: true, step_tolerance: 0.3
# CLI overrides max_steps for this specific run
agentdbg assert abc123 --max-steps 100
Full examples
Strict CI policy
# .agentdbg/policy.yaml — checked into the repo
assert:
max_steps: 80
step_tolerance: 0.2
max_tool_calls: 30
tool_call_tolerance: 0.2
max_cost_tokens: 10000
cost_tolerance: 0.1
max_duration_ms: 30000
duration_tolerance: 0.2
no_new_tools: true
no_loops: true
no_guardrails: true
expect_status: ok
Lenient local policy
# .agentdbg/policy.yaml — for local iteration
assert:
step_tolerance: 0.5
tool_call_tolerance: 0.5
no_loops: true
expect_status: ok
Related docs
- Regression testing — end-to-end workflow
- CLI:
agentdbg assert— command reference - Configuration — env vars, YAML config, guardrails