Skip to content

Trace

The trace command provides headless trace inspection and analysis — no server or dashboard needed. All data is read from local JSONL result files.

Enumerate evaluation result files from .agentv/results/.

Terminal window
agentv trace list [--limit N] [--format json|table]

Shows filename, test count, pass rate, average score, file size, and timestamp for each result file.

Display evaluation results with trace details.

Terminal window
agentv trace show <result-file> [--test-id <id>] [--tree] [--format json|table]
OptionDescription
--test-idFilter to a specific test ID
--treeShow hierarchical trace tree (requires results with output messages)
--format, -fOutput format: table (default), json

The --tree flag renders tool call traces as a hierarchical tree:

research-question, 15.1s, 10,167 tok, $0.105
├─ tools, 2.4s
│ ├─ WebSearch, 2.1s
│ └─ WebSearch, 1.8s
├─ tavily_search, 3.5s
└─ write_report, 450ms
Scores: response_quality 75% | routing_accuracy 100%

Falls back to a flat summary when output messages are not present in the result file.

Compute summary statistics (percentiles) across evaluation results.

Terminal window
agentv trace stats <result-file> [--group-by target|dataset|test-id] [--format json|table]
OptionDescription
--group-by, -gGroup statistics by: target, dataset, or test-id
--format, -fOutput format: table (default), json

Output shows mean, P50, P90, P95, and P99 for score, latency, cost, tokens, tool calls, and LLM calls.

Metric Mean P50 P90 P95 P99
──────────── ────────── ────────── ────────── ────────── ──────────
score 0.83 0.90 1.00 1.00 1.00
latency_s 11.7 9.5 22.8 25.4 27.5
cost_usd $0.077 $0.065 $0.150 $0.165 $0.177
tokens_total 7,463 7,000 13,367 14,433 15,287

Metrics with no data are omitted automatically.

All commands support --format json for piping to jq:

Terminal window
# Find tests costing more than $0.10
agentv trace show results.jsonl --format json \
| jq '[.[] | select(.trace.cost_usd > 0.10) | {test_id, score, cost: .trace.cost_usd}]'
# Compare providers
agentv trace stats results.jsonl --group-by target --format json \
| jq '.groups[] | {label, score_mean: .metrics.score.mean}'

See examples/features/trace-analysis/ for a complete showcase with sample data.