Skip to content

Custom Assertions

Custom assertions let you add evaluation logic that goes beyond built-in types. Define a TypeScript function, drop it in .agentv/assertions/, and reference it by name in your YAML eval files.

AgentV provides two SDK functions for custom evaluation logic:

FunctionBest ForDiscovery
defineAssertion()Pass/fail checks, reusable assertion typesConvention-based (.agentv/assertions/)
defineCodeJudge()Full scoring control with explicit hits/missesReferenced via type: code_judge + command:

Use defineAssertion() when you want a named assertion type that can be referenced across eval files without specifying a command path. It uses a simplified result contract focused on pass and optional score.

Use defineCodeJudge() when you need full control over scoring with explicit hits/misses arrays, or when the evaluator is a one-off judge tied to a specific eval. See Code Judges for details.

Both functions handle stdin/stdout JSON parsing, snake_case-to-camelCase conversion, Zod validation, and error handling automatically.

Terminal window
npm install @agentv/eval

Place assertion files in .agentv/assertions/ anywhere in your project tree. AgentV walks up from the eval file’s directory to find the nearest .agentv/assertions/ folder.

The filename (without extension) becomes the assertion type name:

.agentv/assertions/word-count.ts --> type: word-count
.agentv/assertions/sentiment.ts --> type: sentiment
.agentv/assertions/has-citation.ts --> type: has-citation

Supported file extensions: .ts, .js, .mts, .mjs.

Custom assertion types cannot override built-in types (contains, equals, is_json, etc.). If a filename matches a built-in, it is silently skipped.

Reference the assertion by type name directly — no command: path needed:

assert:
- type: word-count
- type: contains
value: "Hello"

The simplest pattern returns pass (boolean) and reasoning (string):

.agentv/assertions/word-count.ts
import { defineAssertion } from '@agentv/eval';
export default defineAssertion(({ answer }) => {
const wordCount = answer.trim().split(/\s+/).length;
return {
pass: wordCount >= 3,
reasoning: `Output has ${wordCount} words`,
};
});

When only pass is provided, the score defaults to 1 (pass) or 0 (fail).

Return a score (0 to 1) for granular evaluation instead of binary pass/fail:

.agentv/assertions/efficiency.ts
import { defineAssertion } from '@agentv/eval';
export default defineAssertion(({ answer, trace }) => {
const hasContent = answer.length > 0 ? 0.5 : 0;
const isEfficient = (trace?.eventCount ?? 0) <= 5 ? 0.5 : 0;
return {
score: hasContent + isEfficient,
hits: [
...(hasContent ? ['Has content'] : []),
...(isEfficient ? ['Efficient'] : []),
],
};
});

If pass is omitted but score is provided, pass is derived as score >= 0.5. Scores are clamped to the [0, 1] range.

The handler must return an AssertionScore object:

FieldTypeDescription
passbooleanExplicit pass/fail. If omitted, derived from score (>= 0.5 = pass).
scorenumberNumeric score between 0 and 1. Defaults to 1 if pass=true, 0 if pass=false.
hitsstring[]Aspects that passed.
missesstring[]Aspects that failed.
reasoningstringHuman-readable explanation.
detailsRecord<string, unknown>Optional structured data for domain-specific metrics.

The handler receives an AssertionContext with the same fields as a code judge:

FieldTypeDescription
questionstringThe input question/prompt
criteriastringEvaluation criteria from the test case
answerstringThe agent’s text response
referenceAnswerstringExpected/reference answer
traceTraceSummaryExecution metrics (tool calls, tokens, duration, cost)
inputMessage[]Full resolved input messages
expectedOutputMessage[]Expected output messages
outputMessage[]Actual agent output messages
sidecarRecord<string, unknown>Custom metadata passed through

Test assertions locally by piping JSON to stdin:

Terminal window
echo '{"question":"Say hello","criteria":"Multi-word greeting","answer":"Hello there, nice to meet you!","reference_answer":"","sidecar":{}}' \
| bun run .agentv/assertions/word-count.ts

Expected output:

{
"score": 1,
"hits": [],
"misses": [],
"reasoning": "Output has 6 words (>= 3 required)"
}

For test-driven development, write Vitest tests against your assertion logic directly:

.agentv/assertions/__tests__/word-count.test.ts
import { expect, test } from 'vitest';
// Extract the core logic into a testable function
function checkWordCount(answer: string) {
const wordCount = answer.trim().split(/\s+/).length;
const minWords = 3;
const pass = wordCount >= minWords;
return { pass, wordCount };
}
test('passes with enough words', () => {
const result = checkWordCount('Hello there friend');
expect(result.pass).toBe(true);
});
test('fails with too few words', () => {
const result = checkWordCount('Hi');
expect(result.pass).toBe(false);
});

This example shows the complete flow from assertion definition to YAML eval file.

my-project/
.agentv/
assertions/
word-count.ts
evals/
dataset.eval.yaml
package.json
.agentv/assertions/word-count.ts
#!/usr/bin/env bun
import { defineAssertion } from '@agentv/eval';
export default defineAssertion(({ answer }) => {
const wordCount = answer.trim().split(/\s+/).length;
const minWords = 3;
const pass = wordCount >= minWords;
return {
pass,
score: pass ? 1.0 : Math.min(wordCount / minWords, 0.9),
reasoning: pass
? `Output has ${wordCount} words (>= ${minWords} required)`
: `Output has only ${wordCount} words (need >= ${minWords})`,
};
});
evals/dataset.eval.yaml
name: custom-assertion-demo
description: Demonstrates custom assertions with convention discovery
execution:
target: default
tests:
- id: greeting-response
criteria: Agent gives a multi-word greeting
input: "Say hello and introduce yourself"
expected_output: "Hello! I'm an AI assistant here to help you."
assert:
- type: contains
value: "Hello"
- type: word-count
- id: short-answer
criteria: Agent gives a short but valid response
input: "What is 2+2?"
expected_output: "The answer is 4."
assert:
- type: contains
value: "4"
- type: word-count
Terminal window
npm install @agentv/eval
agentv eval evals/dataset.eval.yaml

Each test produces scores from both the built-in contains assertion and your custom word-count assertion. Results appear in the output JSONL with each evaluator’s score in the scores[] array.