evalstudio eval
Manage evals. Evals can contain multiple scenarios to create comprehensive test collections. Personas are associated with scenarios, not with evals directly. When running an eval, runs are created for each scenario/persona combination.
Note: The CLI currently supports specifying a single scenario via --scenario. For multi-scenario evals, use the REST API or web UI.
Usage
evalstudio eval <command> [options]
Commands
create
Create a new eval.
evalstudio eval create [options]
| Option | Description |
|---|---|
-n, --name <name> | Eval name (required) |
-c, --connector <connector> | Connector ID or name (required) |
--scenario <scenario> | Scenario ID or name (required) |
--json | Output as JSON |
Note: LLM provider for evaluation is configured at the project level in evalstudio.config.json.
Example:
evalstudio eval create \
-n "Booking Test Suite" \
-c "My Agent Connector" \
--scenario "Booking Cancellation"
Output:
Eval created successfully
ID: 987fcdeb-51a2-3bc4-d567-890123456789
Name: Booking Test Suite
Connector: My Agent Connector
Scenario: Booking Cancellation
Success: Agent confirms cancellation and explains refund policy
Failure: Agent fails to process cancellation
Max Msgs: 10
Created: 2026-01-28T10:00:00.000Z
list
List evals.
evalstudio eval list [options]
| Option | Description |
|---|---|
--json | Output as JSON |
Example:
evalstudio eval list
Output:
Evals:
------
Booking Test Suite (987fcdeb-51a2-3bc4-d567-890123456789)
Full Agent Suite (abc12345-6789-def0-1234-567890abcdef)
Scenarios: 3
Note: When an eval has multiple scenarios, the scenario count is displayed.
show
Show eval details.
evalstudio eval show <id> [options]
| Option | Description |
|---|---|
--expand | Include scenario details |
--json | Output as JSON |
Example:
evalstudio eval show 987fcdeb-51a2-3bc4-d567-890123456789 --expand
Output:
Eval: Booking Test Suite
------
ID: 987fcdeb-51a2-3bc4-d567-890123456789
Name: Booking Test Suite
Success: Agent confirms cancellation and explains refund policy
Failure: Agent fails to process cancellation
Max Msgs: 10
Scenarios:
- Booking Cancellation
Customer needs to cancel an appointment
- Booking Modification
Customer needs to change date
Created: 2026-01-28T10:00:00.000Z
Updated: 2026-01-28T10:00:00.000Z
update
Update an eval.
evalstudio eval update <id> [options]
| Option | Description |
|---|---|
-n, --name <name> | New eval name |
--scenario <scenario> | New scenario ID or name (replaces all existing scenarios) |
--connector <connector> | New connector ID or name |
--json | Output as JSON |
Example:
evalstudio eval update 987fcdeb-51a2-3bc4-d567-890123456789 \
-n "Updated Test Suite" \
--scenario "New Scenario"
Note: The --scenario option replaces all existing scenarios with the specified one. For managing multiple scenarios, use the REST API or web UI.
delete
Delete an eval.
evalstudio eval delete <id> [options]
| Option | Description |
|---|---|
--json | Output as JSON |
Example:
evalstudio eval delete 987fcdeb-51a2-3bc4-d567-890123456789
Output:
Eval "Booking Cancellation" deleted successfully
Display Name
Evals are displayed using their name. If no name is set, the eval ID is used.
Scenarios and Run Creation
Evals can contain multiple scenarios. When running an eval:
- The system iterates through each scenario
- For each scenario, it uses the personas associated with that scenario
- A run is created for each scenario/persona combination
For example, if an eval has 2 scenarios, and each scenario has 3 personas, creating runs produces 6 runs (2 x 3).
Personas are associated with scenarios, not with evals directly. This allows different scenarios to test different persona types.
JSON Output
All commands support the --json flag for machine-readable output, useful for scripts and CI/CD pipelines.
evalstudio eval list --json
Output:
[
{
"id": "987fcdeb-51a2-3bc4-d567-890123456789",
"name": "Booking Test Suite",
"connectorId": "connector-uuid",
"scenarioIds": ["scenario-uuid-1", "scenario-uuid-2"],
"createdAt": "2026-01-28T10:00:00.000Z",
"updatedAt": "2026-01-28T10:00:00.000Z"
}
]