Evals API
REST API endpoints for managing evals. Evals can contain multiple scenarios to create comprehensive test collections. Personas are associated with scenarios, not with evals directly. When running an eval, runs are created for each scenario/persona combination.
Endpoints
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/projects/:projectId/evals | List all evals |
| GET | /api/projects/:projectId/evals/:id | Get eval by ID |
| POST | /api/projects/:projectId/evals | Create a new eval |
| PUT | /api/projects/:projectId/evals/:id | Update an eval |
| DELETE | /api/projects/:projectId/evals/:id | Delete an eval |
List Evals
GET /api/projects/:projectId/evals
Response
[
{
"id": "987fcdeb-51a2-3bc4-d567-890123456789",
"name": "Booking Cancellation Test",
"connectorId": "connector-uuid",
"scenarioIds": ["scenario-uuid-1", "scenario-uuid-2"],
"createdAt": "2026-01-28T10:00:00.000Z",
"updatedAt": "2026-01-28T10:00:00.000Z"
}
]
Get Eval
GET /api/projects/:projectId/evals/:id
GET /api/projects/:projectId/evals/:id?expand=true
Query Parameters
| Parameter | Type | Description |
|---|---|---|
expand | boolean | Include scenario details |
Response (with expand=true)
{
"id": "987fcdeb-51a2-3bc4-d567-890123456789",
"name": "Booking Cancellation Test",
"connectorId": "connector-uuid",
"scenarioIds": ["scenario-uuid-1", "scenario-uuid-2"],
"scenarios": [
{
"id": "scenario-uuid-1",
"name": "Booking Cancellation",
"instructions": "Customer needs to cancel",
"maxMessages": 10,
"successCriteria": "Agent confirms cancellation",
"failureCriteria": "Agent fails to process"
},
{
"id": "scenario-uuid-2",
"name": "Booking Modification",
"instructions": "Customer needs to change date",
"maxMessages": 10
}
],
"connector": {
"id": "connector-uuid",
"name": "My Agent",
"type": "langgraph",
"baseUrl": "https://api.example.com"
},
"createdAt": "2026-01-28T10:00:00.000Z",
"updatedAt": "2026-01-28T10:00:00.000Z"
}
Error Response
{
"error": "Eval not found"
}
Status Code: 404
Create Eval
POST /api/projects/:projectId/evals
Content-Type: application/json
Request Body
{
"name": "Booking Cancellation Test",
"connectorId": "connector-uuid",
"scenarioIds": ["scenario-uuid-1", "scenario-uuid-2"]
}
Note: LLM provider for evaluation is configured at the project level via project llmSettings.
Fields
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Display name for the eval |
connectorId | string | Yes | Connector for running this eval |
scenarioIds | string[] | Yes | Array of scenario IDs (at least one required) |
Response
Status Code: 201 Created
{
"id": "987fcdeb-51a2-3bc4-d567-890123456789",
"name": "Booking Cancellation Test",
"connectorId": "connector-uuid",
"scenarioIds": ["scenario-uuid-1", "scenario-uuid-2"],
"createdAt": "2026-01-28T10:00:00.000Z",
"updatedAt": "2026-01-28T10:00:00.000Z"
}
Error Responses
| Status | Description |
|---|---|
| 400 | Name is required / Connector ID is required / At least one Scenario ID is required |
| 404 | Scenario/Connector not found |
Update Eval
PUT /api/projects/:projectId/evals/:id
Content-Type: application/json
Request Body
{
"name": "Updated Eval Name",
"scenarioIds": ["new-scenario-uuid-1", "new-scenario-uuid-2"]
}
All fields are optional. Only provided fields will be updated.
Response
{
"id": "987fcdeb-51a2-3bc4-d567-890123456789",
"name": "Updated Eval Name",
"connectorId": "connector-uuid",
"scenarioIds": ["new-scenario-uuid-1", "new-scenario-uuid-2"],
"createdAt": "2026-01-28T10:00:00.000Z",
"updatedAt": "2026-01-28T10:30:00.000Z"
}
Error Responses
| Status | Description |
|---|---|
| 400 | scenarioIds cannot be empty |
| 404 | Eval/Scenario/Connector not found |
Delete Eval
DELETE /api/projects/:projectId/evals/:id
Response
Status Code: 204 No Content
Error Response
{
"error": "Eval not found"
}
Status Code: 404
Scenarios and Run Creation
Evals can contain multiple scenarios, allowing you to create comprehensive test collections. When running an eval:
- The system iterates through each scenario in
scenarioIds - For each scenario, it uses the personas associated with that scenario
- A run is created for each scenario/persona combination
For example, if an eval has 2 scenarios, and each scenario has 3 personas, creating runs for the eval produces 6 runs (2 x 3).
Personas are associated with scenarios, not with evals directly. This allows different scenarios to test different persona types.