Skip to main content

Evals API

REST API endpoints for managing evals. Evals can contain multiple scenarios to create comprehensive test collections. Personas are associated with scenarios, not with evals directly. When running an eval, runs are created for each scenario/persona combination.

Endpoints

MethodEndpointDescription
GET/api/projects/:projectId/evalsList all evals
GET/api/projects/:projectId/evals/:idGet eval by ID
POST/api/projects/:projectId/evalsCreate a new eval
PUT/api/projects/:projectId/evals/:idUpdate an eval
DELETE/api/projects/:projectId/evals/:idDelete an eval

List Evals

GET /api/projects/:projectId/evals

Response

[
{
"id": "987fcdeb-51a2-3bc4-d567-890123456789",
"name": "Booking Cancellation Test",
"connectorId": "connector-uuid",
"scenarioIds": ["scenario-uuid-1", "scenario-uuid-2"],
"createdAt": "2026-01-28T10:00:00.000Z",
"updatedAt": "2026-01-28T10:00:00.000Z"
}
]

Get Eval

GET /api/projects/:projectId/evals/:id
GET /api/projects/:projectId/evals/:id?expand=true

Query Parameters

ParameterTypeDescription
expandbooleanInclude scenario details

Response (with expand=true)

{
"id": "987fcdeb-51a2-3bc4-d567-890123456789",
"name": "Booking Cancellation Test",
"connectorId": "connector-uuid",
"scenarioIds": ["scenario-uuid-1", "scenario-uuid-2"],
"scenarios": [
{
"id": "scenario-uuid-1",
"name": "Booking Cancellation",
"instructions": "Customer needs to cancel",
"maxMessages": 10,
"successCriteria": "Agent confirms cancellation",
"failureCriteria": "Agent fails to process"
},
{
"id": "scenario-uuid-2",
"name": "Booking Modification",
"instructions": "Customer needs to change date",
"maxMessages": 10
}
],
"connector": {
"id": "connector-uuid",
"name": "My Agent",
"type": "langgraph",
"baseUrl": "https://api.example.com"
},
"createdAt": "2026-01-28T10:00:00.000Z",
"updatedAt": "2026-01-28T10:00:00.000Z"
}

Error Response

{
"error": "Eval not found"
}

Status Code: 404

Create Eval

POST /api/projects/:projectId/evals
Content-Type: application/json

Request Body

{
"name": "Booking Cancellation Test",
"connectorId": "connector-uuid",
"scenarioIds": ["scenario-uuid-1", "scenario-uuid-2"]
}

Note: LLM provider for evaluation is configured at the project level via project llmSettings.

Fields

FieldTypeRequiredDescription
namestringYesDisplay name for the eval
connectorIdstringYesConnector for running this eval
scenarioIdsstring[]YesArray of scenario IDs (at least one required)

Response

Status Code: 201 Created

{
"id": "987fcdeb-51a2-3bc4-d567-890123456789",
"name": "Booking Cancellation Test",
"connectorId": "connector-uuid",
"scenarioIds": ["scenario-uuid-1", "scenario-uuid-2"],
"createdAt": "2026-01-28T10:00:00.000Z",
"updatedAt": "2026-01-28T10:00:00.000Z"
}

Error Responses

StatusDescription
400Name is required / Connector ID is required / At least one Scenario ID is required
404Scenario/Connector not found

Update Eval

PUT /api/projects/:projectId/evals/:id
Content-Type: application/json

Request Body

{
"name": "Updated Eval Name",
"scenarioIds": ["new-scenario-uuid-1", "new-scenario-uuid-2"]
}

All fields are optional. Only provided fields will be updated.

Response

{
"id": "987fcdeb-51a2-3bc4-d567-890123456789",
"name": "Updated Eval Name",
"connectorId": "connector-uuid",
"scenarioIds": ["new-scenario-uuid-1", "new-scenario-uuid-2"],
"createdAt": "2026-01-28T10:00:00.000Z",
"updatedAt": "2026-01-28T10:30:00.000Z"
}

Error Responses

StatusDescription
400scenarioIds cannot be empty
404Eval/Scenario/Connector not found

Delete Eval

DELETE /api/projects/:projectId/evals/:id

Response

Status Code: 204 No Content

Error Response

{
"error": "Eval not found"
}

Status Code: 404

Scenarios and Run Creation

Evals can contain multiple scenarios, allowing you to create comprehensive test collections. When running an eval:

  1. The system iterates through each scenario in scenarioIds
  2. For each scenario, it uses the personas associated with that scenario
  3. A run is created for each scenario/persona combination

For example, if an eval has 2 scenarios, and each scenario has 3 personas, creating runs for the eval produces 6 runs (2 x 3).

Personas are associated with scenarios, not with evals directly. This allows different scenarios to test different persona types.