Evals API

REST API endpoints for managing evals. Evals can contain multiple scenarios to create comprehensive test collections. Personas are associated with scenarios, not with evals directly. When running an eval, runs are created for each scenario/persona combination.

Endpoints

Method	Endpoint	Description
GET	`/api/projects/:projectId/evals`	List all evals
GET	`/api/projects/:projectId/evals/:id`	Get eval by ID
POST	`/api/projects/:projectId/evals`	Create a new eval
PUT	`/api/projects/:projectId/evals/:id`	Update an eval
DELETE	`/api/projects/:projectId/evals/:id`	Delete an eval

List Evals

GET /api/projects/:projectId/evals

Response

[
  {
    "id": "987fcdeb-51a2-3bc4-d567-890123456789",
    "name": "Booking Cancellation Test",
    "connectorId": "connector-uuid",
    "scenarioIds": ["scenario-uuid-1", "scenario-uuid-2"],
    "createdAt": "2026-01-28T10:00:00.000Z",
    "updatedAt": "2026-01-28T10:00:00.000Z"
  }
]

Get Eval

GET /api/projects/:projectId/evals/:id
GET /api/projects/:projectId/evals/:id?expand=true

Query Parameters

Parameter	Type	Description
`expand`	boolean	Include scenario details

Response (with expand=true)

{
  "id": "987fcdeb-51a2-3bc4-d567-890123456789",
  "name": "Booking Cancellation Test",
  "connectorId": "connector-uuid",
  "scenarioIds": ["scenario-uuid-1", "scenario-uuid-2"],
  "scenarios": [
    {
      "id": "scenario-uuid-1",
      "name": "Booking Cancellation",
      "instructions": "Customer needs to cancel",
      "maxMessages": 10,
      "successCriteria": "Agent confirms cancellation",
      "failureCriteria": "Agent fails to process"
    },
    {
      "id": "scenario-uuid-2",
      "name": "Booking Modification",
      "instructions": "Customer needs to change date",
      "maxMessages": 10
    }
  ],
  "connector": {
    "id": "connector-uuid",
    "name": "My Agent",
    "type": "langgraph",
    "baseUrl": "https://api.example.com"
  },
  "createdAt": "2026-01-28T10:00:00.000Z",
  "updatedAt": "2026-01-28T10:00:00.000Z"
}

Error Response

{
  "error": "Eval not found"
}

Status Code: 404

Create Eval

POST /api/projects/:projectId/evals
Content-Type: application/json

Request Body

{
  "name": "Booking Cancellation Test",
  "connectorId": "connector-uuid",
  "scenarioIds": ["scenario-uuid-1", "scenario-uuid-2"]
}

Note: LLM provider for evaluation is configured at the project level via project llmSettings.

Fields

Field	Type	Required	Description
`name`	string	Yes	Display name for the eval
`connectorId`	string	Yes	Connector for running this eval
`scenarioIds`	string[]	Yes	Array of scenario IDs (at least one required)

Response

Status Code: 201 Created

{
  "id": "987fcdeb-51a2-3bc4-d567-890123456789",
  "name": "Booking Cancellation Test",
  "connectorId": "connector-uuid",
  "scenarioIds": ["scenario-uuid-1", "scenario-uuid-2"],
  "createdAt": "2026-01-28T10:00:00.000Z",
  "updatedAt": "2026-01-28T10:00:00.000Z"
}

Error Responses

Status	Description
400	Name is required / Connector ID is required / At least one Scenario ID is required
404	Scenario/Connector not found

Update Eval

PUT /api/projects/:projectId/evals/:id
Content-Type: application/json

Request Body

{
  "name": "Updated Eval Name",
  "scenarioIds": ["new-scenario-uuid-1", "new-scenario-uuid-2"]
}

All fields are optional. Only provided fields will be updated.

Response

{
  "id": "987fcdeb-51a2-3bc4-d567-890123456789",
  "name": "Updated Eval Name",
  "connectorId": "connector-uuid",
  "scenarioIds": ["new-scenario-uuid-1", "new-scenario-uuid-2"],
  "createdAt": "2026-01-28T10:00:00.000Z",
  "updatedAt": "2026-01-28T10:30:00.000Z"
}

Error Responses

Status	Description
400	scenarioIds cannot be empty
404	Eval/Scenario/Connector not found

Delete Eval

DELETE /api/projects/:projectId/evals/:id

Response

Status Code: 204 No Content

Error Response

{
  "error": "Eval not found"
}

Status Code: 404

Scenarios and Run Creation

Evals can contain multiple scenarios, allowing you to create comprehensive test collections. When running an eval:

The system iterates through each scenario in scenarioIds
For each scenario, it uses the personas associated with that scenario
A run is created for each scenario/persona combination

For example, if an eval has 2 scenarios, and each scenario has 3 personas, creating runs for the eval produces 6 runs (2 x 3).

Personas are associated with scenarios, not with evals directly. This allows different scenarios to test different persona types.

Endpoints​

List Evals​

Response​

Get Eval​

Query Parameters​

Response (with expand=true)​

Error Response​

Create Eval​

Request Body​

Fields​

Response​

Error Responses​

Update Eval​

Request Body​

Response​

Error Responses​

Delete Eval​

Response​

Error Response​

Scenarios and Run Creation​

Endpoints

List Evals

Response

Get Eval

Query Parameters

Response (with expand=true)

Error Response

Create Eval

Request Body

Fields

Response

Error Responses

Update Eval

Request Body

Response

Error Responses

Delete Eval

Response

Error Response

Scenarios and Run Creation