Skip to main content

Introduction

EvalStudio is a flexible evaluation platform for testing chatbots, AI agents, and REST APIs. Run multi-turn conversation tests or structured JSON evaluations, assess responses with LLM-as-judge, and integrate into your CI/CD pipeline.

Key Features

  • Multi-turn conversation testing - Define personas, scenarios, and seed messages to simulate realistic interactions
  • Multiple interfaces - CLI for developers/CI, Web UI for teams, REST API for automation
  • Connectors - Test LangGraph agents via configurable endpoints
  • LLM-as-judge evaluation - Evaluate agent responses against success and failure criteria using LLM
  • Concurrent execution - Run evaluations in parallel with configurable concurrency
  • Git-friendly - Tests stored as JSON files, works seamlessly with version control

Packages

PackageDescription
@evalstudio/coreCore evaluation engine (zero dependencies)
@evalstudio/cliCLI — bundles API and Web UI via evalstudio serve
@evalstudio/apiFastify REST API server (embedded in CLI)
@evalstudio/webReact Web UI (embedded in CLI)
@evalstudio/postgresOptional PostgreSQL storage backend
@evalstudio/docsDocumentation site (you're here!)

Getting Started

Reference