Overview
Test runs allow you to validate your LLM functions against a set of test cases. This helps ensure:- Functions produce expected outputs
- Changes don’t break existing behavior
- Quality is maintained across versions
Creating a Test Run
- Navigate to a function
- Click the Test Runs tab
- Click New Test Run
- Configure the test run settings
- Click Run
Input Sets
Input sets define the test cases for your function:Creating an Input Set
- Go to a function’s Test Runs tab
- Click Input Sets
- Click Create Input Set
- Add test cases as JSON
Input Set Format
Importing Input Sets
Import input sets from:- CSV files: Each row becomes a test case
- JSON files: Array of input objects
- Existing traces: Use real inputs from production
Success Criteria
Define what constitutes a successful test:Built-in Criteria
| Criteria | Description |
|---|---|
| No Errors | The function completes without errors |
| Valid Output | The output matches the expected schema |
| Contains Field | A specific field is present in the output |
Custom Criteria
Write custom success criteria using JavaScript:Running Tests
Manual Test Runs
- Select an input set
- Configure success criteria
- Click Run
- Wait for results
Comparing Versions
Compare outputs across function versions:- Create a test run
- Select multiple versions to test
- Review side-by-side results
- Identify regressions or improvements
Test Results
Result Summary
| Status | Description |
|---|---|
| Passed | All success criteria met |
| Failed | One or more criteria not met |
| Error | Function execution failed |
| Pending | Test is still running |
Result Details
For each test case, view:- Input: The test input
- Output: The function output
- Expected: Expected output (if defined)
- Criteria Results: Which criteria passed/failed
- Execution Time: How long the test took
Aggregate Metrics
- Pass Rate: Percentage of passing tests
- Average Latency: Mean execution time
- Token Usage: Total tokens consumed
Scheduled Tests
Run tests automatically on a schedule:- Go to Test Runs > Schedules
- Click Create Schedule
- Configure:
- Function: Which function to test
- Input Set: Which test cases to use
- Frequency: How often to run (hourly, daily, weekly)
- Notifications: Alert on failures
Test Run History
View past test runs:- Go to a function’s Test Runs tab
- Browse the test run history
- Click on a run to view details
- Compare runs over time
Best Practices
- Diverse Test Cases: Include edge cases and typical inputs
- Version Testing: Test before publishing new versions
- Regular Runs: Schedule tests to catch regressions
- Clear Criteria: Define specific, measurable success criteria
- Review Failures: Investigate and fix failing tests promptly