Skip to main content

Overview

Test runs allow you to validate your LLM functions against a set of test cases. This helps ensure:
  • Functions produce expected outputs
  • Changes don’t break existing behavior
  • Quality is maintained across versions

Creating a Test Run

  1. Navigate to a function
  2. Click the Test Runs tab
  3. Click New Test Run
  4. Configure the test run settings
  5. Click Run

Input Sets

Input sets define the test cases for your function:

Creating an Input Set

  1. Go to a function’s Test Runs tab
  2. Click Input Sets
  3. Click Create Input Set
  4. Add test cases as JSON

Input Set Format

[
  {
    "text": "My name is John Doe"
  },
  {
    "text": "Jane Smith is a software engineer"
  },
  {
    "text": "Dr. Robert Johnson, PhD"
  }
]

Importing Input Sets

Import input sets from:
  • CSV files: Each row becomes a test case
  • JSON files: Array of input objects
  • Existing traces: Use real inputs from production

Success Criteria

Define what constitutes a successful test:

Built-in Criteria

CriteriaDescription
No ErrorsThe function completes without errors
Valid OutputThe output matches the expected schema
Contains FieldA specific field is present in the output

Custom Criteria

Write custom success criteria using JavaScript:
// Check that firstName is not empty
output.firstName && output.firstName.length > 0

// Check that confidence is above threshold
output.confidence > 0.8

// Check that output matches expected pattern
/^[A-Z][a-z]+$/.test(output.firstName)

Running Tests

Manual Test Runs

  1. Select an input set
  2. Configure success criteria
  3. Click Run
  4. Wait for results

Comparing Versions

Compare outputs across function versions:
  1. Create a test run
  2. Select multiple versions to test
  3. Review side-by-side results
  4. Identify regressions or improvements

Test Results

Result Summary

StatusDescription
PassedAll success criteria met
FailedOne or more criteria not met
ErrorFunction execution failed
PendingTest is still running

Result Details

For each test case, view:
  • Input: The test input
  • Output: The function output
  • Expected: Expected output (if defined)
  • Criteria Results: Which criteria passed/failed
  • Execution Time: How long the test took

Aggregate Metrics

  • Pass Rate: Percentage of passing tests
  • Average Latency: Mean execution time
  • Token Usage: Total tokens consumed

Scheduled Tests

Run tests automatically on a schedule:
  1. Go to Test Runs > Schedules
  2. Click Create Schedule
  3. Configure:
    • Function: Which function to test
    • Input Set: Which test cases to use
    • Frequency: How often to run (hourly, daily, weekly)
    • Notifications: Alert on failures

Test Run History

View past test runs:
  1. Go to a function’s Test Runs tab
  2. Browse the test run history
  3. Click on a run to view details
  4. Compare runs over time

Best Practices

  • Diverse Test Cases: Include edge cases and typical inputs
  • Version Testing: Test before publishing new versions
  • Regular Runs: Schedule tests to catch regressions
  • Clear Criteria: Define specific, measurable success criteria
  • Review Failures: Investigate and fix failing tests promptly