Skip to main content
The Simforge Python SDK captures your AI function calls to automatically generate evaluations. Re-run your prompts with different models, parameters, and inputs to iterate faster.

Installation

# pip
pip install simforge-py

# Poetry
poetry add simforge-py

# uv
uv add simforge-py

Quick Start

import os
from simforge import Simforge

simforge = Simforge(api_key=os.environ["SIMFORGE_API_KEY"])
Copy this prompt into your coding agent (tested with Cursor and Claude Code using Sonnet 4.5):
Modify existing Python code to add Simforge tracing.
Do NOT browse or web search. Use ONLY the API described below.

Simforge Python SDK (authoritative excerpt):
- Install: `pip install simforge-py` or `poetry add simforge-py` or `uv add simforge-py`
- Init:
  import os
  from simforge import Simforge
  simforge = Simforge(api_key=os.environ["SIMFORGE_API_KEY"])
- Instrumentation (ONLY allowed form - use get_function):
  # Declare trace function key once
  my_service = simforge.get_function("<trace_function_key>")

  # Decorate methods with span
  @my_service.span()
  def method_name(): ...

  # Or with options:
  @my_service.span(name="DisplayName", type="function")
  def method_name(): ...

  # Span types: "llm", "agent", "function", "guardrail", "handoff", "custom"
- Decorator form ONLY; must be placed immediately ABOVE the `def` it instruments.
- DO NOT use context managers or manual span creation.
- DO NOT extract helper methods.

STRICT RULE:
You MUST ask me which method to instrument before making ANY code changes.
Do NOT choose a method yourself.

Task:
1) Ensure simforge-py is installed and initialization exists.
2) Ask me which EXISTING method should be instrumented with Simforge.
3) After I confirm the method name:
   - Create a function wrapper with `simforge.get_function("<trace_function_key>")`
   - Add `@my_service.span()` directly ABOVE that method's `def`
   - Ensure the simforge client is initialized and accessible
4) Do not change method signature, behavior, or return value. Minimal diff.

Output:
- First: your question asking which method to instrument
- After confirmation: minimal diffs for dependencies, initialization, and the method change

Basic Configuration

Simforge(api_key: str)

# Disable tracing (functions still execute, but no spans are sent)
Simforge(api_key: str, enabled: bool = True)

Tracing

Declare the trace function key once and link multiple spans together:
order_service = simforge.get_function("order-processing")

@order_service.span()
def process_order(order_id: str) -> dict:
    return {"order_id": order_id}

@order_service.span()
def validate_order(order_id: str) -> dict:
    return {"valid": True}

Using @simforge.span() Directly

For a single span without linking to a function group:
@simforge.span("one-off-operation")
def standalone_task() -> str:
    return "done"

Automatic Nesting

Spans nest automatically based on call stack:
@simforge.span("outer", type="agent")
def outer():
    inner()  # Becomes a child of "outer"

@simforge.span("inner", type="function")
def inner():
    pass

Span Options

Parameters:
  • trace_function_key (required): String identifier for grouping spans
  • name (optional): Display name. Defaults to function name, then trace function key
  • type (optional): Span type. Defaults to "custom"
  • metadata (optional): Dictionary (dict[str, Any]) attached to the span for custom context (e.g. user ID, region, request ID)
Span Types:
SpanType = Literal[
    "llm",        # LLM calls
    "agent",      # Agent workflows
    "function",   # Function calls
    "guardrail",  # Safety checks
    "handoff",    # Human handoffs
    "custom"      # Default
]
Examples:
# Function name is automatically captured as span name
@simforge.span("order-processing")
def process_order(order_id: str) -> dict:
    return {"order_id": order_id}
# Span name: "process_order"

# Override with name option
@simforge.span("order-processing", name="OrderProcessor")
def process_order(order_id: str) -> dict:
    return {"order_id": order_id}
# Span name: "OrderProcessor"

# Set span type
@simforge.span("safety-check", type="guardrail")
def check_content(content: str) -> dict:
    return {"safe": True}

# With metadata (definition-time)
@simforge.span("order-processing", type="function", metadata={"user_id": "u-123", "region": "us-east"})
def process_order(order_id: str) -> dict:
    return {"order_id": order_id}

Runtime Metadata

Use get_current_span() to get a handle to the active span, then call .set_metadata() to attach metadata from inside a traced function — useful when metadata depends on runtime values like request IDs, computed scores, or dynamic context:
from simforge import get_current_span

@simforge.span("order-processing", type="function")
def process_order(order_id: str) -> dict:
    user_id = get_current_user()
    get_current_span().set_metadata({"user_id": user_id, "order_id": order_id})
    return {"order_id": order_id, "status": "completed"}
Runtime metadata merges with definition-time metadata. On conflict, runtime values win:
# Definition-time: {"user_id": "u-123", "region": "us-east"}
# Runtime call: get_current_span().set_metadata({"region": "eu-west", "request_id": "req-789"})
# Result: {"user_id": "u-123", "region": "eu-west", "request_id": "req-789"}

Error Handling

Errors are captured in the span and re-raised:
@simforge.span("risky-service")
def risky():
    raise ValueError("error")

try:
    risky()
except ValueError:
    pass
# Span records error and timing

Flushing Traces

from simforge import flush_traces

flush_traces(timeout=30.0)  # Default: 30s
Traces flush automatically on process exit via atexit hook.

OpenAI Agents SDK

Attach a trace processor to capture agent runs:
pip install simforge-py[openai-tracing]
from agents import set_trace_processors

processor = simforge.get_openai_tracing_processor()
set_trace_processors([processor])

Replay

Replay historical traces through a function and create a test run with comparison data. This is useful for testing changes to your functions against real production inputs.
@simforge.span("my-function-key")
def my_function(text: str) -> dict:
    return {"processed": text.upper()}

result = simforge.replay(my_function, limit=5)

# Or replay specific traces by ID
result = simforge.replay(my_function, trace_ids=["trace-abc", "trace-def"])

print(f"Test Run: {result['test_run_url']}")
for item in result["items"]:
    print(f"  Input: {item['input']}")
    print(f"  Result: {item['result']}")
    print(f"  Original: {item['original_output']}")
Parameters:
  • fn (required): The function to replay (must be decorated with @span)
  • limit (optional): Maximum number of traces to replay. Default: 5
  • trace_ids (optional): List of trace IDs to filter which traces are replayed
Returns:
{
    "items": [
        {
            "input": [...],           # The inputs passed to fn
            "result": ...,            # What fn returned
            "original_output": ...,   # What the original trace produced
            "error": None | str       # Error message if fn raised
        }
    ],
    "test_run_id": "...",
    "test_run_url": "..."
}
Notes:
  • The function must be decorated with @span — the trace function key is read from the decorator
  • The function can be sync or async (async functions are detected and run automatically)
  • If the function raises an error for one input, replay continues with the remaining inputs
  • Each replay creates a test run visible in the Simforge dashboard
  • Works through nested decorators (e.g. @retry, @cache) — walks the __wrapped__ chain to find @span

Native Functions

Simforge’s native functions improve prompt tuning efficacy. The auto-tuning engine has full access to prompts—unlike other Agent SDKs that nest user instructions inside system prompts, making prompts inaccessible to tracing.
result = simforge.call("ExtractName", text="My name is John Doe")
# Returns Pydantic model or primitive type

Advanced Configuration

Simforge(
    api_key: str,                    # Required
    service_url: str | None = None,  # Default: https://simforge.goharvest.ai
    env_vars: dict[str, str] | None = None,  # For local function execution
    execute_locally: bool = True,    # Execute functions locally or on server
    enabled: bool = True             # Enable/disable tracing
)
  • env_vars: Pass LLM provider API keys for local execution (e.g., {"OPENAI_API_KEY": "..."})
  • execute_locally: When True (default), fetches prompts from Simforge and executes locally. When False, executes on Simforge servers.
  • enabled: When False, all tracing is disabled. Decorated functions still execute normally but no spans are sent.

Sim Runners

Coming soon.