Skip to main content
The Simforge Python SDK captures your AI function calls to automatically generate evaluations. Re-run your prompts with different models, parameters, and inputs to iterate faster.

Installation

# pip
pip install simforge-py

# Poetry
poetry add simforge-py

# uv
uv add simforge-py

Quick Start

import os
from simforge import Simforge

simforge = Simforge(api_key=os.environ["SIMFORGE_API_KEY"])
Need an API key? Get one from the Simforge dashboard or see the API Keys guide for detailed setup instructions.
Copy this prompt into your coding agent (tested with Cursor and Claude Code using Sonnet 4.5):
Modify existing Python code to add Simforge tracing.
Do NOT browse or web search. Use ONLY the API described below.

Simforge Python SDK (authoritative excerpt):
- Install: `pip install simforge-py` or `poetry add simforge-py` or `uv add simforge-py`
- Init:
  import os
  from simforge import Simforge
  simforge = Simforge(api_key=os.environ["SIMFORGE_API_KEY"])
- Instrumentation (ONLY allowed form - use get_function):
  # Declare trace function key once
  my_service = simforge.get_function("<trace_function_key>")

  # Decorate methods with span
  @my_service.span()
  def method_name(): ...

  # Or with options:
  @my_service.span(name="DisplayName", type="function")
  def method_name(): ...

  # Span types: "llm", "agent", "function", "guardrail", "handoff", "custom"
- Decorator form ONLY; must be placed immediately ABOVE the `def` it instruments.
- DO NOT use context managers or manual span creation.
- DO NOT extract helper methods.

Task:
1) Ensure simforge-py is installed and initialization exists.
2) Read the codebase and identify ALL AI workflows (LLM calls, agent runs, AI-driven decisions).
3) Present me with a numbered list of workflows you found. For each, describe:
   - What it does
   - Why it's worth instrumenting — what visibility tracing gives you into each step
4) After I choose which workflow(s) to instrument:
   - Create a function wrapper with `simforge.get_function("<trace_function_key>")`
   - Add `@my_service.span()` directly ABOVE each method's `def`
   - Instrument intermediate steps (not just the final output) so each trace has enough context to diagnose issues
   - Ensure the simforge client is initialized and accessible
5) Do not change method signature, behavior, or return value. Minimal diff.

Output:
- First: your numbered list of workflows with why each is worth instrumenting
- After my selection: minimal diffs for dependencies, initialization, and the method changes

Basic Configuration

Simforge(api_key: str)

# Disable tracing (functions still execute, but no spans are sent)
Simforge(api_key: str, enabled: bool = True)
Missing API key doesn’t crash. If the API key is missing, empty, or whitespace-only, the SDK automatically disables tracing and logs a warning. All decorated functions still execute normally — no spans are sent, no errors are thrown. You don’t need any conditional logic around the API key.

Tracing

Declare the trace function key once and link multiple spans together:
order_service = simforge.get_function("order-processing")

@order_service.span()
def process_order(order_id: str) -> dict:
    return {"order_id": order_id}

@order_service.span()
def validate_order(order_id: str) -> dict:
    return {"valid": True}

Multi-File Projects

For projects with instrumented functions spread across multiple files, create a dedicated file that initializes Simforge and exports the function. Import it wherever you need to instrument.
# lib/simforge_client.py — single source of truth
import os
from simforge import Simforge
simforge = Simforge(api_key=os.environ["SIMFORGE_API_KEY"])
order_service = simforge.get_function("order-processing")
# services/process_order.py
from lib.simforge_client import order_service

@order_service.span()
def process_order(order_id: str) -> dict:
    return {"order_id": order_id}
# services/validate_order.py
from lib.simforge_client import order_service

@order_service.span()
def validate_order(order_id: str) -> dict:
    return {"valid": True}
Spans from different files are automatically linked as parent-child when one decorated function calls another.

Using @simforge.span() Directly

For a single span without linking to a function group:
@simforge.span("one-off-operation")
def standalone_task() -> str:
    return "done"

Automatic Nesting

Spans nest automatically based on call stack:
@simforge.span("outer", type="agent")
def outer():
    inner()  # Becomes a child of "outer"

@simforge.span("inner", type="function")
def inner():
    pass

Span Options

Parameters:
  • trace_function_key (required): String identifier for grouping spans
  • name (optional): Display name. Defaults to function name, then trace function key
  • type (optional): Span type. Defaults to "custom"
Span Types:
SpanType = Literal[
    "llm",        # LLM calls
    "agent",      # Agent workflows
    "function",   # Function calls
    "guardrail",  # Safety checks
    "handoff",    # Human handoffs
    "custom"      # Default
]
Examples:
# Function name is automatically captured as span name
@simforge.span("order-processing")
def process_order(order_id: str) -> dict:
    return {"order_id": order_id}
# Span name: "process_order"

# Override with name option
@simforge.span("order-processing", name="OrderProcessor")
def process_order(order_id: str) -> dict:
    return {"order_id": order_id}
# Span name: "OrderProcessor"

# Set span type
@simforge.span("safety-check", type="guardrail")
def check_content(content: str) -> dict:
    return {"safe": True}

Span Context

Use get_current_span() to get a handle to the active span, then call .add_context() to attach contextual key-value pairs from inside a traced function — useful for runtime values like request IDs, computed scores, or dynamic context:
from simforge import get_current_span

@simforge.span("order-processing", type="function")
def process_order(order_id: str) -> dict:
    user_id = get_current_user()
    get_current_span().add_context({"user_id": user_id, "order_id": order_id})
    return {"order_id": order_id, "status": "completed"}
Each add_context call pushes the entire dictionary as one entry. Multiple calls accumulate entries:
get_current_span().add_context({"user_id": "u-123"})
get_current_span().add_context({"request_id": "req-789"})
# Result: contexts: [{"user_id": "u-123"}, {"request_id": "req-789"}]

Span Prompt

Use get_current_span() to set the prompt string on the current span. This is stored in span_data.prompt and is useful for capturing the exact prompt text sent to an LLM:
from simforge import get_current_span

@simforge.span("classification", type="llm")
def classify_text(text: str) -> str:
    prompt = f"Classify the following text: {text}"
    get_current_span().set_prompt(prompt)
    result = llm.complete(prompt)
    return result
The last set_prompt call wins — it overwrites any previously set prompt on the span. Calling set_prompt outside a span context is a no-op (it never crashes).

BAML Auto-Instrumentation

If you use BAML for your LLM calls, wrap_baml automatically captures the rendered prompt and LLM metadata (model, provider, token counts, duration) on the current span — no manual set_prompt or add_context calls needed.
pip install baml-py
from baml_client import b

# Pass your BAML client to the constructor
simforge = Simforge(api_key=os.environ["SIMFORGE_API_KEY"], baml_client=b)

# Wrap a BAML method inside a span — prompt and metadata are captured automatically
@simforge.span("classify", type="llm")
async def classify(text: str):
    return await simforge.wrap_baml(b.ClassifyText)(text=text)

result = await classify("Hello world")
wrap_baml works by creating a BAML Collector, running the method through a tracked client, then extracting:
  • Promptset_prompt() with the rendered messages (system + user)
  • Contextadd_context() with { model, provider, inputTokens, outputTokens, durationMs }
If baml-py is not installed, the method is called directly without instrumentation.

Trace Context

Use get_current_trace() to set context that applies to the entire trace (all spans within a single execution). This is useful for grouping traces by session or attaching trace-level metadata:
from simforge import get_current_trace

@simforge.span("order-processing", type="function")
def process_order(order_id: str) -> dict:
    trace = get_current_trace()

    # Set session ID (stored as database column, filterable in dashboard)
    trace.set_session_id("session-123")

    # Set trace metadata (stored in raw trace data)
    trace.set_metadata({"region": "us-west-2", "environment": "production"})

    # Add context entries (stored as key-value pairs, accumulates across calls)
    trace.add_context({"workflow": "checkout-flow", "batch_id": "batch-2024-01"})

    return {"order_id": order_id, "status": "completed"}
  • set_session_id(id) — Groups traces by user session. Stored as a database column for efficient filtering.
  • set_metadata(dict) — Arbitrary key-value metadata on the trace. Merges with existing metadata.
  • add_context(dict) — Key-value context entries. Accumulates across multiple calls.

Error Handling

Errors are captured in the span and re-raised:
@simforge.span("risky-service")
def risky():
    raise ValueError("error")

try:
    risky()
except ValueError:
    pass
# Span records error and timing

Flushing Traces

from simforge import flush_traces

flush_traces(timeout=30.0)  # Default: 30s
Traces flush automatically on process exit via atexit hook.

OpenAI Agents SDK

Attach a trace processor to capture agent runs:
pip install simforge-py[openai-tracing]
from agents import set_trace_processors

processor = simforge.get_openai_tracing_processor()
set_trace_processors([processor])

Replay

Replay historical traces through a function and create a test run with comparison data. This is useful for testing changes to your functions against real production inputs.
@simforge.span("my-function-key")
def my_function(text: str) -> dict:
    return {"processed": text.upper()}

result = simforge.replay(my_function, limit=5)

# Or replay specific traces by ID
result = simforge.replay(my_function, trace_ids=["trace-abc", "trace-def"])

print(f"Test Run: {result['test_run_url']}")
for item in result["items"]:
    print(f"  Input: {item['input']}")
    print(f"  Result: {item['result']}")
    print(f"  Original: {item['original_output']}")
Parameters:
  • fn (required): The function to replay (must be decorated with @span)
  • limit (optional): Maximum number of traces to replay. Default: 5
  • trace_ids (optional): List of trace IDs to filter which traces are replayed
Returns:
{
    "items": [
        {
            "input": [...],           # The inputs passed to fn
            "result": ...,            # What fn returned
            "original_output": ...,   # What the original trace produced
            "error": None | str       # Error message if fn raised
        }
    ],
    "test_run_id": "...",
    "test_run_url": "..."
}
Notes:
  • The function must be decorated with @span — the trace function key is read from the decorator
  • The function can be sync or async (async functions are detected and run automatically)
  • If the function raises an error for one input, replay continues with the remaining inputs
  • Each replay creates a test run visible in the Simforge dashboard
  • Works through nested decorators (e.g. @retry, @cache) — walks the __wrapped__ chain to find @span

Native Functions

Simforge’s native functions improve prompt tuning efficacy. The auto-tuning engine has full access to prompts—unlike other Agent SDKs that nest user instructions inside system prompts, making prompts inaccessible to tracing.
result = simforge.call("ExtractName", text="My name is John Doe")
# Returns Pydantic model or primitive type

Advanced Configuration

Simforge(
    api_key: str,                    # Required
    service_url: str | None = None,  # Default: https://simforge.goharvest.ai
    env_vars: dict[str, str] | None = None,  # For local function execution
    enabled: bool = True,            # Enable/disable tracing
    baml_client: Any = None          # Generated BAML client (for wrap_baml)
)
  • env_vars: Pass LLM provider API keys for local execution (e.g., {"OPENAI_API_KEY": "..."})
  • enabled: When False, all tracing is disabled. Decorated functions still execute normally but no spans are sent.
  • baml_client: The generated BAML client instance (e.g., b from baml_client). Used by wrap_baml() when no explicit client is passed at the call site.

Sim Runners

Coming soon.