Skip to content
Get started

Getting Started

Install Overmind, register your agent, and run your first optimization round in about 10 minutes.

This guide takes you from zero to an optimized agent. The whole flow is local: you run a CLI against your Python project and artifacts land under .overmind/.

Requirements

  • Python 3.10 or higher
  • uv (modern Python package manager)
  • API keys for at least one LLM provider (OpenAI, Anthropic)

Terminal window
pip install overmind

For local development:

Terminal window
git clone https://github.com/overmind-core/overmind
cd overmind
uv tool install -e .

Verify:

Terminal window
overmind --help

Prefer uv run? All commands below also work as uv run overmind <command> after uv sync.


From your agent’s project root:

Terminal window
cd your-agent-project/
overmind init

This creates .overmind/ and walks you through configuring API keys and default models. Settings are written to .overmind/.env. Safe to re-run.


Point Overmind at the Python function it should call for each test case.

Terminal window
overmind agent register my-agent agents.my_agent:run

The module path (agents.my_agent) is resolved relative to your project root; run is the function name.

Your function receives an input dict and must return a dict:

def run(input_data: dict) -> dict:
# your agent logic
return {"response": result}

Framework-based agents — Google ADK, LangChain, CrewAI, etc. — often don’t expose a plain callable. Overmind detects this during registration and offers to auto-generate an entrypoint wrapper for you. It also collects any additional API keys your agent needs at that point.

Other registry commands:

CommandDescription
overmind agent listList all registered agents
overmind agent show <name>Show registration details and pipeline status
overmind agent update <name> <mod:fn>Update the entrypoint after renaming
overmind agent remove <name>Remove from registry
overmind agent validate <name> --data <path>Run the first test case end-to-end

Section titled “4. Validate the entrypoint (optional but recommended)”
Terminal window
overmind agent validate my-agent --data tests/sample.json

Runs the first case from your dataset through the agent so you catch import/wrapper/API-key issues before investing time in full setup.


Terminal window
overmind setup my-agent

This is an interactive flow that prepares everything the optimizer needs:

PhaseWhat happens
Agent analysisAn LLM reads your code to detect input/output schema, tools, and decision logic.
Policy generationWithout --policy, a policy is inferred from the code. With --policy <path>, your document is analyzed against the code and refinements are suggested. Either way you can refine conversationally before approving.
DatasetOvermind uses your existing test data if found, or generates diverse synthetic cases based on the policy and agent description.
Evaluation criteriaScoring rules are proposed for each output field, with policy-aware stricter checks where relevant.

Variants:

Terminal window
# Bring an existing policy document
overmind setup my-agent --policy docs/my_policy.md
# Non-interactive (for CI / scripts) — requires ANALYZER_MODEL and
# SYNTHETIC_DATAGEN_MODEL in .overmind/.env
overmind setup my-agent --fast

Setup produces two artifacts in .overmind/agents/<name>/setup_spec/:

  • eval_spec.json — machine-readable evaluation spec (used at runtime)
  • policies.md — human-readable policy document you maintain

Both are editable after generation.


Terminal window
overmind optimize my-agent

This kicks off the iterative optimization loop. Each iteration:

  1. Runs your agent on every training case and collects traces + outputs
  2. Scores outputs against the eval spec (0–100 across multiple dimensions)
  3. Diagnoses failure patterns using the analyzer model
  4. Generates N candidate fixes (best-of-N), each biased toward a different area — tool descriptions, core logic, input handling, system prompt
  5. Validates candidates (syntax, interface, smoke test on a subset)
  6. Evaluates surviving candidates on the full training set
  7. Accepts or reverts — the best candidate is kept only if it improves the global best without regressing too many individual cases

Interactive config lets you tune analyzer model, LLM-as-Judge, iterations, candidates per iteration, parallel workers, train/holdout split, regression thresholds, and early stopping. For CI/scripted use:

Terminal window
overmind optimize my-agent --fast

See the full reference in the Overmind guide.


Artifacts land in .overmind/agents/<name>/:

PathDescription
setup_spec/policies.mdAgent policy (human-editable)
setup_spec/eval_spec.jsonEvaluation spec with embedded policy
setup_spec/dataset.jsonTest dataset used for optimization
experiments/best_agent.pyHighest-scoring single-file agent
experiments/best_agent/All optimized files (multi-file agents)
experiments/results.tsvScore history per iteration
experiments/traces/Per-run JSON traces
experiments/report.mdSummary with scores, improvements, and diffs

You can edit policies.md or eval_spec.json and re-run overmind optimize to continue improving from where you left off.


Data files are JSON arrays where each element has an input and expected_output:

[
{
"input": { "company_name": "Acme Corp", "inquiry": "Need enterprise pricing" },
"expected_output": { "category": "hot", "lead_score": 85 }
}
]

Place them under data/ in your agent directory and Overmind will detect them during setup. A test set of 10–50 diverse cases is usually enough. Without data, Overmind generates realistic synthetic cases from the policy and agent description.


If you also want Overmind traces from a deployed application — independent of the optimizer — install the Python or JS/TS tracing SDK and call init() once at startup:

Terminal window
pip install overmind
import overmind
overmind.init(service_name="my-service", environment="production")

See the Python SDK reference.

Tracing and the optimizer are independent — you can use either or both.