Paste a URL, upload a CSV golden dataset, and get pass/fail results in seconds. No code. No infrastructure. No AWS knowledge required.
87.0%
score
87
PASS
13
FAIL
| # | INPUT | EXPECTED | ACTUAL | RESULT |
|---|---|---|---|---|
| 001 | Capital of France? | Paris | Paris | PASS |
| 002 | Who wrote Hamlet? | Shakespeare | William Shakespeare | PASS |
| 003 | What is 2 + 2? | 4 | The answer is five. | FAIL |
| 004 | Boiling point H₂O? | 100°C | 100 degrees Celsius | PASS |
The problem
Braintrust, Langfuse, DeepEval — powerful tools, but you need to write test execution code, host your own runner, and manage infrastructure. Lynx is different. Just paste your endpoint and we do the rest.
Other tools
Lynx
How it works
Paste your endpoint URL and an example request JSON. Lynx detects your schema and lets you pick which field is the user query.
POST https://your-agent.com/chat → {"message": "..."}A simple CSV with two columns. One row per test case. That's all Lynx needs to run your entire evaluation suite.
input, expected_outputLynx fires requests to your endpoint, extracts the response, and uses an LLM judge to score each output semantically — not just exact match.
87 / 100 passed · 87.0% · 13 failedFeatures
Built for developers shipping AI products in production.
Semantic scoring using Claude or GPT-4o mini. Partial matches, rephrased answers, and equivalents all score correctly.
Full SSE streaming support for OpenAI, Anthropic, NDJSON, and custom event formats. Tested just like regular endpoints.
Add custom request headers like Authorization or x-api-key. Encrypted before storage, never logged.
Every run is stored with row-by-row results and judge reasoning. Track score trends over time.
Watch results load in real time row by row. See exactly where your agent is failing as it happens.
Use your own Anthropic or OpenAI key for judging, or use Lynx's. Keys are encrypted with Fernet at rest.
Pricing
Start for free. Scale as you grow.
Team
Pay-per-run also available at $0.10 / run for occasional use.
No credit card required. Up and running in minutes.