Now in beta · Zero setup required

Test your LLM endpoints.
Zero setup.

Paste a URL, upload a CSV golden dataset, and get pass/fail results in seconds. No code. No infrastructure. No AWS knowledge required.

Get started free →See how it works

lynx — run #42 — completed

87.0%

score

87/100 passed

PASS

FAIL

#	INPUT	EXPECTED	ACTUAL	RESULT
001	Capital of France?	Paris	Paris	PASS
002	Who wrote Hamlet?	Shakespeare	William Shakespeare	PASS
003	What is 2 + 2?	4	The answer is five.	FAIL
004	Boiling point H₂O?	100°C	100 degrees Celsius	PASS

13 more rows not shown · judge: claude-haiku

The problem

Every eval tool makes you write code.
We handle 100% of execution.

Braintrust, Langfuse, DeepEval — powerful tools, but you need to write test execution code, host your own runner, and manage infrastructure. Lynx is different. Just paste your endpoint and we do the rest.

Other tools

✕ Write test execution code

✕ Host your own runner

✕ Manage cloud infrastructure

✕ Pay for your own LLM calls

Lynx

✓ Paste your endpoint URL

✓ Upload a CSV dataset

✓ Click Run Tests

✓ See pass/fail results

How it works

Up and running in minutes

Configure your endpoint

Paste your endpoint URL and an example request JSON. Lynx detects your schema and lets you pick which field is the user query.

POST https://your-agent.com/chat → {"message": "..."}

Upload your golden dataset

A simple CSV with two columns. One row per test case. That's all Lynx needs to run your entire evaluation suite.

input, expected_output

See results instantly

Lynx fires requests to your endpoint, extracts the response, and uses an LLM judge to score each output semantically — not just exact match.

87 / 100 passed · 87.0% · 13 failed

Features

Everything you need to test LLM agents

Built for developers shipping AI products in production.

⚖️

LLM Judge

Semantic scoring using Claude or GPT-4o mini. Partial matches, rephrased answers, and equivalents all score correctly.

⚡

Streaming Support

Full SSE streaming support for OpenAI, Anthropic, NDJSON, and custom event formats. Tested just like regular endpoints.

🔑

Auth Headers

Add custom request headers like Authorization or x-api-key. Encrypted before storage, never logged.

📊

Run History

Every run is stored with row-by-row results and judge reasoning. Track score trends over time.

⏱️

Live Progress

Watch results load in real time row by row. See exactly where your agent is failing as it happens.

🔒

Your Keys, Your Control

Use your own Anthropic or OpenAI key for judging, or use Lynx's. Keys are encrypted with Fernet at rest.

Pricing

Simple, usage-based pricing

Start for free. Scale as you grow.

Free

$0forever

✓ 3 test runs / month
✓ 50 rows per run
✓ Community support

Start free

Pro

$29/month

✓ Unlimited runs
✓ 500 rows per run
✓ Email support
✓ Run history

Get Pro

Team

$99/month

✓ Everything in Pro
✓ Multiple endpoints
✓ GitHub integration
✓ Priority support

Pay-per-run also available at $0.10 / run for occasional use.

Start testing your
LLM endpoint today.

No credit card required. Up and running in minutes.

Create free account →

Test your LLM endpoints.Zero setup.

Every eval tool makes you write code.We handle 100% of execution.