Now in beta · Zero setup required

Test your LLM endpoints.
Zero setup.

Paste a URL, upload a CSV golden dataset, and get pass/fail results in seconds. No code. No infrastructure. No AWS knowledge required.

lynx — run #42 — completed

87.0%

score

87/100 passed

87

PASS

13

FAIL

#INPUTEXPECTEDACTUALRESULT
001Capital of France?ParisParisPASS
002Who wrote Hamlet?ShakespeareWilliam ShakespearePASS
003What is 2 + 2?4The answer is five.FAIL
004Boiling point H₂O?100°C100 degrees CelsiusPASS
13 more rows not shown · judge: claude-haiku

The problem

Every eval tool makes you write code.
We handle 100% of execution.

Braintrust, Langfuse, DeepEval — powerful tools, but you need to write test execution code, host your own runner, and manage infrastructure. Lynx is different. Just paste your endpoint and we do the rest.

Other tools

Write test execution code
Host your own runner
Manage cloud infrastructure
Pay for your own LLM calls

Lynx

Paste your endpoint URL
Upload a CSV dataset
Click Run Tests
See pass/fail results

How it works

Up and running in minutes

01

Configure your endpoint

Paste your endpoint URL and an example request JSON. Lynx detects your schema and lets you pick which field is the user query.

POST https://your-agent.com/chat → {"message": "..."}
02

Upload your golden dataset

A simple CSV with two columns. One row per test case. That's all Lynx needs to run your entire evaluation suite.

input, expected_output
03

See results instantly

Lynx fires requests to your endpoint, extracts the response, and uses an LLM judge to score each output semantically — not just exact match.

87 / 100 passed · 87.0% · 13 failed

Features

Everything you need to test LLM agents

Built for developers shipping AI products in production.

⚖️

LLM Judge

Semantic scoring using Claude or GPT-4o mini. Partial matches, rephrased answers, and equivalents all score correctly.

Streaming Support

Full SSE streaming support for OpenAI, Anthropic, NDJSON, and custom event formats. Tested just like regular endpoints.

🔑

Auth Headers

Add custom request headers like Authorization or x-api-key. Encrypted before storage, never logged.

📊

Run History

Every run is stored with row-by-row results and judge reasoning. Track score trends over time.

⏱️

Live Progress

Watch results load in real time row by row. See exactly where your agent is failing as it happens.

🔒

Your Keys, Your Control

Use your own Anthropic or OpenAI key for judging, or use Lynx's. Keys are encrypted with Fernet at rest.

Pricing

Simple, usage-based pricing

Start for free. Scale as you grow.

Free

$0forever
  • 3 test runs / month
  • 50 rows per run
  • Community support
Start free

Pro

$29/month
  • Unlimited runs
  • 500 rows per run
  • Email support
  • Run history
Get Pro

Team

$99/month
  • Everything in Pro
  • Multiple endpoints
  • GitHub integration
  • Priority support
Contact us

Pay-per-run also available at $0.10 / run for occasional use.

Start testing your
LLM endpoint today.

No credit card required. Up and running in minutes.

Create free account →