The customer
A B2B SaaS at Series A burning roughly $48K/month on LLM API spend across two AI features (a chat copilot and a summarisation pipeline). Both shipped fast in 2024 and were never tuned afterwards.
The task they submitted
Audit our LLM usage end-to-end. Find spend we can cut without losing quality. Build the eval suite that proves it.
Our approach
Day 1: traffic profiling — 80% of spend was on a single endpoint that didn't need GPT-4-class reasoning. Day 2: built a 420-question eval suite from 90 days of real customer logs, with automated grading. Day 3: switched the bulk path to a smaller open model behind a vLLM gateway, tuned the prompts, added prompt caching to the remaining frontier-model calls. Day 4: rolled out behind a 10% canary, validated against the eval set, then 100%.
The outcome
$48K → $19K monthly run-rate. Zero regressions on the eval suite. The customer used the savings to fund three more Boundev tasks across the next quarter.
“$48K to $19K a month, no quality regression. Paid for two years of Growth in the first month.”
