AWS Bedrock vs OpenAI API: the real cost for production
The per-token price you see on a comparison page is the smallest part of what you will actually pay. For a production SaaS feature, the real bill is set by your traffic shape, your latency targets, and how much idle capacity you are willing to reserve. This is a practitioner breakdown of where AWS Bedrock beats the OpenAI API on cost, where it does not, and the line items that never appear in either price table.
Short answer: at low-to-medium volume with bursty traffic, the OpenAI API is usually cheaper and simpler. At sustained high volume with predictable load, Bedrock provisioned throughput often wins. The model family you pick matters more than the platform.
The headline per-token numbers
As of mid-2026, a Claude Sonnet-class model on Bedrock runs roughly $3.00 per million input tokens and $15.00 per million output tokens. A GPT-4o-class model on the OpenAI API sits near $2.50 input and $10.00 output, with the newer flagship tier closer to $1.25 input and $5.00 output. The small, cheap tiers are where the real savings live: GPT-4o mini is about $0.15 input and $0.60 output, and Amazon Nova Pro on Bedrock lands roughly 68 percent below GPT-4o for many tasks.
Two things follow from those numbers. First, output tokens dominate the bill on chat and generation workloads, so the output price is the one to optimize. Second, the gap between a frontier model and a small model is 10x to 50x, which is far wider than the gap between platforms. If you are arguing about Bedrock versus OpenAI before you have right-sized the model, you are optimizing the wrong variable. We walk through that exercise in our writeup on cutting an LLM bill from $48k to $19k a month.
Where Bedrock actually wins
Bedrock's advantage is not the list price. It is the commitment-based pricing and the AWS-native plumbing.
Provisioned throughput at sustained load
If your traffic is predictable and high, Bedrock provisioned throughput reserves model capacity for a flat hourly rate and can land 30 to 40 percent below on-demand per-token pricing. The catch is that you pay for the reserved capacity whether or not requests arrive, so it only pays off above a steady utilization floor. Spiky or low traffic burns money on idle reservation.
Batch inference for non-real-time work
Both platforms discount asynchronous work by about 50 percent: Bedrock batch inference and the OpenAI Batch API. For anything that does not need a live response, such as nightly summarization, backfills, evals, or document enrichment, batch is the single largest lever on either platform. Route every non-interactive job through it.
Data residency and IAM
If your stack already lives in AWS, Bedrock inherits your VPC, IAM, and audit trail with no new vendor contract. For a regulated SaaS buyer, that governance fit can outweigh a few cents per million tokens.
Where the OpenAI API stays cheaper
At low-to-medium volume the OpenAI API tends to win on total cost because you pay only for what you use and there is no reserved capacity to keep warm. The pay-per-token model fits bursty B2B SaaS traffic, where a single enterprise customer can 10x your daily volume for one afternoon and then go quiet.
Prompt caching also moves the needle here. Reusing a cached system prompt and context across calls cuts input cost substantially on repetitive workloads, which is exactly the pattern most copilots produce. We cover the mechanics in how prompt caching cuts LLM cost. The point is that a well-cached OpenAI deployment can be cheaper than an under-utilized Bedrock reservation.
The costs that never show up in the price table
Platform choice changes your engineering cost, not just your inference cost.
Switching providers means re-tuning prompts, because a prompt optimized for GPT-4o rarely behaves identically on Claude or Nova. It means rebuilding your eval set against the new model. It means new SDKs, new error handling, new rate-limit behavior. Teams routinely underestimate this and treat the two APIs as drop-in replacements. They are not.
The durable fix is to stop hard-coding a single provider. Put a routing layer in front of inference so cheap requests go to a small model and only hard requests escalate to a frontier model, regardless of platform. That architecture is the highest-leverage cost decision most teams skip, and we describe it in using model routing to cut AI costs. With routing in place, the Bedrock-versus-OpenAI question becomes a per-route decision instead of a company-wide bet.
How to decide
Use a simple rule. If your monthly volume is low, bursty, or still finding product-market fit, start on the OpenAI API, lean on prompt caching, and revisit at scale. If your volume is high, steady, and you already run on AWS, model a Bedrock provisioned-throughput reservation against your real utilization and compare it to on-demand. In both cases, send every non-real-time job through batch, and right-size the model per route before you argue about platforms.
If you would rather not run that modeling exercise in-house, building the right inference architecture is the kind of scoped task our team ships. See what we build and our subscription pricing.
Frequently asked questions
Is Bedrock always cheaper than OpenAI?
No. Bedrock wins on cost mainly at high, steady volume through provisioned throughput. At low or bursty volume the OpenAI API's pure pay-per-token model is usually cheaper because there is no reserved capacity sitting idle.
What is the single biggest cost lever on either platform?
Right-sizing the model per task, then routing batchable work through the 50-percent batch discount. Both beat any platform-level price difference. A frontier model is 10x to 50x the cost of a small one, so moving easy requests to a cheaper tier saves far more than switching vendors.
Can I run OpenAI models through Bedrock?
Some OpenAI-family models are now offered on Bedrock, which lets AWS-centric teams keep one billing and governance surface. Pricing and availability change often, so confirm current rates on each platform's live pricing page before you commit, since these numbers move.
Does switching providers really cost engineering time?
Yes. Expect to re-tune prompts, rebuild your eval set, and adjust SDK and error handling. A prompt tuned for one model rarely performs identically on another, so budget real engineering hours rather than treating the swap as a config change.
Rather we just build it?
Book a free scoping call and we'll ship your production-safe AI feature this week.