Cost playbook

How to Add Claude to Your MVP Without Burning Your Runway

Short answer

Claude is cheap if you architect for cost from day one. Use Haiku for high-volume simple tasks, Sonnet for the workflow's hardest step, Opus only for genuinely difficult problems. Enable prompt caching - it cuts cost 70-90% on most MVP usage. The first-month bill for a 500-user MVP with AI features is typically under $100, not $1,000.

Published April 29, 2026 · Last updated April 29, 2026

The cost panic, defused

Founders open the Anthropic pricing page, see "$15 per million output tokens" for Opus, multiply by their imagined usage, and conclude AI is too expensive. The math is wrong because nobody actually uses Opus for everything, and "per million tokens" is way more usage than an MVP generates.

Real numbers from real MVPs I've shipped: a 500-user app where every user runs 5 AI tasks per day costs roughly $30-$80/month total in API fees. Not per user. Total. The variance comes from prompt length and which model you use - choices you control.

If you're seeing 10x that number in your modeling, you're either modeling Opus for everything or skipping caching. Both are fixable.

Pattern 1: Pick the cheapest model that works

Anthropic's lineup is roughly: Haiku (fastest, cheapest), Sonnet (balanced, the default for most production), Opus (most capable, most expensive).

Use Haiku for: Classification, formatting, simple extraction, summarizing short text, routing user input to the right downstream prompt, anything where the task is bounded. Haiku is dramatically cheaper than Sonnet and faster - often the right pick for >50% of an MVP's AI calls.

Use Sonnet for: The user-facing core of the workflow. The thing the user is paying for. Long-context reasoning, multi-step instructions, code, structured output that has to be right.

Use Opus for: Genuinely hard reasoning. Multi-step planning. Complex code refactors. Tasks where Sonnet struggles. For most MVPs this is <5% of calls. Don't default to Opus.

Founders who use Sonnet for everything pay 5-15x what they need to. Founders who use Opus for everything are setting fire to their runway.

Pattern 2: Enable prompt caching (the 70-90% cost cut)

Most AI calls in an MVP have a static front-half (system prompt, instructions, examples, document context) and a dynamic tail (the user's question). Caching the static front-half means you pay for it once instead of every call.

Anthropic's prompt caching is a one-line addition to the API request. The cached portion costs roughly 10% of the normal input price on cache hits.

Real-world impact: An app with a 5,000-token system prompt and 200-token user questions, called 1,000 times a day, costs ~$15/day without caching and ~$2/day with caching. That's the difference between a $450/month AI bill and a $60/month bill.

Founders who skip caching and complain about API costs are leaving the cheapest optimization in the world untouched. Implement this before you implement anything else.

Pattern 3: Don't make the model do things code can do

Bad: Asking Claude to extract a date from a string, or split text on commas, or count words. These are 3 lines of code. Burning tokens on them is wasteful.

Good: Use code for the deterministic plumbing (parsing, validation, retries, splitting work into chunks). Use Claude only for the part that requires actual reasoning or natural-language understanding.

A common MVP mistake is wrapping everything in a single giant Claude prompt because it feels easier. The result is slow, expensive, and unreliable. Split the workflow: code does the easy steps, Claude does the hard ones.

If you find yourself prompt-engineering around something a regex would do, stop and write the regex.

Pattern 4: Stream when the user is waiting; batch when they're not

Streaming (sending tokens to the user as they're generated) makes UX feel 5x faster even when total time is identical. Use streaming for any chat or generation the user is staring at.

Batching (sending many requests at once, processing async) is for tasks where the user doesn't have to wait - overnight processing, bulk operations, generating reports. Anthropic's Batch API is significantly cheaper than synchronous calls for this.

Mixing them: a user uploads 50 documents to summarize. Send the request, return immediately, process via batch in the background, email when done. The user gets results in 30 minutes instead of staring at a spinner for 5 - and your cost is half.

Pattern 5: Cap usage per user, then charge for more

Free tier with hard limit. Paid tier with higher limit. Pay-as-you-go for power users. This is standard SaaS pricing applied to AI.

Why this matters: AI cost scales with usage. SaaS revenue scales with users. If your free tier lets a single user generate $50/month in API calls, the math doesn't work no matter how good the product is.

Practical setup: Track tokens per user per month. Cap free users at a level where the API cost is <20% of what a converting user would pay. When they hit the cap, show a paywall that mentions exactly what they used and what unlocks.

Founders who skip the cap end up either bleeding money or shutting off the AI feature in panic. Either is worse than charging from day one.

A realistic first-month bill

Hypothetical MVP: doc summarizer SaaS, 500 signups in month 1, ~30% activation, ~50 active users running 4 summaries/day average. Each summary is a 4,000-token doc + 600-token output.

Without optimization: Sonnet for everything, no caching, full doc re-sent each call. Roughly $180-$220/month.

With optimization: Haiku for the routing/triage step, Sonnet for the actual summary, prompt cache enabled on the system prompt and instructions, code-side dedup of repeated docs. Roughly $35-$60/month.

What changed: Same product, same users, same output quality. Different cost structure. The optimizations took half a day to ship.

If your MVP cost projections show a $1,000+/month AI bill on a few hundred users, the issue is almost always architectural, not pricing. Audit the patterns above before assuming AI is too expensive.

When to upgrade past Haiku/Sonnet

Stay on Haiku as long as it works. Test on 20 real prompts; if outputs are acceptable, you're done. Don't upgrade out of nervousness.

Move to Sonnet when: Haiku misses the nuance on your hardest 10% of cases. Run a side-by-side eval. If Sonnet wins meaningfully on those edge cases, upgrade those calls (only those calls) to Sonnet.

Move to Opus when: Sonnet gets the right answer 70% of the time and you need 95%. This usually means the task is multi-step reasoning, complex planning, or unusually hard code. Use Opus for the specific call, not the whole workflow.

Don't fall for "latest model" theater. The latest model isn't automatically the right one. Stick with what works for your task; only upgrade when you have eval data showing it's worth the cost.

Want me to ship your product? Let's talk.

See shipped products