Prompt caching is a powerful cost optimization technique that allows LLM providers to store and reuse the computed representations of prompt prefixes. When subsequent API requests share the same prefix, the provider can skip reprocessing those tokens, passing the savings on to you through significantly reduced input token pricing.
How Prompt Caching Works
When you send a prompt to an LLM API, the model processes each input token through its transformer layers to build internal representations (key-value caches). This computation is a major part of what you're paying for with input tokens.
Prompt caching works by:
Provider-Specific Implementations
Anthropic Prompt Caching
Anthropic offers the most aggressive caching discount at 90% off cached input tokens:
You explicitly control caching with cache_control breakpoints in your messages.
OpenAI Automatic Caching
OpenAI implements automatic prompt caching with a 50% discount:
OpenAI's caching is automatic — no code changes needed. The system detects repeated prefixes and caches them transparently.
Google Gemini Context Caching
Google offers explicit context caching for Gemini models with different pricing mechanics — a per-hour storage fee plus discounted input pricing.
When to Use Prompt Caching
Prompt caching is most effective when:
1. System Prompts Are Long
If your system prompt is 1,000+ tokens (common for agents with detailed instructions), caching it saves those tokens on every subsequent call within the TTL window.
2. Few-Shot Examples Are Consistent
Applications that include the same few-shot examples in every prompt can cache the example section.
3. RAG with Repeated Context
If multiple user queries reference the same documents, the document context can be cached.
4. Agent Loops
AI agents make multiple LLM calls per task, often with the same system prompt and growing context. Caching the static portions saves significantly.
5. Multi-Turn Conversations
Each new message in a conversation resends the full history. The prior messages (which don't change) can be cached.
Calculating Prompt Caching Savings
Scenario: An AI agent with a 3,000-token system prompt makes 20 LLM calls per task using Claude 3.5 Sonnet.
Without caching:
With caching:
At 10,000 tasks per month, that's $1,800 vs $280 — saving $1,520 monthly just on system prompt tokens.
Best Practices for Prompt Caching
Structure Prompts for Cacheability
Place static content at the beginning of prompts:
Monitor Cache Hit Rates
Track what percentage of input tokens are hitting cache. Low hit rates indicate:
Combine with Other Optimizations
Prompt caching stacks with other cost optimizations: