Cost per token is the atomic unit of LLM API pricing. Every interaction with a language model — whether it's a simple question or a complex agent workflow — ultimately reduces to a count of tokens multiplied by their per-token price. Understanding cost per token is essential for budgeting, optimization, and comparing providers.
How Cost Per Token Is Calculated
LLM providers express pricing in dollars per million tokens (or occasionally per 1,000 tokens). The formula for any API call is:
Total Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)
For example, with GPT-4o ($2.50/1M input, $10.00/1M output):
This seems tiny for a single request, but at 1 million requests per day, that's $7,500 daily.
Input vs Output Token Pricing
Almost all providers charge different rates for input and output tokens:
Why Output Tokens Cost More
Output tokens are generated auto-regressively — one at a time, each requiring a full forward pass through the model. This sequential process is computationally expensive. Input tokens, by contrast, are processed in parallel through the transformer's attention mechanism, making them cheaper per token.
The Price Ratio
The input-to-output price ratio varies by provider:
This ratio matters for optimization: applications that generate long outputs are disproportionately affected by output token pricing.
Cost Per Token Across Models
The cost-per-token landscape spans several orders of magnitude:
Frontier Models ($2-15 per million output tokens)
GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro — these are the most capable models with the highest per-token costs. Best for complex reasoning, nuanced generation, and tasks where quality directly impacts business outcomes.
Mid-Tier Models ($0.50-4 per million output tokens)
GPT-4o-mini, Claude 3.5 Haiku, Gemini 1.5 Flash — smaller, faster models that handle 80% of tasks at 5-20% of the cost. Ideal for classification, extraction, simple Q&A, and high-volume use cases.
Budget Models ($0.05-0.50 per million output tokens)
Llama 3.1 8B, Mistral 7B, and other small open-source models served via API providers. Best for simple tasks like sentiment analysis, entity extraction, and template filling.
Reasoning Models ($10-60 per million output tokens)
OpenAI o1, o3 and similar reasoning-focused models charge premium prices for chain-of-thought reasoning capabilities. These models also generate many "thinking" tokens that add to output costs.
Factors That Affect Effective Cost Per Token
Your actual cost per token may differ from list prices due to:
Volume Discounts
Enterprise agreements often include volume-based discounts. Spending $100K+/month typically unlocks 10-30% savings.
Cached Token Discounts
Many providers offer 50-90% discounts on cached input tokens — repeated prompt prefixes that don't need reprocessing.
Batch API Pricing
OpenAI and others offer 50% discounts for non-real-time batch processing, effectively halving your cost per token for async workloads.
Fine-Tuned Model Pricing
Fine-tuned models sometimes have different (often higher) per-token prices but can achieve better results with shorter prompts, potentially lowering total costs.
Optimizing Cost Per Token
Beyond choosing cheaper models, you can reduce your effective cost per token through: