Glossary

Cost Per Token

The unit price charged by LLM providers for each token processed, typically measured in dollars per million tokens for both input and output.

Cost per token is the atomic unit of LLM API pricing. Every interaction with a language model — whether it's a simple question or a complex agent workflow — ultimately reduces to a count of tokens multiplied by their per-token price. Understanding cost per token is essential for budgeting, optimization, and comparing providers.

How Cost Per Token Is Calculated

LLM providers express pricing in dollars per million tokens (or occasionally per 1,000 tokens). The formula for any API call is:

Total Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)

For example, with GPT-4o ($2.50/1M input, $10.00/1M output):

  • A request with 1,000 input tokens and 500 output tokens costs:
  • Input: 1,000 × $0.0000025 = $0.0025
  • Output: 500 × $0.000010 = $0.005
  • Total: $0.0075
  • This seems tiny for a single request, but at 1 million requests per day, that's $7,500 daily.

    Input vs Output Token Pricing

    Almost all providers charge different rates for input and output tokens:

    Why Output Tokens Cost More

    Output tokens are generated auto-regressively — one at a time, each requiring a full forward pass through the model. This sequential process is computationally expensive. Input tokens, by contrast, are processed in parallel through the transformer's attention mechanism, making them cheaper per token.

    The Price Ratio

    The input-to-output price ratio varies by provider:

  • OpenAI GPT-4o: 1:4 ($2.50 vs $10.00 per 1M)
  • Anthropic Claude Sonnet: 1:5 ($3.00 vs $15.00 per 1M)
  • Google Gemini Pro: 1:4 ($1.25 vs $5.00 per 1M)
  • Open-source models: Often 1:1 (same price for input and output)
  • This ratio matters for optimization: applications that generate long outputs are disproportionately affected by output token pricing.

    Cost Per Token Across Models

    The cost-per-token landscape spans several orders of magnitude:

    Frontier Models ($2-15 per million output tokens)

    GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro — these are the most capable models with the highest per-token costs. Best for complex reasoning, nuanced generation, and tasks where quality directly impacts business outcomes.

    Mid-Tier Models ($0.50-4 per million output tokens)

    GPT-4o-mini, Claude 3.5 Haiku, Gemini 1.5 Flash — smaller, faster models that handle 80% of tasks at 5-20% of the cost. Ideal for classification, extraction, simple Q&A, and high-volume use cases.

    Budget Models ($0.05-0.50 per million output tokens)

    Llama 3.1 8B, Mistral 7B, and other small open-source models served via API providers. Best for simple tasks like sentiment analysis, entity extraction, and template filling.

    Reasoning Models ($10-60 per million output tokens)

    OpenAI o1, o3 and similar reasoning-focused models charge premium prices for chain-of-thought reasoning capabilities. These models also generate many "thinking" tokens that add to output costs.

    Factors That Affect Effective Cost Per Token

    Your actual cost per token may differ from list prices due to:

    Volume Discounts

    Enterprise agreements often include volume-based discounts. Spending $100K+/month typically unlocks 10-30% savings.

    Cached Token Discounts

    Many providers offer 50-90% discounts on cached input tokens — repeated prompt prefixes that don't need reprocessing.

    Batch API Pricing

    OpenAI and others offer 50% discounts for non-real-time batch processing, effectively halving your cost per token for async workloads.

    Fine-Tuned Model Pricing

    Fine-tuned models sometimes have different (often higher) per-token prices but can achieve better results with shorter prompts, potentially lowering total costs.

    Optimizing Cost Per Token

    Beyond choosing cheaper models, you can reduce your effective cost per token through:

  • Prompt compression: Techniques like LLMLingua that compress prompts while preserving meaning
  • Response length control: Using max_tokens and stop sequences to prevent unnecessarily long outputs
  • Caching: Storing and reusing responses for identical or similar queries
  • Model routing: Dynamically selecting the cheapest model that can handle each specific request
  • Token budgets: Setting per-request and per-user limits to prevent runaway costs
  • 🦞How ClawHQ Helps

    ClawHQ tracks your effective cost per token across every model, provider, and use case. See how caching, batching, and model routing affect your actual per-token costs versus list prices. ClawHQ's analytics help you identify which requests could be handled by cheaper models and quantify the savings from optimization strategies. Monitor cost-per-token trends over time as providers update pricing.

    Frequently Asked Questions

    What is the average cost per token for LLM APIs?

    Cost per token varies by model tier. Frontier models like GPT-4o cost $2.50-15 per million tokens, mid-tier models like GPT-4o-mini cost $0.15-4, and budget open-source models cost $0.05-0.50 per million tokens. Output tokens typically cost 3-5x more than input tokens.

    How many tokens does a typical API request use?

    A typical chatbot exchange uses 200-500 input tokens and 100-300 output tokens. RAG applications use 2,000-8,000 input tokens. AI agents can consume 10,000-100,000+ tokens per task across multiple API calls.

    How can I lower my cost per token?

    Strategies include: using mid-tier models for simple tasks (5-20x savings), enabling prompt caching (50-90% input savings), using batch APIs for async work (50% discount), negotiating volume discounts, and compressing prompts to reduce token counts.

    Does cost per token differ between input and output?

    Yes, output tokens typically cost 3-5x more than input tokens because they are generated sequentially, requiring more computation. For example, GPT-4o charges $2.50 per million input tokens but $10.00 per million output tokens.

    Related Terms

    Take Control of Your AI Costs

    Take control of your AI agent fleet. Monitor, manage, and optimize — all from one command center.

    Start Free Trial →