What is the average cost per token for LLM APIs?

Cost per token varies by model tier. Frontier models like GPT-4o cost $2.50-15 per million tokens, mid-tier models like GPT-4o-mini cost $0.15-4, and budget open-source models cost $0.05-0.50 per million tokens. Output tokens typically cost 3-5x more than input tokens.

How many tokens does a typical API request use?

A typical chatbot exchange uses 200-500 input tokens and 100-300 output tokens. RAG applications use 2,000-8,000 input tokens. AI agents can consume 10,000-100,000+ tokens per task across multiple API calls.

How can I lower my cost per token?

Strategies include: using mid-tier models for simple tasks (5-20x savings), enabling prompt caching (50-90% input savings), using batch APIs for async work (50% discount), negotiating volume discounts, and compressing prompts to reduce token counts.

Does cost per token differ between input and output?

Yes, output tokens typically cost 3-5x more than input tokens because they are generated sequentially, requiring more computation. For example, GPT-4o charges $2.50 per million input tokens but $10.00 per million output tokens.

Cost Per Token: Definition & Guide

Cost per token is the atomic unit of LLM API pricing. Every interaction with a language model — whether it's a simple question or a complex agent workflow — ultimately reduces to a count of tokens multiplied by their per-token price. Understanding cost per token is essential for budgeting, optimization, and comparing providers.

How Cost Per Token Is Calculated

LLM providers express pricing in dollars per million tokens (or occasionally per 1,000 tokens). The formula for any API call is:

Total Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)

For example, with GPT-4o ($2.50/1M input, $10.00/1M output):

A request with 1,000 input tokens and 500 output tokens costs:

Input: 1,000 × $0.0000025 = $0.0025

Output: 500 × $0.000010 = $0.005

Total: $0.0075

This seems tiny for a single request, but at 1 million requests per day, that's $7,500 daily.

Input vs Output Token Pricing

Almost all providers charge different rates for input and output tokens:

Why Output Tokens Cost More

Output tokens are generated auto-regressively — one at a time, each requiring a full forward pass through the model. This sequential process is computationally expensive. Input tokens, by contrast, are processed in parallel through the transformer's attention mechanism, making them cheaper per token.

The Price Ratio

The input-to-output price ratio varies by provider:

OpenAI GPT-4o: 1:4 ($2.50 vs $10.00 per 1M)

Anthropic Claude Sonnet: 1:5 ($3.00 vs $15.00 per 1M)

Google Gemini Pro: 1:4 ($1.25 vs $5.00 per 1M)

Open-source models: Often 1:1 (same price for input and output)

This ratio matters for optimization: applications that generate long outputs are disproportionately affected by output token pricing.

Cost Per Token Across Models

The cost-per-token landscape spans several orders of magnitude:

Frontier Models ($2-15 per million output tokens)

GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro — these are the most capable models with the highest per-token costs. Best for complex reasoning, nuanced generation, and tasks where quality directly impacts business outcomes.

Mid-Tier Models ($0.50-4 per million output tokens)

GPT-4o-mini, Claude 3.5 Haiku, Gemini 1.5 Flash — smaller, faster models that handle 80% of tasks at 5-20% of the cost. Ideal for classification, extraction, simple Q&A, and high-volume use cases.

Budget Models ($0.05-0.50 per million output tokens)

Llama 3.1 8B, Mistral 7B, and other small open-source models served via API providers. Best for simple tasks like sentiment analysis, entity extraction, and template filling.

Reasoning Models ($10-60 per million output tokens)

OpenAI o1, o3 and similar reasoning-focused models charge premium prices for chain-of-thought reasoning capabilities. These models also generate many "thinking" tokens that add to output costs.

Factors That Affect Effective Cost Per Token

Your actual cost per token may differ from list prices due to:

Volume Discounts

Enterprise agreements often include volume-based discounts. Spending $100K+/month typically unlocks 10-30% savings.

Cached Token Discounts

Many providers offer 50-90% discounts on cached input tokens — repeated prompt prefixes that don't need reprocessing.

Batch API Pricing

OpenAI and others offer 50% discounts for non-real-time batch processing, effectively halving your cost per token for async workloads.

Fine-Tuned Model Pricing

Fine-tuned models sometimes have different (often higher) per-token prices but can achieve better results with shorter prompts, potentially lowering total costs.

Optimizing Cost Per Token

Beyond choosing cheaper models, you can reduce your effective cost per token through:

Prompt compression: Techniques like LLMLingua that compress prompts while preserving meaning

Response length control: Using max_tokens and stop sequences to prevent unnecessarily long outputs

Caching: Storing and reusing responses for identical or similar queries

Model routing: Dynamically selecting the cheapest model that can handle each specific request

Token budgets: Setting per-request and per-user limits to prevent runaway costs

Cost Per Token

How Cost Per Token Is Calculated

Input vs Output Token Pricing

Why Output Tokens Cost More

The Price Ratio

Cost Per Token Across Models

Frontier Models ($2-15 per million output tokens)

Mid-Tier Models ($0.50-4 per million output tokens)

Budget Models ($0.05-0.50 per million output tokens)

Reasoning Models ($10-60 per million output tokens)

Factors That Affect Effective Cost Per Token

Volume Discounts

Cached Token Discounts

Batch API Pricing

Fine-Tuned Model Pricing

Optimizing Cost Per Token

🦞How ClawHQ Helps

Frequently Asked Questions

What is the average cost per token for LLM APIs?

How many tokens does a typical API request use?

How can I lower my cost per token?

Does cost per token differ between input and output?

Related Terms

Token Pricing

LLM API Costs

OpenAI Pricing

Anthropic Pricing

Prompt Caching

Take Control of Your AI Costs