Glossary

Token Pricing

The pricing model used by LLM providers where costs are calculated based on the number of tokens (text fragments) processed in API requests and responses.

Token pricing is the foundational billing model for virtually every large language model API on the market today. Understanding how token pricing works is essential for anyone building AI-powered applications, as it directly determines your operational costs and influences architectural decisions.

What Are Tokens?

Tokens are the basic units that language models use to process text. A token roughly corresponds to 3-4 characters in English, or about 75% of a word. For example, the word "hamburger" might be split into "ham," "bur," and "ger" — three tokens. Common words like "the" or "and" are typically single tokens.

Different model providers use different tokenization schemes:

  • OpenAI uses tiktoken (cl100k_base for GPT-4, o200k_base for GPT-4o)
  • Anthropic uses their own tokenizer for Claude models
  • Google uses SentencePiece for Gemini models
  • These differences mean the same text can result in slightly different token counts across providers.

    How Token Pricing Works

    LLM APIs charge separately for:

    Input Tokens (Prompt Tokens)

    These are the tokens in your request — the system prompt, user message, conversation history, function definitions, and any context you provide. Input tokens are typically cheaper than output tokens because the model processes them in parallel.

    Output Tokens (Completion Tokens)

    These are the tokens the model generates in response. Output tokens cost more because they're generated sequentially, requiring more compute per token. The ratio between input and output pricing varies by provider but is typically 1:3 to 1:5.

    Cached Tokens

    Some providers offer discounted pricing for cached input tokens — portions of your prompt that are identical to recent requests. This can reduce input costs by 50-90%.

    Current Token Pricing Landscape (2025-2026)

    ModelInput (per 1M tokens)Output (per 1M tokens)
    GPT-4o$2.50$10.00
    GPT-4o-mini$0.15$0.60
    Claude 3.5 Sonnet$3.00$15.00
    Claude 3.5 Haiku$0.80$4.00
    Gemini 1.5 Pro$1.25$5.00
    Llama 3.1 70B (via API)$0.50-0.90$0.50-0.90

    Why Token Pricing Matters for AI Applications

    Token pricing has profound implications for application design:

  • Context window costs: Sending a full 128K context window of input tokens with GPT-4o costs about $0.32 per request — just for the input. If your application makes thousands of such requests daily, costs add up fast.
  • System prompt overhead: Long system prompts are re-sent with every API call. A 2,000-token system prompt costs you those tokens on every single request.
  • Conversation history: Chat applications that send full conversation history grow linearly in cost per message. Message 50 in a conversation costs roughly 50x more than message 1.
  • Agent loops: AI agents that iterate multiple times multiply token costs with each step, as they typically re-send prior context.
  • Optimizing for Token Pricing

    Smart teams optimize their token usage through several strategies:

  • Prompt engineering: Shorter, more efficient prompts that achieve the same results
  • Context management: Summarizing or truncating conversation history instead of sending everything
  • Prompt caching: Leveraging provider caching for repeated prompt prefixes
  • Model tiering: Using cheaper models for simple tasks and expensive models only when needed
  • Token budgets: Setting hard limits on token consumption per request or per user
  • The Future of Token Pricing

    Token prices have been dropping rapidly — roughly 10x every 18 months. However, usage is growing even faster as AI agents consume far more tokens than simple chatbots. The net effect is that total AI spending continues to rise even as per-token costs fall.

    🦞How ClawHQ Helps

    ClawHQ tracks every token across every model and provider in real-time. See exactly how many input and output tokens each agent, feature, or team consumes. ClawHQ automatically calculates costs using the latest pricing from OpenAI, Anthropic, Google, and other providers — so you always know your true per-token spend. Identify the most expensive prompts, optimize token usage, and set budgets to control costs.

    Frequently Asked Questions

    What is a token in AI pricing?

    A token is a fragment of text that language models use as their basic processing unit. In English, one token is roughly 3-4 characters or about 75% of a word. LLM providers charge based on the number of tokens in your input (prompt) and output (completion).

    Why are output tokens more expensive than input tokens?

    Output tokens cost more because they are generated sequentially — each new token requires a full forward pass through the model. Input tokens are processed in parallel, making them computationally cheaper. The typical price ratio is 1:3 to 1:5 (input:output).

    How can I estimate token costs before making API calls?

    Use tokenizer libraries (like tiktoken for OpenAI) to count tokens in your prompts before sending them. ClawHQ provides real-time token tracking and cost estimation across all providers, helping you predict and control spending.

    Are token prices going down over time?

    Yes, token prices have been dropping roughly 10x every 18 months. However, total AI spending often increases because applications are consuming far more tokens — especially AI agents that make multiple LLM calls per task.

    Related Terms

    Take Control of Your AI Costs

    Take control of your AI agent fleet. Monitor, manage, and optimize — all from one command center.

    Start Free Trial →