The Costs You Don't See
You check your OpenAI dashboard: $1,200 this month. Seems reasonable for 10 agents. But hidden inside that number are hundreds of dollars in waste โ costs that are completely avoidable if you know where to look.
These hidden costs are invisible in provider billing dashboards. You need per-agent, per-task cost tracking to find them. Here are the worst offenders.
Hidden Cost #1: Retry Loops ($$$)
When a task fails and retries, you pay for every attempt. A task that costs $0.05 normally can cost $0.25 if it retries 5 times. At scale, retry costs can be 20-30% of your total bill.
Common causes:
- Rate limit errors that trigger immediate retries (backoff not configured)
- Output parsing failures that trigger full task reruns
- Timeout errors from overloaded models
Fix: Implement exponential backoff. Fall back to cheaper models on retry. Set max retry limits. Track retry rates in ClawHQ.
Hidden Cost #2: Context Window Bloat ($$$)
Every message in a conversation accumulates in the context window. A 10-turn conversation with 2,000 tokens per turn means your 10th message includes 20,000 tokens of context โ and you pay for all of it as input tokens.
Fix: Implement context summarization. Trim old messages. Use shorter system prompts. Set maximum conversation lengths.
Hidden Cost #3: Wrong Model Selection ($$$$)
The biggest hidden cost, and the easiest to fix. Using Claude Opus ($15/1M input) for tasks that Claude Haiku ($0.25/1M input) handles perfectly is 60x more expensive โ for the same result.
Fix: Use ClawHQ's model optimization tab to identify which tasks use which models and where cheaper alternatives work.
Hidden Cost #4: Verbose System Prompts ($$)
A 3,000-token system prompt costs $0.045 on GPT-4 for every single API call. If the agent makes 500 calls/day, that's $22.50/day just for the system prompt. Trimming to 1,000 tokens saves $15/day = $450/month.
Fix: Audit system prompts. Remove redundant instructions. Use variables instead of repeated text.
Hidden Cost #5: Unbounded Output ($$)
Without max_tokens set, models sometimes generate lengthy responses when a short one would do. An agent asked to classify a ticket might write a 500-word explanation instead of returning "billing_issue".
Fix: Set max_tokens. Use structured output (JSON). Be explicit about response format in prompts.
Hidden Cost #6: Duplicate Processing ($)
The same email gets processed twice. The same document gets summarized three times. Without deduplication, you're paying for redundant work.
Fix: Implement task deduplication. Enable response caching. Track task IDs to prevent reprocessing.
Finding Your Hidden Costs
Use ClawHQ to hunt for waste:
- Sort tasks by cost: Find the most expensive individual tasks โ they often reveal retry or bloat issues
- Compare agents: If two agents do similar work but one costs 3x more, investigate
- Check model distribution: Are expensive models handling simple tasks?
- Track cost per task over time: Rising per-task costs indicate context bloat or regression



