Tutorials › Generative AI Engineering › LLM Cost Optimisation

LLM Cost Optimisation

5 min read Quiz at the end

Cut LLM costs with model routing, prompt caching, output limits, semantic caching, and batch processing.

LLM Cost Optimisation

# API costs (approximate 2024)
# Claude Haiku:    $0.25/$1.25 per million tokens (in/out)
# Claude Sonnet:   $3/$15 per million tokens
# Claude Opus:     $15/$75 per million tokens
# GPT-4o mini:     $0.15/$0.60 per million tokens
# GPT-4o:          $5/$15 per million tokens

# Strategy 1: Model routing
def choose_model(task_complexity: str) -> str:
    if task_complexity == 'simple':
        return 'claude-haiku-4-5'      # 60x cheaper than Opus
    elif task_complexity == 'medium':
        return 'claude-sonnet-4-5'
    return 'claude-opus-4-5'

# Strategy 2: Prompt caching (Anthropic)
# Mark static system prompts with cache_control
# Up to 90% discount on cached token reads

# Strategy 3: Output length control
# 'Respond in 100 words or fewer.'
# max_tokens=150  # hard cap

# Strategy 4: Semantic cache
# Cache responses for similar questions
# Threshold: cosine_sim >= 0.95 = cache hit

# Strategy 5: Batch API
# OpenAI: 50% discount for async batch jobs
# Anthropic: batch API for non-real-time tasks

# Strategy 6: Compress prompts
# Remove redundant instructions
# Use abbreviations in few-shot examples

← LLM Evaluation Next: Guardrails and Safety →

Topic Quiz · 1 questions

Test your understanding before moving on

1. What is the cheapest cost optimisation strategy?

💡 Model routing — using cheap models (Haiku, GPT-4o-mini) for simple tasks — gives the biggest cost savings.

Quick Access

LLM Cost Optimisation

LLM Cost Optimisation

Test your understanding before moving on