📡 You're offline — showing cached content
New version available!
Quick Access
Tutorials Prompt Engineering How LLMs Work

How LLMs Work

5 min read Quiz at the end
Understand tokens, context windows, temperature, and sampling to prompt with intent and precision.

How LLMs Work — What Prompt Engineers Must Know

  • Tokens — text split into sub-word pieces (~4 chars each). Context window = max tokens the model can see.
  • Temperature — controls randomness: 0 = deterministic, 1 = creative
  • Top-p — nucleus sampling: only consider tokens covering top p% of probability mass
  • System prompt — persistent instructions before the conversation
# Temperature guide
temp=0.0  -- factual, consistent (math, code, extraction)
temp=0.3  -- balanced (summarisation, Q&A)
temp=0.7  -- creative (writing, brainstorming)
temp=1.0+ -- highly creative / experimental

# Context windows (approximate)
Claude 3.5 Sonnet: 200K tokens (~150K words)
GPT-4o:            128K tokens
Gemini 1.5 Pro:    1M tokens
Topic Quiz · 2 questions

Test your understanding before moving on

1. What does temperature=0 produce?
💡 Temperature 0 makes the model deterministic — always choosing the highest probability token.
2. What is a context window?
💡 The context window limits how much text the model can consider in a single API call.