How LLMs Work
5 min read Quiz at the end
Understand tokens, context windows, temperature, and sampling to prompt with intent and precision.
How LLMs Work — What Prompt Engineers Must Know
- Tokens — text split into sub-word pieces (~4 chars each). Context window = max tokens the model can see.
- Temperature — controls randomness: 0 = deterministic, 1 = creative
- Top-p — nucleus sampling: only consider tokens covering top p% of probability mass
- System prompt — persistent instructions before the conversation
# Temperature guide
temp=0.0 -- factual, consistent (math, code, extraction)
temp=0.3 -- balanced (summarisation, Q&A)
temp=0.7 -- creative (writing, brainstorming)
temp=1.0+ -- highly creative / experimental
# Context windows (approximate)
Claude 3.5 Sonnet: 200K tokens (~150K words)
GPT-4o: 128K tokens
Gemini 1.5 Pro: 1M tokens
Topic Quiz · 2 questions
Test your understanding before moving on
1. What does temperature=0 produce?
💡 Temperature 0 makes the model deterministic — always choosing the highest probability token.
2. What is a context window?
💡 The context window limits how much text the model can consider in a single API call.