Tutorials › Prompt Engineering › How LLMs Work

How LLMs Work

5 min read Quiz at the end

Understand tokens, context windows, temperature, and sampling to prompt with intent and precision.

How LLMs Work — What Prompt Engineers Must Know

Tokens — text split into sub-word pieces (~4 chars each). Context window = max tokens the model can see.
Temperature — controls randomness: 0 = deterministic, 1 = creative
Top-p — nucleus sampling: only consider tokens covering top p% of probability mass
System prompt — persistent instructions before the conversation

# Temperature guide
temp=0.0  -- factual, consistent (math, code, extraction)
temp=0.3  -- balanced (summarisation, Q&A)
temp=0.7  -- creative (writing, brainstorming)
temp=1.0+ -- highly creative / experimental

# Context windows (approximate)
Claude 3.5 Sonnet: 200K tokens (~150K words)
GPT-4o:            128K tokens
Gemini 1.5 Pro:    1M tokens

← What is Prompt Engineering? Next: Anatomy of a Great Prompt →

Topic Quiz · 2 questions

Test your understanding before moving on

1. What does temperature=0 produce?

💡 Temperature 0 makes the model deterministic — always choosing the highest probability token.

2. What is a context window?

💡 The context window limits how much text the model can consider in a single API call.

Quick Access

How LLMs Work

How LLMs Work — What Prompt Engineers Must Know

Test your understanding before moving on