Kafka is a distributed event streaming platform for real-time data pipelines with topics, partitions, and consumer groups.
What is Apache Kafka?
Kafka is a distributed event streaming platform for building real-time data pipelines. Messages are published to topics and consumed by multiple subscriber groups independently.
High throughput — millions of events per second
Durable — messages persist on disk
Replay — consumers can re-read old messages
Horizontally scalable
# Key concepts
Topic — named stream of messages
Partition — ordered, immutable log within a topic
Producer — writes messages to topics
Consumer — reads messages from topics
Broker — Kafka server node
Cluster — group of brokers