End-to-end real-time pipeline: app -> Kafka -> Flink enrichment -> ClickHouse OLAP -> Grafana dashboard.
Real-Time Data Pipeline Architecture
# End-to-end real-time pipeline
# Source -> Kafka -> Flink -> ClickHouse -> Grafana
# 1. Producer: app sends events
from confluent_kafka import Producer
producer = Producer({"bootstrap.servers":"kafka:9092"})
def on_order_completed(order: dict):
producer.produce("orders",
key=str(order["order_id"]),
value=json.dumps(order).encode())
producer.poll(0)
# 2. Stream processing: Flink enriches and aggregates
# (SQL in Flink)
Flink SQL:
INSERT INTO clickhouse_daily_revenue
SELECT
DATE_FORMAT(event_time, "yyyy-MM-dd") AS date,
product_id,
SUM(amount) AS revenue,
COUNT(*) AS order_count
FROM kafka_orders
GROUP BY TUMBLE(event_time, INTERVAL "1" MINUTE), DATE_FORMAT(event_time, "yyyy-MM-dd"), product_id;
# 3. ClickHouse: sub-second queries
# SELECT date, SUM(revenue) FROM daily_revenue
# WHERE date = today() GROUP BY date
# Returns in < 100ms on billions of rows
# 4. Grafana dashboard reads ClickHouse
# Auto-refreshes every 10 seconds
# Latency: event occurs -> visible in dashboard in ~5 seconds