Data Pipeline Design
5 min read Quiz at the end
ETL, ELT, streaming, and Lambda architecture — choose based on latency requirements and data volume.
Data Pipeline Architecture Patterns
| Pattern | Description | Use Case |
|---|
| ETL | Extract Transform Load -- transform before loading | Data warehouse loading |
| ELT | Extract Load Transform -- transform after loading | Cloud DW (BigQuery, Snowflake) |
| Streaming | Continuous real-time processing | Fraud detection, IoT |
| Micro-batch | Small batches every 1-5 minutes | Near-real-time analytics |
| Lambda | Batch + streaming layers | Historical + real-time combined |
| Kappa | Streaming only, no separate batch layer | Simplified real-time architecture |
# ETL vs ELT decision
# ETL: expensive compute before load, legacy DW
# ELT: load raw data, transform in warehouse (modern)
# ELT pipeline
1. Extract raw data from sources -> S3 (data lake)
2. Load raw data into Snowflake staging tables
3. Transform with dbt (SQL models)
4. Expose mart tables to BI tools
# Streaming pipeline
Kafka -> Flink/Spark Streaming -> Kafka/ClickHouse
(continuous, low latency <1 second)
# Lambda architecture
Batch layer: Spark -> HDFS -> BigQuery (historical accuracy)
Speed layer: Kafka -> Flink -> Redis (low latency recent)
Serving layer: merges both views for queries
Topic Quiz · 1 questions
Test your understanding before moving on
1. What is the Lambda architecture?
💡 Lambda architecture uses separate batch (Spark) and speed (Kafka/Flink) layers merged at query time.