Tutorials › Data Engineering › Data Engineering Interview Prep

Data Engineering Interview Prep

5 min read Quiz at the end

Data engineering interview: ETL vs ELT, star schema, SCD, partitioning, Spark, Iceberg, idempotency, CDC.

Data Engineering Interview Topics

ETL vs ELT -- ETL transforms before loading (legacy); ELT loads raw and transforms in warehouse (modern cloud DW)
Star schema -- fact tables (events, many FKs) and dimension tables (lookups); optimised for analytics queries
SCD Type 2 -- slowly changing dimensions with valid_from/valid_to rows track full history
Partitioning -- partition large tables by date; reduces scan cost and improves query performance
dbt ref() -- references another model; builds dependency graph for correct execution order
Spark DAG -- lazy transformations build a Directed Acyclic Graph; actions trigger execution
Broadcast join -- send small table to all executors; eliminates shuffle for large-small joins
Iceberg vs Delta -- both add ACID to data lake; Iceberg is vendor-neutral; Delta is Databricks-native
Idempotency -- pipeline can run multiple times with same result; use MERGE or truncate+insert
CDC -- Change Data Capture reads database binlog (Debezium) for row-level change streaming
Great Expectations -- define data quality rules as code; fail pipeline on violation
Data Mesh -- domain teams own their data products; central platform provides tooling

Topic Quiz · 1 questions

1. What is the main difference between dbt ref() and source() functions?

💡 Use source() for raw data coming from ingestion, ref() for transformed dbt models — they serve different layers.