📡 You're offline — showing cached content
New version available!
Quick Access
Tutorials Data Engineering Data Engineering Interview Prep

Data Engineering Interview Prep

5 min read Quiz at the end
Data engineering interview: ETL vs ELT, star schema, SCD, partitioning, Spark, Iceberg, idempotency, CDC.

Data Engineering Interview Topics

  • ETL vs ELT -- ETL transforms before loading (legacy); ELT loads raw and transforms in warehouse (modern cloud DW)
  • Star schema -- fact tables (events, many FKs) and dimension tables (lookups); optimised for analytics queries
  • SCD Type 2 -- slowly changing dimensions with valid_from/valid_to rows track full history
  • Partitioning -- partition large tables by date; reduces scan cost and improves query performance
  • dbt ref() -- references another model; builds dependency graph for correct execution order
  • Spark DAG -- lazy transformations build a Directed Acyclic Graph; actions trigger execution
  • Broadcast join -- send small table to all executors; eliminates shuffle for large-small joins
  • Iceberg vs Delta -- both add ACID to data lake; Iceberg is vendor-neutral; Delta is Databricks-native
  • Idempotency -- pipeline can run multiple times with same result; use MERGE or truncate+insert
  • CDC -- Change Data Capture reads database binlog (Debezium) for row-level change streaming
  • Great Expectations -- define data quality rules as code; fail pipeline on violation
  • Data Mesh -- domain teams own their data products; central platform provides tooling
Topic Quiz · 1 questions

Test your understanding before moving on

1. What is the main difference between dbt ref() and source() functions?
💡 Use source() for raw data coming from ingestion, ref() for transformed dbt models — they serve different layers.