Tutorials › Data Engineering › Data Engineering Overview

Data Engineering Overview

5 min read Quiz at the end

Data engineering builds reliable data pipelines from ingestion to storage, transformation, and delivery.

What is Data Engineering?

Data engineering is the discipline of designing, building, and maintaining the infrastructure and pipelines that collect, store, transform, and deliver data reliably at scale.

Role	Responsibility	Tools
Data Engineer	Build pipelines, manage data infrastructure	Spark, Airflow, dbt, Kafka
Data Analyst	Query and visualise data	SQL, Tableau, Looker
Data Scientist	Build ML models	Python, scikit-learn, PyTorch
MLOps Engineer	Deploy and monitor models	MLflow, Kubeflow, SageMaker

# Data Engineering Stack
Ingestion:     Kafka, Fivetran, Airbyte, Debezium
Storage:       S3, GCS, ADLS, HDFS
Processing:    Apache Spark, Flink, dbt
Orchestration: Apache Airflow, Prefect, Dagster
Warehouse:     Snowflake, BigQuery, Redshift, ClickHouse
Catalog:       Apache Atlas, DataHub, Amundsen
Quality:       Great Expectations, Soda, Monte Carlo

← Back to overview Next: Data Pipeline Design →

Topic Quiz · 1 questions

Test your understanding before moving on

1. What is the key difference between ETL and ELT?

💡 ELT loads raw data into a cloud data warehouse (Snowflake, BigQuery) and transforms there using SQL/dbt.

Quick Access

Data Engineering Overview

What is Data Engineering?

Test your understanding before moving on