Delta Lake
6 min read Quiz at the end
Delta Lake adds ACID transactions, time travel, MERGE upserts, and schema evolution to Spark data lakes.
Delta Lake — ACID on Data Lakes
from delta import DeltaTable
from pyspark.sql import SparkSession
spark = SparkSession.builder
.config("spark.jars.packages","io.delta:delta-core_2.12:2.4.0")
.config("spark.sql.extensions","io.delta.sql.DeltaSparkSessionExtension")
.getOrCreate()
# Write Delta table
df.write.format("delta").mode("overwrite").save("s3://datalake/delta/orders")
# Read
df = spark.read.format("delta").load("s3://datalake/delta/orders")
# MERGE (upsert) -- ACID guaranteed
delta_table = DeltaTable.forPath(spark, "s3://datalake/delta/orders")
delta_table.alias("target").merge(
source_df.alias("source"),
"target.order_id = source.order_id"
).whenMatchedUpdateAll()
.whenNotMatchedInsertAll()
.execute()
# Time travel
df_yesterday = spark.read.format("delta")
.option("versionAsOf", 5).load("s3://datalake/delta/orders")
df_before = spark.read.format("delta")
.option("timestampAsOf","2025-01-01").load("s3://datalake/delta/orders")
# Optimise (compact small files)
delta_table.optimize().executeCompaction()
delta_table.vacuum(retentionHours=168) # clean old versions
# Schema evolution
df.write.format("delta").option("mergeSchema","true").mode("append").save(path)
Topic Quiz · 1 questions
Test your understanding before moving on
1. What does Delta Lake MERGE INTO do?
💡 MERGE INTO implements upsert — matched rows are updated, unmatched rows are inserted, all atomically.