📡 You're offline — showing cached content
New version available!
Quick Access
Tutorials Data Engineering Delta Lake

Delta Lake

6 min read Quiz at the end
Delta Lake adds ACID transactions, time travel, MERGE upserts, and schema evolution to Spark data lakes.

Delta Lake — ACID on Data Lakes

from delta import DeltaTable
from pyspark.sql import SparkSession

spark = SparkSession.builder 
    .config("spark.jars.packages","io.delta:delta-core_2.12:2.4.0") 
    .config("spark.sql.extensions","io.delta.sql.DeltaSparkSessionExtension") 
    .getOrCreate()

# Write Delta table
df.write.format("delta").mode("overwrite").save("s3://datalake/delta/orders")

# Read
df = spark.read.format("delta").load("s3://datalake/delta/orders")

# MERGE (upsert) -- ACID guaranteed
delta_table = DeltaTable.forPath(spark, "s3://datalake/delta/orders")

delta_table.alias("target").merge(
    source_df.alias("source"),
    "target.order_id = source.order_id"
).whenMatchedUpdateAll() 
 .whenNotMatchedInsertAll() 
 .execute()

# Time travel
df_yesterday = spark.read.format("delta") 
    .option("versionAsOf", 5).load("s3://datalake/delta/orders")
df_before = spark.read.format("delta") 
    .option("timestampAsOf","2025-01-01").load("s3://datalake/delta/orders")

# Optimise (compact small files)
delta_table.optimize().executeCompaction()
delta_table.vacuum(retentionHours=168)  # clean old versions

# Schema evolution
df.write.format("delta").option("mergeSchema","true").mode("append").save(path)
Topic Quiz · 1 questions

Test your understanding before moving on

1. What does Delta Lake MERGE INTO do?
💡 MERGE INTO implements upsert — matched rows are updated, unmatched rows are inserted, all atomically.