Tutorials › AWS Solutions Architect › High Availability and Disaster Recovery

High Availability and Disaster Recovery

5 min read Quiz at the end

Design HA architectures and DR strategies. Understand RTO, RPO, and the four DR approaches (backup/restore, pilot light, warm standby, multi-site).

High Availability and Disaster Recovery

High Availability (HA) means your application keeps running when parts fail. Disaster Recovery (DR) means you can restore your application when a major failure occurs. Every Solutions Architect must design for both.

Teacher Note: HA is like having a backup power generator in your building — when the main power fails, the generator kicks in automatically within seconds and nobody notices. DR is like having a second building in another city — if the main building burns down, you move operations to the backup site.

Recovery Objectives — Define Your Tolerance

Metric	Definition	Example
RTO (Recovery Time Objective)	Maximum acceptable downtime	We can tolerate 1 hour of downtime
RPO (Recovery Point Objective)	Maximum acceptable data loss	We can lose at most 15 minutes of transactions

Four DR Strategies — Cheapest to Most Expensive

Strategy	RTO	RPO	Cost	How It Works
Backup and Restore	Hours to days	Hours	Lowest	Regular snapshots to S3, restore when needed
Pilot Light	10-30 minutes	Minutes	Low	Core services always running, scale up on failure
Warm Standby	Minutes	Seconds	Medium	Scaled-down but fully running copy in another region
Multi-Site Active-Active	Near zero	Near zero	Highest	Full capacity running in multiple regions simultaneously

High Availability Patterns

Multi-AZ RDS: automatic failover within Region in 60-120 seconds — RPO=0, RTO=2 minutes
ALB + ASG across 3 AZs: instances in us-east-1a, 1b, 1c — survive any AZ failure
S3 with Versioning: recover any previous version of any object
Route 53 Failover: automatically route traffic to DR site when primary health check fails
Aurora Global Database: 5 regions, under 1 second replication lag, promote secondary in under 1 minute

Exam Tip: The exam loves choosing the RIGHT DR strategy for the given RTO/RPO requirements. Always match strategy to cost vs recovery requirement: 'not willing to lose any data' = active-active or warm standby. 'Can tolerate 4 hours downtime' = backup and restore. Read the requirements carefully!

← AWS Security Services Next: ECS and EKS — Container Services →

Topic Quiz · 2 questions

Test your understanding before moving on

1. A company's RPO is 1 hour and RTO is 4 hours. Which is the MOST cost-effective DR strategy?

💡 Backup and Restore has RTO of hours and RPO based on backup frequency. It is the cheapest strategy and matches RTO of 4 hours and RPO of 1 hour (hourly snapshots).

2. A banking application cannot afford to lose ANY data and must recover within 2 minutes of any failure. Which approach meets this requirement?

💡 Multi-AZ RDS uses synchronous replication (RPO=0, no data loss) and automatic failover in 60-120 seconds (RTO under 2 minutes).

Quick Access

High Availability and Disaster Recovery