For an updated ready-to-use CloudFormation template of this code, see newer
post: Complete code: cross-region RDS recovery
.
Amazon RDS is a great database-as-a-service, which takes care of almost all database-related maintenance tasks for you -
everything from automated backups and patching to replication and fail-overs into another availability zones.
Unfortunately all of this fails if the region where your RDS is hosted fails. Region-wide failures are very
rare, but they do happen! RDS does not support cross-region replication at the
moment, so you cannot simply create a replica of your database in another region (unless you host the database on an EC2
instance and set up the replication yourself). The second-best option, to make sure you can restore your service quickly
in another region, is to always have a copy of your latest database backup in that region. In case of RDS, that can mean
copying automated snapshots. There is no option for AWS to do it automatically, but it can be easily scripted with AWS
Lambda functions.
RDS can create an automated snapshot of your database every day. All we need to do is make sure to copy that snapshot
once it’s ready and remove any old snapshots from the “fail-over region” to save storage cost.
The following quick-and-dirty Lambda function (in Python) accomplishes just that:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
| import boto3
import operator
ACCOUNT = 'xxxxx'
def copy_latest_snapshot():
client = boto3.client('rds', 'eu-west-1')
frankfurt_client = boto3.client('rds', 'eu-central-1')
response = client.describe_db_snapshots(
SnapshotType='automated',
IncludeShared=False,
IncludePublic=False
)
if len(response['DBSnapshots']) == 0:
raise Exception("No automated snapshots found")
snapshots_per_project = {}
for snapshot in response['DBSnapshots']:
if snapshot['Status'] != 'available':
continue
if snapshot['DBInstanceIdentifier'] not in snapshots_per_project.keys():
snapshots_per_project[snapshot['DBInstanceIdentifier']] = {}
snapshots_per_project[snapshot['DBInstanceIdentifier']][snapshot['DBSnapshotIdentifier']] = snapshot[
'SnapshotCreateTime']
for project in snapshots_per_project:
sorted_list = sorted(snapshots_per_project[project].items(), key=operator.itemgetter(1), reverse=True)
copy_name = project + "-" + sorted_list[0][1].strftime("%Y-%m-%d")
print("Checking if " + copy_name + " is copied")
try:
frankfurt_client.describe_db_snapshots(
DBSnapshotIdentifier=copy_name
)
except:
response = frankfurt_client.copy_db_snapshot(
SourceDBSnapshotIdentifier='arn:aws:rds:eu-west-1:' + ACCOUNT + ':snapshot:' + sorted_list[0][0],
TargetDBSnapshotIdentifier=copy_name,
CopyTags=True
)
if response['DBSnapshot']['Status'] != "pending" and response['DBSnapshot']['Status'] != "available":
raise Exception("Copy operation for " + copy_name + " failed!")
print("Copied " + copy_name)
continue
print("Already copied")
def remove_old_snapshots():
client = boto3.client('rds', 'eu-west-1')
frankfurt_client = boto3.client('rds', 'eu-central-1')
response = frankfurt_client.describe_db_snapshots(
SnapshotType='manual'
)
if len(response['DBSnapshots']) == 0:
raise Exception("No manual snapshots in Frankfurt found")
snapshots_per_project = {}
for snapshot in response['DBSnapshots']:
if snapshot['Status'] != 'available':
continue
if snapshot['DBInstanceIdentifier'] not in snapshots_per_project.keys():
snapshots_per_project[snapshot['DBInstanceIdentifier']] = {}
snapshots_per_project[snapshot['DBInstanceIdentifier']][snapshot['DBSnapshotIdentifier']] = snapshot[
'SnapshotCreateTime']
for project in snapshots_per_project:
if len(snapshots_per_project[project]) > 1:
sorted_list = sorted(snapshots_per_project[project].items(), key=operator.itemgetter(1), reverse=True)
to_remove = [i[0] for i in sorted_list[1:]]
for snapshot in to_remove:
print("Removing " + snapshot)
frankfurt_client.delete_db_snapshot(
DBSnapshotIdentifier=snapshot
)
def lambda_handler(event, context):
copy_latest_snapshot()
remove_old_snapshots()
if __name__ == '__main__':
lambda_handler(None, None)
|
For the given account (update the ACCOUNT
var at the top of the code) it will go through each of your RDS instances
and copy the latest snapshot from Ireland (eu-west-1) to Frankfurt (eu-central-1). It will then go through all manual
snapshots within Frankfurt and keep only the latest snapshot for each instance. Region values can be changed within the
script to match any requirements.
This Lambda can be scheduled in two ways:
- via CloudWatch Events Schedule, to simply run every day,
- via RDS events (through SNS), to run whenever an RDS backup is finished (some improvements to the code could be
useful).
You can create this function manually (it does not require any additional libraries, so it can be copied & pasted into
AWS Lambda) or use CloudFormation (please do!). For reference, check out the GitHub repository where you can find other
useful Lambdas and CloudFormation templates for their
creation: https://github.com/pbudzon/aws-maintenance.