Complete code: cross-region RDS recovery

December 28, 2017 by Paulina BudzoƄ

After posting the previous post on this topic (Copying RDS snapshot to another region for cross-region recovery) , I noticed a lot of people being interested in using the code I provided as an example. Many were not sure how to make use of it, and after a couple of pull requests it became obvious that a complete, fully-working code and CloudFormation template would be a good idea. So, yesterday, I pushed an update to aws-maintenance repository with a fully working code, which you can easily customize via CloudFormation parameters to match your needs.

The main changes include:

  1. The code was moved to a separate CloudFormation template, which creates all (and only) the resources needed for the solution to work. You can now simply upload the infrastructure/templates/rds-cross-region-backup.json template to CloudFormation, fill out the parameters and have an automated cross-region snapshot copying happening all of its own!
  2. The code was updated to be triggered specifically by SNS message send by RDS “backup” events. This notifications are sent by RDS whenever the daily, automated snapshot is started and finished - the code will react to the “backup finished” event and copy the latest snapshot to the provided target region.
  3. Because of the above change, the code now only acts for a single RDS instance - copying the new snapshot for that specific instance and removing old snapshots from the target region for that specific instance.
  4. Added more comments to the code, to hopefully make it easier to understand for everyone :)

Some notes as for what to expect:

  • The code intends to keep only 1 (latest) copy of the snapshot in the target region. Even though, you will likely notice 2 snapshots being present. This is because, the code triggers the copy to be made into the target region, but does not wait for it to be finished, before attempting to remove old snapshots (simply because this can take a long time, especially for large databases). Therefore, when the list of snapshots for the database is taken from target region, the most recent snapshot is likely to still be in progress, and the code only takes “available” (ready) snapshots into account. This is actually useful, as the operation of copying the snapshot can potentially fail to finish correctly, which would leave you without any usable snapshots for that database in the target region.
  • The snapshot in the target region will be named {source_rds_instance_name}-{source_region}-{original_snapshot_name}. The code itself relies on this name to check whether the snapshot was already copied or not. The Lambda can be executed multiple times, but will not trigger the copy if snapshot with this name already exists.
  • Only automated snapshots will be taken into account when making a copy. This means if you create a manual snapshot, it will not be copied, but it will trigger the Lambda (which may clean up the second snapshot from the target region, see point #1 above). None of this is a real issue :)
  • (Obviously, I hope) if you disable automated snapshots on your RDS instance, this code will do nothing.

If you’re interested in seeing the exact changes made, check out this commit.

If you just want to use the solution, download the CloudFormation template and create a CloudFormation stack using it. When creating a stack from this template, you will be asked for two required parameters: source and target regions. Source region must be the one where your RDS instance is running. The target region is the one where you want to store the copies of snapshots. Use the region IDs, not the full (human-friendly) names, so for example “eu-west-1” for Ireland. For a full list of regions where RDS is available (and their names and IDs), check out AWS documentation.

By default, the Lambda will be subscribed to notifications from all RDS instances within the source region. If you only want to limit this to specific instances, provide a comma-delimited list of the instances names in the third CloudFormation parameter (for example, “database-1,database-2” to limit the code to only those two instances). It’s helpful if all your test and production environments are in the same AWS account (though, they shouldn’t really…) and you only want to use the code for production databases.

Posted in: CloudFormation