Disaster Recovery in AWS with cross-region replication and DataSync

by Mike Sweetman
16 June 2020

The problems of fault tolerance and increased survivability of computer systems have been studied for a very long time and are still one of the most important topics. The first computer systems in the 1950s and 1960s were implemented on relays and on lamps; their fault tolerance was very low due to failures in the element base. Therefore, engineers and programmers fought for every important moment in increasing the fault tolerance of computers at the dawn of the computing era. The time between failures was then several hours, then the computer system needed to be repaired or restarted. Now, in the current operating conditions of cloud systems, all the requirements for fault tolerance and reliability remain as relevant. This is due to the emergence of concepts such as a cloud region, the time a computer system takes to return to a healthy state with the exclusion of data or transaction loss, and the Mean time between failures (MTBF) for modern cloud solutions can be years. Consider the main tasks and issues in Disaster Recovery with cross-region replication in Amazon AWS.

What is Disaster Recovery

Disaster recovery includes a set of policy practices, tools, and procedures that allow you to recover and also maintain vital technological infrastructure and computer systems after a natural or human disaster. Disaster Recovery now focuses on cloud systems and data centers that support critical business functions. This implies maintaining all the essential aspects of the functioning of the business, despite significant destructive events. Therefore, disaster recovery can be seen as part of business continuity.

As described here: “IT Service Continuity (ITSC) is a subset of business continuity planning (BCP) and encompasses IT disaster recovery planning and wider IT resilience planning. 

The ITSC Plan reflects Recovery Point Objective (RPO - recent transactions) and Recovery Time Objective (RTO - time intervals).”

What is Region in Cloud Systems

Availability Zones (AZs) are isolated locations in the regions of the data center from which public cloud services are created and run. Regions are the geographic locations where data centers of public cloud service providers are located. Companies choose one or more access zones around the world for their services, depending on the needs of the business.

Amazon Web Services (AWS) operates in the United States, South America, Europe, and Asia Pacific. Each region contains two to five access zones that are geographically separated from each other. The regions are connected to each other through high-speed specialized Internet channels.

Customers choose availability zones for a variety of reasons, including compliance and proximity to end customers. Cloud administrators and DevOps can also choose to replicate services across multiple availability zones to reduce latency or protect resources. Administrators can move resources to another availability zone in the event of a failure.

What is Replication

Computer replication relies on resource redundancy, such as software or hardware components, to increase reliability, fault tolerance, or availability.

When a single master replica is designed to handle all requests, the system uses a primary backup or master-slave scheme. If a replica can process a request and propagate a new state, the system uses a scheme with several main or several main ones.

Backup differs from replication in that the saved copy of the data remains unchanged for a long period of time. Replicas, on the other hand, are often updated and quickly lose any historical state. Replication is one of the oldest and most important topics in the general field of distributed systems.

Disaster Recovery in Amazon AWS

Sustainability of a business is determined by the efficiency and continuity of the data flow in the organization. Even a short interruption in the workflow can lead to thousands of lost transactions, a significant decline in production and a loss of customer confidence.

The causes of interruptions can be different, from natural disasters to equipment breakdowns or human errors. A well-prepared cloud-based disaster recovery strategy will help you continue if your physical infrastructure is unavailable for a while.

AWS supports a variety of disaster recovery architectures, from those designed for companies with light workloads to large enterprise solutions that enable failover to scale. AWS provides a suite of cloud services for rapid disaster recovery of IT infrastructure and data.

CloudEndure Disaster Recovery is one of AWS services that enables you to quickly and conveniently move your disaster recovery strategy from your existing physical or virtual data centers, private clouds, or other public clouds to the AWS cloud. If you have already migrated to AWS, you can further protect your mission-critical workloads with inter-regional disaster recovery. If you haven’t migrated to AWS yet, you can organize your work with free CloudEndure migration licenses.

Cross-region replication in AWS

There are several replication methods in the AWS cloud system.

One type of replication enables automatic asynchronous copying of objects through Amazon S3 buckets. You can copy objects between different AWS regions or within the same region. To enable object replication, you must add the replication configuration to the original S3 bucket. You can do the following two types of replication:

  • Cross-Region replication (CRR) is used to copy objects across Amazon S3 buckets in different AWS Regions.
  • Same-Region replication (SRR) is used to copy objects across Amazon S3 buckets in the same AWS Region.

Replication must be used in order to:

  • Replicate objects while retaining metadata — you can use replication to make copies of your objects that retain all metadata, such as the original object creation time and version IDs.
  • Replicate objects into different storage classes — you can use replication to directly put objects into S3 Glacier, S3 Glacier Deep Archive, or another storage class in the destination bucket.
  • Maintain object copies under different ownership — Regardless of who owns the source object, you can tell Amazon S3 to change replica ownership to the AWS account that owns the destination bucket.

Cross-Region replication can help you do the following: meet compliance requirements (Amazon S3 stores your data across multiple geographically distant locations) and minimize latency (if customers are in two geographic locations).
Image source

Cross-Region replication

As for EC2 cross-region replication, you can solve this problem using the AWS Route 53 routing policies. Amazon Route 53 sends user requests to AWS, such as Amazon EC2 instances, Elastic Load Balancing, or Amazon S3 buckets. Route 53 has 5 different routing policies, in which case you can use one of the following two policies:

  • Geolocation routing policy - Use this if you want to route traffic based on the location of your users.
  • Delayed Routing Policy - Use when you have resources in several places and you want to redirect traffic to a resource that provides maximum latency.

For EC2, you can raise instances in several regions and redirect users using Route 53 if one of the regions does not respond for some reason. To copy an EC2 instance to another region you can use the following scenario for instance:

  • Take a snapshot of your EBS volume.
  • Copy the EBS snapshot to your desired region
  • You will also have to copy the AMI to the desired region if necessary
  • Launch a new instance of EC2 using the copied snapshot in the desired region
  • Redirect users to a new region.

Replicating Amazon Aurora MySQL DB Clusters Across AWS Regions is another option to create cost-effective solutions for reliable infrastructure. For each source DB cluster, you can have up to five cross-Region DB clusters that are read replicas. The DB cluster and cross-region read replica DB cluster can have up to 15 Aurora Replicas. This can be a very good solution considering prices of data transfer.

As our team notes, what was interesting in the same project was that the DataSync service says that it's "free" but you need to use very expensive EC2 instances to run it on.  So instead we used a much smaller instance and rsync. The cost was only $3-4 per month instead of $175-$200. The benefits that AWS lists are good, but it's intended for much, much higher data volumes and frequencies.

AWS DataSync enables you to easily and quickly move large amounts of data across your network between on-premises storage and Amazon S3, Amazon Elastic File System (Amazon EFS), or Amazon FSx for a Windows file server. DataSync eliminates or automatically performs many of these tasks, including script copy jobs, transmission planning and control, data validation, and network utilization optimization. 

The DataSync software agent connects to the network file system (NFS) and server message block (SMB) storage. DataSync can transfer hundreds of terabytes and millions of files at speeds up to 10 times faster than open-source tools, over the Internet, or through AWS Direct Connect links. DataSync can be used to transfer active datasets or archives to AWS, transfer data to the cloud, and process or replicate data to AWS for business continuity. To start working with DataSync, you need to deploy the DataSync agent, connect it to the file system, and select AWS storage resources.
Image source

DataSync in Amazon AWS

The following benefits of DataSync are valuable:

  • AWS DataSync makes it easy to move data across the network between on-premises storage and AWS. DataSync automates both the management of data transfer processes and the infrastructure necessary for high-performance and secure encrypted data transfer.
  • Fast data transfer over the network in AWS because DataSync uses a specially designed network protocol and parallel multi-threaded architecture to speed data transfer.
  • Reduced operational costs by moving data at the lowest cost with a fixed price per gigabyte of DataSync. 

By Jerry Hargrove published on the AWSGeek.

AWS DataSync

Conclusion

In conclusion, it is necessary to note that replication and recovery after a failure, as before, remain a very important part of any computer system. In cloud computing, this is especially important since we are dealing with the operation of a remote resource. A correctly configured recovery scenario after failures, competent construction of recovery from backups, and checking the system’s check will save hundreds of hours of work for your system administrators and DevOps, and companies and users will save hundreds of thousands of dollars in the project budget.

Our Luneba experts will build Disaster Recovery in AWS with cross-region replication and DataSync for you from scratch or help you develop an existing one. In connection with the mass transfer of work to home, even in industries where we did not expect this, there is a need to set up a workplace at any point where there is an Internet connection, so the need for cloud services is increasing. The reliability of cloud systems directly depends on proper data synchronization and a properly configured and tested error recovery mechanism. Luneba will configure the necessary services in the cloud as soon as possible and provide support for the developed solutions.

Related articles.