There are application workloads running on SQL Database Server in On-Prem with two nodes across two different sites within the same region (primary and secondary sites) with VM-to-VM synchronous replication (Aka – Replicated VM) to provide HA with no single point of failure.
These application workloads may get planned for EC2Rehost native Amazon cloud migration but does not support or compatible with Amazon RDS (especially COTS applications). Also, the SQL Server cannot be hosted on EC2 between two AZs since EC2 does not support VM-to-VM replication.
Still, single node of SQL Server on EC2 is feasible but will be good for Non-Prod environments, ideally.
Hence, how do we deploy highly available SQL Server solution on Amazon EC2 for business-critical application workloads in Prod?
What and So What?
In this document, we try to compare and contrast the high available solution for Microsoft SQL Server on Amazon EC2 and help to understand the nature of these trade-offs, and the cost and complexities of implementing HA for SQL Server workloads on AWS.
Two architectural solutions can be provided based on two technology methodologies:
Option 1 – SQL Failover Cluster Instance (FCI) with Multi-Zone Amazon FSx for Windows Server
Option 2 - SQL Server Always On (AG)
SQL Failover Cluster Instance (FCI) with Multi-AZ Amazon FSx for Windows Server
There are 2 nodes spanning across 2 AZs.
In this scenario, SQL Server FCI relies on shared storage being accessible from all the nodes participating in FCI.
Amazon FSx (Multi AZ) for Windows File Server to simplify deploying and using shared storage to host databases.
FSx for Windows File server provided fully managed, highly reliable and scalable for file storage that is accessible over SMB protocol.
With Amazon FSx, shared file storage can be managed easily that automatically replicates the storage synchronously across two Availability Zones.
Supported in SQL Server Standard Edition and Enterprise edition.
Protection at the SQL Server instance level for all the users and system databases.
With Amazon FSx as shared storage, we can use fully managed and fully replicated Multi-AZ storage.
SQL Failover Cluster Instance (FCI) with Multi-AZ Amazon FSx for Windows Server
There are 2 SQL standalone instances with 2 different Availability Zones (AZs)
In this scenario, AZ1 act as primary site and AZ2 act as secondary site.
Windows failover cluster feature to be enabled on both the nodes and should be part of the same cluster (WSFC).
Since, there are 2 nodes spanning across 2 AZs, it is configured with synchronous commit and Automatic failover while it guarantees no data loss. But in this mode, network latency is unacceptable.
We can avoid this by using an asynchronous commit and does not wait for the acknowledgement before committing the transaction. That is why the recommended approach for HA/DR is using asynchronous commit mode.
The SQL Server Always On availability groups feature available in SQL Server 2012 and later versions.
A single endpoint for applications to connect through AG listener which simplifies failover.
Secondary replicas can be used for reading scaling.
SQL Server Always On (AG)
Option 1 vs Option 2 – Comparative Study
SQL Server Compatibility
Option 1 (FCI with FSx) - Supported in SQL Server Standard and Enterprise edition.
Option 2 (Always ON) - Supported in SQL Server Enterprise edition. Limited support in Standard edition in single AG and single database.
Storage Compatibility
Option 1 (FCI with FSx) - Amazon FSx is fully managed shared file storage that automatically replicates the storage synchronously across two Availability Zones. (No more copies of same database required).
Option 2 (Always ON) - Two or more copies of the same databases are synchronized across multiple AZs.
Features Compatibility
Option 1 (FCI with FSx) - Read and write traffic (both) will be routed to single database. Supports Distribution Transaction Coordinator (DTC), especially for legacy On-Premises application. New user logins or any changes in the existing user logins (SQL Authentication/Windows Authentication modes) are done in shared storage and no manual intervention is required.
Option 2 (Always ON) - Secondary replicas can be used for reading scaling in Enterprise edition. This feature is not available in Standard edition. Does not support Distribution Transaction Coordinator (DTC). New user logins or any changes in the existing user logins (SQL Authentication/Windows Authentication modes) in primary replica needs to be manually updated in secondary.
Availability (RPO, RTO)
Option 1 (FCI with FSx) - A SQL Server failover cluster requires all the databases be placed on shared storage so RPO is theoretically zero. RTO typically, an automatic failover and the duration depends on the amount of data modified by the active transactions at the time of the failover.
Option 2 (Always ON) - In asynchronous commit mode, we have to force the failover, thereby accepting the risk of minimum data loss.
Relative Cost
Option 1 (FCI with FSx) - The SQL failover cluster solution requires SQL Server Standard or Enterprise edition to run. Standard edition saves significant cost vs Enterprise edition based on the licensing).SQL server failover also requires a shared storage for system and user databases which also increases complexity and TCO.
Option 2 (Always ON) - Fully featured AG requires SQL Enterprise edition (increases the licensing cost significantly based on the licensing) and increases TCO and complexity.