Data latency in Always-On Availability Groups in Synchronous VS Asynchronous commit

One topic that has been discussed within the team is if we should use the asynchronous read replicas that we have configured to the load balancing mechanism of an availability group or not.

In order to answer this, we need to really understand how always-on Availability groups work. So, let’s dive deep into the subject.

Synchronous-Commit

This mode emphasizes high availability over performance, at the cost of increased transaction latency. Under synchronous-commit mode, transactions wait to send the transaction confirmation to the client until the secondary replica has hardened the log to disk. The secondary database remains SYNCHRONIZED as long as data synchronization continues. This guarantees that every transaction that is committed on a given primary database has also been committed on the corresponding secondary database.

Asynchronous-Commit

Under this mode, the secondary replica never becomes synchronized with the primary replica. Though a given secondary database might catch up to the corresponding primary database, any secondary database could lag behind at any point. Asynchronous-commit mode can be useful in a disaster-recovery scenario in which the primary replica and the secondary replica are separated by a significant distance and where you do not want small errors to impact the primary replica or in situations where performance is more important than synchronized data protection.

In summary with asynchronous commit mode, the application receives confirmation that the transaction committed after the last log record is flushed to the primary replica’s log file. This improves performance because the application does not have to wait for the log records to be transmitted but it opens up the AG to the potential of data loss.

How Synchronization Works in SQL Server on AG

  1. On receiving a transaction from a client, the primary replica writes the log for the transaction to the transaction log and concurrently sends the log record to the secondary replicas.
  2. Once a log record is written to the transaction log of the primary database, the transaction can be undone only if there is a failover at this point to a secondary that did not receive the log. The primary replica waits for confirmation from the synchronous-commit secondary replica.
  3. The secondary replica hardens the log and returns an acknowledgement to the primary replica.
  4. On receiving the confirmation from the secondary replica, the primary replica finishes the commit processing and sends a confirmation message to the client.
    • In summary we can say with synchronous commit mode, the application does not receive confirmation that the transaction committed until after the log records are hardened on all synchronous secondary replicas. This is how AGs can guarantee zero data loss. Any transactions which were not hardened before the primary failed would be rolled back and an appropriate error would be raised to the client.
    • On the other hand, in asynchronous commit mode, the application receives confirmation that the transaction committed after the last log record is flushed to the primary replica’s log file. This improves performance because the application does not have to wait for the log records to be transmitted but it opens up the AG to the potential of data loss.
    • IMPORTANT NOTE
    • If the primary replica fails before the secondary replicas harden the log records, then the application will believe a transaction was committed but a failover would result in the loss of that data.
    • On the other hand, in asynchronous commit mode, the application receives confirmation that the transaction committed after the last log record is flushed to the primary replica’s log file. This improves performance because the application does not have to wait for the log records to be transmitted but it opens up the AG to the potential of data loss.

Demonstration

Thus far, we have explained in theory at a basic level, how the system works with the write mechanism. How about reads in secondary replicas? Are they 100% reliable? Ok let’s create a demo for this case:

First let’s create a simple table to use in our demo:

In this demo we have one primary (SQL02) and two secondary replicas one with Synchronous (SQL01) commit other with Asynchronous commit (SQL03)

Let’s create a table on the primary node

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[DBA_Always_On_Read_Latency]') AND type in (N'U'))
DROP TABLE [dbo].[DBA_Always_On_Read_Latency]
GO
CREATE TABLE [dbo].[DBA_Always_On_Read_Latency](
	[Col_1] [int] IDENTITY(1,1) NOT NULL,
	[Col_2] [int] NULL,
	[Col_3] [nvarchar](200) NULL,
PRIMARY KEY CLUSTERED 
(
	[Col_1] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

Now after creating the demo table, we have created a linked server on the primary node to the two secondary nodes

Image-1

Now we have to build a load script to insert data in the table in the SQL02 (Primary)

DECLARE @cnt INT;
SET @cnt = 1;

WHILE (@cnt <= 50000)
BEGIN
INSERT INTO DBA_Always_On_Read_Latency( Col_2, Col_3) VALUES (@cnt, REPLICATE('a', 60))
SET @cnt= @cnt + 1;
WAITFOR DELAY '00:00:00.005'
END

As a last step we need to build a script that will allow us to get the data from the 3 servers, make sure that this script runs in another query window and the property ApplicationIntent=ReadOnly is enabled.

--Primary
SELECT MAX(Col_1), 'SQL02'
FROM [TestDB_PM].[dbo.DBA_Always_On_Read_Latency]

--Synchronous replica
SELECT MAX(Col_1), 'SQL01'
FROM [SQL01].[TestDB_PM].[dbo].[DBA_Always_On_Read_Latency]

--Asynchronous replica
SELECT MAX(Col_1), 'SQL03'
FROM [SQL03].[TestDB_PM].[dbo].[DBA_Always_On_Read_Latency]

Now execute the load script on the primary and in another window execute the script above, the results should be something like this ones:

Image-2

Now that we know how writes work, let’s understand how reads work. For reads it is a little different; as we can see in the example above where querying the primary has returned 3148 rows, the sync replica 3104 and 3116 in the async. We can confirm that reads do not have the same behavior as writes.

In summary, the primary replica sends the log records of changes on primary database to the secondary replicas. On each secondary database, a dedicated redo mechanism applies the changes to the log records. On a read-access to the secondary database, any given data change does not appear in the query results until the log record that contains the change has been applied to the secondary database and the transaction has been committed on primary database.

I’ve found an interesting paragraph in Microsoft documentation where they say the following statement:

‘Implementing read-only access to secondary replicas is useful if your read-only workloads can tolerate some data latency. In situations where data latency is unacceptable, consider running read-only workloads against the primary replica.

That’s it you can find all the info in this Microsoft document

https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/availability-modes-always-on-availability-groups?view=sql-server-ver16#SyncCommitAvMode