SQL Server IaaS and Retry Logic

Recently I had an interesting discussion with a customer and a question came up: should we still worry about Retry Handling in our application code when our SQL Server runs in virtual machines in an Infrastructure as a Service (IaaS) implementation?

More about the question

First, let’s review this question in more detail.  Let’s assume you currently run an application on virtual machines (or even physical machines) that are hosted at your favorite hosting provider, and you are interested in moving this application in the Microsoft cloud (Microsoft Azure). Your decision is to keep as much of your current architecture in place, and move your application “as-is” in virtual machines in Azure; depending on who you talked to, you probably heard that moving into IaaS (in Azure or not) is a simple fork lift. Except that your application depends on SQL Server, and now you have a choice to make: will your database server run on a virtual machine (IaaS), or using the platform as a service SQL Database (PaaS)?  For the remainder of this discussion I will assume that you “can” go either way; that’s not always true because some applications use features that are only available in SQL Server IaaS (although the gap is now relatively small between SQL Server and SQL Database).

What could go wrong

SQL Database (PaaS) is an environment that is highly load balanced and can potentially fail over to other server nodes automatically and frequently (more frequently than with your current hosting provider). As a result, your application could experience more frequent disconnections. Most often than not, applications are not designed to automatically retry their database requests when such disconnections occur. That’s because most of the time, when a disconnection happens it is usually a bad thing, and there are bigger problems to solve (such as a hard drive failure). However, in the cloud (and specifically with PaaS databases), disconnections happen for a variety of reasons, and are not necessarily an issue; they just happen quickly. As a result, implementing retry logic in your application code makes you application significantly more robust in the cloud, and more resilient to transient connection issues (for more information about this, look up the ).

However, applications that use SQL Server in VMs (IaaS) in Microsoft Azure may also experience random disconnections. Although there are no published metrics that compare the resiliency of the availability of VMs compared to PaaS implementations, VMs are bound to restart at some point (due to host O/S upgrades for example), or rack failures, causing downtime of your SQL Server instance (or a failover event if you run in a cluster). While VMs in Microsoft Azure that run in a load balanced mode can have a service uptime that exceeds 99.95%, VMs running SQL Server are never load-balanced; they can be clustered at best (but even in clustered situations, there are no uptime guarantees since the VMs are not load balanced). VMs also depend on an underlying storage that is prone to “throttling” (read this blog post about Azure Storage Throttling for more information), which could also induce temporary slowdowns, or timeouts. So for a variety of reasons, an application that runs SQL Server in VMs can experience sporadic, and temporary disconnections that could warrant a retry at the application layer.

Retry Handling

As a result, regardless of your implementation decision (SQL Server IaaS, or SQL Database PaaS), it is prudent (if not highly recommended) to modify your application code to include some form of retry logic; adding retry logic will create the perception that your application slowed down by hiding the actual connection failure. There are a few implementation models, but the most popular for the Microsoft Azure platform is the Transient Fault Handling Application Block (mostly used for ADO.NET code). This application block will help you implement two kinds of retries: connection and transaction retries. Connection retries are performed if your code is unable to connect to the database for a short period of time, and transaction retries will attempt to resubmit a database request in case the previous request failed for transient reasons. The framework is extensible and gives you flexibility to decide whether you want to retry in a linear manner, or through a form of exponential retry.

Note that the Entity Framework version 6 and higher include automatic retry policies; see this article for more information.

About Herve Roggero

Herve Roggero, Microsoft Azure MVP, @hroggero, is the founder of Blue Syntax Consulting (http://www.bluesyntaxconsulting.com). Herve's experience includes software development, architecture, database administration and senior management with both global corporations and startup companies. Herve holds multiple certifications, including an MCDBA, MCSE, MCSD. He also holds a Master's degree in Business Administration from Indiana University. Herve is the co-author of "PRO SQL Azure" and “PRO SQL Server 2012 Practices” from Apress, a PluralSight author, and runs the Azure Florida Association.

Print | posted @ Wednesday, May 27, 2015 6:01 PM

Comments on this entry:

Comments are closed.

Comments have been closed on this topic.