Data infrastructure is what powers the proper performance of your software. However, unexpected issues can occur making your system unavailable. That’s where the notion of a downtime, a time period when the system is not available, comes in.
Downtime may have different causes and outcomes. But one thing is certain, downtime can be extremely expensive. One hour of system downtime can cost you more than $300,000. And, apart from direct losses, it is extremely damaging to your reputation. Especially when downtime impacts the system at a large public company with millions of system users.
Imagine a lengthy downtime of a major payment system. You cannot pay with your card for gasoline at a gas station during the interruption of payment services. Or, you’re just standing at the supermarket checkout frantically looking for other payment options. In those cases, do you think of switching to another payment system? Chances are high that the answer is yes. Also, payment system company will carry great risk of a significant customer outflow.
These are just two small examples showing how damaging downtime can be to digitized businesses. They emphasize why investment in infrastructure and quality service is so important.
Common downtime causes
First, let’s discuss the reasons for the most common causes of disruptions that increase downtime, and their solutions.
Hardware disruptions
Nowadays, the easiest way to solve hardware disruption is to move to the cloud. Cloud providers handle the database and compensate for losses caused by their failure. This compensation is not enough to cover financial and reputation losses. But cloud providers take a personal interest in running data infrastructures safely and reliably. Therefore, they create cloud architectures so there are no bottlenecks and with the best decoupling practices. As a result, the failure of one service doesn’t cause a “snowball effect” that disrupts all other services.
Some companies don’t rely on cloud technologies. There are several reasons for this. For example, there may be regulatory or compliance restrictions imposed by state, finance, or military organizations, especially when it comes to storing sensitive data. In this case, the organization must duplicate all the hardware in its data center or on a remote one according to the best Geo-redundancy practices.
Disruptions during the transition to a new version
Such problems can be solved with a Blue-Green deployment practice. In this case, specialists run two identical versions of the production environment. They are synchronized and take turns to play the role of production. When one environment fails during the version transition, the other one takes its place to avoid downtime.
Human error
Sometimes, your employees may lack technical understanding or the capability to monitor various parts of the system like essential scripts and databases. Instances of human error that leads to system disruptions and downtime are quite common in database management. Actions that will help you solve the problem include:
- Employee training that will provide them with enough database management learning and SOPs as instructions for efficient database management;
- Use of the most reliable DB monitoring tools;
- Cloud migration where some database management tasks will be handled by the cloud provider.
Enhancing the SQL Server AlwaysON feature
In this article, we will focus on the specific case where the ABCloudz team helped the customer solve the problem of database hardware disruptions by migrating their data assets to the cloud.
Customer challenge
The customer was a media giant with immense data assets stored within a large data infrastructure. It involved numerous SQL Servers and Oracle database servers. They decided to solve the downtime problem by applying the SQL Server AlwaysOn feature. The latter provides for database high availability by ensuring efficient database replication. This solution increased the availability of the customer’s service.
At the time, AlwaysOn was a recent technology in the market. So, the customer did not have enough qualified specialists to manage the feature. Some of the customer’s database specialists had to learn the AlwaysON option on the fly. Due to this push to learn the feature quickly, human-related disruptions began affecting the work of system databases more frequently. This increased the risk of lengthy downtime, even when compared to downtime occurrences before the implementation of the AlwaysOn feature.
The customer desperately needed to find a solution quickly and they decided to pursue a two-step solution. The first part would include the transfer of the entire database infrastructure to the AWS cloud. The second part would involve continuous staff training aimed at improving database monitoring and management in the future.
ABCloudz handles the challenge
With the decision made to transfer the entire database infrastructure, the ABCloudz team swung into action. Our team was required to recreate the customer’s database architecture in the cloud. We also had to migrate their on-premises Oracle and SQL Server databases to cloud-based PostgreSQL and SQL Server databases, respectively.
The image below illustrates a general approach for migrating from on-prem to the cloud that we have applied in to help the customer handle their system downtime issues.
We expanded the customer’s data infrastructure in the AWS cloud. Their SQL Server AlwaysON cluster consisting of two nodes was migrated to the cloud cluster with three nodes: the primary database, the secondary database for high availability, and the passive secondary node for emergency data recovery in case of primary database disruptions. Such cluster architecture was organized according to the best SQL Server recommendations helping to reach the 99.99 availability of critical databases.
As for the Oracle database, we created a copy in the cloud. Cloud-based Oracle database was migrated to PostgreSQL. Also, we created a copy in the secondary Database Recovery database of the cloud-based SQL Server AlwaysOn cluster. The resulting infrastructure ensured data recovery for both Oracle and SQL Server databases and almost eliminated all downtime for both. See the scheme for this migration in the image below.
Downtime problem solved for the customer
We relied on our database migration expertise and Microsoft SQL Server best practices to provide the customer with an excellent cloud architecture without downtime issues. Our specialists ensured high availability and even geo-redundancy of the customer’s servers.
As a result of our effort, the customer:
- Almost completely eliminated the downtime threat by reaching the 99.99 database availability rate
- Received three database nodes in the cloud under a single license (instead of paying for three separate licenses)
- Migrated part of the customer’s data assets from Oracle to SQL Server and PostgreSQL, which enabled them to cut costs for Oracle support and monitoring, which was rather expensive to them
- Realized the benefits of continuous support and maintenance as the integral parts of AWS service offer.
After the migration, the customer deployed best training practices to enhance their employees’ database monitoring expertise. This enabled them to ensure the continuous monitoring of SQL Servers long term.
In Conclusion
To sum up, downtime costs you money and reputation. Solutions like SQL Server AlwaysON alone won’t provide you with 100% downtime reduction guarantees because the right architecture and database expertise is required. That’s where ABCloudz can help. We handle the most challenging database migrations and create reliable data infrastructures in the cloud with meticulous consideration of best practices. Contact us and see how we can help you with the ultimate solution to your downtime problems.