Some believe that backups are a routine that should be set and forgotten about. Such people also believe that ransomware attacks, downtime caused by hardware failures, and human mistakes that lead to data loss are things that happen to people in the news, or topic starters on Reddit in other words, to someone else.
We all know that these kinds of beliefs are a recipe for a disaster: Lost data, lost business opportunity, lost productivity and, in the end, lost money. Backups can break down at many stages, the recovery may not go as fast as expected or, simply, your infrastructure might not be ready for the recovery operation.
To avoid all that, you need to create a comprehensive plan of how you test your backups. Our guide will help you to build a robust action plan to make sure that your backed-up data is recoverable.
Create a List for Your Backups
First and foremost, you should make sure that all backup routines are documented. Why do you need to create a list if you already have a system administrator who knows all the routines that are in place (or if you know them yourself)? There are two main reasons:
- The person with direct knowledge of your backups may be absent or may leave your company, thus leaving you without a full understanding of your backup infrastructure. You, or your new system administrator, will spend ages just trying to understand how things work. Needless to say, in the event of any disaster, the lack of such knowledge will lead to downtime.
- Even the best professionals may mix up or forget specific technical details once things have gotten out of control and the situation is stressful.
So, you should create a list that includes all backups that are run, their types, retention settings, and the hardware that you need for the backup or recovery processes. Don’t forget to include your recovery time and recovery point calculations. These will help you to test your backups later and evaluate whether your backup plans are sufficient.
Backup and Recovery Tests
As we have already mentioned, backup and recovery are two different processes. And both should be tested in order to be sure that nothing goes wrong when you need to get your data back from the storage.
When testing backups, you should create a map of every piece of infrastructure and data that you need to back up. Here are the basic checks that you should perform regularly:
- Check your backup infrastructure. If you have a local backup infrastructure, check the health of your SMART(Self-Monitoring Analysis and Reporting Technology) drives and your NAS(Network Access Storage) devices. If you back up to the cloud, check that all files are consistent in the storage.
- Check the consistency of your data. Some backup solutions have a feature to check the consistency of your data on the machine and in the storage, in order to ensure data integrity.
- Check that all parts of your infrastructure are covered. You have previously listed all the backup plans you run, but what if you missed something vital? Audit your infrastructure and make sure that everything critical for the company is being backed up.
- Check security settings. Have you enabled data encryption in transit and at rest? Do you need to encrypt filenames? Lastly, who has access to your backup storage? As a rule of thumb, you should use the rule of the least minimal privilege for your access policies.
Here are the most common rules for tests and checks that you should adhere to in order to be sure that you can recover anything, any time:
- Test in accordance with recovery time and recovery point estimations. This guides you how fast you should recover and how much data you can afford to lose in the event of downtime. Defining these parameters:
- RTO (the Recovery Time Objective), is a metric that defines the time to recover your IT infrastructure and services following a disaster to ensure business continuity.
- RPO, or Recovery Point Objective, is a measure of the maximum tolerable amount of data that the business can afford to lose during a disaster.
- Define the scope of your tests. You should break down your recovery testing from the simplest to the most demanding, and make sure that you regularly test each of these, including single file recovery, single machine or server (including and excluding the infrastructure), recovery tests of the interconnected parts of your network and infrastructure; and lastly, test disaster recovery in various scenarios.
- Define the schedule for your tests. You should schedule your tests for two reasons. First, you should do it regularly, to be sure that you catch up with all the changes in your infrastructure. Secondly, you should make sure that your tests won’t affect your business operations, which means scheduling them outside of business hours.
- Document everything. Every single part of your tests should be documented, including the schedule, the scope, the exact tests and their results, your RTO and RPO estimation, people authorized to perform the tests, and other team members that you might need to notify regarding the tests.
Backup and recovery tests are not mere routine and dull exercises. Although they do not sound like the most enjoyable activities for the IT professional, they are designed to make sure that you can bring back every piece of your infrastructure/data center in the event of any disaster, human fault, or failure. And if you take a look at your interconnected, partly on-prem and partly cloud-based, complex infrastructure and network, you will immediately observe how fragile all this complexity is.
Create a great testing environment and make sure that you have covered everything that is vital to your business, thus ensuring you a good night’s sleep. We are here to help you get “good night’s sleep” or “enjoy your time off/holiday”, Talk to Us.