Every year, I hear dozens of horror stories from customers about server and network outages and the resulting losses of data and productivity. For a brief moment, some network users may find an outage a bit charming, as older colleagues lean back and reflect “This is the way it was back in the seventies – no Internet, no e-mail, not even a fax machine. Just typewriters, phones, and Uncle Sam’s mail.”
Such nostalgia is invariably short-lived, though. Today, we and (more importantly) our customers expect immediacy of access to information, applications and one another. Even small enterprises are increasingly online, mobile and Web 2.0-driven, to the point where information technology (IT) is no longer just a business tool. It is business – the heart and the circulatory system through which most transactions flow. If your IT systems fail, your daily operations follow – and if the outage lasts too long, your business may fail.
So, in tough economic times as well as in prosperity, SMBs should ask themselves how they can create a high availability infrastructure that responds robustly to new-age business challenges and disruptions. Server clustering and data mirroring can play an important role in implementing high availability. They can also serve as a cornerstone to effective business continuity (BC) and disaster recovery (DR) strategy and – good news – they can be very affordable.
Clustering and Mirroring for High Availability
Server clustering can be driven by any of several objectives: creating scalability, load balancing, and of course, increasing system availability. Clustering for high availability allows automated failover between servers in the cluster, providing close monitoring of applications and all their components, including operating systems, server hardware, networking and storage. The clustering software determines when to perform a failover by continually checking each application’s “heartbeat” signal, and if one system has a problem, the application on another server in the cluster takes over. To the outside world, the cluster appears to be a single system, but intelligent redundancy within it creates high availability.
Application availability is only half of the IT requirement. The data that applications create and use must be equally available in order for business to continue. Disk mirroring is the recording of redundant data on two partitions of the same disk or two separate disks, for fault-tolerant operation.
Mirroring is a central component in the highest level of data protection and disaster recovery, and it differs from ordinary backups, which simply replicate a complete volume at specific points in time, often for use in testing. Mirroring creates dynamic, real time copies of data volumes, which further reduces the amount of data at risk of loss. Mirroring can be done using Level 1 Redundant Array of Independent Disks (RAID) features. RAID can be provided through the motherboard or a controller card, or built into a dedicated disk array.
Benefits and Challenges
Server clustering provides three key benefits:
- High Availability: Designed to avoid a single point of failure.
- Scalability: Computing power can be increased by adding more processors or computers.
- Manageability: Appears as a single-system image with a single point of control.
While clustering provides significant benefits, IT managers must also be cognizant of related challenges. Further, a clustered environment can be complicated to manage – especially if your staff is new to this technology. If IT is unable to perform basic checks, such as confirming whether a patch has been applied correctly to all nodes in the cluster, it could cause serious outages. Finally, if the SMB is using Service-Oriented Architecture (SOA), where applications are working in tandem, it will require solutions that understand the dependencies.
The benefits of data mirroring:
- Protects Against Data Loss: Added redundancy offers backup in case of hardware failure.
- Disaster Protection: Offers quick recovery against site- and region-wide incidents.
- Individual Disk Access: Each disk or set of disks in the mirror can be accessed separately for reading purposes.
Although mirroring is essential to ensuring high availability of data, it’s not a complete data protection solution by itself. Mirroring is ineffective if the data is corrupted. For example, a virus might corrupt or erase data, or a user might accidentally delete data. This is why data protection in the form of regular backups is also necessary for file-level protection
Advice for IT
When SMBs decide to implement clustering and mirroring as part of a healthy high availability solution and BC/DR plan, it should be managed seamlessly to maximize the benefits. Consider the following:
- It’s all about the bucks: Systems that provide data protection and recovery in an hour, day or week are less expensive than ones that deliver business-critical service, which should experience close to zero downtime. You and your business’s key managers need to look at all of the business functions and processes that are dependent on IT. Then ask, ‘What is the financial impact on each of these services if IT goes down?’
- Always start with the application: A critical first step is determining which applications require 24×7 availability. To help with this task, SMBs can build a dependency tree for each application that should be available. Make a list of what makes the application work (e.g., switch, server, desktop, etc).
- RPO, RTO: Determine your business’s recovery point objective (RPO) and recovery time objective (RTO). The RPO, in effect, is the amount of data loss your business can sustain, while the RTO is the amount of time you can afford your systems to be down – the maximum tolerable outage. If a disaster occurs, how much time can your business afford to lose? An hour? A day? A week? This depends on the nature of your particular business and your owners’ or managers’ appetite for business risk, so it’s important that IT alone does not decide what the RPO and RTO are.
- Five nines: Most SMBs should strive to achieve five nines reliability, which means systems are available 99.999 percent of the time. Not all businesses need or can achieve five-nines reliability – perhaps four or three nine availability is adequate in some cases. The decimal point differences may seem like hair splitting, but they reflect significant duration or frequency of outages, which your employees and customers will find maddening. Think about it this way – a system that is 99.999 percent available to a business that operates only 40 hours per week (and most operate more hours than that) is not available for two minutes per year. One that is 99.99 percent available is not available for 20 minutes per year. One that is 99.9 percent available is not available for two hours per year – and of course, management doesn’t get to decide which two.
- How much does two hours of down time matter to your business, especially if you can’t pick and choose which two hours you lose? That question demonstrates the Russian Roulette of ignoring system availability in your business plan.
- To outsource or not, that is the question: What is the level of service you’ll need? Is there an in-house IT expert who has the bandwidth to manage server clustering and disk mirroring? If not, consider bringing in your solutions provider to do it for you, or even consider hosted services to support your business-critical infrastructure.
- Don’t forget BC/DR: As clustering and mirroring are part of a healthy BC/DR plan, you should test your systems regularly. The frequency with which an organization can test depends on the DR budget, but as a benchmark, SMBs should test no less than twice annually. If it is impossible to test the entire system, periodically test the most critical applications and systems.
According to Gartner, improving availability will help to reduce direct loss of revenue and loss of future revenue, revenue loss through failure to meet contractual obligations, productivity loss or overtime costs, and damaged reputation. Remember, your system is your business, and your business is your system.