Best Practice: Always 3 Nodes Minimum in a Cluster
Apr 09, 2019
There are three reasons to cluster servers: high availability, load balancing, and high-performance computing (HPC). The most common use case for clustering is high availability. For a long time, efforts have been made to create clusters from just two servers and though it may seem counter-intuitive, a two-node cluster is more complex to set up correctly and manage than a three-node cluster. Those who have experienced the complexities and gotchas of two-node clusters know that a minimum of three servers is the best way to create a cluster that is both reliable and easy to manage.
Performance and Cost-Effectiveness
Consider the resources required to stand up a two-node cluster. You may decide to split the workload between the two nodes of the cluster (called active/active clustering), but since either node could fail, the other node has to be able to take on all of the cluster workloads (VMs and data) at any time. That means each node needs enough resources to run all of the combined workloads plus some overhead for maintaining healthy operation and allowing additional growth. This means all of the CPU, RAM, and storage required to run everything.
In a two-node cluster, your actual compute resource usage will always need to be less than 50% (probably realistically less than 45% so you have at least 10% available per node.) of the available resources in the cluster. Compare that with a three-node cluster where you can use up to 67% or more in some cases and still absorb a full node failure. If a node fails in a three-node cluster, only one-third of the cluster resources are gone. When the workloads from the lost node failover to the two remaining nodes, they can be split between the two remaining nodes and the two remaining nodes can share the resource burden more easily than a single remaining node.
When a node fails in a three-node cluster, you are left with only two nodes like a two-node cluster, however, the likelihood of another node failing before you restore the lost node is so small you don’t have to account for it in resource allocation. You only ever have to account for one of the three nodes failing at a time. Spreading the resources across three nodes rather than two means that each node is never expected to run the entire cluster workload and the server specs won’t need to be as high to maintain acceptable or equivalent performance, saving you on the cost of the servers purchased.
One other performance factor to consider is the performance cost of rebuilding a failed node which is particularly acute in a two-node cluster. When one of the two nodes fails, the other node not only has to take on all of the cluster workloads until the other node is restored, but it must also rebuild the restored node when it comes back online, sending data until that node is a fully redundant partner again. This puts an additional performance load on the active node of the cluster which is running all of the cluster workloads until other node is fully recovered. In a three-node cluster, the repair process is less dramatic since the remaining two nodes continue acting as a cluster even when the third node fails. The storage can still maintain redundancy and when the third node is restored, there is much less intense rebuilding needed to rebalance the data and workloads back across the cluster.
Clusters can also provide the ability to apply updates to individual nodes without taking cluster workloads offline by first moving those workloads to another node of the cluster. This is called a rolling update since workloads are rolling from node to node as updates are taking place. In a two-node cluster this process is always limited to moving all workloads to a single node when updating.
In a true failure/failover situation, which is less common than updates, you are more likely to live with downgraded performance as forcing a single node to run every workload may stress its resource limitations. Are you also willing to live with degraded performance each time you want to update the cluster with new software/firmware? Probably not. So even beyond the need to spec out each node big enough to run every workload in a failover scenario, you have to consider how big the specs need to be to comfortably run all the workloads every time you need to apply updates. The bigger the spec, the higher the cost.
In a three-node cluster, because you have two other nodes to split the workloads during failover or updates, you can provide reasonable performance when a node is offline for maintenance at a lower spec and a lower cost.
Based on what has already been stated on the cost-effectiveness, you may be asking, “What if I use a shared storage appliance like a SAN or NAS?” Well, there was once a time where this was the only way you could create a high availability cluster, but times have changed. If you use a shared storage appliance, you are just moving the costs onto a separate piece of hardware. Is your storage appliance also clustered for high availability or is it now a single point of failure for the cluster?
The storage should also be clustered for high availability and now you then have the same 2-node cluster considerations. The exception is that with storage, you always want at least a replication factor of 2 and that means your usable storage will always be less than 50% of raw storage no matter how many nodes you have. You don’t gain better resource efficiency with three nodes vs. two nodes but you do gain better high availability.
Shared storage appliances for clustering are an antiquated idea in the modern world of software-defined storage. By using software-defined storage and storage resources on the servers you are clustering, it is much easier to spread the storage across three or more nodes in a cluster than trying to cluster multiple storage appliances separately. It is much more cost-effective as well.
Failover Negotiation and Split-Brain
In a two-node cluster, it is more difficult for the cluster logic to determine what to do if there are communication (network) issues rather than a node failure. If the cluster nodes lose communication with each other, how does a node know whether or not to failover the workloads from the node it cannot communicate with? Typically, this is handled through a cluster witness of some kind. A cluster witness (or multiple witnesses in some cases) is a third point of contact that, in theory, is still contactable by one or both of the nodes and it can arbitrate the cluster status. The witness must live outside the cluster so it becomes one more object to manage in your network in addition to the cluster.
As said, this works “in theory” but in reality, it is more complicated. Unlike a true third node, the cluster witness is not really a fully active member of the cluster and its assessment of the state of the cluster can also be hampered by communication issues. A bad witness implementation could potentially put the cluster into a dreaded split-brain scenario where both nodes begin running all workloads and once this happens, it is a nightmare to recover from. Correctly implementing a good witness/arbitration system for a two-node cluster is complex and this article by Andrew Beekof on clusterlabs.org explains these complexities in more detail if you are interested in diving deeper.
Having a minimum of three nodes can ensure that a cluster always has a quorum of nodes to maintain a healthy active cluster. With two nodes, a quorum doesn’t exist. Without it, it is impossible to reliably determine a course of action that both maximizes availability and prevents data corruption. Nothing is infallible, of course, and even a three-node cluster can be taken offline by network issues and loss of quorum. If that were to happen, however, there are likely problems occurring that are bigger than just the cluster going offline and the probability of getting into a split brain scenario with a three-node cluster is practically zero.
It is really time to leave the concept of a two-node cluster behind and embrace the best practice of a three-node minimum. I have been in the business of creating high availability solutions for over 20 years and have seen the complexities and gotchas of two-node clusters up close and intimately over that time. Do yourself a favor and always use a minimum of three-nodes and you will sleep better at night knowing your workloads and data are running on a reliable and easy-to-manage cluster.
Scale Computing unveils new “Predictive Ordered Write” storage technology
Scale Computing unveils new “Predictive Ordered Write” storage technology
Apr 01, 2019