Although server farm virtualization is far from being a new concept, the introduction of a Virtual Machine Monitor for servers based on Intel Architecture by VMware Inc. first, and other vendors subsequently, has made it in the last few years a very hot topic. Many firms have just started to introduce virtualization, and some have not even yet, but many others are already coping with a new set of specific issues due to virtualization. In this article we shall explore two related issues some people believe will play a significant role in the near future: VM stall and VM sprawl.
IT departments have often decided to start the virtualization process for their server farm by focusing on servers belonging to one of the following categories:
- Development servers.
- Test servers.
- Servers with low utilization rate.
- Old servers no longer supported by the HW vendor but whose application stack is still required.
- Unsupported operating systems (e.g., Windows NT 4.0).
This choice was grounded on several good motivations like the following:
Mitigation of the of the unsatisfactory performance risk. Excessive latency due to the virtualization layer overhead may produce unexpected performance problems whose business impact may be acceptable for development and test systems, but unacceptable for production systems. Moreover, generally speaking, the lower the current utilization, the lower the risk of having performance problems after virtualization.
High HW maintenance savings. Financial savings due the reduced number of physical servers are larger when the consolidation ratio is larger; that makes servers with low utilization ideal for server virtualization.
Mitigation of the support risk. At the beginning several SW vendors were reluctant to support their SW in a virtualized setting (actually some still are). Several IT departments decided to ignore this reluctance for development and test servers but they could not definitely do the same for production systems.
Over the time Virtual Machine Monitor scalability has improved and several risks play now a minor role. As a consequence it is quite natural to wonder whether the scope of the virtualized server farm can be further expanded. Some IT departments are reluctant to go along this way; a fact that has been described elsewhere with the expression VM stall and classified as an important issue that needs to be tackled. We must actually first wonder whether this is an issue. Firms are not introducing, or at least they should not introduce, virtualization for the sake of virtualization but only because virtualization delivers significant benefits with minimum risks. If not, we are no longer in the realm of serious business investment.
Let consider for instance the decision of whether virtualizing or not production servers. Would the benefits offset the risks? Clearly the answer to this question depends on many factors that need to be carefully examined. If you virtualize 100 low-utilized uniprocessor servers you should expect to have significant HW maintenance savings; if you virtualize two highly-utilized 8 core servers you should not; and you should expect to run the risk of having excessive latency particularly if the workload is I/O intensive.
We all know that there are benefits of virtualization that can make attractive to virtualize a server even though no consolidation benefit is accrued. If you put a single VM on top of a Virtual Machine Monitor you do not have any consolidation benefits, but you still have the possibility to move your VM to another server to perform HW maintenance without having to rely on the complex technologies like High Availability clustering.
In conclusion, an accurate analysis of business benefits, costs and risks is more and more important to assess whether further expansion of the virtualized server farm can be useful.
While some people are worried because server farm virtualization is not progressing at a faster pace, others are worried for the opposite reason that virtualization is making much easier to deploy new operating system instances (a phenomenon described by some as VM sprawl). Indeed in the past to instantiate a new OS instance first required to go through the long process of procuring a new HW box on which to install that OS instance. With Virtual Machine Monitors everything is far faster: with just a few clicks you may have your own OS instance up and running.
Well, that may look wonderful but it actually paves the way to a whole new set of issues: the new OS instance has to be maintained, SW licenses have to be paid, and so on. Was that OS instance really required? And if so, who will take care of discarding it when it is no longer required? In short server farm virtualization has freed us from the issue of server sprawl but it has replaced it with the new issue of OS instances sprawl.
This phenomenon was expected and SW vendors are already offering products that help to put discipline in the VM instantiation process so that the VM sprawl risk is mitigated. Preventing the VM sprawl by creating a strong control process has unfortunately the side effect of reducing the flexibility introduced with virtualization. An alternative approach is to adopt a weaker control process and use regularly tools to check whether VMs are actually being used. We have decided to follow this strategy by adding new functions to our virtual servers monitoring tool ( WASFO Data Collector). If a VM has been little utilized for long time, or even worse if it has been off for long time, it may well be that nobody actually needs it any more. We can automatically collect information like VM uptime, utilization and others that provide good hints of whether the VM should be discarded.
VM stall and VM sprawl are phenomena that albeit producing opposite consequences can co-exist in a datacenter. A firm may experience VM sprawl in currently virtualized server farm (i.e., creation of useless VMs) and be reluctant to further expand the scope of that server farm by virtualizing other servers.