What are you looking for ?
Infinidat
Articles_top

What’s Lurking in Your Virtualized Data Center

By Krishna Raj Raja, CloudPhysics

What’s Lurking in Your Virtualized Datacenter?
krishna  By Krishna Raj Raja, member of founding team, CloudPhysics

Datacenters are fraught with operational hazards, and because virtual infrastructure is extremely complex and dynamic, many vulnerabilities go undetected and can be onerous to find. On the other hand, known hazards are often ignored because administrators underestimate or don’t understand their scope, severity and risk.

For example, we recently spoke with a well-known company who told us they brought down a large portion of their datacenter during an upgrade because of incompatibilites between their PCIe cards and ESX. A seemingly trivial issue resulted in hours of costly disruption to their business.

In the ‘spirit’ of Halloween, we created an infographic about what may be lurking in your datacenter. Before I get into that, I also want to point out some tricks and a treat from CloudPhysics to help you get rid of your spooks:

  • Tricks: I’ve written a Halloween Cookbook for CloudPhysics users that compiles tips and tricks for using our analytics to roust hidden goblins from your environment.
  • Treat: Halloween Special: Let us find your spooks for you! Our data scientists will produce an Insights Report highlighting where hazards lurk in your datacenter. It’s free and only takes about 15 minutes of your time. Our analytics will do the rest. You can request this free report.

Here’s the infographic, followed by an explanation of what we’ve found. The source for these stats is the CloudPhysics global data set, which has more han 50 trillion points of machine metaata from datacenters of all shapes and sizes around the world.

CloudPhysics,Krishna Raj Raja

Availability: Application Downtime
Applications are the lifeblood of an organization: when an application slows down or goes down, productivity – and often profitability – is severely compromised. Most organizations with a virtualized datacenter rely on hypervisor HA features to keep applications running. That makes it all the more surprising to find that 41% of VMware HA clusters do not have admission control enabled. Without admission control, HA cannot guarantee all the VMs in the cluster will be successfully powered-on in the event of a host failure.

Similarly, running out of disk space can wreak havoc and cause application downtime. Our analysis shows that every week 43% of organizations risk application downtime due to a ‘disk full’ condition in the guest.

Software bugs are another source of downtime. Often vendors discover and document these issues in knowledge base articles. Every month VMware and other vendors release or update about 200 knowledge base articles and roughly 10 of these are for critical data loss or server outage issues. The sheer number of kilobyte articles can overwhelm IT teams that have a backlog of critical tasks and projects. As a result, there are scores of issues that are simply ignored for lack of time, but are operational time bombs waiting to go off.

Another big problem for virtual datacenters is contention for storage resources, which causes application latency and unresponsiveness. Bully VMs – those that consume more than their fair share of shared resources – are hard to detect and hazardous. On average, each bully victimizes 5 other VMs, starving them of the resources they need to run properly. Contention is one of the toughest problems to pinpoint and troubleshoot because of the way virtualization scrambles storage I/O.

Utilization: Dead Space and Zombie VMs
Storage is the biggest cost in the datacenter and storage growth threatens to take on a life of its own. Today, lots of folks take comfort in using thin provisioning (either at the virtualization layer or at the storage array) thinking that it will reduce storage usage. Yet we found that 26% of the disk space used by the VM is dead space. What exactly is a dead space? It is the space previously allocated, but currently deleted and no longer used. Dead space exists at the virtualization layer as well as in the guest OSs. Newer version of ESX and some newer storage arrays can manage dead space at the virtualization layer but the dead space in the guest OS is invisible both to the virtualization layer and to the storage array. Why you should care? Because you can easily reclaim this space.

Zombie VMs are another source of wasted space: 16% of VMs are powered off or suspended and never used again. They are living dead in your datacenter. Why living dead? Because these VMs are not active and do not consume CPU or memory resources but they occupy valuable disk space on expensive storage arrays.

Vulnerabilities: Heartbleed and ShellShock
Organizations are constantly at risk from security vulnerabilities such as the well-known SSL Heartbleed and more recent ShellShock security bugs. Major vendors such as VMware quickly released patches for both issues, however patch adoption is surprisingly slow. Take for instance the SSL Heartbleed issue, which was patched in April by VMware. We examined our global dataset in July and discovered 50% of vulnerable ESX hosts were unpatched. After communicating this finding to our users and releasing a method for them to determine their vulnerability, we ran the same analysis this week, and found that 22% of ESX hosts remain unpatched. While that is a substantial improvement, many hosts remain vulnerable.

Shellshock is more recent. This issue affects all Linux VMs including virtual appliances, and older versions of ESX hosts (4.1 and below). In our global dataset, we found 27% of all VMs run Linux and are therefore exposed to Shellshock. We also found 7% of ESX hosts are still running ESX 4.1 classic version and below, which means they are also exposed.

End of Life: Unsupported Software
Running unsupported software is inherently risky. One commonly used OS, Windows 2003, is hitting the graveyard when it reaches the end of support life next year. Our analysis found that Windows 2003 accounts for 25% of the total Windows VMs running in the datacenter. Further, 5.4% of the Windows VMs run Windows XP, which already reached end of support life this year. In addition, over 6.2% of ESX hosts are running ESX version 4.1 and below, versions which have already reached end of VMware support life.

Summary
These are just a few of the issues that could be haunting your datacenter, and the thought of trying to find all these creepy crawlies may seem downright frightening.

But don’t be scared, be prepared. Call in the experts at CloudPhysics to show you how our data-driven insights can help you quickly and easily find and exorcise operational hazards in your virtual datacenter.

Articles_bottom
AIC
ATTO
OPEN-E