In 2005, twenty servers running a critical application at the busiest hospital in Illinois were consolidated into one physical server. Instead of reaping the benefits of consolidation, disaster struck. (Its name will go unmentioned but you’ll find it out if you read on.)
Hospital management anticipated the usual benefits that virtualization brings:
- Easier administration. Caring for one server is easier than caring for 20.
- Greater confidence in the IT infrastructure. The storage that accompanies virtualization is likely to be more reliable than the distributed storage of standalone servers. This reliability is a product of newer technology and a more efficient design.
- Peace of mind. Virtualized storage complements or fits well with its business continuity features. VMware’s VMotion, for instance, empowers the human administrator to migrate virtual machines to backup servers in real time.
(Click here to learn why JCAHO accreditation is important to a hospital.)
How did this happen?
After the virtual environment was created, the IT staff added standard security controls to each new virtual server. This was fine as this is standard procedure. However, some of those virtual servers lay dormant. In fact, it appears that nearly a dozen servers were created for “testing” purposes. These were not removed after they had served their purpose. (I actually think that most of them were created for the novelty of it. How do you account for servers named “Tyrone” or “Michael Jordan?”) During the months that these servers lay dormant, Microsoft and the application vendor had issued patches. When these dormant servers were reactivated, they were not updated with those patches. The servers thus turned into potholes or, worse, security vulnerabilities waiting to be compromised. It didn’t take long for that to happen. Consequently, the hospital lost data.
We were brought in to sort out the mess.
LESSONS LEARNED
What did we take away from this incident?
First, virtual servers must be managed individually and managed from their creation to their removal.
Second, management of these servers consists of staying abreast of patches, installing them as needed and meticulously documenting the patches that were installed. These steps have to be done for the virtual environment, the guest operating system and the application. These steps are crucial especially because of staff turnover.
Finally, management of the virtualized data center should be handled by capable hands. The integrator may have configured the virtual environment properly when it was created. However, we all know that things change over time. Someone has to take ownership of staying abreast of these changes. In the hospital’s situation, the virtual environment unraveled in steps. Visualize these: (1) a new appliance was installed, (2) a new server was created, (3) a new application was implemented, and (4) Microsoft issued more security patches. All of these events most likely took place. Consequently, failing to update the relevant pieces or updating the pieces incorrectly would have caused problems. Note that there are two hurdles: (1) identify the pieces that need to be updated and (2) do the updates correctly. At the end, we discovered two network links that were dead ends. We think these links had prevented two or more virtual servers from communicating.
While that was a technical AHA!, the bigger picture shows the consequences of a thoughtless decision. The hospital had stopped paying maintenance fees to the integrator. It attempted to maintain the environment on its own. This was unwise since the IT staff did not have trained personnel. The VLAN’s configuration developed potholes and compromised security. This is how a combination of thoughtless decision-making and sloppy housekeeping nearly hijacked a hospital’s JCAHO accreditation and risked punitive action from CMS. (This was a major reason. During that period, the hospital was cited for numerous violations.)
No comments:
Post a Comment