Persistent vs. Non-Persistent Workloads: the Admin's Conundrum
Exploring options in the bare-metal world.
Virtualization isn't the answer to every problem in IT. There are plenty of workloads where neither containers nor hypervisors are the answer. The old problems of being able to easily provision, utilize and make highly available bare-metal workloads remain, so what options exist for the modern sysadmin?
Solutions to this problem can be largely divided into two groups: persistent and non-persistent workloads. For all intents and purposes, these break down into OpEx and CapEx problems, respectively.
The simplest method of making a workload non-persistent(ish) is simply to disconnect the storage from the compute. Using Fibre Channel or iSCSI LUNs delivered from centralized storage has been used for ages. If the server dies, who cares? Attach the LUNs to another server and off you go.
Blade chasses take this a step further by allowing datacenter administrators to automate the process, even assigning some blades in each chassis to be hot spares, ready to take over in the event one of the primary blades dies.
This is all fun and good for a while, but eventually hardware ages and is replaced. Frequently there's a transition period where multiple generations are in use at the same time. Bare metal operating systems don't tend to do so well when you run them on radically different hardware.
Any operating system can be made non-persistent. People who do Virtual Desktop Infrastructure (VDI) for a living can explain in detail the number of possible ways this can occur, and the many, many hurdles to making it happen.
In a perfect world, the operating system can live on one disk, the application on another and the settings and data on a third. Changes could be made to each without affecting the other.
In this manner, application updates and operating system updates could be performed independently of one another by using a "golden master" approach: the operating system and application disks would be read-only copies of a centrally cared for and curated master. The master is tested and updated; workloads are simply restarted in order to benefit.
Another benefit of this is that data and settings can simply be moved from one operating system/application combo to another, allowing the bits that matter – data and settings – to travel between entirely different hardware configurations.
Unfortunately, this rarely works, in large part because developers are lazy. Containers are one attempt to solve this problem, but they come with their own problems, again because developers are lazy. Building truly potable applications in this manner takes care and attention, especially in the Windows world.
Linux applications are frequently quite different. I have built render farms with thousands of diskless nodes. These nodes pull an operating system image from a PXE server on boot, load the render application and its configuration from the central server, then set about rendering whatever's in the queue, spitting the finished data into a folder.
Today's VDI software will help do this or Windows in a virtualized environment, but it isn't strictly necessary. The infamous Hiren's Boot CD has been making entirely non-persistent mini-Windows XP (and in one incarnation mini-Windows 7) versions bootable from a read-only DVD for years, and a big enough Windows nerd can make such things happen.
Proper non-persistent workloads are a huge investment of time and effort up front. Administrators must learn the intricacies of the operating systems and the applications involved in order to pull it off. This can take months or even years.
A persistent workload is one that simply must be installed on a server or where the availability requirements are such that the length of time a computer takes to reboot is an unacceptable amount of downtime.
Workloads that need to actually be installed on a server usually fall into this category because the software packages involved are picky, or the vendors are strict about support. For whatever reason, they can't be pulled apart and made non-persistent.
Truly persistent workloads usually require fault-tolerant hardware. In this world you're usually talking about Stratus or HPE's NonStop.
Fault-tolerant hardware is expensive. While that might seem to argue that running persistent workloads is CapEx-heavy, it isn't. The hardware is the cheap part.
In a non-persistent world changes are easy. Clone out the golden master, tinker until satisfied, replace the golden master with the clone. Testing is cheap and easy, and the consequences of screwing up aren't that great.
In a persistent world you can't screw up. Even if you've done what you're supposed to and bought two of everything so that you can test your changes on an identical system before rolling out to production, the entire affair is hugely procedure-heavy.
Think of it like pushing out changes to a Mars rover. Somewhere at NASA there is another Mars rover, and all the changes are tested on it, documenting each and every step in sequence. Once everyone is satisfied, they reset the test to the same condition as the operational rover and verify that the procedure works exactly. Some number of rehearsals later the changes are rolled to the real rover.
Properly persistent workloads are the same. Every change is a nail-biting affair. The cost of change management for persistent workloads is thus typically much higher than the cost of the hardware itself.
Knowledge is Power
Virtualization administrators may not be called upon to worry about these workloads themselves. Regardless, it behooves us to investigate the bare-metal world if only so that we can justify the seemingly outrageous costs of the software that enables us to do our jobs.
Good virtualization management tools are expensive. But the alternatives are far more so.
Trevor Pott is a full-time nerd from Edmonton, Alberta, Canada. He splits his time between systems administration, technology writing, and consulting. As a consultant he helps Silicon Valley startups better understand systems administrators and how to sell to them.