The Time Is Now to Embrace Parallel Processing -- Virtualization Review

The Time Is Now to Embrace Parallel Processing

DataCore shows what can happen.

By Dan Kusnetzky
06/16/2016

One of my clients, DataCore Software, sent along a press release touting the performance improvements provided by its new release of DataCore SANsymphony software-defined storage (SDS) platform and DataCore Hyper-converged Virtual SAN.

The real key to their product is a deep understanding about how today's systems are designed and how they're being used. Most of today's operating systems, having being designed in an earlier era -- sometimes as much as five decades ago -- don't make full and optimal use of today's systems. This is true from smartphones to tablets to even the largest server in enterprise datacenters. I thought it might be wise to explore what DataCore's doing now and how others could learn from their technology and successes.

How We Got Here
We live in a multi-processor (or multi-core) world now. Even handheld devices are configured with multiple cores. But much of the operating system software executing on those devices doesn't make optimal use of them and performance suffers. Why is that? Most of today's OSes were designed decades ago when machine resources were scarce and expensive, and staff resources were a much smaller part of the overall cost-to-use equation.

Important system functions were designed for simplicity, even if that meant reducing the overall performance of the system. Storage and network I/O were typically serialized. This means that all requests to send and receive messages were put in a queue and a single process popped them off and executed the request. This approach worked just fine when systems sported a single processor.

An Outdated Approach
This approach, however, hasn't stood the test of time and isn't always the best when even a smartphone contains a multi-core processor for processing, a digital signal processor (a special-purpose computer for processing wireless signals), a processor to manage WiFi communications, another one to manage GPS communications and yet another to manage Bluetooth communications and auxiliary devices.

If we examine a large-scale system found in an enterprise datacenter, we might find a configuration containing 64 or 128 processors, each of which has 4 to 8 cores. We're also likely to find a herd of other special-purpose systems supporting the main system. This might include a stack of network servers, storage servers and security servers; and that's just the beginning. Most organizations have neither the time nor the expertise to dig deeply into how poorly server resources are used. For the most part, they've chosen the most expedient path: purchasing systems based on sizing guidelines from previously observed application performance.

As an example, this means that if a given configuration could reliably support 10 virtual machines (VMs), the organization would purchase a machine half again bigger if it needed to support 15 VMs. While expedient, this approach means that most organizations are paying too much. Given the right software, the systems they have could be doing much more work than they currently deliver.

Queuing Up
Software engineers have put some focus on addressing this issue by deploying virtual processing software, such as VMs or containers, to trick systems into doing more work. For the most part, however, their focus has been finding ways to make the most of systems' computational power. When one virtual system or container stalls while waiting for I/O to complete, another virtual system or container is put to work.

This may have addressed one part of the issue we see today. It doesn't, on the other hand, really address the biggest bottleneck holding back full utilization of today's system resources to accelerate I/O requests.

The approach developed long ago to serializing I/O has resulted in applications standing in line to wait their turn to read from and write data to storage. This means less work is accomplished and further consolidation of workloads on these systems is limited, unless systems attain greater I/O capacity.

Harnessing the CPU
What DataCore has done with its storage technology is find a clever way to utilize available processing power and put it to work accelerating storage. What most organizations don't realize is that most systems process inputs and outputs serially, becoming the limiting factor on how many virtual machines can do computing simultaneously.

What they've learned is that harnessing available CPU cores to work on I/O offers organizations a number of productivity and cost saving benefits:

More applications or VMs can be supported on the currently installed systems. The organizations won't need to purchase unnecessary hardware or replace systems prematurely.
When an organization can purchase fewer systems or smaller system configurations and use them for a longer period of time, it can save on system acquisitions, purchasing processes, power and cooling, datacenter floor space, as well as planning and implementing datacenter transitions.

To prove the point, DataCore ran the SPC-1 benchmark performed on a Lenovo system and had the results audited. Here's a bit of what the company said about the results:

DataCore achieved audited results that included record-setting numbers for the fastest response time at 100 percent load: 0.320 Milliseconds; best price performance: $0.08 per SPC-1 IOPS; and highest IOPS per rack unit: 459,290.87 SPC-1 IOPS in a 2U enclosure (approximately 3.5 inches or 89 millimeters) -- a small fraction of a standard 42U rack. All of these numbers rank three to ten times better than popular storage products from leading manufacturers, including all-flash arrays and many multimillion-dollar systems. The PSP5 release further elevates DataCore's revolutionary Parallel I/O performance advantage.

I'm pointing to the amazing performance DataCore and its partner, Lenovo, have achieved through the use of engaging multi-processor, or parallel, performance to accelerate I/O only to suggest that the same thought process could be of significant benefit elsewhere.

Dan's Take: Unleash the Power
We now live in a world in which even the smallest computing devices typically are multi-processor (or multi-core) systems. Our OSes -- including Android, iOS, Linux, UNIX and even Windows -- are all still living somewhat in a single-processor world.

It is true that clever software engineers have found ways to graft multi-processing into those OSes. What is equally true is that the developers of those OSes have focused more on adding new features, UI upgrades and supporting more applications, than going through their operating system with a fine-tooth comb, trying to unleash the power of all of the cores or processors our devices contain.

DataCore's achievements, while notable, are just pointers to what might be done in other areas. Since IT is now all about doing more with less, isn't it time we sought out other bottlenecks built into our collective thinking about how systems should work?

About the Author

Daniel Kusnetzky, a reformed software engineer and product manager, founded Kusnetzky Group LLC in 2006. He's literally written the book on virtualization and often comments on cloud computing, mobility and systems software. He has been a business unit manager at a hardware company and head of corporate marketing and strategy at a software company.