What 3D NAND Storage Means for the Industry -- Virtualization Review

What 3D NAND Storage Means for the Industry

For one, it means a big help for small and medium businesses.

By Trevor Pott
08/10/2017

This year at Flash Memory Summit, several SSD manufacturers announced new products based on 3D NAND. 3D NAND allows flash manufacturers to build larger, more capacious SSDs, with some of the announcements including individual SSDs up to 128TB. These will have a transformative effect on on-premises IT.

Many analysts will want to talk about how high capacity flash spells doom for traditional hard drive manufacturers. While that's likely to be the case in the long run, I think we're some ways out from that yet. There are two limiting factors on the death of the traditional hard disk: the number of SSDs that can be manufactured in a year, and the price of those SSDs.

It's all fine and good to predict that 128TB SSDs will spell the end of hard drives manufacturers struggling to put out 12TB drives for mass consumption, but that prediction simply isn't going to come true before the next refresh cycle, or even the one after that.

For all that 3D NAND seems like a panacea for our storage ills, it has some very serious problems. 3D NAND was the solution to the fact that we had hit the limits of physics with traditional planar SSDs. Somewhere between 20nm and 5nm it becomes impossible to consistently make decent SSDs, and we run into walls for both capacity and price.

3D NAND tried to solve this by stacking layers of silicon one on top of the other, avoiding the limits of physics with clever engineering. Unfortunately, 3D NAND was, at best, a stopgap. It will run into the practical limits of engineering in a right hurry. As with all other types of semiconductors, we're about to hit the wall hard on how much smaller we can make things. Moore's law has been broken.

So if 3D SSDs aren't magical storage dust that can be sprinkled on our datacenters to solve all our storage ills, why will they be transformative? The answer lies not in the extremes of usage, but in the mass market of the mundane.

HCI Practicalities
An important concept that seems to be taking off in the IT world is the idea of simplicity. The short version of this particular lecture goes like this: public cloud has shown companies that IT can be done without paying nerds to babysit infrastructure all the time. The more complex your infrastructure is, the more time you have to spend on it. By embracing simpler infrastructures, IT teams can focus on revenue-generating activities instead of keeping the lights on. As a consequence, IT teams are embracing simpler infrastructures because they want to remain employed.

One of these simpler infrastructures is hyper-converged infrastructure (HCI). HCI gets rid of the need for a dedicated Storage Area Network (SAN) and dedicated storage hardware. Consequently, it also gets rid of the need for dedicated storage administrators. Advanced software algorithms and automation handle the storage equation.

With HCI, servers are bought with storage installed. Multiple servers are clustered together and their internal storage is shared with the rest of the cluster. A virtualized or containerized workload's storage lives on the node where it's executing, as well as on at least one other node in the cluster. Additional capacity is added by adding more nodes to the cluster.

Up until now the problem with HCI has been that there's a serious imbalance between the amount of storage organizations consume and the amount of compute they use. Speaking to systems administrators in the field, a lot of the antipathy towards HCI that exists is the disconnect between how organizations want to use HCI and what vendors want to sell.

Some of this is down to the HCI vendors; most of them want to target enterprises and midsized organizations that will run a fairly small number of workloads per node, and will largely be running a homogenous group of workloads per cluster. Vendors want to sell expensive all-flash HCI solutions to an organization that will run a couple dozen SQL instances on a cluster, or several hundred VDI instances. They emphatically don't want to be in the business of selling clusters for mixed use where organizations try to cram every kind of workload under the sun onto a single cluster.

Organizations purchasing HCI, on the other hand, want to do exactly that. For them, the whole point of going the HCI route is buying "fire and forget" infrastructure that is dirt simple and where they don't have to think about what runs where.

Remember that sysadmins are competing against the ease of use of the public cloud while trying to justify their jobs. Going back to management and saying "we need dedicated hardware for each class of workload" means that the cost advantage on-premises solutions have over public cloud solutions starts to quickly go away.

This is where 3D NAND comes in.

Death of the NAS
3D NAND is "deep and cheap" flash. It doesn't have high write life, but you can put a lot of bits onto it. It is also significantly faster than traditional hard drives. In practice, this means that vendors can build out HCI solutions with enough raw capacity that virtualizing a company's file servers becomes practical.

Unstructured data, as found on file servers, has a very specific usage pattern. It is typically written once, updated a handful of times and then never looked at again. An entire industry has evolved to figure out how to identify "cold" data and then ship it up to public cloud storage, tape storage or some other form of "inexpensive" storage.

Layer upon layer of obfuscation has been built into software to make it seem like that archived data hasn't moved, and is always available. Dealing with unstructured data can get complicated, fast. But what if it didn't have to be?

Go back in time to when Intel tried to reinvent the modern CPU with the introduction of the Itanium. This entirely new chip was to be faster, stronger, smarter and better than the old 32-bit x86 chips. Unfortunately, it would require everyone rewriting all of their software in order to use it. Adoption of the Itanium was slow, and in short order AMD decided that the proper solution to the 32-bit problem was simply to make a 64-bit version of the x86 instruction set. AMD's approach built backwards compatibility in and didn't require any major changes from anyone. The result was a major victory for AMD, causing Intel to embrace the new x86-64 chips, abandoning its dreamed-of Itanium monopoly.

More than 15 years later, the old 32-bit software still hasn't entirely gone away. We're mostly 64-bit today, but since running the 32-bit code isn't hurting anyone, the transition period has stretched out into a multi-decade affair. The same will be true for how we deal with unstructured data.

If HCI can become storage-heavy without compromising performance, it will kill off the NAS in addition to killing off the SAN. The overwhelming majority of organizations will be able to serve all of their non-endpoint compute needs from 3D NAND-equipped HCI.

Edge cases will still exist that need big object storage solutions, but for the most part, we'll be able to cram every workload under the sun and all of our data -- structured and unstructured -- into generic HCI clusters. No dedicated hardware. No clustering of like workloads. No babying.

Death of the Dual-Proc
One of the casualties of all of this will be the dual processor (2P) server. For a long time, 2P servers were the industry standard because two CPU sockets were necessary to get enough cores to handle the tough workloads. After core counts in a single socket (1P) crept high enough that most people would consider single socket servers, the 2P market was kept alive because Intel artificially limited the 1P market by limiting the amount of RAM that a single CPU could address.

Enter AMD again. Now that it has a CPU design that can compete with Intel on raw speed, it has identified once more a gap in the market that Intel has deliberately left open. Intel has traditionally limited its single-socket CPUs to 32GB RAM, with the Xeon-Ds being allowed up to 128GB of RAM but without regular refreshes.

Getting more RAM means using a CPU designed for dual-socket servers, and this drives up the price of the server significantly. Of course, that's always been the idea behind limiting that RAM capabilities of single-socket chips: it's designed to make you buy the more expensive chips, and preferably more of them. AMD's new Epyc CPUs support up to 2TB of RAM per CPU. Now, these are admittedly CPUs also designed for dual-socket servers; however, this is more than double the 768GB per CPU that their direct Intel competitors can handle. This is important not because of the absolute amount of RAM that can be attached to a chip, but because of where "sweet spot" pricing is.

Single-CPU motherboards for AMD Epyc CPUs with 16 DIMM slots are a thing. 16GB DIMMs are the current sweet spot pricing-wise, meaning that AMD is enabling single socket servers with 256GB RAM.

More importantly, an entry-level AMD Epyc-based 1P server can have 8 physical cores (16 threads), 256GB RAM, 2x "deep and cheap" 8TB 3D NAND SSDs, and 1x performance 1GB SSD with decent write life. Factoring in the resources needed for high availability and redundancy, three nodes of these servers will give you a usable 16 cores (32 threads), 512GB RAM, 24TB of bulk storage and 2TB of performance storage.

This may not seem like much to folks who buy 2P servers today with 1.5TB of RAM apiece and 300TB of disk storage, but this is a huge amount of compute and storage to a small business. And it's a completely bare-bones, entry-level offering.

More to the point, it's an entry-level HCI cluster that is screamingly fast and can replace your existing SAN, NAS and compute servers. You can collapse an entire rack of traditional compute down into something that can reasonably be fit into a 1U multi-node chassis.

And by this time next year there's a good chance we'll be able to buy one of these clusters from an HCI vendor for $20,000.

Transformation
This is the transformation that 3D NAND will bring to the datacenter. Alongside the re-emergence of competition in the x86 CPU space, we're going to see 4TB and 8TB capacity SSDs become mainstream. This is going to drive HCI adoption among mainstream customers; not because of the extremes of performance, capacity of capability, but because of what's possible with the mundane "sweet spot" pricing.

The power of 3D SSDs to change things for you and I isn't found in the 128TB monstrosities that will cost as much as a nice car. It's not in proposed 2U units with up to 6PB of storage. 3D SSDs will change things because they let everyday organizations keep doing what we're doing without any major changes, even as our IT needs grow. No fancy unstructured data cloud archiving systems required. No need to obtain or maintain layers of complicated SAN or NAS storage. We can run it all in virtual machines, from high-demand databases to mostly idle Active Directory servers to great big file servers.

The hyperscalers and Fortune 500s can gobble up all the hard drives and fight over who gets what percentage of the ultra-high-capacity SSDs manufactured every year. We no longer need either. We'll take the sweet spot drives and build clusters that are responsive, capacious and inexpensive.

With 3D NAND, we can put off that public cloud thing for another refresh cycle. After that, we'll see.

About the Author

Trevor Pott is a full-time nerd from Edmonton, Alberta, Canada. He splits his time between systems administration, technology writing, and consulting. As a consultant he helps Silicon Valley startups better understand systems administrators and how to sell to them.