A First Look at Azure Ultra SSD
Paul Schnackenburg examines the current storage options for Azure IaaS VMs, as well as the new offering, Ultra SSD, released in preview at Ignite 2018.
Storage in public clouds has always been an interesting problem. The importance of IOPS and throughput are highlighted by the fact that every Infrastructure-as-a-Service (IaaS) VM size in Azure has specific limitations on those figures, on a per-disk basis and overall for the particular VM size. Often, sizing a VM for memory and vCPU is an easier task than knowing what kind of disk performance the application it's running requires.
Today, Azure offers three types of disk storage that you can use with your IaaS VMs (I looked at this back in August 2018), Standard HDD, Standard SSD and Premium SSD. The two Standard offerings provide the same IOPS (500 IOPS/disk) and throughput (60Mbps throughput), but the SSD-based storage offers even more performance with lower latency.
Standard HDD and SSD are sufficient for many basic workloads that aren't disk bound. For high-performance database workloads and other applications that require good storage performance you should use Premium SSD. This type of storage has a limitation per disk of the available throughput and IOPS leading to a management overhead where you have to connect multiple disks to a VM and then use Storage Spaces in Windows and Software Raid in Linux to stripe those disks together. All these three storage types are offered from the same storage fabric that underlies all storage in Azure, which provides API access and many other benefits, at the expense of higher latency due to the additional layers of capabilities. Also be aware that different VM sizes have varying disk performance limitations -- you don't want to provision Premium (or Ultra) SSD that's more than the VM can manage as you'll be wasting money. Different Premium SSD sizes also provide different performance, which is coupled with capacity -- you can see the different limits here.
A word about caching: Azure offers three options for disk caching, none (which is also the only option for some VM sizes such as the B series); read (recommended), which will use resources on the host to cache common reads, minimizing the traffic that has to go over the network to the storage cluster; and read/write, which gives you better performance but with the risk of data loss if writes have been stored in the cache on the host but have not yet been written to the storage. Only use this option if the vendor of the workload in the VM recommends it.
This new option is specifically designed for applications such as MongoDB, PostgreSQL, and RabbitMQ that benefit from low queue depths and for high IOPS workloads such as SQL Server, SAP/HANA, and Oracle DB. Design goals were to keep latency under a millisecond and provide both high IOPS and throughput while decoupling capacity from performance as much as possible. The current Premium SSD performs better the larger the disk and because you pay for provisioned disk size (not used space like you do with Standard SSD/HDD), you may end up with wasted capacity. Because it's currently in preview, there's some PowerShell magic to do before you can use Ultra SSD and your subscription must be enabled for it (see Figure 1).
Maximum IOPS from a single Ultra SSD disk is 160,000 and max throughput is 2,000Mb/s (!). Note that a new VM size (Experimental_E64-40sv3) is required to take advantage of these values, "normal" Azure VMs top out at 80,000 IOPS. Sizes range from 4GB to 64TiB -- Figure 2 shows the relationship between size and performance (and, yes, there still is one, although it's not as linear as it is for Premium SSD).
Many enterprises have been reluctant to move their exceptional performing tier 1 mission-critical workloads that run on all-flash Fibre Channel storage arrays to the cloud, due to lack of storage performance. Ultra SSD (when it's generally available) should remedy that concern (see Figure 3).
Ultra SSD is built on a separate storage fabric, the underlying physical storage is NVMe-based (as is in fact Premium SSD). Microsoft open sources its Azure server design, which is called Project Olympus.
Microsoft has designed its own "ruler," called "Enterprise and Datacenter SSD Form Factor (EDSFF) long," each housing 16TB of M2 form factor NVMe storage and they're in turn stored in a 1U enclosure that can hold up to 16 rulers, which are hot swappable from the front. The enclosures are connected to the Hyper-V hosts using PCIe switches.
An interesting aspect of Ultra SSD is that you provision exactly the throughput and IOPS values you need and this gives you a predictable bill at the end of the month as Ultra SSD is charged on provisioned performance and capacity, not transactions (see Figure 4). Even more useful is that you can change these values while the VM is running, for instance if you have a report server that needs very high disk performance one day a month when it's crunching a lot numbers for the reports, after this peak is finished you can set up automation to dial the performance right down for the rest of the month, saving dollars. Note that increasing storage capacity (not IOPS/throughput) does require the VM to be shut down.
Ultra SSD is currently in limited private preview and is only available in the East US 2 region (and only in Availability Zone 3), is only supported on ES/DS v3 VMs, can only be provisioned as data disks (Microsoft recommends using Premium SSD for the OS disk) and only work as managed disks. Currently they don't support disk snapshots, Availability Sets or VM Scale Sets or Azure Disk Encryption and, most importantly, Azure Backup and Azure Site Recovery are not yet supported. General Availability is planned for the first quarter of 2019, with support for snapshots (and thus backups) planned for the second quarter of 2019, with Availability Sets and support in most regions coming in the third quarter of 2019. Ultra SSD will only be offered as Locally Redundant Storage (LRS) with three copies in a single datacenter, high availability will have to be built using tools such as Azure Site Recovery (ASR).
Ultra SSD is a very interesting improvement to IaaS VM storage in Azure and while it's early days (given the limitations, particularly around backup), I find the concept of being able to change the performance characteristics of the underlying storage "on the fly" very attractive for the right use cases. And 160,000 IOPS and 2Gb/s (from a single disk) with sub millisecond latency should open up the ability to truly run any workload in the public cloud.
Paul Schnackenburg, MCSE, MCT, MCTS and MCITP, started in IT in the days of DOS and 286 computers. He runs IT consultancy Expert IT Solutions, which is focused on Windows, Hyper-V and Exchange Server solutions.