The Cranky Admin
        
        How To Survive the Next Amazonpocalypse
        There are numerous cloud options out there. Learn what they are.
        
        
        
In case you somehow slept through Feb. 28,  2017, Amazon's public cloud had a major outage that succeeded in  "breaking" large portions of the Internet.  Breathless reporting from all points has  concentrated on pointing fingers of blame, or tallying the list of those  affected. Here at Virtualization Review,  however, we focus on more practical concerns affecting every day IT  practitioners. Practical concerns, like "I told you so."
If you're reading this blog on Virtualization Review, there's a pretty  good chance you know your hypervisors from your bare metal and your firewalls  from your reverse proxies. I don't need to tell any of you why Amazon's outage  happened, or even that it was inevitable. You all knew that already. You are  the ones saying "I told you so."
Our bosses turn to us to keep IT running. We  in turn rely on hardware and software vendors, service providers and public  cloud providers. Our job is to find the right mix to meet the needs we're  confronted with. Sometimes we don't call it right, and that's on us. All too  often, however, someone higher up the food chain overrules us, and that's on  them.
You Can't Always Get What You Want
  The problem with "I told you so"'s  is that they don't really help you get what you want. Rubbing some empty suit's  nose in the fact that they decided to put all their eggs on a basket run by a  company that built a pan-global empire out of grinding its suppliers down to  the lowest possible bidder will just make said suit defensive. Nobody likes  being confronted with their mistakes.
There are alternative approaches to making  the infrastructure under our care more resilient, even if they're not nearly as  satisfying. The first thing we should all be doing is gathering banners for the  next battle. There are excellent post-Amazonpocalypse analyses, such  as this one by the inestimable Dan Kusnetzky, who has weighed in on  disaster planning aspects. Curating this sort of work from across the Web will  help us make our case the next time some pointy-haired boss Dilberts us with a  "why isn't this in the cloud yet?"
Of course, the world isn't so simple that  the answer to Amazon's outage is some sort of knee-jerk "the cloud is bad."  Yes, an outage occurred, but all the same reasons why the cloud was worth  consideration in the first place still apply. We do, however, have to change  the questions we ask, and of whom.
Assessing the Options
  If you want to build a resilient and highly-available  IT solution, there are three main routes to victory. You can: 1) control all  aspects of the solution yourself, which typically means controlling at least  two physical sites, 2) go the hybrid route where one site is owned and operated  by someone else, or 3) fully outsource everything to, for example, a public  cloud provider.
Controlling the whole thing yourself is  considered "old school" today. This is actually rational. Unless your  organization is large enough to justify having multiple datacenters in  different geographical locations, using colocation facilities to house gear for  your second site, or using a public cloud provider as a disaster recovery  location is the most economic and pragmatic approach. Both are plentiful, and  you can mix and match to meet your needs.
These hybrid solutions are the new normal. Keep  some workloads (and associated data) onsite, run the rest elsewhere, and have  backups run across the multiple points of presence. If something goes splork, you can flip the switch, light  up the backups and continue on your merry.
Of course, public cloud providers claim to  offer all of this -- and more -- within their own infrastructure. The big four  have multiple regions, numerous Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service  (PaaS) options, backups, snapshots and so on. As Amazon demonstrated so ably,  however, if the control plane goes out underneath it all, things can break so  bad that even Amazon  can't get into its own dashboard to update outages.
This brings us to the world of cloud  brokers. The chances of one public cloud provider having an outage aren't very  high. The chances of multiple public cloud providers having outages at the same  time could politely be called statistically unlikely. The chances that major  public cloud providers and regional service provider offerings all go down  together rest in the "well, the world is pretty much taking that day off  anyway, so why worry about it" realm.
System administrators have options for  resiliency and high availability, both on-site and off. Checking that these  have been taken advantage of -- and testing to ensure they continue to be  operational -- is where the real challenge lies.
Vetting Virtual Vendor Veracity
  With hybrid solutions becoming the norm,  some part of our infrastructure moves beyond out control. This could be as  simple as trusting a colocation facility to house some of our servers, or it  could be full-blown hybrid cloud solutions, such as Microsoft's Azure Stack. 
Before engaging these solutions, we need to  verify that they do what it says on the tin. We need to ensure that the  portions of the solution we don't control are fit for purpose and we need to  establish some means of regularly auditing everything to ensure that corners  weren't cut when we weren't looking.
Hybrid solutions require we not only audit  the offsite elements regularly, but also the software responsible for making  sure that A connects to B and data gets where it's supposed to go. Patches can  break things just as easily as a remote service provider suffers a "backhoe  vs. fiber optic" incident. Trust needs to be earned, not given away  freely.
Where things get a lot more difficult for  us is Software-as-a-Service providers (SaaS). Unlike IaaS or PaaS, where we can  engage with cloud brokers on our own to mitigate the risks of a public cloud  provider outage, SaaS providers merely provide an Internet-delivered  application. We don't get the luxury of using a cloud broker with them.
Applications, Not Infrastructure
  Here is where it's most important to tread  carefully. Business processes are built around applications, not  infrastructure. Applications are vital, hard to migrate away from and provide  lock-in that can go to the very core of an organization.
Before engaging with a SaaS provider, it's  our job as system administrators to carefully vet them. It's our responsibility  to make sure that they have taken appropriate steps to make the infrastructure  they employ resilient and highly available. It's our job to regularly audit  this and to raise the alarm if something changes. 
The sad reality is that we won't always be  listened to. Vendors using public cloud infrastructure as their backend sell  lies about uptime and misleading marketing about reliability. That's life.
What we can do is take opportunities like  this recent AWS outage to gather banners and make sure we're prepared for the  next argument, should that need arise. We can and should develop objective  metrics that we want SaaS providers to meet before we engage with them, and a  means of scoring those providers when our counsel is sought before relying on  one.
The Alternative View
  Walk into the meeting and tell them that  you never want to say "I told you so." Offer alternatives. And if you're  large enough, negotiate with the SaaS vendor. It may just be that they  themselves weren't aware of how they could improve their offering. 
In a world of public cloud solutions, this  sober second sight is what we get paid for.
        
        
        
        
        
        
        
        
        
        
        
        
            
        
        
                
                    About the Author
                    
                
                    
                    Trevor Pott is a full-time nerd from Edmonton, Alberta, Canada. He splits his time between systems administration, technology writing, and consulting. As a consultant he helps Silicon Valley startups better understand systems administrators and how to sell to them.