The Importance of Trust When Choosing Your Cloud Provider
Careful selection of your cloud partners is essential. Start small, and build experience and trust over time.
Using cloud computing is not a license to disengage your brain. Workloads in the cloud still need to be configured, managed and secured. They still need backups. Proper architecture is as important to cloud deployments as it is to on-premises ones, and the cloud comes with the added problem of needing to trust your cloud provider.
The cloud removes a lot of the scut work from IT. Administrators don't have to worry about swapping out dead servers or designing storage solutions. Organizations don't have to build their own datacenters, with all the attendant costs and considerations that go with that. The cloud tends to have huge Internet capacity: the kind that organizations of any size can only dream of for on-premises deployments. The cloud offers real-world value.
But the cloud isn't magic. It's (mostly) run by humans, and humans are fallible. While the cloud does employ a great deal of automation, that automation is designed and implemented by fallible humans. From top to bottom, mistakes get made. Cloud outages occur. Data is lost.
As a result, organizations looking to engage a cloud provider for any reason need to understand that they will still bear a burden of responsibility regarding their own IT design. You need to plan for cloud outages, data loss and disasters just as you would for on-premises. The advantage to using the cloud is that while you may have to plan for these things, you don't have to build all the relevant bits.
Recent Cloud Mishaps
Exploring some of the prominent whoopsies of the past year can help put the need to continue to be responsible in context. Cloud provider mistakes have come in different flavors over the past two years, here's merely a sampling of known events:
In mid-2017 Cisco lost customer data in its Meraki cloud. The lost data included customized interactive voice response greetings and undisclosed "enterprise apps." While considered to be a relatively minor data loss event, the outage is notable because the fact that Meraki gear is cloud-managed is the primary selling point. No on-premises management tools to contend with means less effort in getting things set up.
The Meraki outage is one of the more recent examples of the importance of trust in cloud computing. Trust, once lost, is nearly impossible to regain. As vendors ask organizations to move more and more into the cloud -- including the management planes of on-premises devices -- having a cloud provider that you can trust becomes more important than ever.
Earlier in 2017 Amazon's S3 storage had a well-publicized outage that was caused by human error on Amazon's side. It reinforced the requirement to perform disaster planning, even when you use cloud computing.
Shortly thereafter, Digital Ocean saw an outage. In this case, someone on Digital Ocean's side used the production credentials in a script meant for testing. Fortunately, no data was lost. The same can't be said for a Netgear outage that occurred around the same time frame; this cloud outage managed to also delete data on customers' local, on-premises NASes. Whoops.
A year earlier, Google managed to down its own cloud by applying a patch to the wrong routers. The incident was compounded by a monitoring solution on Google's end that couldn't identify the root cause, ultimately dragging the outage out. Here again end customers were fortunate in that data loss does not appear to have been a result.
Understanding your cloud provider is crucial. Consider the case of PC World KnowHow cloud backup users. In 2016 this cloud backup solution did not keep versioned copies of cloud backups, meaning that ransomware easily destroyed both production and backup copies of data. Others, such as Apple's iCloud, kept files that were supposedly deleted, revisiting this approach only when called out in the media.
The canonical example of not understanding how the cloud works, however, is Code Spaces. Code Spaces was forced to go out of business because it didn't understand how Amazon's cloud worked. A malicious actor compromised Code Spaces' Amazon EC2 user credentials and simply deleted all of the company's production and backup data. Its mistake is now a textbook lesson in why you must engage your brain before using the cloud.
The Cloud Offers Value
Not all is doom and gloom about the cloud. The cloud does offer very real value. Everything, however, hinges on trust. Well ... trust, and a rational assessment of your own IT capabilities.
Consider GitLab. GitLab walked away from the cloud and ended up with a catastrophe: five out of five of its data protection tools failed, resulting in the loss of a production database and a very embarrassing public debacle. Running away from the cloud in fear doesn't necessarily solve everything.
The point of the cloud is to relieve systems administrators from occupying themselves with "keeping the lights on," and allow them to put their knowledge and expertise to use at higher levels. Systems administrators freed from mundanity can focus on automation, on architecture and on meeting business challenges. This is the real promise of the cloud. When done right, the cloud delivers on this promise.
Trust comes into play in determining how much effort organizations have to put into various layers of resiliency. In a perfect world, cloud providers would be completely trustworthy and no one would ever have to worry about things like backup verification, cloud-to-cloud backup or building disaster recovery solutions for workloads that run in the cloud.
Extracting Value Takes Effort
Individual cloud providers may be more trustworthy than cloud providers as a whole. This is because some cloud providers -- especially regional services providers -- are willing to work with organizations one-on-one to ensure that all of an organization's needs are met.
Key to this is transparency; the cloud provider needs to be willing and able to answer how they'll handle any failure scenario, and they need to be open about which scenarios they cannot protect against, meaning that the organization has to engineer their own contingency. To contrast, the responsibility of the organization is to understand enough about their IT architecture and it needs to be able to ask relevant questions, and adequately understand the answers.
There are technologies and approaches to IT that organizations can use to minimize their exposure to IT risk, both on-premises and in the cloud. Composable workloads, overlay networking, microsegmentation, automated incident response and more are all relevant IT concepts. Similarly, application modernization toward the goal of being "cloud native" allows organizations that develop their own to use the cloud in an efficient, secure, resilient and cost efficient manner.
This is a journey, however; one that takes years, if not decades. For organizations seeking to take advantage of all that the cloud has to offer, the best advice that anyone can offer is to be prepared to do the work.
Research, research and more research is required. Careful and considered selection of your cloud partners is essential. Start small, and build experience and trust over time.
Above all, however: Test everything, back up everything and then test it all over again. If your data doesn't exist in at least two places, then it simply does not exist. And if you cannot restore that data from its backups, then once more it does not exist. On-premises or in the cloud, risk management is key to IT, and risk management begins with adequate data protection.
Trevor Pott is a full-time nerd from Edmonton, Alberta, Canada. He splits his time between systems administration, technology writing, and consulting. As a consultant he helps Silicon Valley startups better understand systems administrators and how to sell to them.