The Schwartz Cloud Report

Blog archive

Behind Microsoft's Latest Windows Azure Outage

Microsoft's Windows Azure cloud storage service went down worldwide late Friday afternoon, just as I was getting ready to call it a week. An expired SSL certificate was the cause of the outage, Microsoft eventually confirmed.

The Windows Azure outage -- which lasted into Saturday -- is ironic, given last week's study that indicated Windows Azure storage offered the fastest response times out of five large cloud networks, beating those operated by Amazon Web Services, Google, HP and Rackspace. Good thing for Microsoft that Nasuni, the vendor that ran the study, wasn't testing Windows Azure this weekend.

Once the service was back up Saturday, I posted an update noting that Microsoft had fixed the problem and users could once again access their data. The company said the service was 99 percent available early Saturday and completely restored by 8 p.m. PST. But the damage was already done -- and many customers and partners were furious.

In comments posted on a Windows Azure forum, Sepia Labs' Brian Reischl, who first pointed to the SSL certificate as the likely culprit, seemed to feel users should cut Microsoft some slack. Reischl said letting an SSL certificate fall through the cracks is a mistake anyone could make. "I know I have. It's easy to forget, right?" he posted. "It's an amateur mistake, but it happens. You end up with some egg on your face, add a calendar reminder for next year, and move on."

But one has to wonder how Microsoft, which has staked its future on the cloud and has spent billions to build Windows Azure into one of the largest global cloud services, could not have put in safeguards to prevent the domino effect that occurred when that cert expired -- much less have a mechanism in place to know when all certificates are about to expire. Putting it in admins' Outlook calendars would be a good start.

Of course, there are more sophisticated tools to make sure SSL certificates don't expire. Among them are Solar Winds' certificate monitoring and expiration management component of its Server & Application Monitor, a favorite among readers of our sister publication, Redmond. Another option not so coincidently hit my inbox this week: Matt Watson, founder of Stackify, spent a few hours over the weekend developing a free tool called CertAlert.me, which allows site owners to scan the Web sites they own and track SSL and domain name expirations.

"It happens a lot," Watson told me in a brief telephone conversation regarding outages like the one that struck Friday, which affected Stackify. "All you can do is sit on your hands and pray," he said, adding that years ago he had to deal with an expired SSL certificate. "You buy them and you forget about them and the next thing you know, your site's gone. It's one of those things that get overlooked."

Asked what's the business opportunity for offering this free service, Watson said he saw it as an opportunity to bring exposure to his startup's namesake offering, a Windows Azure-based server monitoring platform targeted at easing access for developers while ensuring they don't have access to production systems.

Indeed, you can bet Microsoft is going to ensure it doesn't happen. "Our teams are also working hard on a full root cause analysis (RCA), including steps to help prevent any future reoccurrence," said Steven Martin, Microsoft's general manager of Windows Azure business and operations, in a blog post apologizing for the disruption. Given the scope of the outage, Microsoft will offer credits in conformance with its SLAs, Martin said.

This is not the first outage Microsoft has had to explain and probably won't be the last. And we all know the number of well-publicized outages Amazon Web Services has encountered in recent years.

If you're a Windows Azure customer, did last week's slip-up erode your confidence in storing your data in Microsoft's cloud? Drop me a line at jschwartz@1105media.com or leave a comment below.

Posted by Jeffrey Schwartz on 02/26/2013 at 11:05 AM


What is this?

Reader Comments:

Wed, Feb 27, 2013 Jon Australia

Storing data in the "cloud" is a bit like skydiving with a virtual parachute. You can have my share of the cloud, I'll stick with raid and fttn thanks.

Tue, Feb 26, 2013

Is it me or the so acclaimed Cloud services provider are having a bad publicity with this incident's? An SSL certificate expired? It happens to the best i think.

Tue, Feb 26, 2013 Tim Wessels Somewhere in the cloud

Well, this is two years in a row to the month when a certificate related snafu brought down Windows Azure. Last year it was a bug that didn't account for leap year. This year a certificate expired. Microsoft spent $15B to build out six cloud data centers but how good was the design for these warehouse-scale computing environments? They are probably very well designed to deal with any number of failures, but a certificate failure should not have been able to take down the Windows Azure service. Fixing this is not rocket science. Go-Daddy starts pinging me months before my domain name registrations expire and renew them automatically if I don't. Why couldn't Microsoft do the same with certificates? This needs fixing or Microsoft risks losing a lot of credibility in the cloud marketplace.

Tue, Feb 26, 2013 Marty Austin, Texas

MS should stick with what they do best, writing software. Oh Wait!

Add Your Comment:

Your Name:(optional)
Your Email:(optional)
Your Location:(optional)
Comment:
Please type the letters/numbers you see above