In-Depth
A Look at Microsoft's Secure Future Initiative
Back in September 2023 I looked at Microsoft's security woes in the wake of the Storm-0558 revelations ("Microsoft, Do Better"). I looked at what Microsoft had revealed at the time, and my expectation for how this was going to prompt Microsoft to change its approach to security.
That response is now here, in the form of the Secure Future Initiative (SFI) along with a whitepaper, blog post and Ignite session.
In this article I'll cover what SFI is, how it compares with Bill Gates's Trustworthy Computing Initiative (TwC) memo from 22 years ago, what Microsoft is doing technically behind the scenes, what's missing from the SFI and why I think they missed the mark on what really should have been the "Secure Culture Initiative."
A Little History
Back in the early 2000s Microsoft's products were hit with several high-profile worms and other attacks such as the ILOVEYOU email worm, Code Red (attacked Internet Information Server, IIS) and Nimda (spread through email, network shares, web sites and IIS), and in early 2003, SQL Slammer, followed by Blaster in August 2003. I was already an experienced IT tech then and I remember both the attacks and the memo quite clearly.
This led Bill Gates, then CEO, to write the TwC memo (more history here), which gave rise to the adoption of the Security Development Lifecycle (SDL) and wide adoption of Data Execution Protection (DEP), Address Space Layout Randomization (ASLR) and Control-flow Enforcement Technology (CET) to defend against Return-Oriented Programming (ROP) and other techniques.
The memo was relatively short, to the point and smelled of a deep realization that something had to fundamentally change. I think Microsoft saw the flaws being shown in its approach as an existential threat and responded accordingly. In fact, the Microsoft Security Response Center (MSRC) prioritized security vulnerabilities that could be turned into worms for many years after this.
This new SFI has been compared to the TwC, and I think there are quite a few noteworthy differences.
The TL;DR of Secure Future Initiative
Penned by Brad Smith (vice chair and president), it starts by outlining the changing threat landscape: more advanced attacks, faster attacks and more nation-state sponsored groups, all of which is true, but which was also true five years ago (and will be true in another five). It then goes on to cite AI as the first pillar of three to build the SFI on. Yes, AI is cool and yes Microsoft (with its OpenAI partnership) is a leader in this space, but I don't think it's mainstream, tested and reliable enough yet to be one of the load-bearing pillars.
They're also going to use AI to develop more secure software (outlined in a separate email by Charlie Bell, executive vice president).
The final pillar is "stronger application of international norms in cyberspace," the most confounding part of SFI (and a particular favorite of Brad Smith). First, attackers (including nation state ones) in Iran, China, North Korea and Russia don't care a hoot about norms. They ransom hospitals because they know they're more likely to pay -- tough if someone dies. And in wartime, anything goes as Russia has demonstrated with its relentless cyberattacks in Ukraine. And they hack any company, government organization or military installation in order to achieve their goals, including intellectual property theft. Also, the U.S. (and other western nations) also spy hack into the systems of other nations -- they're just stealthier about it. The final clanger is "states should recognize cloud services as critical infrastructure, with protection against attack under international law." Wouldn't that be nice: "We're hosting in the cloud so we're off limits." I don't think that's remotely realistic in today's world.
My main gripes with the SFI, and the way it's presented, are twofold. First, Satya is nowhere to be seen. This is not the CEO having a moment of deep realization that the current path is dangerous and the ship needs to steer clear of the icebergs ahead. It's security leaders talking about incremental security improvements.
Second, it's marketing word salad (mostly, see more below). It talks in generalities about improvements but not enough about concrete, measurable and transparent details that we can see and use to start rebuilding our trust in Microsoft's security culture.
And that's why I think this should have been called "Secure Culture Initiative," which would have put the spotlight squarely back where it belongs, on Microsoft, which needs to change its internal approach and prioritization of security. In an ever-increasing "speed of feature release" competition with AWS in the IaaS and PaaS cloud spaces, and ditto in SaaS with Google (I think this is a mistake, Google Workspace is no longer a serious competitor to Microsoft 365 except in very small businesses) and to a lesser extent Salesforce and others, will program managers at Microsoft be incentivized to say "No, we can't release this feature now, even though the competition is, because we'll need to spend another two months ironing out the security issues."? Because that's what it comes down to; you can't produce software as fast, and without security bugs (even with the help of AI) compared to if you take a more relaxed attitude towards secure design and coding.
The Technical Details
One thing that hadn't been revealed when I wrote the original article was a detailed explanation of how Microsoft was going to ensure Storm-0558 attacks would never be possible again. This we've now got in the aforementioned blog post and session from Ignite, and it's good news. I just wish it had been more prominently featured in the SFI itself.
They've built a system where no human action is involved in the creation and dissemination of signing keys, ever. So a malicious insider (or a compelled engineer with family held hostage) can't be forced to create or copy keys. The keys will be stored in Hardware Security Modules (HSMs), a particular challenge given that Entra ID (formerly Azure AD) is a distributed system, something HSMs aren't built for.
The actual processing of keys is done using Confidential Computing in Azure, something which I've covered here in more detail. In essence we commonly apply techniques to protect data in transit (TLS and other forms of encryption) and data at rest (full disk encryption, database encryption and so on), but Confidential Computing protects data when it's being processed by the CPU by encrypting areas of memory and using special enclaves in the CPU to process the data. This stops accidental or intentional leakage of keys from memory or copies of crash dumps.
They're also decreasing the lifetime of these keys to 30 days, limiting the ability for an attacker to exploit a key that was obtained if all of the above protections failed. Why not shorter? According to Alex Weinert, director of identity security, it's a balance between security and performance / scalability.
Finally, they've implemented deep monitoring and telemetry to spot any anomalies in the system such as a key being used beyond its expiry date.
Which is all well and good, and certainly reassures me that this particular attack will never be possible again (just like the dual, independent software build pipelines at SolarWinds do), but my question is: What other skeletons are there in the closet? The multiple failures and human errors that resulted in the Storm-0558 attack succeeding in the first place happened because of a lax attitude towards security, and I wonder if there are any other areas of the identity platform that are at risk.
And the fact that this wasn't caught by Microsoft's own hackers -- their red team -- indicates to me that the fundamentals of the platform, such as the issuance of root keys, weren't in scope for their attacks. And that's weird because that's where I would look first.
Musical Chairs
The fallout of this attack includes some personnel changes. Bret Arsenault, CISO for the last 14 years, is moving to a newly created role, chief security adviser. The new CISO is Igor Tsyganskiy from Bridgewater here in Australia, who started this week.
My question to whoever is the "boss of security" is: Will they have the power and mandate to enforce the ideas in SFI across the entire, huge organization that is Microsoft? Because just as the Challenger disaster happened because "low-level" engineers tried to sound the alarm on O-rings and cold temperatures but were overridden by senior leaders, this is a culture thing. And when the choice between shipping a new feature or fixing a security issue comes to light, this is when that security culture must be in place for that to be heard and prioritized.
The Good
Just like in my last article I don't want to imply that there isn't good security work being done at Microsoft:
- I appreciate the improvements in Windows 11 such as using Rust (a memory safe programming language) starting with Font Parsing and the Win32k Kernel and requiring Trusted Platform Module (TPM) hardware support, and through this enabling Secure Boot and associate services as well as Passkey and passwordless support. Read more here.
- Getting rid of third-party printer drivers with the new Windows Protected Print Mode (WPP) will mitigate a whole class of attacks.
- For very small organizations, Microsoft first enforced Security Defaults for new clients (2019), then applied it to existing tenants that hadn't touched their security settings (2022). These non-configurable policies apply MFA for all users and administrators, and blocks all legacy (=don't support MFA) protocols.
- Starting to roll out now, Microsoft is targeting larger tenants, who are using Conditional Access but who haven't got some basic policies in place with three Microsoft managed CA policies: Require MFA for admin portals; Require MFA for users configured with per-user MFA; and Require MFA for high-risk sign-ins (if you have Entra ID P2). These will show up in your tenant in report-only mode (not being enforced) for 90 days, then Microsoft will turn them on if you haven't made any changes. It's an interesting approach and I like it.
So, there's plenty of good work being done, but I'm still concerned about the culture of "security first" that's required.
Conclusion
There will most likely be one more chapter in this saga. First, we had the attack, and Microsoft's initial blog posts with the technical details, followed by more details on how it happened. Now we've had the technical response, and the marketing department's words. But the third chapter will come when the Cyber Safety Review Board (CSRB) releases its second report, sometime in 2024, focused on the Storm-0558 attack and on cloud security vendors security in general. Incidentally, its first report on the Lapsus$ crew published on the Cybersecurity and Infrastructure Security Agency (CISA) web site is a fascinating read and shows how teenage hackers use social engineering tactics (as was employed in the Caesar's and MGM hacks recently) to great advantage.
In summary, I feel reassured by the effort Microsoft has put in place for Entra ID technically to make this attack impossible in the future, which is good, however the overall shift to a more security-focused company culture is still uncertain. Time will tell, but as always in capitalism, when the choice comes down to doing the right thing or serving shareholder value, the latter always wins.