Information Governance: The Microsoft Way -- Virtualization Review

Information Governance: The Microsoft Way

Paul Schnackenburg says the industry is adopting these technologies as part of Zero Trust and defense in depth, so everyone should have a grasp of the basics.

By Paul Schnackenburg
03/28/2023

In several articles here on Virtualization and Cloud Review I've looked at various security services from Microsoft aimed at protecting your organization against cyber threats. I've looked at Defender Threat Intelligence and External Attack Surface Management, Entra Permissions Management, Defender for Business (complete EDR for SMBs), Defender for Identity, Microsoft Sentinel, and how they all integrate together.

But I haven't written about the other side of the coin, your documents and data and how to protect, govern and track their lifecycles -- all of this is covered by Information Governance and Microsoft has a complete portfolio. In this article I'll look at Information Protection, Data Loss Prevention, Retention policies and Records management.

These terms might not come naturally to IT pros and could be thought off as only concerning the legal department or maybe IT security, but as far as I can see, the industry is adopting these technologies as part of Zero Trust and defense in depth, and everyone should have a grasp of the basics.

Data Loss Prevention
Once upon a time, Microsoft offered Windows Information Protection (WIP), which was fairly limited in that it only blocked copying text from a "work" document (opened from SharePoint or OneDrive for Business) into "non-work" locations (personal Google drive / Dropbox and so on). And Exchange Server had some capabilities for blocking the sharing of sensitive data.

This is now superseded by a comprehensive, cloud-based Data Loss Prevention (DLP) service, covering SharePoint, Exchange, Teams, OneDrive for Business and PowerBI along with endpoints (Windows and macOS) that's much more capable.

The point of DLP is to either warn or block users when they are sharing sensitive data inappropriately or against company policy in various ways. This could be sending credit card numbers in an email to an external recipient, storing a Social Security number in a publicly shared SharePoint document library, or saving medical information to an unencrypted USB stick. As you can imagine, there are many ways sensitive data can leak, either inadvertently or with intent.

Managed through the Purview portal (more on this later) at https://compliance.microsoft.com/ you can create policies based on templates for Financial / Medical / Privacy organizations, with customizations for different geographical regions. DLP policies used to be only tenant wide but you can now use administrative units in Azure AD to have German administrators only be able to see and manage DLP alerts for a scoped set of users, for example.

**[Click on image for larger view.]** Enhanced U.S. HIPAA DLP Policy Template

As mentioned, you can have a single policy cover all locations, including on-premises repositories (SharePoint Server and File shares).

**[Click on image for larger view.]** DLP Policy Locations

The template will include several Sensitive Information Types (SITs) and identify these in your content (email, document, PDFs, Teams message and so on) and if triggered you can customize what should happen. You can restrict access to the content, encrypt it, notify the user, administrators and the owners of the content and you can optionally allow users to override the policy, giving a business justification that is audited and logged. You can also create custom DLP policies for specific business requirements, bypassing the templates.

**[Click on image for larger view.]** DLP Policy Rule Actions and Notifications

I recommend using the option to "test out" the policy first, where no actual blocking occurs, letting you check the logs and see what the impact will be when you eventually enforce the policy, giving you the option to tweak settings before impacting a critical business process by mistake.

Specifically for endpoints (Mac and Windows) there are many settings, such as the option to exclude particular file paths or network shares, define restricted apps (where you can control sharing of sensitive data more tightly) and unallowed browsers, group printers and removable USB devices to control where printing and saving of sensitive data is allowed or blocked and even define different settings when a device is connected via VPN.

**[Click on image for larger view.]** Endpoint DLP Settings

You can see file actions taken by your users in Activity explorer and any triggered DLP rules in Alerts. Microsoft recommends managing DLP alerts in the Microsoft 365 Defender Unified Alerts list instead as it provides excellent integration with the Defender suite. For example, DLP alerts may be raised when an attacker has compromised an endpoint, which might be visible with other alerts from Defender for Endpoint or Defender for Office. You can also tag alerts to manage them as a group and perform bulk actions, plus take actions such as disabling a user account or isolating a device on the network if you suspect foul play.

**[Click on image for larger view.]** DLP Alerts in Microsoft 365 Defender Incident Queue

Basic DLP functionality is available in Office 365 E3 but advanced features, including Endpoint DLP, require Microsoft 365 E5 licensing. A very powerful recent addition is Adaptive Protection, which gathers contextual signals (user has resigned, been flagged by insider risk management and so on) and applies a stricter DLP policy to a high-risk user than a low-risk user.

I think Microsoft feels that their DLP solution is ahead of the competition, even providing a tool to migrate to it from Symantec DLP.

Information Protection
OK, blocking sharing of sensitive data at the time the user hits "send" makes sense, but what about all the data at rest? And what about proprietary information to the company that might not trigger based on credit card number or SSN, but which is definitely needs to only be available to authorized users? Traditionally we've accomplished this with permissions on file shares and SharePoint libraries. But they only apply when the document or CAD drawing is there; once I download and attach it to an email or store it on a USB stick, those permissions don't restrict access anymore.

Hence, we need to be able to identify different types of sensitive data and label our documents, and if they contain information we want to restrict, we need encryption and permissions to be built into the document itself. I could write several long articles on this technology, but in summary, Microsoft has been at this for quite a few years now, starting out with on-premises tech, moving to Azure Information Protection and now to Purview / Unified Information Protection being built into the Office clients (no agent installation required).

Currently there are 308 built-in SITs, and you can create your own to find sensitive data (maybe every project has a number based on a particular scheme, for example). Where more complex data needs to be identified, you can use one of the 58 built in Trainable classifiers (created through machine learning) that find Business plans, Customer complaints, Discrimination, Financial audit reports, Network design files and Resumes, for examples. These are available across 12 different languages. If there isn't a built in ML classifier that works for your scenario, you can create your own.

**[Click on image for larger view.]** Trainable Classifiers in Data classification

You can also use Exact Data Match where you have a database of employee IDs, for example, and you don't want to match a random string of numbers that match a particular pattern, but rather correlate on the list of actual staff IDs.

Based on these different ways of identifying data, you create labels, which are then grouped into policies that publish the labels to be used across the different data repositories. You can apply labels to documents, but you can also apply labels to groups and sites, which doesn't actually label the content of a site, for instance (not yet anyway), but controls the sharing options available. Label a site as sensitive and you stop external sharing of documents, for example.

Labels can also apply watermarks / headers and footers to documents and most importantly, encrypt documents and apply permissions. You can optionally allow end users to define who should have access to it when they apply a label manually. These protections follow the document whether it's emailed to someone using a consumer email service or stored on a USB stick.

And unlike DLP, which applies at the time of sharing, Information Protection is built on the assumption that you're scanning existing documents and labeling them based on Label Policies you've defined, as well as training users to label documents as part of their business processed.

You should always create a group of highly trusted admins that are given super user permissions so that they can remove or change the protection on documents if a staff member has left or applied permissions that are too strict, for example.

Data in Other Locations
The options above for Information Protection don't just apply to documents, you can also apply labels and policies to Schematized data assets using Microsoft Purview Data Map, which can find the specified data in SQL Server, Azure SQL, Azure Synapse, Cosmos DB, AWS RDS and other data lake storage options.

This service is housed in Azure and uses compute to run scans against the data repositories you define, so be aware that there can be quite a cost associated with this service.

A little bit of clarification around naming is in order here, given Microsoft's ridiculous propensity to rename products every year or so. Originally there was a service called Azure Purview, which is now the Microsoft Purview Data Map mentioned above. The Purview name is now an umbrella term for all of the information governance services, so it's Purview DLP, Purview Information Protection, and they're all accessed in the Microsoft Purview compliance portal.

Data Lifecycle Management
So, we've applied policies to data sharing, and we've governed information at large by finding, labeling and protecting data across documents and databases. The last puzzle piece is controlling the lifecycle of documents, by applying either retention policies or retention labels to make sure that data we must retain for X years due to regulations are kept. For emails this means that if a user deletes an email and then empties their Deleted items folder, these emails are still recoverable (through eDiscovery searches) for the specified number of years. The same applies to documents in SharePoint and OneDrive for Business.

**[Click on image for larger view.]** Retention Label with Events to Start From

All documents are equal, but some documents are more equal than others (with apologies to George Orwell) and must be treated in a special way, not just be retained in a hidden location for a number of years. These are signed contracts, non-disclosure agreements and similar documents that can be declared as records. You have several different levels here, up to the point where a document declared as a regulatory record can't be moved, deleted or have its metadata changed for the duration of the retention label. Once its time is up, you can have a designated group of users perform a disposition review to determine if the documents should be deleted, have a new label applied or just be left where they are with no retention defined.

Conclusion
As you can see, Microsoft has a complete approach to information governance across Microsoft 365 and Azure and was recently recognized as a leader in The Forrester Wave: Data Security Platforms, as well as covering the Cloud Data Management Capabilities 14 key controls of the EDM Council.

There's a lot to implementing these services in a business, particularly as the tech is generally the easy bit. Getting people and processes right is much harder and takes education and perseverance, but I think this is the next area that'll need business focus. Just treating all data the same and not knowing where it's stored, who's got access to it and what's sensitive and what's not is a recipe for disaster.

Good luck on your Information Governance journey.