Putting Fabric in your Data Fabric

Microsoft just held its developer-focused Build conference, this time a hybrid event with both online and warm body attendees.

Satya's keynote was AI, all the time, everywhere, and I have some thoughts on Copilot coming to Windows and basically every other Microsoft app you can think of, which I'll share below.

But first, I'll focus on a much more tangible release, the public preview of Microsoft Fabric. There were 11 sessions focused on this new(ish) data analytics Software-as-a-Service (SaaS) service, and I'll explain where it fits in and why I think it's an important evolution.

[Click on image for larger view.] Fabric Home Page Guided Experiences (source: Microsoft).

Cloud Building Blocks Are Changing Software Forever
One interesting trend I've been noticing over the last few years is how cloud provider's building blocks can be combined to become more than the sum of their parts.

Take Microsoft Teams for example. Building on top of Microsoft 365 Groups (a building block that never went anywhere by itself), SharePoint and OneDrive for Business storage, plus Exchange mailboxes, this service has seen meteoric growth in just a few years. By themselves, these individual services in Office 365 performed their decades old functions just fine, but when mixed together in just the right way (to compete with Slack) something truly useful emerged.

Or Microsoft Sentinel, one of my favorite services that I use for all my clients for deep security visibility across their digital estates. To store all the security log data from cloud and on-premises sources, they didn't invent a new database and instead used Log Analytics, which was already optimized for huge log data sets and offered fast search capabilities through Kusto Query Language (KQL). To provide the Security Orchestration, Automation and Response (SOAR) piece, they again built on a battle-tested Azure service, Logic Apps, and for advanced hunting scenarios you can use Azure Notebooks (hosted Jupyter Notebooks).

Microsoft Fabric is using the same playbook, combining seven existing Platform-as-a-Service (PaaS) services into a single SaaS service.

[Click on image for larger view.] OneLake Data Hub in Fabric (source: Microsoft).

Data Everywhere -- for Everything
With the risk of committing buzzword bingo overload -- an organization's competitive advantage comes down to how well it leverages its data. The confluence of cheap cloud data storage, ubiquitous sensors and cost-effective compute has led to many businesses realizing the value of their data. But it nearly always happens in silos. The marketing department has one place where they gather all their leads and campaign data, whereas manufacturing stores their factory generated production data in another system, followed by distribution in a third system. And there are many systems to choose from, both open source and proprietary, across cloud and on-premises platforms. Of course, if all data was stored in the same place, in the same system, it's value could truly be unlocked for the business as a whole but being a Chief Data Officer is often more like being a Chief Integration Officer.

Microsoft Fabric in a Nutshell
Microsoft has one surprisingly popular service / tool for data visualization, Power BI. Incredibly easy to use, even for a non-developer like myself, anyone can download the desktop app and start creating reports and visualizations. That power, coupled with a very easy way to connect to, or import data sources, has made it the go-to tool for many business users.

Meanwhile Microsoft has several popular services in Azure for building data warehouses, such as Data Lake generation 2, and the Synapse family (formerly SQL Data Warehouse). They all require a fair bit of expertise to configure and fine tune, however. Then there's Data Factory, which does Extract, Transform and Load (ETL) on data, thus ingesting it from disparate data stores.

These services are now being reborn in their SaaS form in Fabric. The existing services will continue to function but will not receive (most) new features, as these will be surfaced in Fabric instead.

[Click on image for larger view.] Components of Microsoft Fabric (source: Microsoft).

So, if Fabric brings together existing services, let's look at what's new. First, there's a new data storage location called OneLake (like OneDrive -- get it? There's even a driver so that it can show up on your local drive, just like OneDrive does). It's built on Data Lake but stores all data in the open Delta format (everyone at Build kept calling them Delta- Parquet files, but they're really just Delta files -- tracking changes to the data).

If the data you need isn't in OneLake already, you can either import it, or use shortcuts that can point to other OneLake locations, existing Data Lake gen 2 or Amazon S3 storage accounts (with more sources to come). The One bit is important. You only have one across an entire Azure AD tenant, just like you only have one Fabric instance. This is to alleviate the integration issues mentioned above. Just like in Power BI you can then divide your OneLake into multiple workspaces, with different security and access controls.

Also, you don't have to fine-tune the size of the underlying infrastructure (Spark clusters), as this is SaaS it's all taken care of for you. Note that pricing has not been announced so at this point it's all a bit hand-wavy, but I trust it'll become clearer over time.

In OneLake a database whose tables are managed by Spark is a Lakehouse, whereas a database whose tables are managed by the SQL transaction system is a Warehouse (unstructured vs. structured data). A Warehouse has a SQL connection endpoint, so you can use any other third-party tool that can connect to it for visualization, or other data processing (provided you have the right credentials).

One new component in Fabric is Data Activator, currently in private preview. In demos it's used to take action on data as it's being streamed into Fabric -- take an action when the temperature of a delivery truck exceeds a particular value for example. It's a "no code" environment where you connect triggers ("patterns of data") with actions to achieve the appropriate result.

Here's some other helpful bits of Fabric terminology.

Another interesting aspect is the enablement of Continuous Integration / Continuous Deployment (CI/CD) throughout Fabric. In Power BI this is done by changing the underlying file format. Today a dashboard is stored in the pbix format, but what's coming is storing Power BI in a folder structure compatible with GitHub (and any other source code repository) -- and this is also coming to the other workloads in Fabric.

Data scientists can work together with others in the business, and Fabric will handle their ML models and Notebooks and other artifacts as objects.

Not in the diagram above (the seventh service) is Azure Purview. This data security service works with Fabric to provide sensitive information scanning, lineage (where did this data come from originally and how was it transformed to end up here), endorsement (this data is accurate) and Data Loss Prevention (DLP) to control sharing of sensitive data. In public preview is also the inclusion of ML artifacts into Purview.

Notebooks open in seconds, something that could take minutes in Synapse, and you can now edit them concurrently with other data engineers.

A nice touch is the Admin monitoring reports (inherited from Synapse) which will give you insight into how Power BI and Fabric is used in your tenant, which features are adopted and so on.

[Click on image for larger view.] Admin Monitoring in Fabric (source: Microsoft).

As the gloss wears off from the "big" keynotes, I did notice that there are quite a few bits missing in Fabric today. Several are in private preview, and some are "on the roadmap." I still think Fabric is going to be a big hit in businesses, because it makes many things that are complex and time consuming much easier, but it's going to take a few more months before it's ready for prime time.

One big thing missing today is OneSecurity. The individual services still have their access controls, but the single, unifying security control across OneLake that then cascades into all the different services isn't there yet. Another missing piece is external sharing. You can share data in OneLake internally within your tenant, but not externally, yet.

And, of course, Copilot is coming to Fabric, in all the different components. It's currently in private preview, but demos showed it creating code for use in Power BI (start your prompt with "code" and it knows you'd like the answer in the form of code) and several other Fabric services.

A business user with no experience in Power BI can say "create a report on last quarter's sales, organized by geography, with the ability to drill down to individual accounts." At least in pre-recorded demos, which brings me to Copilot everywhere.

Generative AI Everywhere
Satya's entire keynote was all about Copilot and AI. Copilot is coming to Windows, Microsoft 365, Dynamics 365 and Fabric. (How about Copilot in Halo -- wouldn't that be full circle -- Cortana as an in-game AI, came and visited our world for a few years as a personal assistant -- faded away and is replaced by Copilot from our world). The demos do look slick, and my imagination sees incredible benefits for my clients and for myself (I can't wait to test Security Copilot in M365 Defender and Sentinel). But this isn't my first rodeo and I feel a few words of caution are in order here.

First, we're firmly in the upcurve of the Gartner Hype Cycle for Large Language Models generative AI, heading firmly towards the Peak of Inflated Expectations. There will be setbacks and disillusionment as people start learning what a ChatGPT style companion in Word, Teams, Outlook and Windows can do, and what it can't do. Second, apart from short video demos there's not really anything us mere mortals can test yet. And third, there's been no announcements of costs or plans yet. I can't imagine that Copilot in M365 will just be included, not even in E5 (which used to be "everything" and is now "nearly everything -- you just have to add these five add-ons over here"). And fourth, this will require people to change how they do their work, and to relearn how to perform certain tasks and procedures, something that comes naturally to us geeks, but not so much to ordinary business people.

It's kind of cool that it's Microsoft riding this dragon towards the crest, though -- not some, smaller, agile startup but good, old Microsoft (along with OpenAI who truth be told is behind a lot of this).

You can sign up for a 60-day Fabric trial here, and administrators can stop users in their tenant signing up for trials here.

One of the reasons for Power BI's popularity is how easy it is to get started with, as a SaaS service. The rest of the Synapse stack always felt more like specialist tools that you had to create and configure in Azure, whereas now they sit right next to Power BI in the portal.

Compared to other popular analytics platforms, I think Fabric is much wider in scope and appeal than Databricks or Snowflake. It'll be interesting to look at Fabric again in 12 months and see if it took the world by storm.


Subscribe on YouTube