Microsoft Fabric: Weaving Data Engineering, Analytics and Visualization
Today Data analytics form a key part of each business to help in optimizing the organization performance, function more efficiently, to help maximize profit and to help stakeholders make more strategically guided decisions. A typical data analytics platform has different personas like Data Scientists, Data Engineers, Data Developers, BI Developers. Today each of these Personas would use different sets of tools to help with their work. Often this leads to data and technology silos within an organization.
Just imagine if all these people could use one single platform to carry on their day-to-day work. This would often lead to close collaboration among the various workgroups working together to deliver the required outputs.
Microsoft Fabric is a SAAS platform which tries to bring together set of familiar and simple tools to help organization shape up a Robust Data Analytic platform.
Most of the Modern data analytics life cycle go through the following stages.
Enterprises usually rely on popular cloud technologies for their data platforms. While each of these cloud technologies provide robust tools for data analytics they do come with their own set of challenges.
Some of the common problems include Data Security and Privacy, Data Governance and Compliance,
Integration of Data in different formats and eliminate data Silos, Scalability and performance, Vendor Lock in, maintaining workforce with Multiple Skillset to support the Data platforms, Maintaining Complex environment.
Microsoft Fabric tries to address some of these challenges by bringing a unified experience. Microsoft Fabric combines together the individual analytics tools and services into a single SAAS Platform.
OneLake allows enterprises to virtualize data lake storage in ADLS Gen2, AWS S3, and Google Storage.
Here is what we think of Microsoft fabric.
Data 360 View and Unified Analytics Platform:
According to Microsoft, OneLake is “The OneDrive for Data”. It is a Unified single logical storage location for an organization. It is highly scalable and capable of storing large volumes of data and has the ability to process different types and sizes. One lake provides access to the data stored using standard APIs and formats (Delta/Parquet file format). Data from other hyperscalers (ADLS G2, AWS S3 and GCP Storage) can be virtualized onto OneLake using “Shortcuts”. The data can be consumed as if it were present on the OneLake locally. This would reduce the overhead of bringing the data from other platforms to OneLake.
Since OneLake is centralized for a tenant, Securing and governing the data can be handled easily. Microsoft Purview and Fabric work together seamlessly so you can store, analyze, and govern your data without piecing together services from multiple vendors.
Fabric combines some of the popular Data analytics tools like Azure Synapse, Kusto DB, Azure Data factory, Power BI, MLFlow(Synapse Data science). This helps the organization reduce the overhead and complexity involved in data integration.
Citizen Data integrator and Pro Developer Data Integrator:
Most of the large enterprises generally would have two different personas while dealing with the data. The Business user (Citizens) who are in need to perform data analysis and the pro developer are required to support the Citizen DI’s. Generally, the Citizen DI are dependent on the Pro DI for making the data available on time. With Microsoft Fabric both users have more opportunity to collaborate more closely. The Citizen DI can carry on with familiar tools like Power BI, Power Query for their analysis. Whereas the Pro DI could continue to work on complex code intensive tasks on the data platform (Coding on Spark, Python)
Alignment with Next gen Data platform:
Microsoft Fabric brings some of the finest tools of data analytics together which are aligned with the industry trend and are continuously adapting to the new trends and patterns.
Data Factory: Provide 150+ connectors to the cloud and on prem data sources using the simple UI based drag and drop developer experience for data transformation and orchestration.
Synapse Data Engineering: Provides the ability to Data analysis using familiar languages and frameworks like Apache Spark or T-SQL to define and execute data exploration. Fabric offers running in different workloads,
Lakehouse – in this scenario, data is loaded, transformed, and curated by leveraging Spark notebooks
Warehouse – in this scenario, data is loaded, transformed, and curated by leveraging the T-SQL language
Kusto DB and Real time analytics:
Microsoft Fabric enables Organization to integrate Real Time Analytics into their data platform using Eventsteam to capture, transform and route the data to OneLake. KustoDB to store and manage the data from Onelake. And KQL to view and visualize the data onto visualization tool like Power BI.
Synapse Data Science:
Microsoft Fabric provides tools to explore and prepare data for analytics and machine learning. Provides capabilities to transform, prepare, and explore your data at scale. With Spark, users can leverage PySpark/Python, Scala, and SparkR/SparklyR tools for data pre-processing at scale.
Data Mesh and Fabric:
Microsoft Fabric helps organizations to logically group the personas in an organization based on specific business area. The data relevant to the `business area can now be owned and managed by the respective Business area. Fabric helps organize the business area user together in a logical management level known as “Domains”. The Domains enable organizations to provide granular control and governance on the data to the specific business area. Domains provide a management boundary between tenant and workspace enabling domain admins to have more granular control over multiple workspaces.
With “Domains” organization can implement data mesh architecture Allowing each business area to manage their own data.
Data Governance:
Microsoft Purview works with Microsoft Fabric so users can discover and manage Microsoft Fabric items in Microsoft Purview applications. From data source down to the Power BI report, Microsoft Purview and Fabric work together seamlessly so you can store, analyze, and govern your data without piecing together services from multiple vendors.
While all these features look exciting here is what we think.
By bringing some of the best data engineering tools like Azure Data Factory, Synapse, Data Lake and Leading Visualization tool Power BI on to the single platform would certainly benefit greenfield implementation of the data analytics and reporting platform.
Users who have been used to the collaborative work experience using Office 365, Power BI for their day-to-day work would find Microsoft Fabric continues the same experience on Microsoft Fabric. Also, with new data engineering experience it would help build more powerful analytics solutions.
Bringing Data Governance using Purview is certainly a nice feature to have in Fabric to bring in some amount of data governance. It would be interesting to see if Microsoft would consider opening the Fabric platform for other non-Microsoft data governance services available in the market today.
Fabric comes with support for integrating the CI/CD pipelines for the data engineering tools but we still lack the support to managing the Deployment pipelines for the Power BI Reports using Azure Devops. Hopefully Microsoft will bring some way to make it possible in the future.
Data mesh using Domains is an interesting feature of Microsoft Fabric. This would benefit organizations to try implementing Data mesh architecture to their data platform with less effort by defining the domains and then use the specific domain group can have their own data engineering consumption pattern. Also, since Fabric using OneLake for storage which could be governed by Microsoft Purview then data governance also would be easy to manage.
Microsoft Fabric brings some of the finest tools onto a single SAAS platform and it looks very promising. We certainly need Microsoft to come up with a clear use case scenario where Fabric can be the best fit. There might be few customers who are already having data platform on Azure in which case have Fabric might be redundant. Microsoft should provide a method or a tool to migrate such platforms to Fabric.
Stay tuned for detailed articles on the topics we have discussed in this blog.
References:
Data Eng, Mgmt & Governance Manager at Accenture in India
1yGood one