Leveraging Fabric SQL Databases for Metadata-driven Processes
Why I am never going back to storing platform configuration data in files.
Modern data platforms are more powerful than ever. As a data engineer, it can be a difficult task to keep up on the latest features from platforms like Fabric. Mirroring, real-time analytics, copilot improvements, SQL databases. Microsoft has done a great job of providing users with out-of-the-box solutions. However, every data platform I have worked on has required custom processes to be built to support custom use cases. A common practice for orchestrating these custom processes is to use metadata-driven processes. Metadata-driven processes reduce overall developer work and maintenance by creating few parameterized artifacts (e.g., notebooks, pipelines, etc.) and calling them many times.
So where do we often find the configuration data that drives our metadata-driven data activities? I find that many teams put them into CSV files, JSON files, or SQL tables. There really hasn’t been a dedicated tool that makes it easy to store this data while allowing for processing tools to access the data. The concerning thing I see is that most teams fail to follow a proper SDLC when creating their system for handling this configuration data. This has always bugged me – this data is extremely valuable, and it should be treated the same as the rest of the data solution.
At Neudesic we have built a dedicated tool for handling this configuration data – Neudesic Maestro. Maestro enables data engineers to easily define processes, the parameters used in the processes, and the individual executions that need to take place. The system is modular – meaning that all process data is stored in shared tables. This allows us to have a very mature system – Maestro boasts features such as expression building, auditing via temporal tables, AI integration, and more. The system sits on top of a SQL Server database for its primary storage.
I was very excited to hear about Fabric SQL databases when they were announced in late 2024. This is an accessible way for data engineering teams to have good data security, reduce infrastructure setup and configuration, and have a strong tool for handling this important data. When I first heard about Fabric SQL databases, I wondered what kind of SQL Server database would be available – would it be a full version, or some other flavor that only supports some features? Maestro would be perfect to go with Fabric SQL, but only if it was a complete version of SQL Server.
Let’s walk through the deployment together.
The first thing I had to do was to create a SQL Database in Fabric. This was very easy – in your workspace, click New Item and then search for SQL Database. Provide a name for the database and Fabric will provision it. Note that not everyone will have this access today. Talk to your Fabric administrator if you are not able to see the option to create a Fabric SQL Database.
Once the database is provisioned, you will see the screen below. You can interact with the database within Fabric. Your team can create objects and share queries with one another, all within the Fabric experience. This makes Fabric SQL Databases more accessible for users who aren’t as familiar with tools, such as SSMS.
I prefer to use other tools for interacting with databases, such as SSMS. Fabric makes it easy to connect to the database from other tools as well. Click on the gear in the top left corner to see the settings information for the database. Click on Connection strings and you will find the server name and database name in the connection string, as shown below. Note that the database name will have a GUID attached to it to ensure it is unique. You will need to include that in your connection string.
Now that I have my database, how should I go about creating objects? As mentioned above, I treat my configuration data as part of the overall solution. This means that my database objects need to be created via proper CI/CD techniques. The Maestro database is packaged in a DACPAC and is released via PowerShell and sqlpackage.exe. I took the same script I use to deploy to Azure SQL databases and plugged in my Fabric SQL database server and database name. I was very happy to find that it worked with few changes – the only change I needed to make was to deploy the DACPAC with the AllowIncompatiblePlatform setting to True. Below is a screenshot of the deployment – its exactly what I would expect with any other SQL Server database.
Recommended by LinkedIn
As mentioned above, the Maestro solution leverages many different types of database objects – user-defined types, triggers, temporal tables, etc. All of them worked as expected. I was very pleased to find that Fabric SQL Databases are the full and complete version of SQL Server – I don’t have to make compromises to use it.
Now that I have a database in Fabric, how do interact with it in pipelines, notebooks, etc.? First, we need to add data to our Maestro database. Maestro allows users to add or maintain data in 3 key ways:
In the image below, I have added 10 activities that I want to execute with Fabric. My process will grab files from OneLake and ingest them as delta tables in my Lakehouse. Each activity has its own parameters that my process will use to route data to the correct place. The app makes it easy for me to see each activity and its properties, and allows me to change those properties as needed. And the best part – the system is dynamic, so I can add additional properties, remove properties, or rename properties, all within the app.
Now for the most important part: how can Fabric read my configuration data and use it in pipelines, notebooks, etc.? Let’s look at a sample pipeline. I recommend using stored procedures as a common interface when retrieving configuration data. To call a SQL Server stored procedure in Fabric pipelines, first create a connection. You can do this by navigating to the Manage connections and gateways link in settings. Once you have created the connection, create a new pipeline. In this case, we will use the lookup activity. Choose your newly created connection, point it to your procedure, and see the output data. In the picture below, my stored procedure returns all 10 data activities that I want to execute. In the future, my orchestration pipeline will retrieve these records and loop through them until each activity has been executed.
Fabric SQL Databases is a fantastic addition to the Fabric platform. It makes it very easy to store auxiliary or configuration data, and offers the complete tools we expect from SQL Server. Setup was easy – Fabric has a great track record for simplifying resource creation and configuration. I was able to deploy it through automated tools and can even connect a web app to it. Fabric SQL Databases are fantastic – which is why I will never store my configuration data in files again.
Interested in learning more about Neudesic Maestro?
Contact us today to schedule a demo and start your journey toward mastering your data.
Azure 15X | KPMG | Ex - EY | Azure Data Engineer | Data Factory | DataBricks | Data Lake | Synapse | Data Pipelines | Data Warehousing | CI/CD | PySpark | SQL | Python | [Views Are Personal]
1moThanks for sharing!