Data Modeling
What is Data Modeling?
Data modeling is the process of creating visual representations of the connections between data structures, with information about the individual attributes contained within those data structures.
When talking about data modeling in general, the term you will hear most often is the star schema. This is a widely adopted approach to designing data warehouses and relational databases and is the recommended approach to take in Power BI as well.
There are two main benefits of using a star schema:
Usability: A star schema makes your data model cleaner and more organized, and your report will be easier to use.
Performance: Star schemas allow you to easily scale your report to very large volumes of data. DAX measures also calculate faster and the Power BI report refreshes faster overall.
How does a star schema work?
A star schema is made up of a central fact table with multiple dimension tables branching off of this fact table, much like the appearance of a star. Ideally, it is good practice to have only one fact table in a data model. However, it is possible to include multiple fact tables in Power BI.
A fact table is made up of values that can be summarized and aggregated, as well as one or more keys that link to the dimension tables. The purpose of the key is to make connections between tables. If you have complex data models or large volumes of data, it is a good idea for the keys to be numeric values rather than text strings, as this can improve the performance of the model.
Fact tables should be very narrow and include as few columns as possible. Any columns that describe the features of a dataset, such as Product Name or Product Category, should be separated out of the fact table (if not already separated in the data warehouse) into their respective dimension tables. This process is called the normalization of the data, and the goal is to avoid repetitiveness.
Dimension tables describe the features of the dataset, where features are logically grouped together in separate tables to avoid bloating a single table or making it difficult to create relationships. The dimension tables should contain unique values for the features they are describing, and their columns will be responsible for filtering or grouping the data in the Power BI report.
An extension of the star schema is the snowflake schema. In this approach, the dimension tables are further divided if a feature has any further categories or subcategories. In our case, we have a Products table that contains the Product ID, Product Name, and Product Category. In a snowflake schema, this Products table will only contain the Product ID and the Product Name, while the Product Category would be further divided out into it’s own table with a Product Category ID and Product Category.
However, using a snowflake schema can unnecessarily add to the complexity of your data model; including these extra tables could decrease performance and make creating relationships difficult and confusing.
How to Create Relationships in Power BI
There are two ways you can create a relationship in Power BI:
Select a field from one table and drag it onto the field in the second table with which you want the relationship to form.
Recommended by LinkedIn
Select Manage Relationships from the ribbon and select “New” to add a relationship using the same window that we will be discussing next (except that it will start as blank).
By default, Power BI will try to infer a relationship between tables; it doesn't always get this right, so you may wish to turn this feature off in the settings or delete any relationships that are created automatically. To edit the relationship, right-click the connecting line between them and select “Properties”.
This window has 2 interesting options to choose from when defining a relationship: cardinality and cross filter direction. The choices for each of these options can have a big impact on the resulting report, so choose carefully. Let's break down each of these options.
Cardinality
There are 4 cardinality choices: many-to-one, one-to-one, one-to-many, or many-to-many.
When creating relationships, it is recommended that the joining field contains unique values in at least 1 of the tables. This allows you to use the one-to-many or many-to-one options in your data model.
In our data, we have a relationship between the Financials table and the Products table using the Product ID field. The Products table has unique values for the Product field (each product only appears once in the table). However, the Financials table can have each product showing up several times by date, country, segment, etc.
The many-to-many option is the most problematic, and while it can be very useful in complex models, it should only be selected if you fully understand the consequences of this relationship.
To illustrate where many-to-many relationships can cause problems, we create a yearly Sales Target table by Product Category in the data model. We need to be able to report sales numbers and these targets side-by-side in the same visual. This can be achieved by creating a many-to-many relationship between the Sales Target table and the Product table.
Many-to-many relationship in Power BI Screenshot
In the following table visual, we can see our sales values and the targets side-by-side for each product category.
However, if we tried to view this table at the Product level, the values for the target would be completely incorrect. This is because of differing levels of granularity in the data; since there are only sales targets at the product category level, Power BI cannot reasonably display these targets at the product level, and thus the targets are simply duplicated for each product in the product category. If you did not know that this could be a consequence of using a many-to-many relationship then this could be highly problematic for business reporting.
How to Optimize your Power BI Model
The most important factor in optimizing your Power BI model is to use good data modeling practices such as those laid out in this tutorial. More often than not, a good data model can single-handedly allow a Power BI report to scale into the millions and billions of data rows while still maintaining a decent level of performance.
However, there are a few other things you can do to optimize your Power BI model, in addition to following good data modeling practices, such as: