The Data Warehouse Matrix, Data Marts and Conformed Dimensions

The Data Warehouse Matrix is an essential design, scope, planning and communication tool for a Data Warehouse. It should be the first thing a Data Warehouse Architect does to define a Data Warehouse project.

A Data Warehouse matrix define the relationship between all planned Facts and Dimensions. I.e. the Matrix. Which bring us to the second topic of this lesson… “Conformed Dimensions“. Wherever possible, Facts should share Dimensions. These shared Dimensions are known as Conformed Dimensions. Conformed Dimensions are powerful. They allow cross business process/event analysis. If two facts share a Dimension, and a member is selected in that Dimension, then it filters both Facts. E.g Filter the Calendar Dimension to May 2019, then select measures from multiple associated Facts for May 2019.

The image below is the Data Warehouse Matrix for our very simple Example Data Warehouse.

The Data Warehouse matrix contains the names of your Facts on rows (blue cells) and the names of your Dimensions in the columns (green cells).

The relationships between the Facts and Dimensions is the value in the grey cell at the intersect of the Facts and Dimensions. A “1” in the cell at the intersection of a Fact row and Dimension column indicates that the Fact is associated to that dimension (and therefore the Fact can be analysed by attributes of the Dimension). By completing a matrix, its possible to see the overall high level design of your Data Warehouse on a single sheet.

The Data Warehouse matrix allows you to see what cross business process analysis is available. If Facts share a Dimension, measures of those facts can be analysed together on the same pivot table when slicing by that Dimension. The matrix can also highlight issues with your design. For example, in the matrix above, it would be useful to breakdown the Time Records by Client. The source system has to support this however.  It does in this case, because Client can be derived through the relationship from Time Record-> Task-> Service-> Client. In the ETL, you would need to navigate that relationship to set a direct relationship from the Time Record Fact to the Client Dimension.

The Data Warehouse Matrix can be used as an estimation tool. By assigning a complexity (Simple/Medium/Complex/Highly Complex) to a Fact/Dimension, and giving each complexity an effort estimate, it’s possible to predict how long the Data Warehouse build will take. In-fact Dimodelo Solutions has a free Data Warehouse Matrix Template that does just that.

In addition you can use the matrix to help plan your project. Generally you won’t or can’t deliver the entire Data Warehouse at one time. Having a matrix allows you to deliver parts of the Data Warehouse while still maintaining an overall plan that integrates new Dimensions and Facts, as they are created. Initially you will usually make decisions about which Facts are going to be the most valuable, and deliver those first. But you will have to temper that decision based on complexity, which is partially determined by how many Dimensions a Fact table is related too. After the first iteration, its possible to update the matrix to communicate progress, and to identify quick wins (Facts where the majority of Dimensions are already delivered).

A matrix can very effectively communicate the scope and cost and progress of the project to Senior IT and business management. We recommend maintaining versions of the matrix for reporting purposes.

The Matrix can be further segregated into groups of related Facts to define Data Marts. The definition of a Data Mart is somewhat confused, with no definitive definition in use in the industry. For the sake of lesson, a Data Mart is a subset of related Facts and Dimensions in the Data Warehouse.

Conformed Dimensions

As you have seen, Conformed Dimensions are Dimensions that are shared by the Fact tables. Those Fact tables don’t necessarily have to exist in the same Data Warehouse database. Data Marts (Groups of related Facts and Dimensions) could existing in different databases, that share Conformed Dimensions. Those data marts might even be managed by different teams. As long as there is an agreement on the surrogate and business keys of Conformed Dimensions, it is possible for Data Marts to share Conformed Dimensions.

One design approach is to keep Conformed Dimensions in a central database. The database must be accessible by all Data Mart databases that use them. As a Fact row is written, it is only necessary for the Fact to lookup the Conformed Dimension by Business Key, and save the Surrogate Key to the Fact. The Facts and Conformed Dimensions from separate databases are combined back together in the Semantic layer (i.e Cube) for analysis purposes.

Another approach is to duplicate Conformed Dimensions across Data Marts, but it is important that they are synchronized with a master. Conformed dimensions copies must have consistent keys, consistent attribute column names, and consistent attribute values.

Sometimes Dimensions are needed at a rolled-up level of granularity. Perhaps the rolled-up Dimension is required because the Fact contains aggregated measures that are associated with aggregated Dimensions. The Roll-up Dimensions would still conform, if it is a strict subset of the detailed Dimension. Attributes that are common to both the detailed and rolled-up Dimensions, should be labeled, and valued identically in both Dimensions.

Another case of conformed dimension subsets are two identical Dimensions, at the same level of detail, but one has a subset of rows.

Some Conformed Dimensions cross over into the Master Data concept of “Golden Records”. Managing Conformed Dimensions of this nature require specialized Master data techniques that fall outside of the Data Warehouse discipline. We suggest you investigate the Master Data Management tools for these kinds of Dimensions. A Conformed Dimension in this case would simply source its data from the “Golden Record” interface.