How to prioritize Data Warehouse development

Attempting to deliver your Data Warehouse in a single iteration, “waterfall” style, would be very difficult. Today’s agile development process demands that the tasks be broken down into smaller iterative phases that incrementally deliver value. This means you need to prioritize your Data Warehouse development.

In the case of a Data Warehouse, the business value is provided when a Fact is delivered with at least a subset of its associated Dimensions. Decisions about which Facts and Dimensions to prioritize are complex. Below are some of the steps you can use to arrive at these decisions:

  1. First, identify high-value dimensions. High-value dimensions are those shared by many Facts. In the example below, the high-value dimensions include Product, Staff and Calendar. By concentrating on high-value Dimensions, iterative delivery has a cumulative effect.
  2. Second, identify Fact/Business Processes that utilize the high-value dimensions.
  3. Third, rank these Facts by their business value vs development complexity. High business value, low complexity Facts that utilize high-value dimensions are ideal candidates.

Example Prioritization

Take the Data Warehouse shown below in a Data Warehouse Bus matrix.

To decide which Fact to deliver in the first project iteration, we did the following:

  1. First, identified the high-value Dimensions as the Calendar, Staff and Product Dimensions. At least 3 facts share them.
  2. Identified that the Product and Sales facts use all these high-value dimensions.
  3. Determined the build priority based on complexity. The Production and Sales facts are both considered complex. However, the Sales fact has more dimensions; therefore, the Production fact costs less to deliver. Delivering the Production Fact first provides a cumulative development benefit. This benefit is realized when you deliver the Sales Fact in a subsequent iteration. The Sales Fact only requires the delivery of an additional three dimensions. I.e., By delivering the Production fact, you have already partially delivered the Sales fact.

A Data Warehouse Matrix provides a framework for overall delivery that can be decomposed into rational development sprints that conform to an overall design.

Kimball Data Warehouse Bus Matrix Excel Template

Measuring Business Value and Complexity

It’s essential to engage the business in determining each of the Fact’s business value and priority. For better or worse, the business usually derives business value only from the perceived value of a narrow set of reports the Fact supports. Competing interests between different business functions are inevitable and challenging, to say the least. The cumulative benefit of delivering high-value dimensions can help business users understand that delivering on other users’ priorities also moves the needle closer to their requirements.

Designers measure the complexity of Facts in several ways:

  • The number of dimensions associated with the Fact.
  • The number of source systems and source tables required to load the Fact.
  • The data format of the source system.
  • The data quality of the source system.
  • The accessibility of the source system.
  • The complexity of the transformation logic.
  • The number of measures in the Fact.
  • The type of Fact (accumulating/transaction/periodic).

Designers measure the complexity of Dimensions in the same way with a couple of additions.

  • The number of attributes in the Dimension.
  • The number of Type 2 attributes on the Dimension.

To help with this prioritization, I use an enhancement to the Data Warehouse Matrix.

In the image below, I have added a measure of complexity to each of the Dimensions and Facts.

This simple enhancement helps in the prioritization process. It also feeds into the estimation process discussed in a later post.

Shown below is a 4-level complexity scale that usually suffices:

Balancing value, complexity and delivery

There is a complex interplay between business value, complexity, and cumulative delivery. Designers need to consider all these factors when planning iterations. Due to the cumulative nature of the Data Warehouse delivery, the complexity of delivering any given fact may change over time. For example, in a given iteration, you may have staged the required source data or delivered ¾ of the related Dimensions, which makes some of the remaining Facts less complex. Also, the business value of any Fact may increase relative to the remaining facts. Designers can take advantage of the cumulative delivery and plan a series of iterations to provide the most value quickly. Essentially, if you deliver “Fact A”, delivering “Fact B” becomes more straightforward.

One of the phenomena experienced when delivering a Data Warehouse is that early Facts and Dimensions appear to take significant effort and are more costly without providing a lot of value. Development tends to accelerate as more of the Data Warehouse is built. It can be a struggle getting over this low-value, high-cost “hump”, with the business becoming despondent and taking shortcuts. Understanding and communicating progress, value and complexity via the Data Warehouse Matrix can help mitigate this issue.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.