Fact Patterns

Fact Standard Pattern

The Standard Fact table pattern is designed to be a high speed load pattern for all styles of fact tables (‘transaction’, periodic snapshot’, ‘bridges’, and ‘accumulating snapshot’).

Key Features

  • Internal Change Data Capture.
  • Full Extract. (unless a Date Range extract pattern is used)
  • Incremental Load.
  • Non Ledger (updates overwrite original).
  • Soft Deletes. (Rows set to not current)
  • Active Partition Period capable.
  • Handles late arriving dimensions.

The pattern is ‘full extract, incremental load’. This means that all rows available in the source system are extracted, but only changes are loaded into the fact table (insert, update, delete).

The pattern is a not a ledger based pattern. This means that if the source of a fact changes, then the fact row is over written, and the history of the change to the fact (not it’s dimensions associations) is lost. This is acceptable in many cases, however if fact history is required, then you should use the Standard Transaction Ledger pattern.

The approach implements its own change data capture by using checksums to compare incoming source records against their equivalent record in the fact table. Where the fact table record doesn’t exist or there is a difference in checksums, further processing on that row occurs. Where the checksums are the same, the record is immediately discarded. This approach reduces the amount of processing the process needs to do, as it discards unchanged records as soon as possible.

Because the pattern implements its own change data capture, and all inserts, updates and deletes are executed in an atomic way, it is simple to recover after failure. Simply restarting the Batch will see it recover and self-heal its target Fact table.

In this pattern the majority of updates are either bulk inserts or bulk updates. Bulk inserts and bulk updates are a very efficient way of updating the Fact table, and therefore maintaining high performance.

For a cube to process this type of fact table, it is necessary to eliminate soft deleted rows in the Data Source View by only selecting ‘current’ records

Active Partition

One of the benefits of Data Warehousing is to remove the burden of keeping historical data from operational systems. However if an operational system does not keep all history, this presents an issue for this type of pattern which implements internal change data capture. How does the pattern differentiate between a legitimate delete in the source vs records that are just no longer being keep? To address this issue, the pattern introduces the concept of an active partition period. The active partition period is a time period (starting with the current batch effective date and working back a number of days) where data in the source is still considered to be active, and can be changed or deleted. Only changes and deletes in this time period would be acted upon. Data that appears to be deleted in the source system prior to this period will not be touched, thus maintaining the history. Deletes are implemented as ‘soft deletes’ simply having their current flag set to ‘not current’. We recommend that you select as large a period as you can to ensure all changes are captured. The patterns perform fast enough to handle bulk data. This pattern can be used with the Date Range Extract Pattern to also limit the data extracted from the Source System for greater efficiency.

The pattern will work with or without an active partition. The active partition is only necessary if the source system is not keeping all historical data. To enable an active partition add the following custom meta data tag ‘DIL_Active_Partition_Period_Days’ to the tag table of the advanced tab of the Fact transformation dialog. Set the value of the tag to the number of days in the active partition. E.g. 365. Also, the date column of the Fact that is used to determine if a row exist within the active partition must also be tagged. To tag the date column, add the tag ‘DIL_Date_Attribute’ to the Fact Editor>ETL Dialog>Transforms>…>Tag Table. Set both the Tag Name and Value to ‘DIL_Date_Attribute’.

This pattern can be used in conjunction with the Date Range Extract Patten to improve performance. Ensure that the ‘Extract_Period_Days’ tag in the Staging table is set to a value greater that the Active Partition Period Days.

Standard Management Columns

There are several standard management columns in every fact table which are used in the ETL process to manage the data.

Row_Is_Current: This indicates that the source row for the fact is still appearing in the source data. Values for this field are  1 or 0 where 1 is current.

Batch_Execution_Id: The id of the batch execution which inserted/updated the fact row.

Checksum. Used in change data capture to identify changes to fact rows.

Transaction Fact Ledger Pattern

The Transaction Fact Ledger pattern is designed to be a high speed load pattern for ‘transaction’ style fact tables. Transaction fact tables typically have a one to one relationship from fact to a source row, representing the finest grain of the source. Generally these facts have a low volume of change once created, and their dimensionality is static at creation.

Key Features

  • Internal Change Data Capture.
  • Full Extract. (unless a Date Range extract pattern is used)
  • Incremental Load.
  • Ledger based (updates change current row to not current, and a new row is written).
  • Soft Deletes. (Rows set to not current)
  • Active Partition Period capable.
  • Handles late arriving dimensions.

The pattern is ‘full extract, incremental load’. This means that all rows available in the source system are extracted, but only changes are loaded into the fact table (insert, update, delete).

The pattern is a ledger based pattern. This means that if the source of a fact row changes, then the corresponding fact row is tagged as ‘not current’ and a new row is written with the new values, thus the history of the change is kept.

The approach implements its own change data capture by using checksums to compare incoming source records against their equivalent record in the fact table. Where the fact table record doesn’t exist or there is a difference in checksums, further processing on that row occurs. Where the checksums are the same, the record is immediately discarded. This approach reduces the amount of processing the process needs to do, as it discards unchanged records as soon as possible.

Because the pattern implements its own change data capture, and all inserts, updates and deletes are executed in an atomic way, it is simple to recover after failure. Simply restarting the Batch will see it recover and self-heal its target Fact table.

In this pattern the majority of updates are either bulk inserts or bulk updates. Bulk inserts and bulk updates are a very efficient way of updating the Fact table, and therefore maintaining high performance.

For a cube to process this type of fact table, it is necessary to eliminate soft deleted rows in the Data Source View by only selecting ‘current’ records.

Active Partition

One of the benefits of Data Warehousing is to remove the burden of keeping historical data from operational systems. However if an operational system does not keep all history, this presents an issue for this type of pattern which implements internal change data capture. How does the pattern differentiate between a legitimate delete in the source vs records that are just no longer being keep? To address this issue, the pattern introduces the concept of an active partition period. The active partition period is a time period (starting with the current batch effective date and working back a number of days) where data in the source is still considered to be active, and can be changed or deleted. Only,x  changes and deletes in this time period would be acted upon. Data that appears to be deleted in the source system prior to this period will not be touched, thus maintaining the history. Deletes are implemented as ‘soft deletes’ simply having their current flag set to ‘not current’. We recommend that you select as large a period as you can to ensure all changes are captured. The patterns perform fast enough to handle bulk data. This pattern can be used with the Date Range Extract Pattern to also limit the data extracted from the Source System for greater efficiency.

The pattern will work with or without an active partition. The active partition is only necessary if the source system is not keeping all historical data. To enable an active partition add the following custom meta data tag ‘DIL_Active_Partition_Period_Days’ to the tag table of the advanced tab of the Fact transformation dialog. Set the value of the tag to the number of days in the active partition. E.g. 365. Also, the date column of the Fact that is used to determine if a row exist within the active partition must also be tagged. To tag the date column, add the tag ‘DIL_Date_Attribute’ to the Fact Editor>ETL Dialog>Transforms>…>Tag Table. Set both the Tag Name and Value to ‘DIL_Date_Attribute’.

This pattern can be used in conjunction with the Date Range Extract Patten to improve performance. Ensure that the Extract_Period_Days tag in the Staging table is set to a value greater that the Active Partition Period Days.

Standard Management Columns

There are several standard management columns in every fact table which are used in the ETL process to manage the data.

Row_Is_Current: This indicates that the source row for the fact is still appearing in the source data. Values for this field are  1 or 0 where 1 is current.

Batch_Execution_Id: The id of the batch execution which inserted/updated the fact row.

Checksum. Used in change data capture to identify changes to fact rows.