Staging

You must define a Staging table and its Source table/query before the source data can be incorporated into the Data Warehouse. Dimodelo Data Warehouse Studio (DA) allows you to import the columns metadata of a source table to quickly define a Staging table.

When defining a Staging table extract, you have the choice to either:

  1. Map Staging columns to a Source table column.
  2. Map Staging Columns to the columns of a query against the Source System.

You can also create a Derived Staging table, by uses other Staging tables as the Source. For more information about Derived Staging tables see the Derived Staging table topic.

Dimodelo Data Warehouse Studio implements 4 Extract patterns.

  1. Full Extract. The Staging table is truncated prior to the Extract being executed. The data from the source table/query is extracted in full.
  2. Incremental Pattern. Extracts only changed data from your source databases. E.g. you can use this pattern to extract only transactions from a large transaction table that have been inserted or updated since the last time the extract was executed. For the pattern to work you need to identify a column or multiple columns that the source table uses to track change, liked a modified date or sequence number.
  3. Date Range Pattern. The date range pattern is used to extract a subset of the source data based on a date range. The end of the date range is the current batch execution date. The start of the date range is a number of days specified prior to the current batch effective date.
  4. Multi Wild Card File Pattern. This pattern is specific to files. The developer defines a file name pattern (with wild cards), and DA will extract the data from all files, that match that name pattern, into the staging table. DA will also manage the files, archiving them in an archive folder.

To create a Staging table, you must first define a Connection Manager and Source System of the Source table(s) in the DA project. A Staging table can only stage data from a single Source System, but can stage data from multiple tables in that Source system, although this is not a recommended approach (according to Kimball et al) in most cases. If you do need to combine data from more than one Source System in the Staging database, to perhaps aggregate or allocate, then you can you a Derived Staging Table.

This lesson includes the following topics:

  1. Stage from Database Full (video)
  2. Stage DB Incremental (article)
  3. Stage period (article)
  4. Stage a File (video)
  5. Derived Staging table (video)
  6. Persistent Staging (video)