Creating a new Dimension

The following video demonstrating creating a new Employee dimension. This video was produced using an older version of Dimodelo Data Warehouse Studio, but is still valid.

Creating a Dimension

To create a dimension, right click the ‘Dimensions’ folder (or one of its sub folders) and select Create New.

There are two ways to create a dimension. By far the quickest way is to import the Meta Data from a Source System, and then edit as required. If the Source Meta data is not available, then you can manually enter the information.

Dimension Tab

  • Dimension Name. The Dimension Name is the name of your table in your data warehouse relational database. Spaces in the name will be converted to _ automatically when you save. The prefix for the Dimension table name can be set in the config file.
  • Dimension Friendly Name. This is the friendly name the user sees in the Solution explorer. In the future, if DA was to generate a cube/tabular model, it would be the name of the Dimension in the Cube. Spaces are not replaced in this name.
  • Dimension Description. A description of the contents of the dimension. optional. The description is output in the generated Data Dictionary documentation.
  • Reference. This is a free form text you can use to reference the dimension design back to another design document, perhaps a business requirements document.

Dimension Attributes Tab

You can add, delete and modify dimension attributes through the dimension’s Attribute tab.

Import

Use the Import Schema button on the Dimension Attributes tab to import the Meta Data for the Dimension from a Source System table.

To Import the Meta Data.

  • On the Dimension Attributes tab, click the Import Schema
  • Select the Source Connection where the source table resides.
  • Select the Import Table and press OK.

The dimension attributes will be populate with the column names and data types from the source table. Some connection types don’t return column data types. In this case the data types are defaulted to varchar(50).

To add an Attribute

Click in the empty row at the end of the list.

  • Attribute Name. Enter the attribute name. Spaces are replaced with _ when the dimension is saved. The Name must be unique within the dimension. Dimodelo Data Warehouse Studio will validate the name. If a name is specified that violates SQL Server naming restrictions, a red ! will be displayed on the row.
  • Data Type. Select a data type from the dropdown list, or just type the data type. The data type will be validated to ensure it is a valid SQL Server data type when the data type field loses focus. If the data type is invalid, a dialog box will display, and the field cleared.
  • Column Type. Choose the column type for the attribute. If the attribute is part of the Natural/Business key of the dimension, then select ‘Business Key’. The business key can be a composite key. That is, more than one attribute can have a type of Business Key. Normally the business key is what your ETL process will use to look up the dimension to identify if source rows already exist in the dimension. It is the primary key of the source table.
  • SCD Type. SCD Type is an acronym for Slowly Changing Dimension Type. The list of types correspond to the SCD types set out by Ralph Kimball. When the Attribute type is ‘Attribute’ this field is enabled and is required. Select the appropriate SCD type for the attribute. In brief Type 2 means keep a new version of this row if this attribute changes where Type 1 means overwrite the existing row. If you don’t understand SCD types then read our ‘Data Warehouse and Business Intelligence Concepts – Guide’.
  • Description. Add a description of the attribute. Generally the description should make sense to an end user. Descriptions are used in the generated Data Dictionary.
  • The parent indicator is used to indicate that the attribute is the natural key of the corresponding parent row in a parent-child hierarchy. That is, each row keeps its parent identifier in this attribute column. Parent identifiers can be composite keys, so more than 1 attribute can be marked as a Parent. The presence of a Parent indicator on any attribute will cause a Parent Surrogate Key to be generated in the DDL using the supplied DDL templates. Only a single Parent surrogate key is produced regardless of how many attributes are marked as Parent. Parent logic is added to the ETL Transformation if Parent is ticked.
  • Reference. This is a free form text you can use to reference the attribute back to another design document, or perhaps a report the attribute appears on.

To Delete an Attribute

Select the attribute row and click the delete key, or click the Delete Attribute(s) button. You can select multiple rows.

To Change an Attribute

Simply find the attribute in the list, click in the field you want to change, and change as desired. Attribute name changes are generated in the DDL as non-destructive rename script. Changes are only made permanent after the document is saved.

Advanced Tab

  • Custom Surrogate Key Name. This field allows you to specify a custom name for the surrogate key of this dimension. This is useful for in a conforming dimension scenario. See the Conformed Dimensions topic for more details. The standard DDL generation templates use the indicator and name to generate the correct surrogate key column name in the DDL.
  • Custom Surrogate Type. This field allows you to specify a custom data type for the surrogate key of this dimension. This is useful for in a conforming dimension scenario. See the Conformed Dimensions topic for more details. The standard DDL generation templates use the indicator and data type to generate the correct surrogate key column data type in the DDL.
  • Surrogate Key is Auto Generated. Un-tick this option if the ETL manages surrogate keys, instead of the database auto generating surrogate keys. The standard DDL templates use the indicator to set IDENTITY on dimension keys. This should be used in conjunction with the Auto Generate Key toggle on the advanced tab of the ETL dialog.
  • Generate DDL. Un-tick this option if you don’t want Dimodelo Data Warehouse Studio to Create this table.
  • Conformed. Meta data indicating the Dimension is a conformed Dimension from another Warehouse. This can be used in the Generation step to generate Conforming code. Currently this property is not used by the standard ETL generation templates.
  • Dimension Tags. Custom Meta Data for the Dimension. Tag the dimension with a tag Name and Tag Value. Some Dimension patterns require custom metadata. See the Patterns guide for more information about the custom meta data required by each pattern.

Custom Management Columns

What are Management Columns? Management columns are columns that are added to your Fact and Dimension tables to help manage the data, during ETL. Management Columns include things like:

  • Batch Identifier.
  • Row Status (Current – Not Current).
  • Row Effective Dates.
  • Checksum (if you use checksums in your ETL)

Dimodelo Data Warehouse Studio allows you to specify Custom management columns on an individual Fact/Dimension basis. A set of Standard Management Columns is specified for all Fact/Dimension tables. See section Standard Management Columns for an explanation of Standard Management Columns.

Custom Management columns can be used for Fact and Dimensions that have custom ETL that requires additional management columns for its operation.

Custom Management Columns can be added, changed and deleted through the Management Columns tab> Custom Management Columns table in the Fact and Dimension editors.

Defining Dimension Transforms

To define a Transform & Load process open the Dimension document of the Dimension table, and select the Add button on the ETL tab. The Transform & Load dialog appears.

Overview Tab

  • Transformation Name. The Logical Data Map name will default to the name of the Dimension table. The Name is used to uniquely identify the Transform and is used by the standard generation templates to name the Transform SSIS package/stored procedure. The name must be unique across all Dimensions and Facts.
  • ETL Pattern. Select the code pattern that will be used as the basis for the generated code for this Transform. Additional Patterns can be defined in the Reference Data/Patterns.rf document with a category of ‘Transform’. The standard Transform patterns include:
    • Type 1 and 2 Dimension. This pattern caters for both Type 1 and Type 2 Slowly changing dimension attributes in the same dimension.
  • Transform Overview. A description of the Transform for documentation purposes.

Staging Sources

Defines the Staging tables and columns that are the source of the transform.

The staging source can be a single primary staging table or the combination of a primary staging table and multiple secondary staging tables. The primary Staging table should be the staging table that has the same grain as your destination Dimension table.

Primary Staging Table

  • Primary Staging Table. Select the Staging table that will be the primary source for your transform. The primary source will have the same grain as the destination Dimension.
  • Staging Table Column List. This list is automatically populated with all the fields from the Primary Staging table. Delete any columns that are not required.
  • Edit Button. Click this button to specify further details for the Primary Staging Table. The Source Staging Table dialog appears.
    • Select the Primary Staging Table columns that are required for the transform.
    • Filter. In some cases, you may want to filter the data returned for the primary Staging table in the Transform process. Define the Filter, using an Expression. When referencing Staging columns in a T-SQL Expression, use the following convention Source_System_Abbrev_ Table_Name.Column_Name. See the Expression Language section for details of how to refer to Staging and Warehouse tables and columns. The filter is applied in the Where clause of the generated Source Query.
    • Tag Table. Define custom meta data for the Primary Staging table.

Secondary Staging Tables

If data is required from another Staging table in the transform, define the table in the Secondary Staging Tables table, and how it joins to the primary (or other secondary).

  • Add Join Button. Click this button to specify details for the Secondary Staging Table. The Source Staging Table dialog appears.
    • Staging Table Name. Select the Secondary Staging table from the drop down.
    • Join On. Write an expression describing how the Secondary table joins to either the Primary Staging table, or other Secondary Staging tables. When referencing Staging columns in a T-SQL Expression, use the following convention Schema.Source_System_Abbrev_ Table_Name.Column_Name. Where the Schema for a staging table is ‘stg’ and for a persistent staging table is ‘psg’. The Join on Expression is used in a LEFT JOIN ON clause of the generated Source Query. The Generated Source query is generated as a LEFT JOIN to the Secondary Staging table. If you require a JOIN, then use a ‘Secondary Staging table key IS NOT NULL’ filter, to eliminate rows where there is no matching Secondary Staging table record.
    • Filter. In some cases, you may want to filter the data returned for the Secondary Staging table in the Transform process. Define the Filter, using an Expression. The filter is added to the Where clause of the generated Source query. When referencing Staging columns in a T-SQL Expression, use the following convention Source_System_Abbrev_ Table_Name.Column_Name. See the Expression Language section for details of how to refer to Staging and Warehouse tables and columns.
    • Tag Table. Define custom meta data for the Primary Staging table.

Match On

The Match On tab defines the columns of the Staging source tables that match to the business key(s) of the target Fact/Dimension for the purposes of identifying new and deleted source records. In the standard generated code, where a match is found, further comparison is made based on checksums, to determine if a change has been made to the target row.

There are 2 ways to define the match:

  1. Use the Match button below the bottom right of the table to automatically match Staging to Target Business key based on Name.
  2. Select the Match manually.
  • Staging Table. Select the Staging table that contains the Staging column.
  • Staging Column. Select the matching Staging Column.
  • … Button. Click the … button if it is necessary to define an expression on columns of the Staging source to match to the target business key. The expression language is T-SQL. When referencing Staging columns in a T-SQL Expression, use the following convention Schema.Source_System_Abbrev_ Table_Name.Column_Name. Where the Schema for a staging table is ‘stg’ and for a persistent staging table is ‘psg’. The expression should be written as if it were in the SELECT list of a T-SQL Query against the Staging database. E.g. Select expression FROM Staging table.
  • Filter By. In Rare cases, perhaps when you have defined multiple transforms on a Dimension, you may need to filter the Dimension table, to match the incoming Staging rows for that transformation. You can define the filter, by specifying a Where clause on the Dimension table. The Filter is a T-SQL expression written as if it were part of a Where clause e.g. SELECT * FROM Dimension WHERE Expression. Reference Dimension table columns by Name.

Transformation Tab

The transformation tab allows you to map Staging table columns to Attributes during the transformation.

There are 2 ways to define the mapping:

  1. Use the Auto Match link below the bottom right of the table to automatically match Staging to Target Attributes based on Name.
  2. Select the Mapping manually.
  • Double click a row or click Edit Mapping button to change the mapping for the column, define an expression mapping, and add a description to the mapping for documentation purposes. Click the button and the Transform Mapping dialog appears.
    • Staging Column Mapping. Select the Staging table and Staging Column mapping for the Target Attribute from the drop downs.
    • Write an expression for the mapping. Note the expression can only reference Staging columns. The expression language is T-SQL. When referencing Staging columns in a T-SQL Expression, use the following convention Schema.Source_System_Abbrev_ Table_Name.Column_Name. Where the Schema for a staging table is ‘stg’ and for a persistent staging table is ‘psg’. The expression should be written as if it were in the SELECT list of a T-SQL Query against the Staging database. E.g. SELECT expression FROM Staging_Table.
    • Transform Mapping Description. A description of the mapping for documentation purposes. Non Mandatory.
    • Transform Mapping Tags. Define custom meta Data for the mapping.

Advanced Tab

  • Transform Tags. Define custom Design/Meta Data for the Transform.
    Auto Generate Key. If the surrogate key for the Dimension is a ‘smart’ key (like 20100616 for a Calendar dimension), select false, and then specify the source Staging column that contains the ‘smart’ key. See the ‘Defining a Dimension with Smart Keys’ topic for more information.