Date Range Extract Staging table

The date range pattern is used to extract a subset of the source data based on a date range. The end of the date range is the current batch execution date. The start of the date range is a number of days prior to the current batch effective date. Each time the ETL runs, the current batch effective date, for that batch execution, is set to midnight of the prior day. The date range ‘window’ moves forward a day as the current batch effective date changes.

Create a new “Task” Persistent Staging table

  1. Expand the Staging folder, right-click Persistent Staging and click Add New.
  2. Select “Service” in the Source System field.
  3. In the Staging Table Name field type “Task“.
  4. Enter the “This is a Task persistent staging table” Description in the Description field. This is optional.

Staging Columns

The quickest way to define the columns of the Staging table is to import the column definitions from a Source table. If there isn’t a matching Source table, then you can manually define the columns. To import a Source table do the following:

  1. Click on the Staging Columns tab.
  2. Click the Import Schema button.
  3. Select “Task” in the Connection dropdown. The Import Table field is populated with tables and views from the source.
  4. Select the “dbo.Task” in the Import Table field.
  5. Click Ok. The columns and data types of the source table are imported into the Staging table design.

Advanced Tab

On the Advanced tab, you will be presented with additional properties for the staging table. These properties are based on the target technology for the data warehouse. In this case, the target technology is Azure Data Warehouse, so you are presented with a DISTRIBUTION option with a default.

  1. Just keep the default options on the advanced tab.

Create a Date Range Extract

  1. To define an Extract select the Add button on the Extract tab. The Extract dialog appears.

Overview Tab

  1. Extract Name. Keep the default Extract Name.
  2. Extract Pattern. Select “Date Range” in the Extract Pattern field. The selected pattern is used to determine the type of code that will be generated for this Extract.
  3. Extract Overview. Optional. Enter a description of the extract for documentation purposes.

Source Tab

The Source tab defines the source table or query of the Extract.

  1. Connection. Select the “Service” connection in the Connection dropdown.
  2. Source Table View. Select the “dbo.Task” in the ‘Table or View‘ dropdown. You can also start typing dbo.Tas… and the matching values will display for selection.

Advanced Tab

The number of days of data to extract is specified by a custom metadata tag called ‘DIL_Extract_Period_Days’ added to the tags table on the advanced tab of the Extract dialog.

  1. Add a new Tag in the Extract Tags table. Set the Tag to “DIL_Extract_Period_Days“.
  2. Set the Value of this tag to “21” to represent 21 days of data to extract. That is, the start date of the date range is 21 days prior to the current batch effective.

Your result should look like this:

DIL_Extract_Period_Days

You must also tag one source column as the Date attribute that contains the date that is used to filter the data in the source based on the date range. To identify this source column, edit it through the Mapping tab of the Extract dialog and add a tag named ‘DIL_Date_Attribute’.

  1. Return to the Mapping tab.
  2. Click the “ScheduledDate” row, and click the Edit Mapping button.The Extract Mapping dialog appears. The staging column chosen must be of a date type that can be CAST to a DateTime.
  3. In the Extract mapping tags table add a Tag NameDIL_Date_Attribute“.
  4. Set the Tag Value to “DIL_Date_Attribute” also.

Your result should look like this.

Date Range Date Attribute

How does the Date Range Pattern work?

When Dimodelo Data Warehouse Studio generates the source query in the Shift data flow, it generates it like this:

SELECT [TaskId] ,[TaskName] ,[TaskDescription] ,[TaskEstimateMinutes] ,[TaskTypeCode] ,[ServiceId] ,[ScheduledDate] ,NULL AS [Reject_Ind] FROM
[dbo].[Task] with (noLock) WHERE CONVERT(datetime,[ScheduledDate],112) >= ?

Where ScheduledDate is the selected DIL_Date_Attribute.

Then the package calculates the first date of the range as:

DATEADD(“dd”, -21, @[User::Current_Batch_Effective_Date] )

Where 21 is the DIL_Extract_Period_Days. So for example, if the current effective date was  2020-01-21, then the resulting date would be 2019-12-31.

Dimodelo Data Warehouse Studio then substitutes the resulting date for the question mark in the query.

All data where the scheduledDate >= 2019-12-31 is extracted.