New managed Incremental Extract Pattern
How to use this pattern
For a Staging table, in the ETL dialog, select the ‘Incremental Extract’ pattern for the Extract. See below (click to enlarge ). [/et_pb_text][et_pb_image admin_label=”Incremental Extract Pattern” src=”https://www.dimodelo.com/wp-content/uploads/Incremental-Extract-Pattern.jpg” alt=”Incremental Extract Pattern” show_in_lightbox=”on” url_new_window=”off” animation=”off” sticky=”off” align=”center” force_fullwidth=”off” always_center_on_mobile=”on” use_border_color=”off” border_color=”#ffffff” border_style=”solid” title_text=”Incremental Extract Pattern”] [/et_pb_image][et_pb_text admin_label=”Text” background_layout=”light” text_orientation=”left” text_font_size=”14″ use_border_color=”off” border_color=”#ffffff” border_style=”solid”] It is possible to define the source of the extract as either a table/view or as a query. To define the incremental extract for a table/view, first select the table/view in the ‘Table or View’ drop down. See below (click to enlarge ). [/et_pb_text][et_pb_image admin_label=”Incremental Extract Table Method” src=”https://www.dimodelo.com/wp-content/uploads/Incremental-Extract-Table-Method.jpg” show_in_lightbox=”on” url_new_window=”off” animation=”off” sticky=”off” align=”center” force_fullwidth=”off” always_center_on_mobile=”on” use_border_color=”off” border_color=”#ffffff” border_style=”solid” alt=”Incremental Extract Table Method” title_text=”Incremental Extract Table Method”] [/et_pb_image][et_pb_text admin_label=”Text” background_layout=”light” text_orientation=”left” text_font_size=”14″ use_border_color=”off” border_color=”#ffffff” border_style=”solid”] Then in ‘Change Data Identifier’ table, select the column(s) that are the ‘Change Data Identifier’ columns. You can select any column in the table/view. It’s also necessary to supply an ‘Initial Load Value’. The ‘Initial Load Value’ is used in the Extract the first time it runs. The idea is to set this to a very low value, so that the initial run of Extract does a Full Extract, selecting all rows in the source. You can tailor this to your needs. For example you may only want rows for the past 3 years. If you have more complex Extract requirements, it may be necessary to write a query. In this case it is still possible to do an incremental Extract. If you have more complex Extract requirements, it may be necessary to write a query. In this case it is still possible to do an incremental Extract. See below (click to enlarge ). [/et_pb_text][et_pb_image admin_label=”Incremental Extract Query Method” src=”https://www.dimodelo.com/wp-content/uploads/Incremental-Extract-Query-Method.jpg” alt=”Incremental Extract Query Method” title_text=”Incremental Extract Query Method” show_in_lightbox=”on” url_new_window=”off” animation=”off” sticky=”off” align=”center” force_fullwidth=”off” always_center_on_mobile=”on” use_border_color=”off” border_color=”#ffffff” border_style=”solid”] [/et_pb_image][et_pb_text admin_label=”Text” background_layout=”light” text_orientation=”left” text_font_size=”14″ use_border_color=”off” border_color=”#ffffff” border_style=”solid”]
In the query you must refer to your ‘Change Data Identifier’ columns as @ChangeDataIdentifierColumnName where @ChangeDataIdentifierColumnName is any name you choose. The query should be something like:
SELECT …
FROM …
WHERE ChangeDataIdentifierColumnName > @ChangeDataIdentifierColumnName
However you can write much more sophisticated queries if necessary. Then in ‘Change Data Identifier’ table, select the column(s) that are defined as @ChangeDataIdentifierColumnName in the query and supply an ‘Initial Load Value’ value. The ‘Initial Load Value’ is used in the Extract the first time it runs. The idea is to set this to a very low value, so that the initial run of Extract does a Full Extract, selecting all rows in the source. You can tailor this to your needs. For example you may only want rows for the past 3 years.
How the generated code works
As stated previously, each time the Extract is executed Dimodelo saves in the control Parameters table the maximum value that was extracted for each of the ‘Change Data Identifier’ column(s). When you define your source as a table/view, Dimodelo generates a query with the appropriate @ChangeDataIdentifierColumnName placeholders for each ‘Change Data Identifier’. The query would be something like
SELECT *
FROM Table
WHERE ChangeDataIdenitifierColumnName > @ChangeDataIdentifierColumnName
This query is used as the query for the source component of the SSIS package data flow. When you define your source as a query, Dimodelo just sets the query in the source component of the SSIS package data flow to the provided query. Before SSIS package data flow is executed, the package retrieves the maximum value of each ‘Change Data Identifier’ column(s) that was stored in the Control Parameters table by the previous invocation of the Extract. The maximum value is substituted for the @ChangeDataIdentifierColumnName in the source query prior to the data flow being executed. It is also possible to include Dimodelo’s standard batch control properties (like batch_effective_date, batch_execution_id) as parameters in the query. Just include them as @controlpropertyname. Read the Managing and Monitoring ETL batches Guide for more information about batch control properties. This works for both ODBC sources (which can take a query) and OLE DB sources. This doesn’t work for excel and file sources, by their very nature usually represent a batch of records, or are small in size. If you need to do this with a file source, then connect to it as an ODBC source.
[/et_pb_text][/et_pb_column][/et_pb_row][/et_pb_section]