How to use this pattern
For a Staging table, in the ETL dialog, select the ‘Incremental Extract’ pattern for the Extract. See below (click to enlarge ).
In the query you must refer to your ‘Change Data Identifier’ columns as @ChangeDataIdentifierColumnName where @ChangeDataIdentifierColumnName is any name you choose. The query should be something like:
WHERE ChangeDataIdentifierColumnName > @ChangeDataIdentifierColumnName
However you can write much more sophisticated queries if necessary. Then in ‘Change Data Identifier’ table, select the column(s) that are defined as @ChangeDataIdentifierColumnName in the query and supply an ‘Initial Load Value’ value. The ‘Initial Load Value’ is used in the Extract the first time it runs. The idea is to set this to a very low value, so that the initial run of Extract does a Full Extract, selecting all rows in the source. You can tailor this to your needs. For example you may only want rows for the past 3 years.
How the generated code works
As stated previously, each time the Extract is executed Dimodelo saves in the control Parameters table the maximum value that was extracted for each of the ‘Change Data Identifier’ column(s). When you define your source as a table/view, Dimodelo generates a query with the appropriate @ChangeDataIdentifierColumnName placeholders for each ‘Change Data Identifier’. The query would be something like
WHERE ChangeDataIdenitifierColumnName > @ChangeDataIdentifierColumnName
This query is used as the query for the source component of the SSIS package data flow. When you define your source as a query, Dimodelo just sets the query in the source component of the SSIS package data flow to the provided query. Before SSIS package data flow is executed, the package retrieves the maximum value of each ‘Change Data Identifier’ column(s) that was stored in the Control Parameters table by the previous invocation of the Extract. The maximum value is substituted for the @ChangeDataIdentifierColumnName in the source query prior to the data flow being executed. It is also possible to include Dimodelo’s standard batch control properties (like batch_effective_date, batch_execution_id) as parameters in the query. Just include them as @controlpropertyname. Read the Managing and Monitoring ETL batches Guide for more information about batch control properties. This works for both ODBC sources (which can take a query) and OLE DB sources. This doesn’t work for excel and file sources, by their very nature usually represent a batch of records, or are small in size. If you need to do this with a file source, then connect to it as an ODBC source.