Review Persistent Staging results

Introduction

This topic walks you through how a Persistent staging table is updated when changes are made at the source.

A persistent staging table records the full history of change of a source table or query. The source could a source table, a source query, or another staging, view or materialized view in a Dimodelo Data Warehouse Studio project. In a persistent table, there may be multiple versions of each row in the source. Each version of the row has an effective date and end date marking the date range of when that row version was valid (or in existence).

Technically speaking a persistent table is a bi-temporal table. A bi-temporal table permit queries over two timelines: valid time and transaction time. Valid time is the time when a row is effective. Transaction time denotes the time when the row version was recorded in the database. The persistent table supports transaction time by tagging each row version with an inserted and updated batch execution Id. The batch execution is associated with a start date-time in the batch database. Note that the DA version of the bi-temporal table goes one step further by identifying a last updated transaction date time.

Persistent Staging is comprehensively covered in the Persistent Staging Lesson.

Review the current state

  1. Run the following query against the Service source database.
Select ServiceId,
StartDate,
CompletedDate,
ClientId,
ServiceTypeCode,
ModifiedDate,
Rate 
FROM Service
where ServiceId in (6,998,1100)

  1. Run the following query against the Azure Data Warehouse database. Compare that to the current state of the Service source table. They will match.
Select ServiceId,
StartDate,
CompletedDate,
[Inserted_Batch_Execution_Id] , 
[Updated_Batch_Execution_Id] , 
[Row_Effective_Date] , 
[Row_End_Date] , 
[Row_Is_Latest],
[Row_Is_Deleted]
FROM psg.SV_Service
where ServiceId in (6,998,1100)
Order By ServiceId, Row_effective_Date


Make some changes

  1. Run the following query against the Service database to update row 6, Delete row 998 and Insert row 1100.
-- Update row 6, update completed Date
Update [Service]
SET CompletedDate = getdate()
,ModifiedDate = getdate()
where ServiceId = 6

-- Delete row 998
-- Remove foreign key references first
Update Task 
SET ServiceId = 997  
where ServiceId = 998;

Update TimeRecord
SET ServiceId = 997
,ModifiedDate = getdate()
where ServiceId = 998;

Delete [Service]
Where ServiceId = 998;

-- Insert row 1100
INSERT INTO [dbo].[Service]
           ([ServiceId]
           ,[Description]
           ,[DueDate]
           ,[StartDate]
           ,[CompletedDate]
           ,[ClientId]
           ,[ServiceTypeCode]
           ,[ScheduledDate]
           ,[BilledDate]
           ,[ModifiedDate]
           ,[Rate])
     VALUES
           (1100
           ,'New Record'
           ,getdate()
           ,getdate()
           ,null
           ,187
           ,337
           ,getdate()
           ,null
           ,getdate()
           ,212);
    1. Re-run the Batch, using the same method in the previous topic.

Review the new state

  1. Run the following query against the Azure Data Warehouse database.
Select ServiceId,
StartDate,
CompletedDate,
[Inserted_Batch_Execution_Id] , 
[Updated_Batch_Execution_Id] , 
[Row_Effective_Date] , 
[Row_End_Date] , 
[Row_Is_Latest],
[Row_Is_Deleted]
FROM psg.SV_Service
where ServiceId in (6,998,1100)
Order By ServiceId, Row_effective_Date

 

  • Row 1. This is the original version of the Service with ServiceId = 6.
    • Updated_Batch_Execution_Id has been updated
    • Row_End_Date = Batch 4 Batch_Execution_Date.
    • Row_Is_Latest (i.e. Latest version) = 0 for false.
  • Row 2. This is the new version of the Service with ServiceId = 6.
    • CompletedDate has a value (the change).
    • Inserted_Batch_Execution_Id = latest batch.
    • Row_Effective_Date = Row_End_Date of the row it supersedes making a contiguous date range.
    • Row_Is_Latest (i.e. Latest version) = 1 for true.
  • Row 3. This is the deleted Service record with ServiceId = 998.
    • Updated_Batch_Execution_Id has been updated to the current batch id.
    • Row_End_Date = current date.
    • Row_Is_Latest (i.e. Latest version) = 1 for true. It’s still the latest version even though it’s deleted.
    • Row_Is_Deleted = 1 indicating the row has been deleted from the source.
  • Row 4. This is the new Service with ServiceId = 1100.
    • Inserted_Batch_Execution_Id = latest batch.
    •  Updated_Batch_Execution_Id = 0, it’s never been updated.
    • Row_Effective_Date = low date, important so that the row is not excluded from a temporal join.
    • Row_End_Date = high date, version is the version valid into the future unless superseded.
    • Row_Is_Latest (i.e. Latest version) = 1 for true.