Persistent Staging Case Study – Employee Fact in Human Resources Data Warehouse

This post is the second in a series of detailed cases studies discussing the ETL strategies that can be used when a Persistent staging layer is included in your Data Warehouse. It’s intended as a reference for developers using Dimodelo Data Warehouse Studio, our Data Warehouse Automation tool to quickly build a Data Warehouse. It’s most useful for Dimodelo developers who have some initial experience with Dimodelo, either through the Training, or in a commercial application. A general discussion of Persistent Staging can be found in “the top 5 reasons you need a persistent layer in your data warehouse” post.

This case study demonstrates how a persistent staging layer can support ETL and analysis scenarios that aren’t possible otherwise.

Overview

The HR department needed to analyse Employee Tenure, Appointment Term, Age, Head Count, Annual Leave Balance, Sick Leave Balance, Mix and Churn. On any given day, the measure values change, and the HR department needed up-to-date information on a daily, weekly and monthly basis.

To accommodate the requirement, a daily Employee Period Snapshot fact was specified. This allowed HR to analyse their employee mix, tenure, appointments etc. as at the end of any given day. The measures could be aggregated over, analysed by, and broken down by the attributes of:

  • Calendar Period
  • Employee
  • Position
  • Work Area
  • Business Line
  • Department
  • Function
  • Age Range
  • Tenure Range
  • Appointment Reason
  • Termination Reason
  • Termination Calendar Period (Employees Terminated in Period)
  • Planned Termination Calendar Period
  • Start Calendar Period (Employees Started in Period)

Challenges

There were 2 challenges in providing the required information:

  1. Employee Tenure and Appointment could be back dated in the source system. For example, An Employee’s appointment to a role might be back dated a month. At the time the update was made, the data warehouse would already had a month of data with the employee still in their old role. Similar back dated termination and start date changes could be made. For this reason, the ETL was set up to always recalculate the last 365 days for every employee.
  2. Most measures for any given day and employee combination (over the 365-day update period) were calculated based on the latest version of the employee. But some measures (e.g. Sick leave, Annual leave) are calculated, based on the version of the Employee on the given day. This meant at the staging layer we needed to keep a history of all employee changes, and produce a current view and an “As At” (the given day) view of Employee. These two view were then integrated at the data warehouse level.

The Solution

Staging

Using Dimodelo Data Warehouse Studio we first persisted the Employee in a persistent staging table. Persistent staging tables capture the full update history of the Employee source table. In-fact, all the raw staging tables used by the fact were persistent staging tables.

We then defined 3 materialized views over the Employee persistent staging table.

  1. The Employee Latest Materialized View joins the past 365 days from the Calendar staging table to the latest version of each Employee. To get the latest version of the Employee simply filter the Employee Persistent Staging Table Where Row_Is_Latest = 1.
  2. The Employee As-At Materialized view joins the past 365 days from the Calendar staging table to the version of each Employee that was effective on that day. We use this view as the source of our Sick Leave and Annual Leave measures later in the Fact. There is only one version of each employee effective on any given day. To join on Effective date, we did the following join:
Select …
FROM stg.C_Calendar C
JOIN [psg].[P_EmployeeP] E
-- Where calendar date falls within the effective date range of the Employee row
-- and calendar date between the employee start and termination date
ON C.Full_Date BETWEEN E.StartDate and COALESCE(E.TerminationDate, E.PlannedTerminationDate, '21000101')
AND (C.Full_Date >= E.Row_Effective_Date AND C.Full_Date < E.Row_End_Date)
  1. The Appointment Materialized View joins Calendar (365 days), Employee, Appointment and Work Area, also for the latest version of the Employee. This view reflected the Primary appointment and attributes for that day. Only one appointment is returned for each employee for each day. Where more than one appointment exists for a given day priority is determined by a rule set.

Fact

The Fact table now becomes relatively simple to define.

  • The business key of the Fact is Employee Code and Day Date.
  • The staging sources of the Fact is the three materialized views joined on Employee Code and Day Date. As an aside, we defined these views as materialized views instead of just simple views because of performance. Joining across views, in SQL Server has terrible performance. Materialized views are materialized as tables in the staging layer, which perform much better.
  • The measures that are derived from the latest version of the employee are mapped to the “Employee Latest Materialized View”.
  • The measures that are derived from the version of the Employee on the given day are mapped to the “Employee As-At Materialized View”.
  • The appoint measures are derived from the “Appointment Materialized View”

In addition, in the Fact Transform, we needed to filter the Fact to the same 365-day period, so that the ETL doesn’t mistake Employee/Day combinations outside (before) the 365 days’ as deleted. This is achieved in Dimodelo by putting a filter expression on the Fact in the Match On tab of the Transform dialog.

The ETL code generated by Dimodelo (Stored Procedure) manages the required updates to the Fact internally including any required back dated updates to fact rows.

This approach is only possible because of persistent staging tables. The required history just isn’t available in normal transient staging tables. It demonstrates that persistent staging tables enable new kinds of data management possibilities.

Dimodelo solutions has in-depth experience in HR Data Warehouses along with many other domains. If you would like help with your Data Warehouse implementation please contact us through our contact page. You can accelerate your Data Warehouse development cycle using our Data Warehouse Automation tool. Download the Dimodelo Data Warehouse Studio free trial and community edition.

Links:

Case Study 1 – Payroll Fact table. 

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.