Within the context of enterprise data warehousing, the effective management of historical data is essential for supporting informed business decision-making. Slowly Changing Dimension (SCD) Type 2 is a widely adopted technique for addressing changes in data over time.
A brief overview of Slowly Changing Dimensions Type 2
Slowly Changing Dimensions Type 2 (SCD Type 2) is a common solution for managing historical data. To ensure clarity, I’ll briefly recap SCD Type 2.
A Type 2 of SCD retains the full history of values. When the value of a chosen attribute changes, the current record is closed. A new record is created with the changed data values and this new record becomes the current record.
Existing Dimension data
surrokey depID dep IsActivity
1 1001 IT 1
2 1002 HR 1
3 1003 Sales 1
Dimension changed and new data comes
depId dep
1003 wholesale <--- depID is same, name changed from "Sales" to "wholesale"
1004 Finance <--- new data
Mark existing dimensional records as expired (inactive); create a new record for the current dimensional data; and insert new incoming data as new dimensional records.
Now, the new Dimension will be:
surrokey depID dep IsActivity
1 1001 IT 1 <-- No action required
2 1002 HR 1 <-- No action required
3 1003 Sales 0 <-- mark as inactive
4 1003 wholesale 1 <-- add updated active value
5 1004 Finance 1 <-- insert new data
This solution demonstrates the core concepts of a Slowly Changing Dimension (SCD) Type 2 implementation. While it covers the major steps involved, real-world production environments often have more complex requirements. When designing dimension tables (e.g., the dep table), I strongly recommend adding more descriptive columns to enhance clarity. Specifically, including [Start_active_date] and [End_active_date] columns significantly improves the traceability and understanding of dimension changes over time.
Implementing SCD Type 2
Step 1: Create a Dimension Table- dep
# Create table
create table dep (
surrokey int IDENTITY(1, 1),
depID int,
dep varchar(50),
IsActivity bit);
# Insert data,
surrokey depID dep IsActivity
1 1001 IT 1
2 1002 HR 1
3 1003 Sales 1
Step 2: Create Data Flow
Add the source dataset. dataset should point to file which is located in your source layer.



We have 2 data rows. That means depID =1003, updated value, a new comes depID=1004 need add into dimension table.
Step 3: Add derived column
Add derived column resource and add column name as isactive and provide the value as 1.


Step 4: Sink dimension data
Create a dataset point to SQL Server Database Table dep

Add a Sink use above dataset, SQLServer_dep_table

Configure the sink mappings as shown below


Step 5: Add SQL dataset as another source.

Step 6: Rename column from Database Table dep
Use select resource to rename columns from SQL table.
rename column name:
- depID –> sql_depID
- dep –> sql_dep
- Isactivity –> sql_IsActivity

Step 7: Lookup
Add lookup to join new dimension data that we have import in “srcDep” at “Step 2”
At this step, existing dimension table “Left Join” out the new coming dimension (need update info or new comes dimension values).
- existing dimension data, depID=1003 ,previously “dep” called “Sales” , now it need changing to “wholesales”


Step 8: filter out non-nulls
Add filter, filter out the rows which has non-nulls in the source file columns.
Filter expression : depID column is not null.
!isNull(depid)

This requires filtering the ‘lkpNeedUpdate’ lookup output to include only rows where the depID is not null.


Step 9: Select need columns
Since up stream “filterNonNull” output more columns,

Not all columns are required. The objective is to use the new data (containing depid and dep) to update existing information in the dimension table (specifically sql_depID, sql_dep, and sql_isActivity) and mark the old information as inactive.
Add a “SELECT” to select need the columns that we are going to insert or update in Database dimension table.

Step 10: add a new column and give its value = “0”
Add a deriver, set its value is “0” , means mark it as “inactive“


Step 11: alter row
Add a “Alter Row” to update row information.
configure alter conditions:
Update 1==1

Step 12 Sink updated information
we have updated the existing rows, mark it “0” as “inactive”. it time to save it into database dimension table.
Add a “Sink” point to database dimension table – dep

mapping the columns,
sql_depid ---> depID
sql_dep ---> dep
ActivityStatus ---> IsActivity


Step 13: Adjust Sink order
As there are two sinks, one designated for the source data and the other for the updated data, a specific processing order must be enforced.
Click on a blank area of the canvas, at “Settings” tag, configure them order.
1: sinkUpdated
2: sinkToSQLDBdepTable

Step 14: creata a pipeline
create a pipeline, add this data flow, run it.


SELECT TOP (5000) [surrokey]
,[depID]
,[dep]
,[IsActivity]
FROM [williamSQLDB].[dbo].[dep]
surrokey depID dep IsActivity
1 1001 IT 1
2 1002 HR 1
3 1003 Sales 0
4 1003 Wholesale 1
5 1004 Finance 1

Conclusion
In conclusion, we have explored the powerful combination of Slowly Changing Dimensions Type 2, it has provided you with a comprehensive understanding of how to effectively implement SCD Type 2 in your data warehousing projects, leveraging modern technologies and following industry best practices.
By implementing SCD Type 2 according to Ralph Kimball’s approach, organizations can achieve a comprehensive view of dimensional data, enabling accurate trend analysis, comparison of historical performance, and tracking of changes over time. It empowers businesses to make data-driven decisions based on a complete understanding of the data’s evolution, ensuring data integrity and reliability within the data warehousing environment.
Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca
(remove all space from the email account 😊)