Tuesday, November 16, 2010

Handling Type II Dimension with the MERGE Statement

Microsoft attempted to make our lives easier with the Slowly Changing Dimension task in SSIS.  However, as we all know the task has some major performance issues.  So they introduced the MERGE statement.  When I initially started using the MERGE statement I considered it as a great solution to handle Type I dimensions, but I did not consider it as a solution for Type II.  Recently, I was chatting with Brian Knight and he said that the MERGE statement could handle Type II dimensions.  After spending some time with Books Online I realized that you could SELECT the OUTPUT data from the MERGE Statement.  This set up the perfect mechanism for handling a Type II dimension.  This is how I did it.

Running the examples requires the AdventureWorks2008R2 database.  To get started with the explanation run the following script:

USE tempdb


GO


IF(OBJECT_ID('dbo.Product')) IS NOT NULL


    DROP TABLE dbo.Product


GO


CREATE TABLE dbo.Product


(


    ProductID int identity(1,1),


    ProductNumber varchar(50),


    ProductName varchar(100),


    Color varchar(30),


    ListPrice decimal(12,2)


)


GO


 


INSERT INTO dbo.Product


SELECT 


    ProductNumber,


    Name,


    Color,


    ListPrice


FROM AdventureWorks2008R2.Production.Product


 


GO


 


IF(OBJECT_ID('dbo.DimProduct')) IS NOT NULL


    DROP TABLE dbo.DimProduct


GO


CREATE TABLE dbo.DimProduct


(


    ProductSK int identity(1,1),


    ProductAK varchar(50),


    ProductName varchar(100),


    Color varchar(30),


    ListPrice decimal(12,2),


    EffectiveDate datetime,


    ExpirationDate datetime


)


GO


 


INSERT INTO dbo.DimProduct(ProductAK, ProductName, Color, ListPrice, EffectiveDate)


SELECT 


    ProductNumber,


    ProductName,


    Color,


    ListPrice,


    '1/1/1900'


FROM dbo.Product


GO






The above scripts creates an operational Product table and initially loads the data into the DimProduct table.  Now let's assume that ListPrice is a Type II attribute.  In other words, if the ListPrice changes, the current effective row should be expired and a new row should be inserted into the dimension as the effective row.  This new row will contain the changed data.  To simulate a Type II change run the following script:




UPDATE dbo.Product


SET 


    ListPrice = 5.00


WHERE


    ProductNumber IN ('AR-5381','BA-8327')


GO




The script will change the ListPrice for two products in the operational table.  After the updates have been executed, run the following script:




   1:  INSERT INTO dbo.DimProduct(ProductAK, ListPrice, Color, ProductName, EffectiveDate)


   2:  SELECT ProductNumber, ListPrice, Color, ProductName, EffectiveDate


   3:  FROM 


   4:  (


   5:      MERGE dbo.DimProduct dp


   6:      USING dbo.Product p


   7:          ON dp.ProductAK = p.ProductNumber


   8:      WHEN NOT MATCHED THEN


   9:          INSERT (ProductAK, ProductName, Color, ListPrice, EffectiveDate)


  10:          VALUES (p.ProductNumber, p.ProductName, p.Color, p.ListPrice, '1/1/1900')


  11:      WHEN MATCHED 


  12:          AND ExpirationDate IS NULL 


  13:          AND (dp.ListPrice <> p.ListPrice) THEN


  14:          UPDATE


  15:              SET


  16:                  dp.ExpirationDate = convert(datetime, getdate(), 101)


  17:      OUTPUT $Action MergeAction, p.ProductNumber, p.ListPrice, p.Color, p.ProductName, convert(datetime, getdate(), 101) EffectiveDate


  18:  ) MergeOutput


  19:  WHERE


  20:      MergeAction = 'Update';



Let's start the explanation in the inside of the query.  On line 12 the current effective row is identified.  You may need to change this depending on your Type II dimension design patterns.  Your dimensions may use a specific date as the expiration date and you may also include a ActiveFlag bit column.  Then on line 13 I verify that the column specified as a Type II attribute has actually changed.  Finally, on line 16 the row is expired.  Now you may be thinking how are you going to add the new current row for the corresponding expired rows.  That was the same thing I was thinking.



On line 17 I output any of the Inserts or Updates.  In addition, I added a column that acts as the Effective Date.  The thing that I did not know was that you can SELECT the OUTPUT data from the MERGE statement and INSERT it into a table.  This is accomplished in lines 1 and 2.  Essentially, I derived a table from the OUTPUT of the MERGE statement.  Lastly, I included a WHERE clause to ensure that I only inserted the data that was UPDATED.  That pretty much did it.  To verify the changes run the following query:




SELECT *


FROM dbo.DimProduct


WHERE


    ProductAK IN ('AR-5381','BA-8327')




Talk to you soon,



Patrick LeBlanc, SQL Server MPV, MCTS

No comments:

Post a Comment