Tuesday, November 16, 2010

Handling Type II Dimension with the MERGE Statement

Microsoft attempted to make our lives easier with the Slowly Changing Dimension task in SSIS.  However, as we all know the task has some major performance issues.  So they introduced the MERGE statement.  When I initially started using the MERGE statement I considered it as a great solution to handle Type I dimensions, but I did not consider it as a solution for Type II.  Recently, I was chatting with Brian Knight and he said that the MERGE statement could handle Type II dimensions.  After spending some time with Books Online I realized that you could SELECT the OUTPUT data from the MERGE Statement.  This set up the perfect mechanism for handling a Type II dimension.  This is how I did it.

Running the examples requires the AdventureWorks2008R2 database.  To get started with the explanation run the following script:

USE tempdb


IF(OBJECT_ID('dbo.Product')) IS NOT NULL

    DROP TABLE dbo.Product


CREATE TABLE dbo.Product


    ProductID int identity(1,1),

    ProductNumber varchar(50),

    ProductName varchar(100),

    Color varchar(30),

    ListPrice decimal(12,2)




INSERT INTO dbo.Product






FROM AdventureWorks2008R2.Production.Product




IF(OBJECT_ID('dbo.DimProduct')) IS NOT NULL

    DROP TABLE dbo.DimProduct


CREATE TABLE dbo.DimProduct


    ProductSK int identity(1,1),

    ProductAK varchar(50),

    ProductName varchar(100),

    Color varchar(30),

    ListPrice decimal(12,2),

    EffectiveDate datetime,

    ExpirationDate datetime




INSERT INTO dbo.DimProduct(ProductAK, ProductName, Color, ListPrice, EffectiveDate)







FROM dbo.Product


The above scripts creates an operational Product table and initially loads the data into the DimProduct table.  Now let's assume that ListPrice is a Type II attribute.  In other words, if the ListPrice changes, the current effective row should be expired and a new row should be inserted into the dimension as the effective row.  This new row will contain the changed data.  To simulate a Type II change run the following script:

UPDATE dbo.Product


    ListPrice = 5.00


    ProductNumber IN ('AR-5381','BA-8327')


The script will change the ListPrice for two products in the operational table.  After the updates have been executed, run the following script:

   1:  INSERT INTO dbo.DimProduct(ProductAK, ListPrice, Color, ProductName, EffectiveDate)

   2:  SELECT ProductNumber, ListPrice, Color, ProductName, EffectiveDate

   3:  FROM 

   4:  (

   5:      MERGE dbo.DimProduct dp

   6:      USING dbo.Product p

   7:          ON dp.ProductAK = p.ProductNumber


   9:          INSERT (ProductAK, ProductName, Color, ListPrice, EffectiveDate)

  10:          VALUES (p.ProductNumber, p.ProductName, p.Color, p.ListPrice, '1/1/1900')

  11:      WHEN MATCHED 

  12:          AND ExpirationDate IS NULL 

  13:          AND (dp.ListPrice <> p.ListPrice) THEN

  14:          UPDATE

  15:              SET

  16:                  dp.ExpirationDate = convert(datetime, getdate(), 101)

  17:      OUTPUT $Action MergeAction, p.ProductNumber, p.ListPrice, p.Color, p.ProductName, convert(datetime, getdate(), 101) EffectiveDate

  18:  ) MergeOutput

  19:  WHERE

  20:      MergeAction = 'Update';

Let's start the explanation in the inside of the query.  On line 12 the current effective row is identified.  You may need to change this depending on your Type II dimension design patterns.  Your dimensions may use a specific date as the expiration date and you may also include a ActiveFlag bit column.  Then on line 13 I verify that the column specified as a Type II attribute has actually changed.  Finally, on line 16 the row is expired.  Now you may be thinking how are you going to add the new current row for the corresponding expired rows.  That was the same thing I was thinking.

On line 17 I output any of the Inserts or Updates.  In addition, I added a column that acts as the Effective Date.  The thing that I did not know was that you can SELECT the OUTPUT data from the MERGE statement and INSERT it into a table.  This is accomplished in lines 1 and 2.  Essentially, I derived a table from the OUTPUT of the MERGE statement.  Lastly, I included a WHERE clause to ensure that I only inserted the data that was UPDATED.  That pretty much did it.  To verify the changes run the following query:


FROM dbo.DimProduct


    ProductAK IN ('AR-5381','BA-8327')

Talk to you soon,

Patrick LeBlanc, SQL Server MPV, MCTS

No comments:

Post a Comment