Microsoft attempted to make our lives easier with the Slowly Changing Dimension task in SSIS. However, as we all know the task has some major performance issues. So they introduced the MERGE statement. When I initially started using the MERGE statement I considered it as a great solution to handle Type I dimensions, but I did not consider it as a solution for Type II. Recently, I was chatting with Brian Knight and he said that the MERGE statement could handle Type II dimensions. After spending some time with Books Online I realized that you could SELECT the OUTPUT data from the MERGE Statement. This set up the perfect mechanism for handling a Type II dimension. This is how I did it.
Running the examples requires the AdventureWorks2008R2 database. To get started with the explanation run the following script:
USE tempdbGOIF(OBJECT_ID('dbo.Product')) IS NOT NULL
DROP TABLE dbo.Product
GOCREATE TABLE dbo.Product
(
ProductID int identity(1,1),
ProductNumber varchar(50), ProductName varchar(100), Color varchar(30), ListPrice decimal(12,2))
GOINSERT INTO dbo.ProductSELECT ProductNumber,
Name,
Color,
ListPrice
FROM AdventureWorks2008R2.Production.ProductGOIF(OBJECT_ID('dbo.DimProduct')) IS NOT NULL
DROP TABLE dbo.DimProduct
GOCREATE TABLE dbo.DimProduct
(
ProductSK int identity(1,1),
ProductAK varchar(50), ProductName varchar(100), Color varchar(30), ListPrice decimal(12,2),EffectiveDate datetime,
ExpirationDate datetime
)
GOINSERT INTO dbo.DimProduct(ProductAK, ProductName, Color, ListPrice, EffectiveDate)SELECT ProductNumber,
ProductName,
Color,
ListPrice,
'1/1/1900'FROM dbo.ProductGO
The above scripts creates an operational Product table and initially loads the data into the DimProduct table. Now let's assume that ListPrice is a Type II attribute. In other words, if the ListPrice changes, the current effective row should be expired and a new row should be inserted into the dimension as the effective row. This new row will contain the changed data. To simulate a Type II change run the following script:
UPDATE dbo.ProductSET ListPrice = 5.00
WHEREProductNumber IN ('AR-5381','BA-8327')
GO
The script will change the ListPrice for two products in the operational table. After the updates have been executed, run the following script:
1: INSERT INTO dbo.DimProduct(ProductAK, ListPrice, Color, ProductName, EffectiveDate)
2: SELECT ProductNumber, ListPrice, Color, ProductName, EffectiveDate
3: FROM
4: ( 5: MERGE dbo.DimProduct dp6: USING dbo.Product p
7: ON dp.ProductAK = p.ProductNumber
8: WHEN NOT MATCHED THEN
9: INSERT (ProductAK, ProductName, Color, ListPrice, EffectiveDate)10: VALUES (p.ProductNumber, p.ProductName, p.Color, p.ListPrice, '1/1/1900')
11: WHEN MATCHED
12: AND ExpirationDate IS NULL
13: AND (dp.ListPrice <> p.ListPrice) THEN
14: UPDATE
15: SET
16: dp.ExpirationDate = convert(datetime, getdate(), 101)
17: OUTPUT $Action MergeAction, p.ProductNumber, p.ListPrice, p.Color, p.ProductName, convert(datetime, getdate(), 101) EffectiveDate
18: ) MergeOutput19: WHERE
20: MergeAction = 'Update';
Let's start the explanation in the inside of the query. On line 12 the current effective row is identified. You may need to change this depending on your Type II dimension design patterns. Your dimensions may use a specific date as the expiration date and you may also include a ActiveFlag bit column. Then on line 13 I verify that the column specified as a Type II attribute has actually changed. Finally, on line 16 the row is expired. Now you may be thinking how are you going to add the new current row for the corresponding expired rows. That was the same thing I was thinking.
On line 17 I output any of the Inserts or Updates. In addition, I added a column that acts as the Effective Date. The thing that I did not know was that you can SELECT the OUTPUT data from the MERGE statement and INSERT it into a table. This is accomplished in lines 1 and 2. Essentially, I derived a table from the OUTPUT of the MERGE statement. Lastly, I included a WHERE clause to ensure that I only inserted the data that was UPDATED. That pretty much did it. To verify the changes run the following query:
SELECT *FROM dbo.DimProductWHEREProductAK IN ('AR-5381','BA-8327')
Talk to you soon,
Patrick LeBlanc, SQL Server MPV, MCTS

No comments:
Post a Comment