/
Conversion: Data Staging methodology

Conversion: Data Staging methodology

Although this guide will use Benchmark as an example, the methods can be used to implement any product.

In order to properly convert the data and guarantee consistency and integrity across systems, we need to ensure the relationships are being respected through the entire migration process. To achieve a robust process that is scalable and repeatable across all projects, we will be employing standard staging tables.

 

While there are many approaches that can be applied to the staging tables we will be implementing, I will be breaking down the most common approach that we have already put into place for some of the Projects in our organization. In the future we can continue to evolve it into more complex versions as demand arises.

 

Star-Schema Import: The data will be staged, data in Parent-Children tables will be related to each other using temporary ID’s and imported at the same time, the generated Benchmark ID’s will later be copied back to the Parent staging tables and used to join the children tables.

 

Regardless of the staging approach, we should generally be able to use the same Staging tables. Consistently using the same staging objects allows us to create a library of validation procs and other logic that we can later re-use and scale for other projects. More on that later.

Staging Data

 

Consider the following staging tables:

PTG_ImportStaging_tblParty:

CREATE TABLE [dbo].[PTG_ImportStaging_tblParty]( [ImportPartyID] [int] IDENTITY(1,1) NOT NULL, [OriginalID] [int] NULL, [InstanceID] [int] NULL, [PartyCode] nvarchar NULL, [PrimaryPartyType] nvarchar NULL, PartyID INT)

PTG_ImportStaging_tblPartyName:

CREATE TABLE [dbo].[PTG_ImportStaging_tblPartyName]( [ImportPartyID] [int] NULL, [LastOrBusinessName] nvarchar NULL, [FirstName] nvarchar NULL, [MiddleName] nvarchar NULL, [PrefixName] nvarchar NULL, [SuffixName] nvarchar NULL, [PartyNameType] nvarchar NULL, [SoundexCode] nvarchar NULL ) ON [PRIMARY] GO

 

 

Notice the following:

  1. PTG_ImportStaging_tblParty.ImportPartyID is an identity, but PTG_ImportStaging_tblPartyName.ImportPartyID is not

  2. PTG_ImportStaging_tblPartyName.ImportPartyID will be manually populated as the records are inserted.

  3. PTG_ImportStaging_tblParty.OriginalID is used to store the source system record unique pointer.

  4. PTG_ImportStaging_tblParty.PartyID will be used to store the Benchmark ID that will be generated once the records are inserted.

 

 

Then our data staging step for PTG_ImportStaging_tblParty, looks as follows:

We can say tblParty is a parent table, since tblPartyName, tblPartyAddress and tblPartyPhone use tblParty.PartyID to relate the rows, therefore the latter are childrens.

INSERT INTO PTG_ImportStaging_tblParty ( OriginalID , InstanceID , PartyCode , PrimaryPartyType) SELECT P.SourceTableID as OriginalID ,1 as InstanceID ,'' as PartyCode ,'CLK' as PrimaryPartyType ,'' as PrimaryPartySubType FROM CustomerPartySourceTable PS

 

The data staging step for the Children table PTG_ImportStaging_tblPartyName, looks as follows:

Note

  • We are Joining the source table that contains the Party Names to the staging tblParty table where we have staged the parent data on the Source PartyID and inserting the staged ImportPartyID. This will ensure we’re only loading the names for which parties we have already staged and are therefore not loading parentless rows or extra rows that were not included in the logic used to stage the parent.

INSERT INTO PTG_ImportStaging_tblPartyName ( ImportPartyID, LastOrBusinessName ,FirstName ,MiddleName ,PrefixName ,SuffixName ,PartyNameType ,SoundexCode ) --Names SELECT ISP.ImportPartyID ,PN.caa479400011 as LastOrBusinessName ,PN.caa479400012 as FirstName ,PN.caa479400013 as MiddleName ,NULL as PrefixName ,PN.caa479400014 as SuffixName ,NULL as PartyNameType ,PN.caa47940004 as SoundexCode FROM PTG_ImportStaging_tblParty ISP INNER JOIN CustomerPartyNameSourceTable PN ON ISP.OriginalID = PN.SourcePartyID WHERE ISP.InstanceID = 1

Transforming and Validating Data

 

The use of standardized staging tables allows us to standardize validation tasks across projects. The standardization then enables us to reuse code in order to streamline the process by logic, such as incorporating the Lookup Group Mapper in our Package to check that the data is being staged has been configured properly.

i.e., The rows in PTG_ImportStaging_tblParty.PrimaryPartyType have a properly configured and matching value in the corresponding PrimaryPartyType Lookup Group. The assigned engineers can then easily check the exceptions and work through the reported exceptions with ease.

 

PTG_ImportStaging_tblParty staged data, with the Benchmark pointer column (PartyID) still NULL because no Inserts have been made into Benchmark.

 

Inserting to destination tables

Once the data is ready to be loaded, we can make the use of a MERGE INTO to be able to insert rows as we copy back the generated ID’s to the staging tables through an OUTPUT clause.

 

/*Declare a staging table to store the generated PartyID's for each staged row*/ DECLARE @ImportStagingPartyIds TABLE([PartyID] INT, [ImportPartyID] INT) MERGE INTO tblParty USING PTG_ImportStaging_tblParty AS ISP ON 1 = 0 WHEN NOT MATCHED AND ISP.InstanceID = 1 THEN INSERT ( PartyCode , PrimaryPartyType , Name) VALUES (ISP.PartyCode , ISP.PrimaryPartyType , ISP.Name) /*Put the PartyID and the ImportID back on a Table variable*/ OUTPUT Inserted.PartyID, ISP.ImportPartyID INTO @ImportStagingPartyIds; /*Update the staging table with the created PartyID*/ UPDATE ISP SET ISP.PartyID = NI.PartyID FROM @ImportStagingPartyIds NI INNER JOIN PTG_ImportStaging_tblParty ISP ON NI.ImportPartyID = ISP.ImportPartyID WHERE ISP.InstanceID = @InstanceID

 

In the following screenshot we can see the inserted tblParty records in the top result set. As well as PTG_ImportStaging_tblParty, notice how the above logic updates our staging table PartyID with the created ID’s.

We see the Inserted tblParty record, and the PTG_ImportStaging_tblParty.PartyID has been updated with the created ID by the logic above.

 

The children insert becomes then straight forward:

Similarly to the join in the staging process, to insert we join to the Parent table on the staged row ID (ImportPartyID) to get the Benchmark PartyID into the destination table.

--Insert Party Name records INSERT INTO tblPartyName (PartyID, LastOrBusinessName, FirstName, MiddleName, PrefixName, SuffixName, PartyNameType, SoundexCode, CreateByUserID, CreateDate) SELECT ISP.PartyID , PN.LastOrBusinessName , PN.FirstName , PN.MiddleName , PN.PrefixName , PN.SuffixName , PN.PartyNameType , PN.SoundexCode , @ConversionUser , GETDATE() FROM ImportStaging_tblPartyName PN INNER JOIN ImportStaging_tblParty ISP ON PN.ImportPartyID = ISP.ImportPartyID WHERE ISP.InstanceID = @InstanceID

 

We can use the above method for any Parent-Children relationships such as tblCase and tblCaseCharge, tblCaseEvent, etc. Noting that tblCaseParty will have tblParty and tblCase as parents.

 

Example Package:

\TFS\Conversion\BMBrevard\BMBrevard_Pipeline\BMBrevard_Pipeline\Parties.dtsx

Tables and objects:

\TFS\Conversion\Integration_Pipeline\Import_Staging

 

Related content