Using a mapping table for lookups
Some of the key fields for mapping tables where for example it’s a merge and de-duplicate is an indicator which record is the destination record for example if you are migrating has multiple sources you need to include things like source system / sub system as well as the reference attributes and obviously the target key.
In most cases a single key field can be used to accommodate multiple source attributes – but if you are going for this approach it must be the least restrictive data type that is used – and you will have to be prepared to always cast the data to that data type (e.g. varchar / nvarchar)
Performance wise – it may be worth grouping the source data so that you can type the data in the lookup table better – that would give much better performance than casting all the data.
My personal preference for keys in a system is still the good old fashioned identity column – defined as either INT or BIGINT depending on the expected number of records / record growth although there is also a good case for using GUID’s.
In all cases, in a migration especially with merge, the keys will need to be pre-determined to avoid always having to join to all the source systems. This can be achieved many ways including using identity columns on work tables, row_number() / Dense_rank() functions or even using the output clause from any insert statements.
We will over time be publishing examples on all these options, however, the design of the mapping table will depend on a number of factors in the migration not least whether the old system uses codes for key values or has descriptions. Generally for either method the initial population of the table can be performed via a select distinct columnname from the source system.
Any joins to the lookup table need either to be done as updates or as Left joins to ensure that an absence of a code in a lookup table doesn’t cause the row to be dropped from the original source data.
Reports should be run regularly to verify that there are no null’s resulting from the left join (or update) since these indicate missing entries in the table.