Data Matching

Data Matching – Some hints and Tips

When performing Data Matching, I find that it helps if you create match keys on common elements if you are looking at matching / merging data during data migration.

For example if you are matching post codes you may want to normalise all postcodes into the same format e.g. remove all spaces prior to doing any matching.

For addresses it may be worth creating a string match key from the address but here you have to be caregul since if you strip unwanted characters like ‘-‘ out of house numbers 1-11 becomes the same as 111. In particular this is important if you are using any fuzzy transforms in SSIS

Interestingly – in the past I’ve used Jaro Winkler routines heavily in data matching routines to provide matching responses on customer names and also on address details – this is now one of the routines available in Master Data Services – unfortunately it’s restricted to certain versions of SQL in particular:-

•SQL Server 2012 Business Intelligence (64-bit) x64

•SQL Server 2012 Enterprise (64-bit) x64 – Upgrade from SQL Server 2008 R2 Enterprise only

•SQL Server 2012 Developer (64-bit) x64

•Microsoft SQL Server 2008 R2 Enterprise (64-bit) x64

•Microsoft SQL Server 2008 R2 Developer (64-bit) x64

Alternatively if CLR is enabled there are quite a few resources on the net for the CLR version of the algorithm and there is also a TSQL version of the routine in SQL Server Central which you could use.

I’ve performance tested some of the c# clr versions but haven’t done any testing against the tsql version there – athough it’s worth pointing out that there are optimisations for some of the functions used there at SQL Server Central