Data transfer using Azure Data Factory

Azure data factory can be a useful alternative to SSIS in the data migration life cycle, however, there are some areas where the “current” implementation lets it down sligthly.

Mapping Data Flows

One major difference if you’re used to SSIS is the requirement to pre-create the output tables – no easy click new on a data flow destination here.

Templates

Standard templates can be used for copying multiple sources to their relevant destinations. This is easy to set up but has the downside of being a nightmare to maintain as you then have to edit the JSON that the template creates

Triggers

Again triggers to run the data flow can be set up – either timed, tumbling window or event based e.g. when a file arrives in blob storage start the flow. However, changes to the data flow require the trigger to be set up again doesn’t automatically pick up changes to the data flow.

Processing text files using Azure Data Factory

Data loaded from source files

Data factory makes life easier to process multiple source files e.g. where the source system is not accessible and reports have to be used to extract the data or in one case where all the different offices had their own system and the data ended up being loaded from manual spreadsheets.

Excel format files

These are not supported in data factory but it’s fairly easy to just save these as text files e.g. csv files or tab delimited files watch out for saving excel as tab delimited if unicode characters or non standard language sets in the source system – tab delimited is not unicode….

Blob Storage

Push the files up into blob storage and then add shared access signature keys and process all the files in one go.

Fetching data from Oracle with SSIS

The attunity connector in SSIS is probably one of the fastest methods of getting data out of Oracle – knocks spots off using linked servers. However, make sure you use VS2017 version of SSDT otherwise you will waste a lot of time thinking the oracle configuration is wrong.

BlockSize is important – the default fetches 100 rows at a time and on a 90 Gb table in Oracle changing this to a fetch size of 2500 can knock a lot of time off the extract time

Make sure that Oracle works before doing any tests and make sure you have the 32 bit drivers for development.

And finally don’t forget to keep a copy of the package in git or your preferred source control.