Category Archives: Azure data factory

Data transfer using Azure Data Factory

Azure data factory can be a useful alternative to SSIS in the data migration life cycle, however, there are some areas where the “current” implementation lets it down sligthly.

Mapping Data Flows

One major difference if you’re used to SSIS is the requirement to pre-create the output tables – no easy click new on a data flow destination here.

Templates

Standard templates can be used for copying multiple sources to their relevant destinations. This is easy to set up but has the downside of being a nightmare to maintain as you then have to edit the JSON that the template creates

Triggers

Again triggers to run the data flow can be set up – either timed, tumbling window or event based e.g. when a file arrives in blob storage start the flow. However, changes to the data flow require the trigger to be set up again doesn’t automatically pick up changes to the data flow.

Processing text files using Azure Data Factory

Data loaded from source files

Data factory makes life easier to process multiple source files e.g. where the source system is not accessible and reports have to be used to extract the data or in one case where all the different offices had their own system and the data ended up being loaded from manual spreadsheets.

Excel format files

These are not supported in data factory but it’s fairly easy to just save these as text files e.g. csv files or tab delimited files watch out for saving excel as tab delimited if unicode characters or non standard language sets in the source system – tab delimited is not unicode….

Blob Storage

Push the files up into blob storage and then add shared access signature keys and process all the files in one go.