Category Archives: SSIS

SQL Server SSIS Tips

Excel files with mixed data

Ever find that when you import data from excel files with SSIS that you sometimes lose data? Well one of the issues is that SSIS only samples a small number of rows – well it’s fully documented in http://technet.microsoft.com/en-us/library/ms141683.aspx.  In summary – adding IMEX=1 to the source input will resolve a lot of these issues automatically, otherwise you have a few options

  1. Because of the sample size issue you can reorder the source spreadsheet to get a mixture of the correct data types in the first 8 rows (or use the tip in the technet article to change the sample size).  
  2. Create a new column for the offending column in the source data and use =text(cell) in this column – this will force SSIS to import the data as text (doesnt always work but sometimes does)
  3. Convert the spreadsheet to tab delimited (which also gives you the option of doing a straight bulk insert)
  4. Change connection string to add imex=1 this will force the connection to be reset but resolves most issues ** EXCEPT ** when there is text data in a row that is mostly numeric!

The biggest issue is that when SSIS determines the type for the data thats it – you can’t amend it.

SSIS – Modifying column names in flat file

When column names are not in the first row of the source file – it can be a pain to change all the column names in the SSIS package – if you’ve only got a couple of columns of data – it’s ok but when you have 100’s of columns then it’s extremely painful.
One work around for this is to point your connection at the flat file source – then save the package and use view code to get the XML schema. At this point your columns are named column 0 to column n.

The following tip ONLY works if the column names are in the SAME order as your file – or any additional columns in the table are at the beginning / end of the list. Esnure results are set to text not grids!

selectName, Colidfrom syscolumns where id =object_id(‘targettable’) orderby colId asc

Use the alt key to select then column names (including trailing spaces to make sure you get “all” the name) from the pane and copy the selection. Do this for columns 0-9

Then go into the code view in Visual Studio and find Column 0 – Use alt  + shift to select columns 0-9 and replace with the contents from management studio.

Repeat the process for columns 10-99 in Management studio and visual studio then again for columns 100 – 999 (hopefully you won’t have to repeat for lines 1000 = 9999!

One minor quirk – the last column has a slightly different format – so manually correct that – or just perform this exercise excluding the last column – the last column has objecttype in line with the column list!

Finally select the Columns you have just amended in VS – again using alt + shift (or alt + mouse) and do a search replace to remove redundant spaces.

Having just had to create an SSIS package with 400 + columns – it’s a lot easier than doing it manually!

SSIS Unpivoting data

Had a request to load some prices into SQL Server from an excel spreadsheet – unfortunately the data was in a pivoted format without the raw pivot data.

 

ssis1 400

This becomes the pivot description for the pivot value – in the example below I wanted to pivot on a variety of durations and get back the prices so set pivot key value to the duration.So having the spreadsheet as a source I created a table as the output – using the new table from oledb desination – modified the table name and we have a new table of the unpivoted data

Much easier if the spreadsheet contains the original unpivoted data – but this gets you the same results, for this client I also added another data transform task for data from another sheet in the same spreadsheet and finally a transform to merge the data from both sources into their data tables

Have fun with it – it’s easier than loading the data in manually

Loading XML Files

Should be a lot easier than it is, but if you look for this on the web, you will get confused.

 Put simple – if you have an embedded DTD in the file it won’t work within SSIS – so the easiest solution – don’t use SSIS use

SELECT * FROM OPENROWSET(
   BULK ‘C:userskevindownloadsshopwhale.xml’,
   SINGLE_BLOB) AS x

 This translates as create a table with a column of data type XML

INSERT INTO T(XmlCol)
SELECT * FROM OPENROWSET(
   BULK ‘C:userskevindownloadsshopwhale.xml’,
   SINGLE_BLOB) AS x

If there is an embedded dtd you need to convert (and the column is called bulkcolumn)

INSERT INTO T(XmlCol)
SELECT convert(XML,bulkcolumn,2) FROM OPENROWSET(
   BULK ‘C:userskevindownloadsshopwhale.xml’,
   SINGLE_BLOB) AS x

Using convert(xml,columnname,2) gives limited dtd support

“Enable limited internal DTD subset processing.

If enabled, the server can use the following information that is provided in an internal DTD subset to perform nonvalidating parse operations.

  • Defaults for attributes are applied.
  • Internal entity references are resolved and expanded.
  • The DTD content model will be checked for syntactical correctness.

The parser will ignore external DTD subsets. It also does not evaluate the XML declaration to see whether the standalone attribute is set yes or no, but instead parses the XML instance as if it is a stand-alone document.

This gives you a single row holding the entire document – not very usable (example is for an xml feed for shopwhale) – we can use cross apply to convert this into xml rows in a new table

— Cross apply to get 1 row per product
SELECT   NewTable.ProductDetails.query(‘.’) AS ProductName
into tTransformed
FROM t CROSS APPLY XmlCol.nodes(‘//product’) AS NewTable(ProductDetails)

We now have all the xml rows in the table and can then extract the components – if you do a select top n from the transformed table you can see the structure – cliick on the xml and it’s loaded in a new pane in studio.

SELECT ProductName.value(‘(//product/offer_id)[1]’, ‘int’) as Offer_id
,ProductName.value(‘data(//product/merchant_category)[1]’,’nvarchar(60)’) as merchant_category
,ProductName.value(‘data(//product/update_date)[1]’,’datetime’) as update_date
,ProductName.value(‘data(//product/regular_price)[1]’,’money’) as price
,ProductName.value(‘data(//product/model_number)[1]’,’nvarchar(38)’) as Model_number
into tTransformedValues
FROM tTransformed

Hey presto we have the rows in a table in a usable format – wrap this into a stored procedure – pass in the name of the xml document and you can end up with a table that you can use!