Loading XML Files

Should be a lot easier than it is, but if you look for this on the web, you will get confused.

 Put simple – if you have an embedded DTD in the file it won’t work within SSIS – so the easiest solution – don’t use SSIS use

SELECT * FROM OPENROWSET(
   BULK ‘C:userskevindownloadsshopwhale.xml’,
   SINGLE_BLOB) AS x

 This translates as create a table with a column of data type XML

INSERT INTO T(XmlCol)
SELECT * FROM OPENROWSET(
   BULK ‘C:userskevindownloadsshopwhale.xml’,
   SINGLE_BLOB) AS x

If there is an embedded dtd you need to convert (and the column is called bulkcolumn)

INSERT INTO T(XmlCol)
SELECT convert(XML,bulkcolumn,2) FROM OPENROWSET(
   BULK ‘C:userskevindownloadsshopwhale.xml’,
   SINGLE_BLOB) AS x

Using convert(xml,columnname,2) gives limited dtd support

“Enable limited internal DTD subset processing.

If enabled, the server can use the following information that is provided in an internal DTD subset to perform nonvalidating parse operations.

  • Defaults for attributes are applied.
  • Internal entity references are resolved and expanded.
  • The DTD content model will be checked for syntactical correctness.

The parser will ignore external DTD subsets. It also does not evaluate the XML declaration to see whether the standalone attribute is set yes or no, but instead parses the XML instance as if it is a stand-alone document.

This gives you a single row holding the entire document – not very usable (example is for an xml feed for shopwhale) – we can use cross apply to convert this into xml rows in a new table

— Cross apply to get 1 row per product
SELECT   NewTable.ProductDetails.query(‘.’) AS ProductName
into tTransformed
FROM t CROSS APPLY XmlCol.nodes(‘//product’) AS NewTable(ProductDetails)

We now have all the xml rows in the table and can then extract the components – if you do a select top n from the transformed table you can see the structure – cliick on the xml and it’s loaded in a new pane in studio.

SELECT ProductName.value(‘(//product/offer_id)[1]’, ‘int’) as Offer_id
,ProductName.value(‘data(//product/merchant_category)[1]’,’nvarchar(60)’) as merchant_category
,ProductName.value(‘data(//product/update_date)[1]’,’datetime’) as update_date
,ProductName.value(‘data(//product/regular_price)[1]’,’money’) as price
,ProductName.value(‘data(//product/model_number)[1]’,’nvarchar(38)’) as Model_number
into tTransformedValues
FROM tTransformed

Hey presto we have the rows in a table in a usable format – wrap this into a stored procedure – pass in the name of the xml document and you can end up with a table that you can use!