All posts by The Author

Loading XML Files

Should be a lot easier than it is, but if you look for this on the web, you will get confused.

 Put simple – if you have an embedded DTD in the file it won’t work within SSIS – so the easiest solution – don’t use SSIS use

SELECT * FROM OPENROWSET(
   BULK ‘C:userskevindownloadsshopwhale.xml’,
   SINGLE_BLOB) AS x

 This translates as create a table with a column of data type XML

INSERT INTO T(XmlCol)
SELECT * FROM OPENROWSET(
   BULK ‘C:userskevindownloadsshopwhale.xml’,
   SINGLE_BLOB) AS x

If there is an embedded dtd you need to convert (and the column is called bulkcolumn)

INSERT INTO T(XmlCol)
SELECT convert(XML,bulkcolumn,2) FROM OPENROWSET(
   BULK ‘C:userskevindownloadsshopwhale.xml’,
   SINGLE_BLOB) AS x

Using convert(xml,columnname,2) gives limited dtd support

“Enable limited internal DTD subset processing.

If enabled, the server can use the following information that is provided in an internal DTD subset to perform nonvalidating parse operations.

  • Defaults for attributes are applied.
  • Internal entity references are resolved and expanded.
  • The DTD content model will be checked for syntactical correctness.

The parser will ignore external DTD subsets. It also does not evaluate the XML declaration to see whether the standalone attribute is set yes or no, but instead parses the XML instance as if it is a stand-alone document.

This gives you a single row holding the entire document – not very usable (example is for an xml feed for shopwhale) – we can use cross apply to convert this into xml rows in a new table

— Cross apply to get 1 row per product
SELECT   NewTable.ProductDetails.query(‘.’) AS ProductName
into tTransformed
FROM t CROSS APPLY XmlCol.nodes(‘//product’) AS NewTable(ProductDetails)

We now have all the xml rows in the table and can then extract the components – if you do a select top n from the transformed table you can see the structure – cliick on the xml and it’s loaded in a new pane in studio.

SELECT ProductName.value(‘(//product/offer_id)[1]’, ‘int’) as Offer_id
,ProductName.value(‘data(//product/merchant_category)[1]’,’nvarchar(60)’) as merchant_category
,ProductName.value(‘data(//product/update_date)[1]’,’datetime’) as update_date
,ProductName.value(‘data(//product/regular_price)[1]’,’money’) as price
,ProductName.value(‘data(//product/model_number)[1]’,’nvarchar(38)’) as Model_number
into tTransformedValues
FROM tTransformed

Hey presto we have the rows in a table in a usable format – wrap this into a stored procedure – pass in the name of the xml document and you can end up with a table that you can use!

Standards

The MOST important thing about standards is to have some, keep them and enforce them. Even if they are not the “best” standards in the world at least everything is consistent.

Look for

Naming conventions

  • Tables
  • stored Procedures
  • triggers
  • files – data, log and dump
  • Stored Procedure Conventions
  • Change audit and description
  • Parameters
  • Variables
  • Use of rowcount and @@error

As a simple example
Which is easier to read

create procedure usp_SelClients @cl int -1
as
 select Client_name, HouseNo, Housename, Street, Town, Postcode from clients join addresses on clients.addressid = addresses.addressid where Clientid=@cl

or
Create Procedue usp_SelCleints @ClientId int -1
as
       select
              Client_name,
              House_No,
              House_name,
              Street,
              Town,
              Postcode
       from
              clients
       join
              addresses
       on
              clients.addressid = addresses.addressid
       where
              Client_id=@ClientId

And that’s a small stored procedure .. what happens when you get a complicated select statement with multiple joins and conditions!

Permanent or # tables

There are advantages and disadvantages of using either options, however in my opinion the advantages of permanent work tables outweigh the advantages of # tables for ETL but # tables win for TP systems

  • permanent work tables are available for debugging
  • permanent tables can be conditionally created – whereas temporary tables can’t

however,

  • # tables are automatically cleaned up whereas you have to manually either recreate or truncate work tables
  • # tables always live in tempdb (although you can also create permanent work tables there as well)
  • # tables are unique to the session so never any risk of contention

Extract data using BCP and query

As an alternative to creating a view over multiple tables, there is also the option of using the queryout parameter in BCP to select the data for extracting (same example as used in the view usage)

bcp “select cust.CustomerId, cust.CustomerForename, cust.CustomerSurname, cust.Telephone, addr.HouseNo, addr.Street, addr.Town, addr.county, Addr.postcode From tblCustomer cust Join tblAddress addr ON Cust.AddressId = addr.AddressId” queryout cust.dat -S(local) -T -c

Obviously it’s a lot longer than using a view – but sometimes you don’t have permissions to create views so it’s a usefull alternative

Extracting contents of two tables using BCP and a view

The easiest way of doing this is to create a view over the two tables and then bcp the content of the view out in your required format.

e.g. if you have customer details in a customer table and an address table and you want to extract the contents of the combined table in a tab delimited file

create view vw_Customer_Address as select cust.CustomerId, cust.CustomerForename, cust.CustomerSurname, cust.Telephone,  addr.HouseNo, addr.Street, addr.Town, addr.county, Addr.postcode
From tblCustomer cust
Join tblAddress addr
ON Cust.AddressId = addr.AddressId

 Then just bcp this table out

BCP mydb.dbo.vw_Customer_address out cust.dat -S(local) -T -c

Log output from sql agent jobs

Two ways to log jobs – both equally effective

You can either pipe the output if you are running batch (.bat) files or

In the jobs steps – advanced use the output file option.
If you have multiple job steps
 First step set the overwrite option
 Subsequent steps set the append option

Database dumps / log dumps
Make the database dump overwrite and then for log or differential dumps append so that you have a full job history in a single file

Don’t forget to include the logs directory in the server backup schedule!

One word of warning…Permissions – the account that SQLServerAgent is running under will need permission to create files in the log directory.
All jobs for a server should be placed in the same log directory – and don’t forget to occasionally check the contents of the log files – since that’s how you will find out about problems!

Static Data Tables

If you have static data tables then consider the following option – for low volume static data.

You can set up a DTS package to export the tables to an excel workbook very easily. Just use the export wizard and schedule the DTS package to run at a convenient time.
But…Make sure
 ODBC drivers are configured on the server – 2000 installed by default but not SQL server 7
 The sql server agent account has permissions to write to the folder for the spreadsheet.
 That the option to delete the data first is set.

Don’t forget to set up a DTS import package and….test, retest and test again.

Getting column list into a variable

The easiest way of getting a column list into a variable is to assign the contents of syscolumns (or sys.columns for SQL 2005).

Create a variable to hold the column list and then reassign it in the select statement.
To get the column key use the object id of the object

Select @variable = @variable + coalesce(@variable + ‘,’,”) from sys.columns where object_id = object_id(‘tablename’)

We are using the fact that coalesce returns the first non null value here so that we get a ‘,’ for each of the subsequent field names but just the column name for the first field.
Much easier than using a cursor on sys.columns to get each field.

If you don’t use coalesce for the first item in the array, you will also need to initialise the variable before assigning it since null + anything = null

create table temptable(id int identity, scol1 char(30), scol2 char(30) 

Declare @myvar varchar(max);
select @myvar = coalesce(@myvar + ‘,’,”) from sys.columns where object_id = object_id(‘temptable’) order by column_id

select @myvar

Getting space used from tables – in one command

2000 only getting space used from tables

This isn’t a new tip – you may have seen many references to using sp_MSForEachTable and it’s partner sp_MSForEachDB – just put it into context with getting the data back out in a clean format by using insert .. exec


/* create a table to hold the results of the exec command */
create table #SpaceUsed (name nvarchar(129), rows char(11), reserved varchar(18),data varchar(18), index_Size varchar(18), unused varchar(18))
insert #SpaceUsed exec sp_msforeachTable 'sp_spaceused "?"'
--Run the script - then
select * from #SpaceUsed order by cast(rows as int) desc
--gives you the number of rows
select * from #SpaceUsed order by cast(left(reserved,len(reserved) - 3)as int) desc

–Gives you the data based on space used (including indexes)

Note that in SQL 2008 and subsequent versions – you can simply use the standard reports in the database e.g. right click on the database -> Reports ->Standard Reports->Disk Usage by Tables which can then be exported e.g. in excel by right clicking on the report -> Export -> Excel

I haven’t added that in as a SQL 2008 tip since most people already know about the custom reports – however, worth bearing in mind that compatability must be set to at least 90 for the custom reports to work!

Using BCP Format Files

Format files give a lot of the flexibility of DTS Transformations – you can skip columns, reorder the columns and do most of what you want.

They can also be used in conjunction with Bulk Insert to get the speed efficiency from Bulk Insert.

But if possible – order the table and the source data in the same way and use a sensible delimiter which isn’t in the data – if the data is from a PC based system then try to use tab delimiters – if it’s sourced from a mainframe look at ‘%’ ‘#’ or ‘$’ since they tend not to be extensively used in mainframes.

If the source file is generated by COBOL then it’s likely to be fixed length – so use that instead to create your format files – saves you 1 byte per column – if you’ve got a lot of fields e.g. y/n indicators then you can halve the data by using fixed format. FTP’s will be faster