Archive for the ‘SQL Server 2005’ Category

Cascading Parameter Values in SSRS

Author: Jeremy Smyth

In SQL Server Reporting Services (SSRS), you might want your choice in the first parameter to limit the choices available in the second (and subsequent) parameters.

This works automagically if you order your parameters and datasets correctly

  • First, set up a primary (report) dataset, then a dataset for each parameter dropdown. Code the WHERE clause in the datasets to make the dependencies correct across parameter variables
  • Secondly, order your parameters in the Report | Parameters menu so that the first variable/parameter you want the user to fill in is at the top, and make the second dataset depend on that parameter. Follow this ordering through the parameters; the final parameter(s) should be the one(s) the actual report dataset depends on.
  • Repeat for subsequent parameters

This will work if your WHERE clause in the second and subsequent datasets have variables that SSRS knows are populated from earlier parameters.

As an example, I have three datasets from the venerable pubs database.

pubslist is used to populate the @p parameter, and looks like this:

SELECT pub_id, pub_name FROM publishers

titleslist populates the @t parameter, and looks like this:

SELECT title_id, title FROM titles WHERE pub_id = @p

Finally, reportdataset looks like this:

SELECT title, price, ytd_sales FROM titles WHERE title_id = @t

The order of the parameters in the Report | Report Parameters menu is crucial; because the datasets must be executed in the order shown above, and the @t parameter is in a dataset that relies on the @p parameter being set first, we move @p to the top of the list.

Now, SSRS evaluates the dataset needed to fill the dropdown for the first parameter with labels. It relies on a dataset that doesn’t need a parameter, so can be produced immediately.

Then, having got that parameter value, it can populate the second parameter’s dropdown. That in turn results in the report being produced.

Tables, tables, tables

Author: Jeremy Smyth

So, what’s a temporary table? or an in-memory table? or a pivot table?

An in-memory table is a table in some platforms that’s stored entirely in memory. These don’t really exist in MS SQL, although you could say that it is a table that’s been entirely cached, and so doesn’t result in any physical (hard disk) reads when queried. In earlier versions, the DBCC PINTABLE command allowed the “pinning” of tables in memory, but this was deprecated in SQL Server 2005.

Often, a table-valued variable, @tablename, might be stored in memory (although this is not guaranteed), and declared in a batch or function, with no persistence.

A temporary table is a table that will be automatically dropped when it’s no longer needed, usually when the creating session is terminated. In MS SQL, they begin with a # (or two hashes if they’re global temporary tables, shared between multiple sessions), and are often created with a SELECT INTO #TEMPTABLE … style query.

A pivot table is a special form of query where the values in several rows are summarised, “pivoted” on an axis, and become columns, where the summary data then becomes the rows. Frequently this happens where you’ve rows sorted on dates; these may then be “pivoted” so you end up with a column for January, one for February, one for March, etc.

SQL Server: Database Diagrams

Author: Jeremy Smyth

A little feature that some people don’t know about but would find very very useful: database diagrams in SQL Server. If a picture paints a thousand words, then this feature will help you pick up on the structure of data much more easily.

Within SQL Server, there’s the “Database Diagram” container in each database, in Management Studio.

You can create a new diagram in there, and include all (or some) of your tables, so showing where the relationships are, and what the tables are.

You could also create several different diagrams, each showing a subset of tables and relationships useful to a particular use case.

GROUP BY and single-row values

Author: Jeremy Smyth

So, you’ve constructed a SQL statement with a GROUP BY clause, and you’re getting this message:

Column col is invalid in the select list because it is not
contained in either an aggregate function or the
GROUP BY clause.

Bear in mind that the result of a GROUP BY statement, or of a statement where one or columns uses an aggregate function, has each row containing a summary of other rows.

This means that if you try to include a column in your select clause that isn’t a summary (this includes values by which you’re grouping), then the server is going to have difficulties returning it; remember, it’s going to return one row per group, and any value that can’t be reduced to a single row per group will fail.

This gets interesting even when you’re hoping to return a single value per group, for example with CASE.

If the CASE expression you’re using is dependent on values of individual rows rather than summaries, you’ll also get the above message; you may only reference non-aggregate values (i.e. single-row values) in an aggregate function, or in the WHERE clause, so the solution would involve placing your CASE inside an aggregate function, in this case your SUM.

The same applies for the HAVING clause, as it is effectively selecting rows in the grouped resultset based on values in each group.

Memory is cheap these days; it’s not unusual to find laptops with 4GB or more (and yes, I know this post won’t date well!). This is a bit of a problem with some operating systems, though: 4GB is the maximum memory that many system are capable of handling. Yes, that’s right. Anything over 4GB is wasted on many of today’s operating systems, because of a limit imposed by 32 bit processors on how much memory they may address[1].

This becomes rather a problem on SQL Server installations, where having a lot of available memory is quite beneficial.

It’s worth mentioning at this point that 64-bit systems don’t suffer from the same memory limitations as 32-bit systems, so much of this article is irrelevant if your system is running on a 64-bit edition of Windows. The issue we’re talking about here is that 32 bit systems can ordinarily access only 4GB of RAM in total, which under default conditions means that your SQL Server instance can only access 2GB of useful RAM for its own needs.

However, it is possible to grant access to more memory, but it does depend on a number of conditions.

Physical Address Extension

Since Windows 2000, Microsoft have supported Physical Address Extension (PAE) in a kernel, although the support requires using a non-default kernel. To use PAE, both the hardware and the operating system must support it.

Various 32 bit Windows servers support from 8-128GB, depending on edition. Windows 2000 Advanced and Datacenter Servers, Windows Server 2003 Enterprise and Datacenter Editions, and Windows Server 2008 Enterprise and Datacenter Edition support more than 4GB in 32 bit editions, by using PAE, with Windows 2003 Datacenter hitting the top score of 128GB.

Address Windowing Extensions

Address Windowing Extensions (AWE) is an API in Windows, used by SQL Server to support larger amounts of physical memory. It shouldn’t surprise you by now that AWE relies on PAE, which in turn has the other prerequisites as described above, so in order to enable AWE in SQL Server (using sp_configure), PAE must be supported by the underlying operating system.

The benefits are obvious; SQL Server will have access to more physical memory, which will make its job a lot easier. Your buffer cache will love you, as more and more data can be pulled out of memory rather than causing page hits on your oh-so busy hard disk. The drawbacks are slightly less obvious; memory granted to SQL Server using AWE is non-paged, which means it’s allocated until SQL Server is shut down. This will cause problems is SQL Server is sharing its host with other instances or other AWE-aware services. So keep your maximum memory settings tuned, and enjoy the headroom!


Footnote:
[1] 32 bits is 4 bytes (of 8 bits each). Just like a 3-digit number can store up to 1000 different values — 0-999 — an 8-bit number can store from 0-256. A 6-digit number can store 1000 x 1000, which is 1,000,000 different values, 0 – 999,999. Similarly, a 16-bit number can store 256 x 256, which is 65,536 different values. 32 bits can store 65,536 x 65,536 values, which is a number comfortably over 4 billion. This is where we get the 4GB upper limit for most 32 bit systems; they simply can’t point to the location of memory outside this range.

When you upgrade from one SQL Server type to another (most of my experience has been with 2000 -> 2005), one of the options available is to stick with the existing compatibility level, rather than upgrade.

SQL Server 2000 defaults to “Compatibility Level 80″, with other options including 65, 70. These represent the features and compatibility options for version 8.0 (SQL Server 2000), 6.5 and 7.0 respectively. In other words, it’s possible to coerce a database into thinking it’s running with features consistent with an earlier release of SQL Server.

When upgrading to 2005, it’s possible to retain compatibility level 80, but the recommended route is to switch to 90. But should you?

There are two ways to look at this:

* If you want to change your code (or install some collaborating SQL2005/2008 servers) in the future, your compatibility mode will affect that decision
* If you haven’t run the Upgrade Adviser, you may not be able to upgrade; compatibility level 90 disables certain syntaxes and features that were available in 80.

So, when upgrading, you don’t have to increase your compatibility level, but then you’ve to consider your future with the server too; you might be happy to run on the latest SQL Server instance, but with old-school database settings and syntaxes. On the other hand, collaboration with other servers and applications, or maintenance by future programmers and DBAs, will probably require the switch to a higher compatibility level.

Note: it’s entirely possible to have several databases with different compatibility levels on a single instance.

Tricks with ORDER BY

Author: Jeremy Smyth

The ORDER BY clause in SQL Server is a bit more flexible than many people give it credit for.

In a training course I’m giving at the moment, we’ve just gone over a few of the lesser known uses, along with the widely used ones:

  • The obvious: ordering by a column name
    ORDER BY advance
  • Less obvious: ordering by an expression on a column
    ORDER BY id % 2 -- even IDs first, then odd
  • Ordering by multiple fields
    ORDER BY id % 2 DESC, price        -- odd orders first by even/odd IDs, then by prices within each
  • Ordering by visible column position
    SELECT id, customerid, price * quantity AS "gross revenue" FROM orders ORDER BY 2
  • Ordering by column aliases
    SELECT id, customerid, price * quantity AS "gross revenue" FROM orders ORDER BY "gross revenue"

The last one is particularly interesting, because column aliases can’t be used in other places e.g. in the where clause or from clause.

Constraints are a necessary part of the tug-and-thrust of database design, but once in a while, when designing a data warehouse, someone comes up with the question “Do I need to map those constraints to my DW?” The simple answer is “Yes”, but it’s also a somewhat naive answer that doesn’t take into account the needs of a DW.

For your primary key, consider using a surrogate or alternate key; you’ll need to cater for slowly changing dimensions, e.g. if you’re doing a report over the last 5 years on average sales per married/unmarried salesperson, you’ll want to register the fact that someone was unmarried for 2 years, then married for the last 3. This means your data warehouse will have two dimension-table rows for the same person. Following the OLTP structure for that will be hard, to say the least. Note that it’s not enough to create a composite key on the original primary key and e.g. marital status — the same person may also have other fields that change occasionally, such as address (zipcode), department, job title, or even gender!

As for other constraints such as default value or check constraints, we need to consider how a DW is populated, and what the arrival of data to the DW means. The DW is invariably populated from an OLTP (relational) system, the same system that typically has all the original constraints. Ideally, this means the data is already clean by the time the DW gets it. It also means that the performance hit required by checking these constraints in writing to the database has already been taken. And OLTP system is typically optimised for writes and reads, particularly the large tables that change frequently and are the subject of most of the storage in a DW.

On the other hand, Data Warehouses are massively optimised for reads (assuming you’re populating as a batch, with reasonably high latency), and constraints don’t really factor in read operations. You can typically get around any constraint issues with your DW populating job, if they haven’t already been dealt with in the OLTP system, and deal with nulls etc. at the analysis/reporting tools if necessary. It’s far more important to make sure default values fit with your conceptual data model, and don’t introduce issues at the DW client tools.

Someone asked a question on StackOverflow today about why null values slow down certain queries. This is roughly what I answered.

The main issue with null values and performance is to do with forward lookups.

If you insert a row into a table, with null values, it’s placed in the natural page that it belongs to. Any query looking for that record will find it in the appropriate place. Easy so far….

…but let’s say the page fills up, and now that row is cuddled in amongst the other rows. Still going well…

…until the row is updated, and the null value now contains something. The row’s size has increased beyond the space available to it, so the DB engine has to do something about it.

The fastest thing for the server to do is to move the row off that page into another, and to replace the row’s entry with a forward pointer. Unfortunately, this requires an extra lookup when a query is performed: one to find the natural location of the row, and one to find its current location.

So, the short answer to the question of performance is yes, making those fields non-nullable will help search performance. This is especially true if it often happens that the null fields in records you search on are updated to non-null.

Of course, there are other penalties (notably I/O, although to a tiny extent index depth) associated with larger datasets, and then you have application issues with disallowing nulls in fields that conceptually require them, but hey, that’s another problem!

Ever wanted to make a read-only point-in-time copy of a database, and wondered which technique to use? Microsoft SQL Server 2005 provides a plethora of ways to do this, including database backup/restore, database detach/re-attach, log-shipping, replication, mirroring, and so forth. However, one method available in the Enterprise edition, the Database Snapshot, is new to SQL Server 2005, and is worth taking a closer look.

Why are Database Snapshots Useful?

There are many applications where a point-in-time snapshot is useful. Microsoft suggest the following use cases:

  • Reporting up to a specific time period, ignoring later data
  • Reporting against mirror or standby databases that are otherwise unavailable
  • Insuring against user or administrator error, providing a quick way to revert to an older version of the database
  • Managing test databases, particularly during rapid feature and schema development

Of course, these needs could be served by a database backup or attached copy of a database, but the key benefit of choosing a snapshot over one of the other methods is simple: creating a database snapshot is fast.

Creating and Using Database Snapshots

Creating database snapshots is easy – it’s a CREATE DATABASE statement, specifying only the logical and physical filenames. Remember it’s a read-only snapshot, so we don’t need to add autogrowth or transaction log settings. Here’s the code:

SQL Statement to create a Database Snapshot

SQL Statement to create a Database Snapshot

Snapshot creation is not supported by the Object Explorer interface in Management Studio; you must use a CREATE DATABASE statement as above, with the AS SNAPSHOT OF clause indicating the source database. Also, note that only the Enterprise edition of SQL Server 2005 supports database snapshots.

The snapshot contains a version of the data as it existed at its creation, having rolled back uncommitted transactions. This means that otherwise unavailable databases, such as mirrors and standby servers, can be used to create snapshots.

Having created a snapshot, you can now use it as you would any other read-only database; all objects are exposed in exactly the same way, via Object Explorer, scripts, or reporting tools.

Object Explorer showing the Database Snapshot

Object Explorer showing the Database Snapshot

Reverting a database to the version stored in the snapshot is similarly easy:

SQL Script to restore from snapshot

SQL Script to restore from snapshot

This returns the database to the state it was in when the snapshot was created, minus any uncommitted transactions - remember that a snapshot is transactionally consistent at its creation. Note that restoring from a snapshot renders all other snapshots unusable - they should be deleted and re-created if required.

How do Database Snapshots work?

A Database Snapshot looks like an ordinary read-only database, from the user's point of view; it can be accessed with a USE statement, and can be browsed from within Management Studio. However, it initially occupies almost no disk space, and so can be created almost instantly. This magic is achieved via an NTFS feature, sparse files. A sparse file is a file that may appear to be large, but in fact only occupies a portion of the physical space allocated to it.

Sparse file properties

Sparse file properties

Now, because a database snapshot presents a read-only view of your source database, it need not store a copy of every page. Instead, SQL Server performs a copy-on-write operation; in the source database, the first time a data page changes after the creation of a snapshot, a copy of the original page is placed in the sparse file. The snapshot serves data from the snapshot copies where source data has changed, and the original source pages when they are unchanged.

Best Practices

Sometimes you will choose a copy of a backup over a snapshot, sometimes it'll be a detached copy of the data file. However, for many situations your best bet is a database snapshot, so it's worth keeping some points in mind. In particular:

  • The file size will look considerably larger than the space it consumes on disk, and should be clearly marked as a snapshot for this reason. Use explicit naming conventions to make it clear to administrators.
  • Snapshots are at their best when young and fresh, and don't take up too much space. If you need to keep a snapshot for any length of time, consider using another method to create your read-only copies.
  • As snapshots persist until deleted, you will need to explicitly rotate snapshots, either manually or with a script.
  • Performing index operations such as defragmentation or index rebuilding will modify so many pages that the snapshot will likely contain a complete copy of the source data for that index. The more snapshots there are, the more copies will exist.
  • If the disk containing a snapshot fills up, and a page write fails, the snapshot will become useless, as it will not contain all necessary pages. Make sure the disk can't fill up!

Database snapshots are a worthwhile addition to the arsenal of any SQL Server DBA, and fit well with other techniques, particularly when you may need to quickly revert a database, or if you need to maintain rolling snapshots. Remember the key advantages: high speed and low physical size. But also remember that these advantages diminish as the snapshot ages and grows, and if the number of snapshots increases.

Above all, database snapshots are fast and easy to use; it won't cost you anything to try them out, and you will probably find them very useful indeed. If all you need to do with a point-in-time copy is select from it, or possibly revert to it, then a database snapshot is likely the best choice available