SQL: Why nulls slow down some queries

Someone asked a question on StackOverflow today about why null values slow down certain queries. This is roughly what I answered.

The main issue with null values and performance is to do with forward lookups.

If you insert a row into a table, with null values, it’s placed in the natural page that it belongs to. Any query looking for that record will find it in the appropriate place. Easy so far….

…but let’s say the page fills up, and now that row is cuddled in amongst the other rows. Still going well…

…until the row is updated, and the null value now contains something. The row’s size has increased beyond the space available to it, so the DB engine has to do something about it.

The fastest thing for the server to do is to move the row off that page into another, and to replace the row’s entry with a forward pointer. Unfortunately, this requires an extra lookup when a query is performed: one to find the natural location of the row, and one to find its current location.

So, the short answer to the question of performance is yes, making those fields non-nullable will help search performance. This is especially true if it often happens that the null fields in records you search on are updated to non-null.

Of course, there are other penalties (notably I/O, although to a tiny extent index depth) associated with larger datasets, and then you have application issues with disallowing nulls in fields that conceptually require them, but hey, that’s another problem!


Database Snapshots in SQL Server 2005

Ever wanted to make a read-only point-in-time copy of a database, and wondered which technique to use? Microsoft SQL Server 2005 provides a plethora of ways to do this, including database backup/restore, database detach/re-attach, log-shipping, replication, mirroring, and so forth. However, one method available in the Enterprise edition, the Database Snapshot, is new to SQL Server 2005, and is worth taking a closer look.

Why are Database Snapshots Useful?

There are many applications where a point-in-time snapshot is useful. Microsoft suggest the following use cases:

  • Reporting up to a specific time period, ignoring later data
  • Reporting against mirror or standby databases that are otherwise unavailable
  • Insuring against user or administrator error, providing a quick way to revert to an older version of the database
  • Managing test databases, particularly during rapid feature and schema development

Of course, these needs could be served by a database backup or attached copy of a database, but the key benefit of choosing a snapshot over one of the other methods is simple: creating a database snapshot is fast.

Creating and Using Database Snapshots

Creating database snapshots is easy – it’s a CREATE DATABASE statement, specifying only the logical and physical filenames. Remember it’s a read-only snapshot, so we don’t need to add autogrowth or transaction log settings. Here’s the code:

SQL Statement to create a Database Snapshot
SQL Statement to create a Database Snapshot

Snapshot creation is not supported by the Object Explorer interface in Management Studio; you must use a CREATE DATABASE statement as above, with the AS SNAPSHOT OF clause indicating the source database. Also, note that only the Enterprise edition of SQL Server 2005 supports database snapshots.

The snapshot contains a version of the data as it existed at its creation, having rolled back uncommitted transactions. This means that otherwise unavailable databases, such as mirrors and standby servers, can be used to create snapshots.

Having created a snapshot, you can now use it as you would any other read-only database; all objects are exposed in exactly the same way, via Object Explorer, scripts, or reporting tools.

Object Explorer showing the Database Snapshot
Object Explorer showing the Database Snapshot

Reverting a database to the version stored in the snapshot is similarly easy:




SQL Script to restore from snapshot
SQL Script to restore from snapshot

This returns the database to the state it was in when the snapshot was created, minus any uncommitted transactions – remember that a snapshot is transactionally consistent at its creation. Note that restoring from a snapshot renders all other snapshots unusable – they should be deleted and re-created if required.

How do Database Snapshots work?

A Database Snapshot looks like an ordinary read-only database, from the user’s point of view; it can be accessed with a USE statement, and can be browsed from within Management Studio. However, it initially occupies almost no disk space, and so can be created almost instantly. This magic is achieved via an NTFS feature, sparse files. A sparse file is a file that may appear to be large, but in fact only occupies a portion of the physical space allocated to it.

Sparse file properties
Sparse file properties

Now, because a database snapshot presents a read-only view of your source database, it need not store a copy of every page. Instead, SQL Server performs a copy-on-write operation; in the source database, the first time a data page changes after the creation of a snapshot, a copy of the original page is placed in the sparse file. The snapshot serves data from the snapshot copies where source data has changed, and the original source pages when they are unchanged.

Best Practices

Sometimes you will choose a copy of a backup over a snapshot, sometimes it’ll be a detached copy of the data file. However, for many situations your best bet is a database snapshot, so it’s worth keeping some points in mind. In particular:

  • The file size will look considerably larger than the space it consumes on disk, and should be clearly marked as a snapshot for this reason. Use explicit naming conventions to make it clear to administrators.
  • Snapshots are at their best when young and fresh, and don’t take up too much space. If you need to keep a snapshot for any length of time, consider using another method to create your read-only copies.
  • As snapshots persist until deleted, you will need to explicitly rotate snapshots, either manually or with a script.
  • Performing index operations such as defragmentation or index rebuilding will modify so many pages that the snapshot will likely contain a complete copy of the source data for that index. The more snapshots there are, the more copies will exist.
  • If the disk containing a snapshot fills up, and a page write fails, the snapshot will become useless, as it will not contain all necessary pages. Make sure the disk can’t fill up!

Database snapshots are a worthwhile addition to the arsenal of any SQL Server DBA, and fit well with other techniques, particularly when you may need to quickly revert a database, or if you need to maintain rolling snapshots. Remember the key advantages: high speed and low physical size. But also remember that these advantages diminish as the snapshot ages and grows, and if the number of snapshots increases.

Above all, database snapshots are fast and easy to use; it won’t cost you anything to try them out, and you will probably find them very useful indeed. If all you need to do with a point-in-time copy is select from it, or possibly revert to it, then a database snapshot is likely the best choice available