Tables, tables, tables

So, what’s a temporary table? or an in-memory table? or a pivot table?

An in-memory table is a table in some platforms that’s stored entirely in memory. These don’t really exist in MS SQL, although you could say that it is a table that’s been entirely cached, and so doesn’t result in any physical (hard disk) reads when queried. In earlier versions, the DBCC PINTABLE command allowed the “pinning” of tables in memory, but this was deprecated in SQL Server 2005.

Often, a table-valued variable, @tablename, might be stored in memory (although this is not guaranteed), and declared in a batch or function, with no persistence.

A temporary table is a table that will be automatically dropped when it’s no longer needed, usually when the creating session is terminated. In MS SQL, they begin with a # (or two hashes if they’re global temporary tables, shared between multiple sessions), and are often created with a SELECT INTO #TEMPTABLE … style query.

A pivot table is a special form of query where the values in several rows are summarised, “pivoted” on an axis, and become columns, where the summary data then becomes the rows. Frequently this happens where you’ve rows sorted on dates; these may then be “pivoted” so you end up with a column for January, one for February, one for March, etc.

PHP: require and include

Another simple feature that’s good to pick up on!

When you want to run the same functions from several pages, but don’t want to copy/paste (because that’s BAD), you do this.

Usually, you’d use require or include to include the code from one PHP script in another.

If you hadn’t done this before, or designed it in, you’ll probably need to extract the functions from your script(s), place them in a central file, and include that in all scripts that need it (including, of course, the one they were originally in!). Remember that includeing a file in another causes its embedded HTML to be included too, so you’ll typically include PHP files that have functions only, to achieve the desired result.

Then you just call the function, as if it were written in the script in the first place 🙂

SQL Server: Database Diagrams

A little feature that some people don’t know about but would find very very useful: database diagrams in SQL Server. If a picture paints a thousand words, then this feature will help you pick up on the structure of data much more easily.

Within SQL Server, there’s the “Database Diagram” container in each database, in Management Studio.

You can create a new diagram in there, and include all (or some) of your tables, so showing where the relationships are, and what the tables are.

You could also create several different diagrams, each showing a subset of tables and relationships useful to a particular use case.

Quality of Service: Hosting

There are several useful QoS parameters, although not all are as easily measured as e.g. 99.98% uptime, which is easily derived – 99.98% of the time, your server is available.

Some other useful metrics:

What sort of bandwidth do they support? How about if it’s a shared pipe and they’ve multiple high-volume clients? You’re unlikely to get an honest answer to this from the vendor.
If you’re on a shared host, what sort of competition for resources will you have on the box? Again, this is competitive information, so you’re unlikely to get an honest response.
How independent is each application? i.e. if another application on a co-hosted box loops MySQL, will your app suffer?
How often do they upgrade dependencies, and does that affect their 99.98% uptime?

Then there are the fluffy not-entirely-QoS related hosting problems:

Will they do your backups etc.?
How responsive are they to queries? Do they have a guaranteed response time e.g. 3 hours from first email to first response, 24/7?
Are they efficient? Have they resolved issues within a useful timeframe? A quick response-time is nice and feelgood, but if they don’t fix the problem, that’s not quite so good!
Are they patient with you when you suggest issues, but due to your own technical background are slightly off-base?

Of course you could always do your own backups, with external drives like this, or caddies (2.5″, 3.5″) for old spare hard drives you have lying around, but that doesn’t substitute for the technical support you get from a good host.

GROUP BY and single-row values

So, you’ve constructed a SQL statement with a GROUP BY clause, and you’re getting this message:

Column col is invalid in the select list because it is not
contained in either an aggregate function or the
GROUP BY clause.

Bear in mind that the result of a GROUP BY statement, or of a statement where one or columns uses an aggregate function, has each row containing a summary of other rows.

This means that if you try to include a column in your select clause that isn’t a summary (this includes values by which you’re grouping), then the server is going to have difficulties returning it; remember, it’s going to return one row per group, and any value that can’t be reduced to a single row per group will fail.

This gets interesting even when you’re hoping to return a single value per group, for example with CASE.

If the CASE expression you’re using is dependent on values of individual rows rather than summaries, you’ll also get the above message; you may only reference non-aggregate values (i.e. single-row values) in an aggregate function, or in the WHERE clause, so the solution would involve placing your CASE inside an aggregate function, in this case your SUM.

The same applies for the HAVING clause, as it is effectively selecting rows in the grouped resultset based on values in each group.

Returning a copy of an object

Consider an object passed (by reference) into a method, which then modifies the object. Should the method return the object on completion?

Well, it’s redundant. Here’s why.

There are two approaches to doing this – the classic way (caller chooses what to manipulate/change), and callee changes (which is what happens when you pass in an object by reference:

If the caller chooses what to manipulate, then your method will operate on a copy of the dataset, and return that copy, leaving the original untouched. Then the caller may overwrite the original, if desired. Many array operations in Perl work like this.
The alternative lets the called method do the in-place operation, in which case the return value is more typically a boolean (“Yes, I’ve succeeded!”) or an int containing the number of values affected. This is how database updates work.

Of course, if the object is not passed by reference (i.e. a copy is passed into the method) or if the method explicitly returns a modified copy, then returning an object is the right way to do it.

Redacting Text the Dan Brown Way.

Dan Brown’s The Lost Symbol makes a passing reference to some “redacted text”, where only a highlighted portion of the text becomes available.

In the story, we find a redacted text where results appear like this:

######## secret location UNDERGROUND where the ########
######## somewhere in WASHINGTON D.C., the coordinates ########
######## uncovered an ANCIENT PORTAL that led ###########
######## warning the PYRAMID holds dangerous ###########
######## decipher this ENGRAVED SYMBOLON to unveil #######

…so Trish Dunne, a metasystems expert, constructs a spider to find the document for Katherine Solomon, but the document itself had been redacted.

Now, constructing a “search spider” is not exactly the technical exercise Dan Brown makes it out to be, but the redacted text the search spider produced got me thinking.

A while back, I put together a regular expression for someone who wanted their search engine to produce a summary of searched text, only showing the surrounding words. This isn’t exactly what Trish achieves, but it gave me an idea….

Although it’s not a common requirement, redacting text looks like a bit of fun. Let’s try it in Perl. Given the following input text, let’s do a bit of searching and redacting:

“Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum”

Right. Let’s look for the words “dolor”, “nisi”, and “officia deserunt”:

#!/usr/bin/perl -w

$match = "dolor|nisi|officia deserunt";

while(){
    while(m!((?:w+bs*){2})($match)(w*s*(?:w+bs*){2})!g){
        $x = $`;
        $y = "$1U$2E$3";
        $_ = $';

        $x =~ tr/A-Za-z/#/;
        print "$x$y";
    }
    tr/A-Za-z/#/;
    print;
}

Now this isn’t quite as complex as we’d need it to be; although it allows multiple search words, and catches them all, it doesn’t catch longer strings of them. Nevermind, let’s see how it works.

Unsurprisingly, the match is the magic part. Here’s it exploded:

m!                      # start the match
    (                   # find and remember the two words before the match
        (?:             # define a word (but don't remember it)
            w+         # ...as a bunch of letters/numbers
            b          # (stop matching letters/numbers!)
            s*         # ...followed by zero or more spaces/tabs/etc.
        ){2}            # Now we know what a word is, we want two of them.
    )                   # that's the two words...
    (                   # Now remember the matched word itself.
        $match          # Use the $match variable above.
    )                   #
    (                   # Now the next two words
        w*s*          # Any number of alphanum (even zero, because "dolor"
                        # should also match "dolore")
        (?:             # second word (but don't remember it)
            w+bs*    # same as above
        ){2}
    )
!gx                     # the "!" matches "m!" above, g matches globally
                        # The "x" is explained below.

Little aside first: the exploded code above is perfectly valid Perl, and would work just as well as the code in the complete program above. The magic is performed by the x suffix on the expression; this tells perl to ignore whitespace and comments within a regular expression, and lets people like me explain what’s going on right there in the code. Handy, eh?

So, once we’ve identified each matching pattern, we display it (the $&) after showing “#” for each letter that comes before the match (the $`). Then we reset the text to match ($_) to the remainder of the string, and go again.

(Another aside: if we wanted to replace the spaces and punctuation with ‘#’, as well as all letters, we could do this instead of the tr///: $x = '#' x length($x);)

At the end of matching, we fail to match any more, so we simply translate the remainder to hashes again.

After running it against our file, we get this:

Lorem ipsum DOLOR sit amet, ########### ########### ####, ### ## ####### ###### ########## ## labore et DOLORe magna aliqua. ## #### ## ##### ######, #### ####### ############ ullamco laboris NISI ut aliquip ## ## ####### #########. #### aute irure DOLOR in reprehenderit ## ######### ##### esse cillum DOLORe eu fugiat ##### ########. ######### #### ######## ######### ### ########, #### ## culpa qui OFFICIA DESERUNT mollit anim ## ### #######.

Job done.

Representing Hierarchical data in PHP

I’ve come across a lot of people having a problem when representing data from their databases; hierarchical data provides us with a particular problem when displaying the relationships and containers required.

An example of hierarchical data

As an example, let’s say you have forums within categories. The category “Web Programming” may have forums like “PHP”, “JavaScript” and so on, so we want to list each forum under each category. The problem here is that you’ll have:

a table for categories
one for forums (with a foreign key for categories)
one for posts, with a foreign key for forums
and probably one for comments.

This fits quite neatly into a hierarchy: each comment belongs to a post, which belongs to a forum, which belongs to a category.

Using multiple resultsets

If you want to display the Categories with Forums under them in your page, one way to do this is by setting up multiple resultsets, each focusing on one level of the hierarchy. To achieve this, you’d need to get a resultset for the categories first, and then iterate over this list with another loop for your forum list.

The following code is an example of iterating using nested queries:

$cat_rs = mysql_query("select id, name from categories");
while($cat_row = mysql_fetch_array($cat_rs)){
     // print category name from $cat_row[1]
     $forum_rs = mysql_query("select name... "
                          . "from forums "
                          . "where cat_id = '" . $cat_row[0] ."'");
     while($forum_row = mysql_fetch_array($forum_rs)){
        //print forum stuff
     }
}

Although this matches the hierarchical nature of the data, and is pretty straightforward and intuitive to understand, it is somewhat inefficient; it issues a select statement for each category, plus one to list the categories in the first place. If there are many categories, the page might take some time to load as all queries are issued and processed.

Using a SQL `JOIN`: a single resultset

Another way to solve the problem would be to retrieve all relevant data in a single resultset, and use PHP to iterate over the rows, deciding on when to print each individual section.

A join will give you a single recordset with the category in one column, and the forum in another, giving you many rows for a single category.

$join_rs = mysql_query("select c.id, c.name, f.name,... "
                        . "from categories c inner join forums f "
                        . "on c.id = f.cat_id "
                        . "order by c.id"); // this line is crucial
$current_category = "";
while($join_row = mysql_fetch_array($join_rs)){
	if($join_row[0] != $current_category){
                //set up new category headings here
                $current_category = $join_row[0];
        }
        //print the forum stuff in the current category
}

The PHP is a bit more complex, because you have to close divs or tables correctly, within the loop. The order by line in the above example is vital, because without it, the categories may be dotted throughout the resultset. As we want all forum data related to a single category to appear together, we want rows belonging to that category to be adjacent in the resultset. The order by clause achieves this.

Despite the complexity, there are still benefits to this approach. The earlier example was simple and intuitive, using nested queries for each parent element. On the other hand, in this example we’ve only issued one select statement, so the page is likely to load more quickly, given that the number of round trips to MySQL will be substantially fewer.

PHP: Handling Parameter Arrays from Forms

Sometimes, when processing a form, you need to retrieve a number of values for a single variable. This is often a cause for confusion when beginning PHP. The particular problem faced: how do we process multiple values coming from a form to a databse, when using PHP?

Imagine we had a drop-down box containing a number of elements, and we want the user to select one or more. We’d use the select element in HTML.



Note the square brackets: for parameter arrays — parameters that allow multiple values — PHP requires that you name a GET or POST variable accordingly, so the select above needs to be named taglist[] in order to tell PHP that it will contain an array rather than a single value.
In order to process this, we then have to iterate over that array.
To iterate over the array in PHP code, we use something like this:
foreach($_POST["taglist[]"] as $s) {
    mysql_query("INSERT INTO articlestagged (article_id, tag) " .
       "VALUES ('$id','$s')")
        or die("Insert Error: ".mysql_error());
}

Obviously this example relates to adding new entries to a database, but the same would apply if we were querying a database, displaying multiple images, or indeed anywhere else we needed to operate on multiple concurrent values.

AWE (Address Windowing Extensions) in SQL Server

Memory is cheap these days; it’s not unusual to find laptops with 4GB or more (and yes, I know this post won’t date well!). This is a bit of a problem with some operating systems, though: 4GB is the maximum memory that many system are capable of handling. Yes, that’s right. Anything over 4GB is wasted on many of today’s operating systems, because of a limit imposed by 32 bit processors on how much memory they may address[1].

This becomes rather a problem on SQL Server installations, where having a lot of available memory is quite beneficial.

It’s worth mentioning at this point that 64-bit systems don’t suffer from the same memory limitations as 32-bit systems, so much of this article is irrelevant if your system is running on a 64-bit edition of Windows. The issue we’re talking about here is that 32 bit systems can ordinarily access only 4GB of RAM in total, which under default conditions means that your SQL Server instance can only access 2GB of useful RAM for its own needs, and putting one of these in your development laptop is quite pointless.

However, it is possible to grant access to more memory, but it does depend on a number of conditions.

Physical Address Extension

Since Windows 2000, Microsoft have supported Physical Address Extension (PAE) in a kernel, although the support requires using a non-default kernel. To use PAE, both the hardware and the operating system must support it.

Various 32 bit Windows servers support from 8-128GB, depending on edition. Windows 2000 Advanced and Datacenter Servers, Windows Server 2003 Enterprise and Datacenter Editions, and Windows Server 2008 Enterprise and Datacenter Edition support more than 4GB in 32 bit editions, by using PAE, with Windows 2003 Datacenter hitting the top score of 128GB.

Address Windowing Extensions

Address Windowing Extensions (AWE) is an API in Windows, used by SQL Server to support larger amounts of physical memory. It shouldn’t surprise you by now that AWE relies on PAE, which in turn has the other prerequisites as described above, so in order to enable AWE in SQL Server (using sp_configure), PAE must be supported by the underlying operating system.

The benefits are obvious; SQL Server will have access to more physical memory, which will make its job a lot easier. Your buffer cache will love you, as more and more data can be pulled out of memory rather than causing page hits on your oh-so busy hard disk. The drawbacks are slightly less obvious; memory granted to SQL Server using AWE is non-paged, which means it’s allocated until SQL Server is shut down. This will cause problems is SQL Server is sharing its host with other instances or other AWE-aware services. So keep your maximum memory settings tuned, and enjoy the headroom!

Footnote:
32 bits is 4 bytes (of 8 bits each). Just like a 3-digit number can store up to 1000 different values — 0-999 — an 8-bit number can store from 0-256. A 6-digit number can store 1000 x 1000, which is 1,000,000 different values, 0 – 999,999. Similarly, a 16-bit number can store 256 x 256, which is 65,536 different values. 32 bits can store 65,536 x 65,536 values, which is a number comfortably over 4 billion. This is where we get the 4GB upper limit for most 32 bit systems; they simply can’t point to the location of memory outside this range.