Representing Hierarchical data in PHP

I’ve come across a lot of people having a problem when representing data from their databases; hierarchical data provides us with a particular problem when displaying the relationships and containers required.

An example of hierarchical data

As an example, let’s say you have forums within categories. The category “Web Programming” may have forums like “PHP”, “JavaScript” and so on, so we want to list each forum under each category. The problem here is that you’ll have:

  • a table for categories
  • one for forums (with a foreign key for categories)
  • one for posts, with a foreign key for forums
  • and probably one for comments.

This fits quite neatly into a hierarchy: each comment belongs to a post, which belongs to a forum, which belongs to a category.

Using multiple resultsets

If you want to display the Categories with Forums under them in your page, one way to do this is by setting up multiple resultsets, each focusing on one level of the hierarchy. To achieve this, you’d need to get a resultset for the categories first, and then iterate over this list with another loop for your forum list.

The following code is an example of iterating using nested queries:

$cat_rs = mysql_query("select id, name from categories");
while($cat_row = mysql_fetch_array($cat_rs)){
     // print category name from $cat_row[1]
     $forum_rs = mysql_query("select name... "
                          . "from forums "
                          . "where cat_id = '" . $cat_row[0] ."'");
     while($forum_row = mysql_fetch_array($forum_rs)){
        //print forum stuff
     }
}

Although this matches the hierarchical nature of the data, and is pretty straightforward and intuitive to understand, it is somewhat inefficient; it issues a select statement for each category, plus one to list the categories in the first place. If there are many categories, the page might take some time to load as all queries are issued and processed.

Using a SQL JOIN: a single resultset

Another way to solve the problem would be to retrieve all relevant data in a single resultset, and use PHP to iterate over the rows, deciding on when to print each individual section.

A join will give you a single recordset with the category in one column, and the forum in another, giving you many rows for a single category.

$join_rs = mysql_query("select c.id, c.name, f.name,... "
                        . "from categories c inner join forums f "
                        . "on c.id = f.cat_id "
                        . "order by c.id"); // this line is crucial
$current_category = "";
while($join_row = mysql_fetch_array($join_rs)){
	if($join_row[0] != $current_category){
                //set up new category headings here
                $current_category = $join_row[0];
        }
        //print the forum stuff in the current category
}

The PHP is a bit more complex, because you have to close divs or tables correctly, within the loop. The order by line in the above example is vital, because without it, the categories may be dotted throughout the resultset. As we want all forum data related to a single category to appear together, we want rows belonging to that category to be adjacent in the resultset. The order by clause achieves this.

Despite the complexity, there are still benefits to this approach. The earlier example was simple and intuitive, using nested queries for each parent element. On the other hand, in this example we’ve only issued one select statement, so the page is likely to load more quickly, given that the number of round trips to MySQL will be substantially fewer.

PHP: Handling Parameter Arrays from Forms

Sometimes, when processing a form, you need to retrieve a number of values for a single variable. This is often a cause for confusion when beginning PHP. The particular problem faced: how do we process multiple values coming from a form to a databse, when using PHP?

Imagine we had a drop-down box containing a number of elements, and we want the user to select one or more. We’d use the select element in HTML.


Note the square brackets: for parameter arrays — parameters that allow multiple values — PHP requires that you name a GET or POST variable accordingly, so the select above needs to be named taglist[] in order to tell PHP that it will contain an array rather than a single value.

In order to process this, we then have to iterate over that array.

To iterate over the array in PHP code, we use something like this:

foreach($_POST["taglist[]"] as $s) {
    mysql_query("INSERT INTO articlestagged (article_id, tag) " .
       "VALUES ('$id','$s')")
        or die("Insert Error: ".mysql_error());
}

Obviously this example relates to adding new entries to a database, but the same would apply if we were querying a database, displaying multiple images, or indeed anywhere else we needed to operate on multiple concurrent values.

AWE (Address Windowing Extensions) in SQL Server

Memory is cheap these days; it’s not unusual to find laptops with 4GB or more (and yes, I know this post won’t date well!). This is a bit of a problem with some operating systems, though: 4GB is the maximum memory that many system are capable of handling. Yes, that’s right. Anything over 4GB is wasted on many of today’s operating systems, because of a limit imposed by 32 bit processors on how much memory they may address[1].

This becomes rather a problem on SQL Server installations, where having a lot of available memory is quite beneficial.

It’s worth mentioning at this point that 64-bit systems don’t suffer from the same memory limitations as 32-bit systems, so much of this article is irrelevant if your system is running on a 64-bit edition of Windows. The issue we’re talking about here is that 32 bit systems can ordinarily access only 4GB of RAM in total, which under default conditions means that your SQL Server instance can only access 2GB of useful RAM for its own needs, and putting one of these in your development laptop is quite pointless.

However, it is possible to grant access to more memory, but it does depend on a number of conditions.

Physical Address Extension

Since Windows 2000, Microsoft have supported Physical Address Extension (PAE) in a kernel, although the support requires using a non-default kernel. To use PAE, both the hardware and the operating system must support it.

Various 32 bit Windows servers support from 8-128GB, depending on edition. Windows 2000 Advanced and Datacenter Servers, Windows Server 2003 Enterprise and Datacenter Editions, and Windows Server 2008 Enterprise and Datacenter Edition support more than 4GB in 32 bit editions, by using PAE, with Windows 2003 Datacenter hitting the top score of 128GB.

Address Windowing Extensions

Address Windowing Extensions (AWE) is an API in Windows, used by SQL Server to support larger amounts of physical memory. It shouldn’t surprise you by now that AWE relies on PAE, which in turn has the other prerequisites as described above, so in order to enable AWE in SQL Server (using sp_configure), PAE must be supported by the underlying operating system.

The benefits are obvious; SQL Server will have access to more physical memory, which will make its job a lot easier. Your buffer cache will love you, as more and more data can be pulled out of memory rather than causing page hits on your oh-so busy hard disk. The drawbacks are slightly less obvious; memory granted to SQL Server using AWE is non-paged, which means it’s allocated until SQL Server is shut down. This will cause problems is SQL Server is sharing its host with other instances or other AWE-aware services. So keep your maximum memory settings tuned, and enjoy the headroom!


Footnote:
32 bits is 4 bytes (of 8 bits each). Just like a 3-digit number can store up to 1000 different values — 0-999 — an 8-bit number can store from 0-256. A 6-digit number can store 1000 x 1000, which is 1,000,000 different values, 0 – 999,999. Similarly, a 16-bit number can store 256 x 256, which is 65,536 different values. 32 bits can store 65,536 x 65,536 values, which is a number comfortably over 4 billion. This is where we get the 4GB upper limit for most 32 bit systems; they simply can’t point to the location of memory outside this range.

'Hello World' using AJAX

“Asynchronous Javascript And XML” is a mouthful for a web technology that’s in wide use these days. Simply put, it lets a web page download more content without reloading the page, and relies on a magic Javascript object called an XmlHttpRequest to do the work of requesting more data in the background, so it can fill in things on the page.

So, fiddling with AJAX, let’s do a little Hello Worlding:

Here’s a button, that when clicked, calls the “startRequest()” function in some javascript behind the entry:


And here’s a div:

<div id="ajaxtest">
<em>(although you can't see it as a separate div,
this text will be replaced)</em></div>
(although you can’t see it as a separate div, this text will be replaced)

Go on, watch the div text above, and click the button.

Here’s the javascript:

var xhr;
if (window.XMLHttpRequest){
  xhr = new XMLHttpRequest();
}
else if (window.ActiveXObject){ // in case it's IE
  xhr = new ActiveXObject("Microsoft.XMLHTTP");
 }
// set up the callback
xhr.onreadystatechange = function(){
  myDiv = document.getElementById("ajaxtest");
  if(xhr.readyState  == 4) {
  if(xhr.status  == 200){
   myDiv.style.background = "green";
   myDiv.innerHTML =  xhr.responseText;
  } else {
   myDiv.innerHTML = "Error: " + xhr.status;
  }
  }
}
function startRequest(){
  xhr.open('GET', '/ajaxtest.php', true);
  xhr.send(null);
}

Firstly, the code sets up a variable xhr, which contains the magical object. This is the XmlHttpRequest object, and is responsible for communicating from the live Javascript to the back end server.

The request object is first set up at the script’s execution, but the only thing we do with it is tell it what to do when its “ready state” changes; that is, when the request shifts from not initialised, to set up, to having been sent, to being in process, to completion.

These states are numbered 0-4 respectively. Hold that thought; we’ll use it later. Suffice to say that, on the script’s execution, we now have an otherwise anonymous function that will be called when the ready state changes.

Now, in order for something interesting to happen, we click the button. That kicks off startRequest(), which in turn sets up a asynchronous request to ajaxtest.php, and submits it with no parameters (that’s the null). Asynchronous means we don’t sit around waiting for a response, and that’s why we need the readystatechange callback function.

So, when something interesting happens to the xhr object, the ready state changes (e.g. when we get a response back after sending our request). The browser then executes the anonymous function: first, the function checks that the request is complete – remember Ready State 4 above? don’t forget the ready state changes several times per request. We only want to read the response text once, at completion. Once the function has checked the request has been successfully completed, it then uses the xhr’s response-text to fill in the div called “ajaxtest”.

Et voila, we have “Hello World” in AJAX.

SQL Server Compatibility Levels – to Upgrade or Not to Upgrade?

When you upgrade from one SQL Server type to another (most of my experience has been with 2000 -> 2005), one of the options available is to stick with the existing compatibility level, rather than upgrade.

SQL Server 2000 defaults to “Compatibility Level 80”, with other options including 65, 70. These represent the features and compatibility options for version 8.0 (SQL Server 2000), 6.5 and 7.0 respectively. In other words, it’s possible to coerce a database into thinking it’s running with features consistent with an earlier release of SQL Server.

When upgrading to 2005, it’s possible to retain compatibility level 80, but the recommended route is to switch to 90. But should you?

There are two ways to look at this:

* If you want to change your code (or install some collaborating SQL2005/2008 servers) in the future, your compatibility mode will affect that decision
* If you haven’t run the Upgrade Adviser, you may not be able to upgrade; compatibility level 90 disables certain syntaxes and features that were available in 80.

So, when upgrading, you don’t have to increase your compatibility level, but then you’ve to consider your future with the server too; you might be happy to run on the latest SQL Server instance, but with old-school database settings and syntaxes. On the other hand, collaboration with other servers and applications, or maintenance by future programmers and DBAs, will probably require the switch to a higher compatibility level.

Note: it’s entirely possible to have several databases with different compatibility levels on a single instance.

Tricks with ORDER BY

The ORDER BY clause in SQL Server is a bit more flexible than many people give it credit for.

In a training course I’m giving at the moment, we’ve just gone over a few of the lesser known uses, along with the widely used ones:

  • The obvious: ordering by a column name
    ORDER BY advance
  • Less obvious: ordering by an expression on a column
    ORDER BY id % 2 -- even IDs first, then odd
  • Ordering by multiple fields
    ORDER BY id % 2 desc, price
           -- odd orders first by even/odd IDs, then by prices within each
  • Ordering by visible column position
    SELECT id, customerid, price * quantity as "gross revenue"
    FROM orders
    ORDER BY 2
  • Ordering by column aliases
    SELECT id, customerid, price * quantity as "gross revenue"
    FROM orders
    ORDER BY "gross revenue"

The last one is particularly interesting, because column aliases can’t be used in other places e.g. in the where clause or from clause.