MySQL Blogging

I’ve taken up residence at https://blogs.oracle.com/jsmyth/ for all my MySQL bloggy goodness. I’ll still blog about all of the wide and varied other things here, because it’s entirely possible that the world of MySQL users out there don’t really care about Perl or how to configure Portugese Model M keyboards in Linux.

In the meantime, expect to see a bunch of different questions and answers over there that have cropped up over the last while in my MySQL course writing, questions from students and instructors, and interesting issues that pop up when challenges and conundrums face someone who isn’t a MySQL expert but has to work with it (hey, that describes about 95% of MySQL’s userbase, right?).

MySQL: Mass email change

It’s not unheard of for a company to change e-mail domain in mid-thrust; maybe it’s been bought out, or rebranded, or the parent company has spun it off to its own brand.

Only you’ve got hundreds of employees, each one with their own email address, and your MySQL database is in dire need of updating to reflect this.

To get around this, you’ll need to replace the relevant part of each email string within an update statement, grabbing the hostname substring (after the ‘@’) with a REPLACE, and replacing it.

UPDATE table SET email=REPLACE(email,'OLDHOST.com', 'newhost.com');

Note: REPLACE() is case-sensitive, so if needs be, you can use LOWER(email) inside the REPLACE function if you need to catch all case possibilities, as below:

UPDATE table SET email=REPLACE(LOWER(email),'oldhost.com', 'newhost.com');

This will also convert all your email addresses to lowercase, so be aware of that.

Tables, tables, tables

So, what’s a temporary table? or an in-memory table? or a pivot table?

An in-memory table is a table in some platforms that’s stored entirely in memory. These don’t really exist in MS SQL, although you could say that it is a table that’s been entirely cached, and so doesn’t result in any physical (hard disk) reads when queried. In earlier versions, the DBCC PINTABLE command allowed the “pinning” of tables in memory, but this was deprecated in SQL Server 2005.

Often, a table-valued variable, @tablename, might be stored in memory (although this is not guaranteed), and declared in a batch or function, with no persistence.

A temporary table is a table that will be automatically dropped when it’s no longer needed, usually when the creating session is terminated. In MS SQL, they begin with a # (or two hashes if they’re global temporary tables, shared between multiple sessions), and are often created with a SELECT INTO #TEMPTABLE … style query.

A pivot table is a special form of query where the values in several rows are summarised, “pivoted” on an axis, and become columns, where the summary data then becomes the rows. Frequently this happens where you’ve rows sorted on dates; these may then be “pivoted” so you end up with a column for January, one for February, one for March, etc.

GROUP BY and single-row values

So, you’ve constructed a SQL statement with a GROUP BY clause, and you’re getting this message:

Column col is invalid in the select list because it is not
contained in either an aggregate function or the
GROUP BY clause.

Bear in mind that the result of a GROUP BY statement, or of a statement where one or columns uses an aggregate function, has each row containing a summary of other rows.

This means that if you try to include a column in your select clause that isn’t a summary (this includes values by which you’re grouping), then the server is going to have difficulties returning it; remember, it’s going to return one row per group, and any value that can’t be reduced to a single row per group will fail.

This gets interesting even when you’re hoping to return a single value per group, for example with CASE.

If the CASE expression you’re using is dependent on values of individual rows rather than summaries, you’ll also get the above message; you may only reference non-aggregate values (i.e. single-row values) in an aggregate function, or in the WHERE clause, so the solution would involve placing your CASE inside an aggregate function, in this case your SUM.

The same applies for the HAVING clause, as it is effectively selecting rows in the grouped resultset based on values in each group.

Representing Hierarchical data in PHP

I’ve come across a lot of people having a problem when representing data from their databases; hierarchical data provides us with a particular problem when displaying the relationships and containers required.

An example of hierarchical data

As an example, let’s say you have forums within categories. The category “Web Programming” may have forums like “PHP”, “JavaScript” and so on, so we want to list each forum under each category. The problem here is that you’ll have:

  • a table for categories
  • one for forums (with a foreign key for categories)
  • one for posts, with a foreign key for forums
  • and probably one for comments.

This fits quite neatly into a hierarchy: each comment belongs to a post, which belongs to a forum, which belongs to a category.

Using multiple resultsets

If you want to display the Categories with Forums under them in your page, one way to do this is by setting up multiple resultsets, each focusing on one level of the hierarchy. To achieve this, you’d need to get a resultset for the categories first, and then iterate over this list with another loop for your forum list.

The following code is an example of iterating using nested queries:

$cat_rs = mysql_query("select id, name from categories");
while($cat_row = mysql_fetch_array($cat_rs)){
     // print category name from $cat_row[1]
     $forum_rs = mysql_query("select name... "
                          . "from forums "
                          . "where cat_id = '" . $cat_row[0] ."'");
     while($forum_row = mysql_fetch_array($forum_rs)){
        //print forum stuff
     }
}

Although this matches the hierarchical nature of the data, and is pretty straightforward and intuitive to understand, it is somewhat inefficient; it issues a select statement for each category, plus one to list the categories in the first place. If there are many categories, the page might take some time to load as all queries are issued and processed.

Using a SQL JOIN: a single resultset

Another way to solve the problem would be to retrieve all relevant data in a single resultset, and use PHP to iterate over the rows, deciding on when to print each individual section.

A join will give you a single recordset with the category in one column, and the forum in another, giving you many rows for a single category.

$join_rs = mysql_query("select c.id, c.name, f.name,... "
                        . "from categories c inner join forums f "
                        . "on c.id = f.cat_id "
                        . "order by c.id"); // this line is crucial
$current_category = "";
while($join_row = mysql_fetch_array($join_rs)){
	if($join_row[0] != $current_category){
                //set up new category headings here
                $current_category = $join_row[0];
        }
        //print the forum stuff in the current category
}

The PHP is a bit more complex, because you have to close divs or tables correctly, within the loop. The order by line in the above example is vital, because without it, the categories may be dotted throughout the resultset. As we want all forum data related to a single category to appear together, we want rows belonging to that category to be adjacent in the resultset. The order by clause achieves this.

Despite the complexity, there are still benefits to this approach. The earlier example was simple and intuitive, using nested queries for each parent element. On the other hand, in this example we’ve only issued one select statement, so the page is likely to load more quickly, given that the number of round trips to MySQL will be substantially fewer.