Choosing Between Database Platforms: MS SQL and MySQL, with a twist

Another StackOverflow question: Given a site using PHP and VB.NET, and a choice between Microsoft SQL Server and MySQL, which coupling would be easier to maintain?

VB.NET and Microsoft SQL Server are an obvious marriage, as are PHP and MySQL, but as both development languages are in use, we are faced with choosing between two odd combinations:

VB.NET and MySQL
or PHP and MS SQL

My response? I’d go with MySQL (possibly controversially!), although SQL Server is by far the superior platform, all things considered. If you’re comparing them as similar alternatives, you’re probably not going to use any of the features of MS SQL that make it the better platform, and so it’s not worth the extra hassle.

In summary, here’s why:

PHP’s support for MySQL is second to none (given the following caveat)
PHP’s support for SQL Server is suboptimal; Microsoft provide a PHP driver, and there are other techniques, but PHP’s simply not quite as database-agnostic as VB.NET
VB.NET, although it loves SQL Server, will happily talk to any OLEDB provider (e.g. an ODBC connection) with no problems whatsoever, and MySQL’s ODBC support is pretty mature.

How to do application support: Padre

I accidentally stumbled across Padre, a Perl-native IDE, the other day, and although I wasn’t hugely impressed with its featureset, I was massively impressed with one small aspect. It’s still fairly new, and although it’s only a year or so old, it already has quite a few necessary features.

I wouldn’t compare it to Eclipse or any other relatively mature IDE, but I was massively impressed with one idea, that of user support.

The screenshots page almost casually refers to the “Live Support” option. Yes, they have Live Support. I’m sure it doesn’t come with a guarantee of instant access to the developers, but for an open source project, it’s pure genius. The developers and enthusiastic early-adopters probably hang out in the IRC channel anyway, even without any payment or expectation of payment. Tying this to the users, via a freely available web-based IRC client, is a remarkably clever use of existing technologies and the developer culture.

I’ve been to application-specific IRC channels before, and they tend to be polarised to either rather elitist “are you a developer? no? then buzz off” attitudes, or the opposite, policed by folk who like answering easy questions, and are therefore lording it over their own fiefdoms, kicking curious lurkers who want to pick up the accumulated wisdom by osmosis.

Padre is the first example I’ve seen where, rather than expecting the curious (or desperate) users to get onto IRC (IRC? what’s that?) themselves, the turn it into a clickable “Live Support” option, and bring the users to them with little or no difficulty.

I wish more applications did this.

Running a command on a remote server with SQL Server

Another StackOverflow question: How do I run an OS command on a remote server, from within SQL Server? I answered as follows.

Something like rsh might do – have your local SQL instance invoke xp_cmdshell with an rsh command (or a batch file), connecting to the remote server and executing the command.

If you have SQL on the remote machine, it’d be far easier; call a stored procedure on that machine from the local machine, and let that stored procedure do the work (might need fiddling with proxy accounts and “execute as”).

A third option would be to create a .NET stored procedure, and do your socket programming or remote invocation from there – it runs in-process, so you get similar same performance and security benefits you’d get from writing it in T-SQL rather than hopping out to a cmdshell with all the mess that entails.

The rsh solution won the day.

Having said that, the rsh solution is not tied to SQL Server — rsh can be invoked from anything capable of running shell commands locally, even a batch file, or a Perl script.

Rsh is a feature Microsoft added to Windows a few versions ago, although a variety of it has been available in resource kits since NT 4.0. In simplest terms, it provides a remote command line, not unlike telnetting into a Unix server, only you get the Windows CMD prompt. Rsh allows the invoker to invoke a single command remotely, using command-line options, and without the need to write any further server-side code. This makes it suited to the quick-and-dirty route taken by the recipient of the above answer.

Data Warehousing: To Primary Key or not to Primary Key?

Constraints are a necessary part of the tug-and-thrust of database design, but once in a while, when designing a data warehouse, someone comes up with the question “Do I need to map those constraints to my DW?” The simple answer is “Yes”, but it’s also a somewhat naive answer that doesn’t take into account the needs of a DW.

For your primary key, consider using a surrogate or alternate key; you’ll need to cater for slowly changing dimensions, e.g. if you’re doing a report over the last 5 years on average sales per married/unmarried salesperson, you’ll want to register the fact that someone was unmarried for 2 years, then married for the last 3. This means your data warehouse will have two dimension-table rows for the same person. Following the OLTP structure for that will be hard, to say the least. Note that it’s not enough to create a composite key on the original primary key and e.g. marital status — the same person may also have other fields that change occasionally, such as address (zipcode), department, job title, or even gender!

As for other constraints such as default value or check constraints, we need to consider how a DW is populated, and what the arrival of data to the DW means. The DW is invariably populated from an OLTP (relational) system, the same system that typically has all the original constraints. Ideally, this means the data is already clean by the time the DW gets it. It also means that the performance hit required by checking these constraints in writing to the database has already been taken. And OLTP system is typically optimised for writes and reads, particularly the large tables that change frequently and are the subject of most of the storage in a DW.

On the other hand, Data Warehouses are massively optimised for reads (assuming you’re populating as a batch, with reasonably high latency), and constraints don’t really factor in read operations. You can typically get around any constraint issues with your DW populating job, if they haven’t already been dealt with in the OLTP system, and deal with nulls etc. at the analysis/reporting tools if necessary. It’s far more important to make sure default values fit with your conceptual data model, and don’t introduce issues at the DW client tools.

SQL: Why nulls slow down some queries

Someone asked a question on StackOverflow today about why null values slow down certain queries. This is roughly what I answered.

The main issue with null values and performance is to do with forward lookups.

If you insert a row into a table, with null values, it’s placed in the natural page that it belongs to. Any query looking for that record will find it in the appropriate place. Easy so far….

…but let’s say the page fills up, and now that row is cuddled in amongst the other rows. Still going well…

…until the row is updated, and the null value now contains something. The row’s size has increased beyond the space available to it, so the DB engine has to do something about it.

The fastest thing for the server to do is to move the row off that page into another, and to replace the row’s entry with a forward pointer. Unfortunately, this requires an extra lookup when a query is performed: one to find the natural location of the row, and one to find its current location.

So, the short answer to the question of performance is yes, making those fields non-nullable will help search performance. This is especially true if it often happens that the null fields in records you search on are updated to non-null.

Of course, there are other penalties (notably I/O, although to a tiny extent index depth) associated with larger datasets, and then you have application issues with disallowing nulls in fields that conceptually require them, but hey, that’s another problem!

Perl editing with Eclipse

Part of the problem with developing Java is the plethora of IDEs out there, and the lack of standardisation. It’s not really a problem with IDEs as much as with Java server platforms, as IDEs are largely the same; server platforms are rarely the same.

With Perl, the lack of standardisation in IDEs is not considered as much a problem, for the simple reason that many Perl programmers are really old-school, and tend to prefer simple text editors. Most of my recent Perl work has been done through Vim.

However, after teaching a Java course recently through a combination of Eclipse and ConTEXT, I had a look at Eclipse’s support for Perl, particularly with a view to debugging support; Vim doesn’t have native step-through debugging, and Eclipse seems already suited to things like that.

If you’re already familiar with debugging in Eclipse, then the EPIC plugin is well worth looking at for its Perl support.

It’s got stepped debugging within the Debug perspective, just like Eclipse has with other languages. Its Perl support is not as strong as the Java support — the Watch features and relatively simple editor features like refactoring support leave a lot to be desired — but it’s got an easier learning curve than e.g. “perl -d” (the ‘standard’ way to debug perl), or even learning a new editor like Emacs, with its Perl debugging integration. Of course, as a Vim user I haven’t even learned to hack Perl in Emacs…