Four Kitchens

What makes Pressflow scale: #1 faster core queries

2 Min. ReadDevelopment

Drupal has a number of queries with unfortunate scalability profiles.

URL alias counting (one instance in core)

The biggest offender in Drupal 5 and Drupal 6 is the query counting the number of URL aliases: SELECT COUNT(*) FROM url_alias. This query dates back to when nearly every Drupal site ran on MyISAM, which is important because MyISAM keeps an exact count of the number of rows in every table, making SELECT COUNT (*) FROM [table] an O(1) (read: fast, constant-time) operation.

But InnoDB, the engine of choice for high-scale Drupal sites, does not keep an exact row count for tables because its multiversion concurrency control (MVCC) makes such a count difficult and inefficient. But, MySQL with InnoDB still faithfully runs the query, but by counting every row in the table, an O(n) operation, meaning it is proportionally slow to number of URL aliases on the site.

Such counting is particularly unfortunate because the URL alias system only cares whether the number is zero or the number is greater than zero. In Pressflow 5 and Pressflow 6, we replace this query with SELECT pid FROM url_alias LIMIT 1 (or equivalent), giving us just the information we need (“Is there at least one alias?”) in a way that runs in O(1) on both MyISAM and InnoDB.

Use of LOWER() for case-insensitivity (many places in core)

Drupal 5, 6, and 7 all currently use LOWER() on both sides of some queries to create database-agnostic, case-insensitive string matches. The uses of LOWER() in Drupal on the a table column prior to comparison in a query automatically degrades queries to O(n) with respect to the number of users on the site. The most users a site has, the more time login and other frequent user operations take.

The reason Drupal 5 and 6 use LOWER() is because PostgreSQL’s LIKE operation performs case-sensitive comparisons. (And PostgreSQL’s ILIKE operation is not cross-platform.) By using LOWER(), the same query can run on MySQL and PostgreSQL without modification.

But Pressflow 5 and Pressflow 6 only explicitly support MySQL, so they can take advantage of MySQL’s case-insensitive collations and seamlessly drop the LOWER(). Dropping LOWER() results in user lookups happening in O(log(n)) time, which is very fast for even the largest sites.