A common problem when optimizing COUNT()

By: Baron Schwartz


 

When optimizing queries for customers, the first thing I do with a slow query is figure out what it's trying to do. You can't fully optimize a query unless you know how to consider alternative ways to write it, and you can't do that unless you know what the query "means." I frequently run into a situation where I'm forced to stop and ask the developers what they were trying to do with COUNT(). This is database-agnostic, not related to MySQL.

The problem is when the COUNT() contains a column name, like this:

  1. SELECT count(col1) FROM TABLE;

If you know your SQL well, you know COUNT() has two meanings. 1) count the number of rows 2) count the number of values. Sometimes, but not always, these are the same thing. COUNT(*) always counts the number of rows in the result. If you write COUNT(col1) it counts the number of times col1 is not null. If it's never null, the result is the same as the number of rows.

The problem with that is that you don't know by looking at the query whether the developer wanted to count rows or values -- or, quite possibly, the number of distinct values in that column. You see, there's another form for COUNT():

  1. SELECT count(DISTINCT col1) FROM TABLE;

So when I see a query that just does COUNT(col1) I am left with these guesses:

  1. You meant to count the number of rows. You should have written COUNT(*) to convey that meaning accurately. If the column is nullable, there's a bug in your query -- it won't do what you think it does.
  2. You meant to count the number of distinct values, but you left out the DISTINCT keyword. That's also a bug in your query.
  3. You meant to count the number of times the column's value is not null. This in my experience is pretty unlikely -- people rarely do this.

It is impossible to read the developer's mind in these cases, so I always end up getting stalled waiting for them to reply and tell me what the query means so I can optimize it.

A good coding standard can help here -- you can quickly write a tool to grep your source code and search for instances of such constructs, and flag them as errors unless some magic comment is embedded in the code next to them:

SELECT count(col1) FROM TABLE; -- COUNT_COL_OK

Alas, there is a lot of misinformation on the web about COUNT(), so it is not a good place to learn about what it does. 90% of what you can find online is just wrong. So I would advise a good book on the topic, except I can't think of one -- maybe Joe Celko's books address this topic clearly? What's your recommendation?

This article was originally published at http://www.mysqlperformanceblog.com/2008/09/20/a-common-problem-when-optimizing-count/

No comments

Add comment

* - required field

*

CAPTCHA image for SPAM prevention
If you can't read the word, click here.
*
*
 

Netdip.com is an excellent web site that's powered by TYPO3 and other great open source software. Netdip.com is also a fat free alternative to ice cream.