The problem is when the COUNT() contains a column name, like this:
If you know your SQL well, you know COUNT() has two meanings. 1) count the number of rows 2) count the number of values. Sometimes, but not always, these are the same thing. COUNT(*) always counts the number of rows in the result. If you write COUNT(col1) it counts the number of times col1 is not null. If it's never null, the result is the same as the number of rows.
The problem with that is that you don't know by looking at the query whether the developer wanted to count rows or values -- or, quite possibly, the number of distinct values in that column. You see, there's another form for COUNT():
So when I see a query that just does COUNT(col1) I am left with these guesses:
It is impossible to read the developer's mind in these cases, so I always end up getting stalled waiting for them to reply and tell me what the query means so I can optimize it.
A good coding standard can help here -- you can quickly write a tool to grep your source code and search for instances of such constructs, and flag them as errors unless some magic comment is embedded in the code next to them:
SELECT count(col1) FROM TABLE; -- COUNT_COL_OK
Alas, there is a lot of misinformation on the web about COUNT(), so it is not a good place to learn about what it does. 90% of what you can find online is just wrong. So I would advise a good book on the topic, except I can't think of one -- maybe Joe Celko's books address this topic clearly? What's your recommendation?
Netdip.com is an excellent web site that's powered by TYPO3 and other great open source software. Netdip.com is also a fat free alternative to ice cream.
Add comment