Usually I found some questions on the forums about that topic. For instance:
- “WHERE or HAVING?”
- “Is the ON clause more efficient than the WHERE clause?”
- “Why the field aliases cannot be used with the GROUP BY clause?”
These three questions are syntax oriented. Actually there are a lot of requests about subqueries, temporary objects, sort operations and so on. This kind of questions can be replied reading this logical process document (pdf). I love it and I share it everywhere, also when training in classes/on the job.
It is a LOGICAL workflow used by the Query Processor in order to generate the related PHYSICAL process for retrieving data.
It was created by Itzik Ben-Gan.
This is the related query execution order:
- GROUP BY
- CUBE | ROLLUP
- ORDER BY
It’s simpler to reply to the questions now :)
For further information about this topic, I suggest to read this book and this post from Pinal Dave’s blog.
Why am I writing this post? Because I will never stop to say “read documents like this, try to go in depth, try to understand the query processor, always!“
Keep in mind that this workflow shows the logical process (and not the physical one) that SQL Server engine, in the Query Processor module, applies on our issued statements. What does happen just after we execute commands?
First of all, parsing phase with syntax check
- A tree with logical step is produced (high level, similar to the issued query)
- This tree is used for name resolution and dependency check
- The tree is sent to the Query optimizer in order to let the engine to create execution plans (physical one)
- The Query optimizer evaluates the costs of generated plans and picks the lowest-cost plan that will be used for retrieving data
- The plan may be stored in “memory” (plan cache)
- The engine executes the query and returns the data to the caller
So, the engine will not execute queries applying step by step the logical process. Some operations, for instance SELECTS, are in a single point in the pdf. Physically the engine will apply the SELECT in more than one step and when it’s necessary. The physical process is different also when using indexes. Think about the items you see while watching execution plans (nested loop, hash match, filters, sorts, etc.).
Why this query logical process is so important?
Trying to program in SQL without the knowledge of “logical processing order of query elements” is a straight path to lengthy, poor-performing code that is hard to maintain.
And now, replies to the questions on the top:
“WHERE or HAVING?”
The HAVING clause is executed after a WHERE and is designed for filter after an aggregation (i.e. a GROUP BY) and a already applied filter (the WHERE).
If you can filter using WHERE or HAVING, it is better to use WHERE, because you will take more performances (think about usage of the indexes).
“Can I use the ON clause as a WHERE clause?”
On INNER JOIN case, it’s near the same. The engine understands this behavior. On OUTER JOINs it’s completely different. Since the ON clause is executed just before the WHERE clause, it is used for considering the NULL rows (outer rows).
“Why the field aliases cannot be used with the GROUP BY clause?”
Because the SELECT is not occurred yet when executing the GROUP BY clause.
A topic like this needs more time to be explained. However it could be a starting point for a deeper discussion.