The beauty of SQL (structured query language) is in its simplicity and readability. These two traits should be no surprise given that the word structure is in its name, but to be given such characterizations means that restrictions must be in place in order to operate. The structural restriction I am referring to is the order in which clauses can be syntactically ordered and executed within the context of the query. Based on the clauses that are present in a query, a specific syntax order must be followed in order to successfully run the query. If it isn’t adhered to, then there will be execution errors that will prevent the query from successfully running.
The following is the structure of a query both in terms of syntax and execution order. The syntactical order must be followed to prevent errors from occurring when you run your query and the execution order is how the database optimizer navigates your SQL query to make sense of the data you are trying to retrieve and optimize the speed of the execution.
|WITH (Common Table Expression)
|1 (If present and chosen)
|UNION / UNION ALL
|LIMIT / OFFSET
WITH (Common Table Expression/CTE) are modularized queries that are stored temporarily in memory and used as a reference within the parent SQL query. They are a clean way to simplify complex SQL queries and since they are referenced by the parent query, they appear first within the SQL query syntax. The execution of this query may also be first, but depends on the database optimizers evaluation.
SELECT is the first clause used within an individual SQL query and is the only clause that is mandatory for a query to run. While it is first syntactically within our query, it is actually executed after the final result set is created. The reason for the late execution is because it waits for the execution of the clauses used in the remainder of the SQL query to determine what columns can be displayed within your result set. While querying a single table would be straightforward in determining what columns are available to be used, when multiple tables are joined together or a subquery is used, then the importance of this execution order starts to make sense.
DISTINCT is a clause used within the SELECT clause that operates like GROUP BY. When used, DISTINCT will group rows and remove duplicates from a result set based on the values passed to the SELECT clause. Since this action acts on the column grouping being selected, it runs after the SELECT both in terms of syntax and execution.
FROM traditionally follows SELECT syntactically (exception being the use of DISTINCT) within a SQL query and is executed first. The reason for the immediate execution is because it provides the initial data table in which aggregations, filters, sorts and limits can be applied. All table joins (JOIN) that are made in addition to the data table being referenced directly in the FROM clause will be executed and then the optimizer will search for the next present clause and determine the optimal order to execute the following action.
WHERE is executed after the FROM clause. WHERE is the last remaining SQL clause to be defined that can modify the initial relational data being retrieved and displayed to a result set. This clause filters the relational results and is then handed off to the optimizer to determine what should be executed next.
GROUP BY is next in the execution order. GROUP BY is a clause that executes after the SELECT, FROM and WHERE clauses. It groups rows by the distinct records based on the combination of columns present in the SELECT clause. The operation of grouping rows serves as the changing point from the relational data set layer to non-relational operations. This behavior, not surprisingly, is identical to DISTINCT usage within the SELECT clause, but is different in that GROUP BY opens up the SELECT clause to be able to use aggregation functions. Aggregations are a non-relational operation, which means they run after the relational data is in place hence the reason why GROUP BY runs after relational data is set.
HAVING will execute after GROUP BY. HAVING is the aggregation version of the WHERE clause. While HAVING can be used without GROUP BY being present, it is typically used when the user wants to filter by an aggregate function. Since aggregation is a non-relational operation, HAVING operates in a similar fashion to GROUP BY explaining the execution of this clause lower in the execution order.
UNION / UNION ALL combines the result sets of multiple queries. Once this combination is complete, the final result set can be ordered and limited which is why this clause falls before ORDER BY and LIMIT / OFFSET in the execution order and after the relational and non-relational operations have run to finalize the individual result sets.
ORDER BY will execute after the relational data operations or if a non-relational operation is used, then after those operations. ORDER BY is the mechanism that sorts data once modifications to the result set are complete.
LIMIT and OFFSET will execute last in the execution order. After querying, aggregating, grouping and finally ordering data, these clauses will run, limiting the amount of rows displayed in the result set or offsetting the result set by the amount provided.