GROUP BY
is the first non-relational layer clause that executes. This means that it occurs after the initial result set has been generated during the query. GROUP BY behaves identically to DISTINCT in that it deduplicates the records based on the column groupings present in the query at the time of execution, but where it is different is that it allows aggregation functions and the HAVING clause to be able to be added to the query. The values that can be provided to GROUP BY are comma-separated column names or the index value of the column name. When an aggregation is included in the SELECT clause, each non-aggregated column must be included as a value in the clause either by column name or index position reference.
Given the following table:
id | first_name | last_name | department_id |
---|---|---|---|
1 | Nanni | Saphin | 4 |
2 | Bea | McTurlough | 2 |
3 | Erryn | Chable | 3 |
4 | Flynn | Roman | 1 |
5 | Mart | Faragher | 1 |
Querying data with GROUP BY referencing the column name and index position:
SELECT department_id
FROM employee
GROUP BY department_id
Or
SELECT department_id
FROM employee
GROUP BY 1;
Will return the following result set:
department_id |
---|
1 |
2 |
3 |
4 |
As you can see, both the name and index position reference will return the same result set.
Querying data with DISTINCT and GROUP BY:
SELECT DISTINCT department_id
FROM employee;
Or
SELECT department_id
FROM employee
GROUP BY 1;
Will return the following result set:
department_id |
---|
1 |
2 |
3 |
4 |
This is the identical result set as the result set displayed in the previous example and you can see that there is no difference in the results between using DISTINCT and GROUP BY in this query.
Aggregation function with GROUP BY:
SELECT department_id,
COUNT(department_id)
FROM employee
GROUP BY 1;
Will return the following result set:
department_id | count |
---|---|
1 | 2 |
3 | 1 |
2 | 1 |
4 | 1 |
With this query a count of the records representing each department is aggregated and then grouped by the department_id