Filtering Groups with HAVING
The HAVING clause sets conditions on the GROUP BY clause similar to the way that WHERE interacts with SELECT. The HAVING clauses important characteristics are:
The HAVING clause comes after the GROUP BY clause and before the ORDER BY clause.
Just as WHERE limits the number of rows displayed by SELECT, HAVING limits the number of groups displayed by GROUP BY.
The WHERE search condition is applied before grouping occurs, and the HAVING search condition is applied after.
HAVING syntax is similar to the WHERE syntax, except that HAVING can contain aggregate functions.
A HAVING clause can reference any of the items that appear in the SELECT list.
The sequence in which the WHERE, GROUP BY, and HAVING clauses are applied is:
The WHERE clause filters the rows that result from the operations specified in the FROM and JOIN clauses.
The GROUP BY clause groups the output of the WHERE clause.
The HAVING clause filters rows from the grouped result.
To filter groups:
Following the GROUP BY clause, type:
search_condition is a search condition used to filter groups. search_condition can contain aggregate functions but otherwise is identical to the WHERE search condition, described in Filtering Rows with WHERE and subsequent sections in Chapter 4. You may combine and negate multiple HAVING conditions with the logical operators AND, OR, and NOT.
The HAVING search condition is applied to the rows in the output produced by grouping. Only the groups that meet the search condition appear in the result. You can apply a HAVING clause only to columns that appear in the GROUP BY clause or in an aggregate function.
In Listing 6.18 and Figure 6.18, I revisit Listing 6.9 earlier in this chapter, but instead of listing the number of books that each author wrote (or co-wrote), I use HAVING to list only the authors who have written three or more books.
Listing 6.18 List the number of books written (or co-written) by each author who has written three or more books. See Figure 6.18 for the result.
SELECT au_id, COUNT(*) AS "num_books" FROM title_authors GROUP BY au_id HAVING COUNT(*) >= 3;
Figure 6.18 Results of Listing 6.18.
In Listing 6.19 and Figure 6.19, the HAVING condition also is an aggregate expression in the SELECT clause. This query still works if you remove the AVG() expression from the SELECT list (Listing 6.20 and Figure 6.20).
Listing 6.19 List the number of titles and average revenue for the types with average revenue over $1 million. See Figure 6.19 for the result.
SELECT type, COUNT(price) AS "COUNT(price)", AVG(price * sales) AS "AVG revenue" FROM titles GROUP BY type HAVING AVG(price * sales) > 1000000;
Figure 6.19 Results of Listing 6.19.
Listing 6.20 Listing 6.19 still works without AVG(price * sales) in the SELECT list. See Figure 6.20 for the result.
SELECT type, COUNT(price) AS "COUNT(price)" FROM titles GROUP BY type HAVING AVG(price * sales) > 1000000;
Figure 6.20 Results of Listing 6.20.
In Listing 6.21 and Figure 6.21, multiple grouping columns count the number of titles of each type that each publisher publishes. The HAVING condition removes groups in which the publisher has one or fewer titles of a particular type. This query retrieves a subset of the result of Listing 6.14 earlier in this chapter.
Listing 6.21 List the number of books of each type for each publisher, for publishers with more than one title of a type. See Figure 6.21 for the result.
SELECT pub_id, type, COUNT(*) AS "COUNT(*)" FROM titles GROUP BY pub_id, type HAVING COUNT(*) > 1 ORDER BY pub_id ASC, "COUNT(*)" DESC;
Figure 6.21 Results of Listing 6.21.
In Listing 6.22 and Figure 6.22, the WHERE clause first removes all rows except for books from publishers P03 and P04. Then the GROUP BY clause groups the output of the WHERE clause by type. Finally, the HAVING clause filters rows from the grouped result.
Listing 6.22 For books from publishers P03 and P04, list the total sales and average price by type, for types with more than $10,000 total sales and less than $20 average price. See Figure 6.22 for the result.
SELECT type, SUM(sales) AS "SUM(sales)", AVG(price) AS "AVG(price)" FROM titles WHERE pub_id IN ('P03', 'P04') GROUP BY type HAVING SUM(sales) > 10000 AND AVG(price) < 20;
Figure 6.22 Results of Listing 6.22.
Generally, HAVING clauses should involve only aggregates. The only conditions that should be specified in the HAVING clause are those conditions that must be applied after the grouping operation has been performed. Its more efficient to specify conditions that can be applied before the grouping operation in the WHERE clause. The following statements, for example, are equivalent, but the first statement is preferable because it reduces the number of rows that have to be grouped:
SELECT pub_id, SUM(sales) --Faster FROM titles WHERE pub_id IN (P03, P04) GROUP BY pub_id HAVING SUM(sales) > 10000; SELECT pub_id, SUM(sales) --Slower FROM titles GROUP BY pub_id HAVING SUM(sales) > 10000 AND pub_id IN (P03, P04);