Aggregating Distinct Values with DISTINCT
You can use DISTINCT to eliminate duplicate values in aggregate function calculations; see Eliminating Duplicate Rows with DISTINCT in Chapter 4. The general syntax of an aggregate function is:
agg_func([ALL | DISTINCT] expr)
agg_func is MIN, MAX, SUM, AVG, or COUNT. expr is a column name, literal, or expression. ALL applies the aggregate function to all values, and DISTINCT specifies that each unique value is considered. ALL is the default and rarely is seen in practice.
With SUM(), AVG(), and COUNT(expr), DISTINCT eliminates duplicate values before the sum, average, or count is calculated. DISTINCT isnt meaningful with MIN() and MAX(); you can use it, but it wont change the result. You cant use DISTINCT with COUNT(*).
To calculate the sum of a set of distinct values:
Type:
SUM(DISTINCT expr)
expr is a column name, literal, or numeric expression. The results data type is at least as precise as the most precise data type used in expr.
To calculate the average of a set of distinct values:
Type:
AVG(DISTINCT expr)
expr is a column name, literal, or numeric expression. The results data type is at least as precise as the most precise data type used in expr.
To count distinct non-null rows:
Type:
COUNT(DISTINCT expr)
expr is a column name, literal, or expression. The result is an integer greater than or equal to zero.
The queries in Listing 6.6 return the count, sum, and average of book prices. The non-DISTINCT and DISTINCT results in Figure 6.6 differ because the DISTINCT results eliminate the duplicates of prices $12.99 and $19.95 from calculations.
Listing 6.6 Some DISTINCT aggregate queries. See Figure 6.6 for the results.
SELECT COUNT(*) AS "COUNT(*)", COUNT(price) AS "COUNT(price)", SUM(price) AS "SUM(price)", AVG(price) AS "AVG(price)" FROM titles; SELECT COUNT(DISTINCT price) AS "COUNT(DISTINCT)", SUM(DISTINCT price) AS "SUM(DISTINCT)", AVG(DISTINCT price) AS "AVG(DISTINCT)" FROM titles;
Figure 6.6 Results of Listing 6.6.
TIPS
DISTINCT in a SELECT clause andDISTINCT in an aggregate function dont return the same result.
The three queries in Listing 6.7 count the author IDs in the tabletitle_authors. Figure 6.7 shows theresults. The first query counts all the author IDs in the table. The second query returns the same result as the first query because COUNT() already has done its work and returned a value in single row before DISTINCT is applied. In the third query, DISTINCT is applied to the author IDs before COUNT() starts counting.
Listing 6.7 DISTINCT in a SELECT clause and DISTINCT in an aggregate function differ in meaning. See Figure 6.7 for the results.
SELECT COUNT(au_id) AS "COUNT(au_id)" FROM title_authors; SELECT DISTINCT COUNT(au_id) AS "DISTINCT COUNT(au_id)" FROM title_authors; SELECT COUNT(DISTINCT au_id) AS "COUNT(DISTINCT au_id)" FROM title_authors;
Figure 6.7 Results of Listing 6.7.
Mixing non-DISTINCT and DISTINCT aggregates in the same SELECT clause can produce misleading results.
The four queries in Listing 6.8 (following page) show the four combinations of non-DISTINCT and DISTINCT sums and counts. Of the four results in Figure 6.8 (following page) , only the first result (no DISTINCTs) and final result (all DISTINCTs) are consistent mathematically, which you can verify with AVG(price) and AVG(DISTINCTprice). In the second and third queries (mixed non-DISTINCTs and DISTINCTs), you cant calculate a valid average by dividing the sum by the count.
Listing 6.8 Mixing non-DISTINCT and DISTINCT aggregates in the same SELECT clause can produce misleading results. See Figure 6.8 for the results.
SELECT COUNT(price) AS "COUNT(price)", SUM(price) AS "SUM(price)" FROM titles; SELECT COUNT(price) AS "COUNT(price)", SUM(DISTINCT price) AS "SUM(DISTINCT price)" FROM titles; SELECT COUNT(DISTINCT price) AS "COUNT(DISTINCT price)", SUM(price) AS "SUM(price)" FROM titles; SELECT COUNT(DISTINCT price) AS "COUNT(DISTINCT price)", SUM(DISTINCT price) AS "SUM(DISTINCT price)" FROM titles;
Figure 6.8 Results of Listing 6.8.
Microsoft Access doesnt support DISTINCT aggregate functions. This statement, for example, is illegal in Access:
SELECT SUM(DISTINCT price) FROM titles; --Illegal in Access
But you can replicate it with this subquery (see the Tips in Using Subqueries as Column Expressions in Chapter 8):
SELECT SUM(price) FROM (SELECT DISTINCT price FROM titles);
This Access workaround wont let you mix non-DISTINCT and DISTINCT aggregates, however, as in the second and third queries in Listing 6.8.
In Microsoft SQL Server, if you use DISTINCT, expr must be a column name only. It cant include an arithmetic expression:
SELECT COUNT(DISTINCT price * sales) FROM titles; --Illegal in SQL Server
MySQL supports COUNT(DISTINCT expr) but not SUM(DISTINCT expr) and AVG(DISTINCT expr). Listings 6.6 and 6.8 wont run in MySQL.