## Aggregating Distinct Values with `DISTINCT`

You can use `DISTINCT `to eliminate duplicate values in aggregate function calculations; see “Eliminating Duplicate Rows with `DISTINCT`” in Chapter 4. The general syntax of an aggregate function is:

agg_func([ALL | DISTINCT]expr)

*agg_func *is `MIN`, `MAX`, `SUM`, `AVG`, or `COUNT`. *expr* is a column name, literal, or expression. `ALL `applies the aggregate function to all values, and `DISTINCT `specifies that each unique value is considered. `ALL `is the default and rarely is seen in practice.

With `SUM()`, `AVG()`, and `COUNT( expr)`,

`DISTINCT`eliminates duplicate values before the sum, average, or count is calculated.

`DISTINCT`isn’t meaningful with

`MIN()`and

`MAX()`; you can use it, but it won’t change the result. You can’t use

`DISTINCT`with

`COUNT(*)`.

#### To calculate the sum of a set of distinct values:

Type:

SUM(DISTINCT

*expr*)*expr*is a column name, literal, or numeric expression. The result’s data type is at least as precise as the most precise data type used in*expr.*

#### To calculate the average of a set of distinct values:

Type:

AVG(DISTINCT

*expr*)*expr*is a column name, literal, or numeric expression. The result’s data type is at least as precise as the most precise data type used in*expr.*

#### To count distinct non-null rows:

Type:

COUNT(DISTINCT

*expr*)*expr*is a column name, literal, or expression. The result is an integer greater than or equal to zero.

The queries in **Listing 6.6 **return the count, sum, and average of book prices. The non-`DISTINCT `and `DISTINCT `results in **Figure 6.6** differ because the `DISTINCT `results eliminate the duplicates of prices $12.99 and $19.95 from calculations.

#### Listing 6.6 Some `DISTINCT` aggregate queries. See Figure 6.6 for the results.

SELECT COUNT(*) AS "COUNT(*)", COUNT(price) AS "COUNT(price)", SUM(price) AS "SUM(price)", AVG(price) AS "AVG(price)" FROM titles; SELECT COUNT(DISTINCT price) AS "COUNT(DISTINCT)", SUM(DISTINCT price) AS "SUM(DISTINCT)", AVG(DISTINCT price) AS "AVG(DISTINCT)" FROM titles;

**Figure 6.6 Results of Listing 6.6.**

**TIPS**

`DISTINCT`in a`SELECT`clause and`DISTINCT`in an aggregate function don’t return the same result.The three queries in

**Listing 6.7**count the author IDs in the table`title_authors`.**Figure 6.7**shows theresults. The first query counts all the author IDs in the table. The second query returns the same result as the first query because`COUNT()`already has done its work and returned a value in single row before`DISTINCT`is applied. In the third query,`DISTINCT`is applied to the author IDs before`COUNT()`starts counting.#### Listing 6.7

`DISTINCT`in a`SELECT`clause and`DISTINCT`in an aggregate function differ in meaning. See Figure 6.7 for the results.SELECT COUNT(au_id) AS "COUNT(au_id)" FROM title_authors; SELECT DISTINCT COUNT(au_id) AS "DISTINCT COUNT(au_id)" FROM title_authors; SELECT COUNT(DISTINCT au_id) AS "COUNT(DISTINCT au_id)" FROM title_authors;

**Figure 6.7 Results of Listing 6.7.**Mixing non-

`DISTINCT`and`DISTINCT`aggregates in the same`SELECT`clause can produce misleading results.The four queries in

**Listing 6.8**(following page) show the four combinations of non-`DISTINCT`and`DISTINCT`sums and counts. Of the four results in**Figure 6.8**(following page) , only the first result (no`DISTINCT`s) and final result (all`DISTINCT`s) are consistent mathematically, which you can verify with`AVG(price)`and`AVG(DISTINCTprice)`. In the second and third queries (mixed non-`DISTINCT`s and`DISTINCT`s), you can’t calculate a valid average by dividing the sum by the count.#### Listing 6.8 Mixing non-

`DISTINCT`and`DISTINCT`aggregates in the same`SELECT`clause can produce misleading results. See Figure 6.8 for the results.SELECT COUNT(price) AS "COUNT(price)", SUM(price) AS "SUM(price)" FROM titles; SELECT COUNT(price) AS "COUNT(price)", SUM(DISTINCT price) AS "SUM(DISTINCT price)" FROM titles; SELECT COUNT(DISTINCT price) AS "COUNT(DISTINCT price)", SUM(price) AS "SUM(price)" FROM titles; SELECT COUNT(DISTINCT price) AS "COUNT(DISTINCT price)", SUM(DISTINCT price) AS "SUM(DISTINCT price)" FROM titles;

**Figure 6.8 Results of Listing 6.8.****Microsoft Access**doesn’t support`DISTINCT`aggregate functions. This statement, for example, is illegal in Access:SELECT SUM(DISTINCT price) FROM titles; --Illegal in Access

But you can replicate it with this subquery (see the Tips in “Using Subqueries as Column Expressions” in Chapter 8):

SELECT SUM(price) FROM (SELECT DISTINCT price FROM titles);

This Access workaround won’t let you mix non-

`DISTINCT`and`DISTINCT`aggregates, however, as in the second and third queries in Listing 6.8.In

**Microsoft SQL Server**, if you use`DISTINCT`,*expr*must be a column name only. It can’t include an arithmetic expression:SELECT COUNT(DISTINCT price * sales) FROM titles; --Illegal in SQL Server

**MySQL**supports`COUNT(DISTINCT`but not*expr*)`SUM(DISTINCT`and*expr*)`AVG(DISTINCT`. Listings 6.6 and 6.8 won’t run in MySQL.*expr*)