Publishers of technology books, eBooks, and videos for creative people

Home > Articles

  • Print
  • + Share This
This chapter is from the book

This chapter is from the book

Aggregating Distinct Values with DISTINCT

You can use DISTINCT to eliminate duplicate values in aggregate function calculations; see “Eliminating Duplicate Rows with DISTINCT” in Chapter 4. The general syntax of an aggregate function is:

agg_func([ALL | DISTINCT] expr)

agg_func is MIN, MAX, SUM, AVG, or COUNT. expr is a column name, literal, or expression. ALL applies the aggregate function to all values, and DISTINCT specifies that each unique value is considered. ALL is the default and rarely is seen in practice.

With SUM(), AVG(), and COUNT(expr), DISTINCT eliminates duplicate values before the sum, average, or count is calculated. DISTINCT isn’t meaningful with MIN() and MAX(); you can use it, but it won’t change the result. You can’t use DISTINCT with COUNT(*).

To calculate the sum of a set of distinct values:

  • Type:

    SUM(DISTINCT expr)

    expr is a column name, literal, or numeric expression. The result’s data type is at least as precise as the most precise data type used in expr.

To calculate the average of a set of distinct values:

  • Type:

    AVG(DISTINCT expr)

    expr is a column name, literal, or numeric expression. The result’s data type is at least as precise as the most precise data type used in expr.

To count distinct non-null rows:

  • Type:

    COUNT(DISTINCT expr)

    expr is a column name, literal, or expression. The result is an integer greater than or equal to zero.

The queries in Listing 6.6 return the count, sum, and average of book prices. The non-DISTINCT and DISTINCT results in Figure 6.6 differ because the DISTINCT results eliminate the duplicates of prices $12.99 and $19.95 from calculations.

Listing 6.6 Some DISTINCT aggregate queries. See Figure 6.6 for the results.

SELECT
    COUNT(*) AS "COUNT(*)",
    COUNT(price) AS "COUNT(price)",
    SUM(price) AS "SUM(price)",
    AVG(price) AS "AVG(price)"
  FROM titles;


SELECT
    COUNT(DISTINCT price)
      AS "COUNT(DISTINCT)",
    SUM(DISTINCT price)
      AS "SUM(DISTINCT)",
    AVG(DISTINCT price)
      AS "AVG(DISTINCT)"
  FROM titles;

Figure 6.6Figure 6.6 Results of Listing 6.6.

TIPS

  • DISTINCT in a SELECT clause andDISTINCT in an aggregate function don’t return the same result.

    The three queries in Listing 6.7 count the author IDs in the tabletitle_authors. Figure 6.7 shows theresults. The first query counts all the author IDs in the table. The second query returns the same result as the first query because COUNT() already has done its work and returned a value in single row before DISTINCT is applied. In the third query, DISTINCT is applied to the author IDs before COUNT() starts counting.

    Listing 6.7 DISTINCT in a SELECT clause and DISTINCT in an aggregate function differ in meaning. See Figure 6.7 for the results.

    SELECT COUNT(au_id)
             AS "COUNT(au_id)"
      FROM title_authors;
    
    
    SELECT DISTINCT COUNT(au_id)
             AS "DISTINCT COUNT(au_id)"
      FROM title_authors;
    
    
    SELECT COUNT(DISTINCT au_id)
             AS "COUNT(DISTINCT au_id)"
      FROM title_authors;

    Figure 6.7Figure 6.7 Results of Listing 6.7.


  • Mixing non-DISTINCT and DISTINCT aggregates in the same SELECT clause can produce misleading results.

    The four queries in Listing 6.8 (following page) show the four combinations of non-DISTINCT and DISTINCT sums and counts. Of the four results in Figure 6.8 (following page) , only the first result (no DISTINCTs) and final result (all DISTINCTs) are consistent mathematically, which you can verify with AVG(price) and AVG(DISTINCTprice). In the second and third queries (mixed non-DISTINCTs and DISTINCTs), you can’t calculate a valid average by dividing the sum by the count.

    Listing 6.8 Mixing non-DISTINCT and DISTINCT aggregates in the same SELECT clause can produce misleading results. See Figure 6.8 for the results.

    SELECT
        COUNT(price)
          AS "COUNT(price)",
        SUM(price)
          AS "SUM(price)"
      FROM titles;
    
    
    SELECT
        COUNT(price)
          AS "COUNT(price)",
        SUM(DISTINCT price)
          AS "SUM(DISTINCT price)"
      FROM titles;
    
    
    SELECT
        COUNT(DISTINCT price)
          AS "COUNT(DISTINCT price)",
        SUM(price)
          AS "SUM(price)"
      FROM titles;
    
    
    SELECT
        COUNT(DISTINCT price)
          AS "COUNT(DISTINCT price)",
        SUM(DISTINCT price)
          AS "SUM(DISTINCT price)"
      FROM titles;

    Figure 6.8Figure 6.8 Results of Listing 6.8.


  • Microsoft Access doesn’t support DISTINCT aggregate functions. This statement, for example, is illegal in Access:

    SELECT SUM(DISTINCT price)
      FROM titles;   --Illegal in Access

    But you can replicate it with this subquery (see the Tips in “Using Subqueries as Column Expressions” in Chapter 8):

    SELECT SUM(price)
      FROM (SELECT DISTINCT price 
              FROM titles);

    This Access workaround won’t let you mix non-DISTINCT and DISTINCT aggregates, however, as in the second and third queries in Listing 6.8.

    In Microsoft SQL Server, if you use DISTINCT, expr must be a column name only. It can’t include an arithmetic expression:

    SELECT COUNT(DISTINCT price * sales) 
      FROM titles; --Illegal in SQL Server

    MySQL supports COUNT(DISTINCT expr) but not SUM(DISTINCT expr) and AVG(DISTINCT expr). Listings 6.6 and 6.8 won’t run in MySQL.

  • + Share This
  • 🔖 Save To Your Account