# Summarizing and Grouping Data in SQL

The preceding chapter described *scalar functions, *which operate on individual row values. This chapter introduces SQL’s *aggregate functions, *or *set functions, *which operate on a group of values to produce a single, summarizing value. You apply an aggregate to a set of rows, which may be:

All the rows in a table

Only those rows specified by a

`WHERE`clauseThose rows created by a

`GROUP BY`clause

No matter how many rows the set contains, an aggregate function returns a single statistic: a sum, minimum, or average, for example.

In this chapter, I’ll also cover the `SELECT `statement’s `GROUP BY `clause, which groups rows, and `HAVING `clause, which filters groups.

## Using Aggregate Functions

**Table 6.1 **lists SQL’s standard aggregate functions. The important characteristics of the aggregate functions are:

#### Table 6.1 Aggregate Functions

Function |
Returns |

MIN(expr) |
Minimum value in expr |

MAX(expr) |
Maximum value in expr |

SUM(expr) |
Sum of the values in expr |

AVG(expr) |
Average (arithmetic mean) of the values in expr |

COUNT(expr) |
The number of non-null values in expr |

COUNT(*) |
The number of rows in a table or set |

In Table 6.1,

*expr*often is a column name, but it also can be a literal, function, or any combination of column names, literals, and functions coupled by operators.`SUM()`and`AVG()`work with only numeric data types.`MIN()`and`MAX()`work with characters, numeric, and datetime datatypes.`COUNT(`and*expr*)`COUNT(*)`workwith all data types.All aggregate functions except

`COUNT(*)`ignore nulls. (You can use`COALESCE()`in an aggregate function argument to substitute a value for a null; see “Checking for Nulls with`COALESCE()`” in Chapter 5.)`COUNT(`and*expr*)`COUNT(*)`never returnnull but return either a positive integer or zero. The other aggregate functions return null if the set contains no rows or contains rows with only nulls.Use the

`DISTINCT`keyword to aggregate distinct values; see “Aggregating Distinct Values with`DISTINCT`” later in this chapter.Aggregate functions often are used with the

`GROUP BY`clause; see “Grouping Rows with`GROUP BY`” later in this chapter.Use a

`WHERE`clause to restrict the rows used in aggregate calculations; see “Filtering Rows with`WHERE`” in Chapter 4.Default column headings for aggregate expressions vary by DBMS; use

`AS`to name the result column. See “Creating Column Aliases with`AS`” in Chapter 4.An aggregate expression

*can’t*appear in a`WHERE`clause. If you want to find the title of the book with the highest sales, you*can’t*use:SELECT title_id --Illegal FROM titles WHERE sales = MAX(sales);

You

*can’t*mix nonaggregate (row-by-row) and aggregate expressions in a`SELECT`clause. A`SELECT`clause must contain either all nonaggregate expressions or all aggregate expressions. If you want to find the title of the book with the highest sales, you*can’t*use:SELECT title_id, MAX(sales) FROM titles; --Illegal

The one exception to this rule is that you

*can*mix nonaggregate and aggregate expressions for grouping columns (see “Grouping Rows with`GROUP BY`” later in this chapter):SELECT type, SUM(sales) FROM titles GROUP BY type; --Legal

You

*can*use more than one aggregate expression in a`SELECT`clause:SELECT MIN(sales), MAX(sales) FROM titles; --Legal

You

*can’t*nest aggregate functions:SELECT SUM(AVG(sales)) FROM titles; --Illegal

You

*can*use aggregate expressions in sub-queries. This statement finds the title of the book with the highest sales:SELECT title_id, price --Legal FROM titles WHERE sales = (SELECT MAX(sales) FROM titles);

You

*can’t*use subqueries (see Chapter 8) in aggregate expressions:`AVG(SELECT price FROM titles)`is illegal.**TIPS****Oracle**lets you nest aggregate expressions in`GROUP BY`queries. The following example calculates the average of the maximum sales of all book types. Oracle evaluates the inner aggregate`MAX(sales)`for the grouping column type and then aggregates the results again:SELECT AVG(MAX(sales)) FROM titles GROUP BY type; --Legal in Oracle

**MySQL**4.0 and earlier versions don’t support subqueries.DBMSes provide additional aggregate functions to calculate other statistics, such as the standard deviation; search your DBMS documentation for

*aggregate functions*or*group functions.*