Summarizing and Grouping Data in SQL
The preceding chapter described scalar functions, which operate on individual row values. This chapter introduces SQLs aggregate functions, or set functions, which operate on a group of values to produce a single, summarizing value. You apply an aggregate to a set of rows, which may be:
All the rows in a table
Only those rows specified by a WHERE clause
Those rows created by a GROUP BY clause
No matter how many rows the set contains, an aggregate function returns a single statistic: a sum, minimum, or average, for example.
In this chapter, Ill also cover the SELECT statements GROUP BY clause, which groups rows, and HAVING clause, which filters groups.
Using Aggregate Functions
Table 6.1 lists SQLs standard aggregate functions. The important characteristics of the aggregate functions are:
Table 6.1 Aggregate Functions
Function | Returns |
MIN(expr) | Minimum value in expr |
MAX(expr) | Maximum value in expr |
SUM(expr) | Sum of the values in expr |
AVG(expr) | Average (arithmetic mean) of the values in expr |
COUNT(expr) | The number of non-null values in expr |
COUNT(*) | The number of rows in a table or set |
In Table 6.1, expr often is a column name, but it also can be a literal, function, or any combination of column names, literals, and functions coupled by operators.
SUM() and AVG() work with only numeric data types. MIN() and MAX() work with characters, numeric, and datetime datatypes. COUNT(expr) and COUNT(*) workwith all data types.
All aggregate functions except COUNT(*) ignore nulls. (You can use COALESCE() in an aggregate function argument to substitute a value for a null; see Checking for Nulls with COALESCE() in Chapter 5.)
COUNT(expr) and COUNT(*) never returnnull but return either a positive integer or zero. The other aggregate functions return null if the set contains no rows or contains rows with only nulls.
Use the DISTINCT keyword to aggregate distinct values; see Aggregating Distinct Values with DISTINCT later in this chapter.
Aggregate functions often are used with the GROUP BY clause; see Grouping Rows with GROUP BY later in this chapter.
Use a WHERE clause to restrict the rows used in aggregate calculations; see Filtering Rows with WHERE in Chapter 4.
Default column headings for aggregate expressions vary by DBMS; use AS to name the result column. See Creating Column Aliases with AS in Chapter 4.
An aggregate expression cant appear in a WHERE clause. If you want to find the title of the book with the highest sales, you cant use:
SELECT title_id --Illegal FROM titles WHERE sales = MAX(sales);
You cant mix nonaggregate (row-by-row) and aggregate expressions in a SELECT clause. A SELECT clause must contain either all nonaggregate expressions or all aggregate expressions. If you want to find the title of the book with the highest sales, you cant use:
SELECT title_id, MAX(sales) FROM titles; --Illegal
The one exception to this rule is that you can mix nonaggregate and aggregate expressions for grouping columns (see Grouping Rows with GROUP BY later in this chapter):
SELECT type, SUM(sales) FROM titles GROUP BY type; --Legal
You can use more than one aggregate expression in a SELECT clause:
SELECT MIN(sales), MAX(sales) FROM titles; --Legal
You cant nest aggregate functions:
SELECT SUM(AVG(sales)) FROM titles; --Illegal
You can use aggregate expressions in sub-queries. This statement finds the title of the book with the highest sales:
SELECT title_id, price --Legal FROM titles WHERE sales = (SELECT MAX(sales) FROM titles);
You cant use subqueries (see Chapter 8) in aggregate expressions: AVG(SELECT price FROM titles) is illegal.
TIPS
Oracle lets you nest aggregate expressions in GROUP BY queries. The following example calculates the average of the maximum sales of all book types. Oracle evaluates the inner aggregate MAX(sales) for the grouping column type and then aggregates the results again:
SELECT AVG(MAX(sales)) FROM titles GROUP BY type; --Legal in Oracle
MySQL 4.0 and earlier versions dont support subqueries.
DBMSes provide additional aggregate functions to calculate other statistics, such as the standard deviation; search your DBMS documentation for aggregate functions or group functions.