Publishers of technology books, eBooks, and videos for creative people

# Summarizing and Grouping Data in SQL

• Print
This sample chapter introduces SQL’s aggregate functions, or set functions, which operate on a group of values to produce a single, summarizing value.
This chapter is from the book

### This chapter is from the book 

The preceding chapter described scalar functions, which operate on individual row values. This chapter introduces SQL’s aggregate functions, or set functions, which operate on a group of values to produce a single, summarizing value. You apply an aggregate to a set of rows, which may be:

• All the rows in a table

• Only those rows specified by a WHERE clause

• Those rows created by a GROUP BY clause

No matter how many rows the set contains, an aggregate function returns a single statistic: a sum, minimum, or average, for example.

In this chapter, I’ll also cover the SELECT statement’s GROUP BY clause, which groups rows, and HAVING clause, which filters groups.

## Using Aggregate Functions

Table 6.1 lists SQL’s standard aggregate functions. The important characteristics of the aggregate functions are:

#### Table 6.1 Aggregate Functions

 Function Returns MIN(expr) Minimum value in expr MAX(expr) Maximum value in expr SUM(expr) Sum of the values in expr AVG(expr) Average (arithmetic mean) of the values in expr COUNT(expr) The number of non-null values in expr COUNT(*) The number of rows in a table or set
• In Table 6.1, expr often is a column name, but it also can be a literal, function, or any combination of column names, literals, and functions coupled by operators.

• SUM() and AVG() work with only numeric data types. MIN() and MAX() work with characters, numeric, and datetime datatypes. COUNT(expr) and COUNT(*) workwith all data types.

• All aggregate functions except COUNT(*) ignore nulls. (You can use COALESCE() in an aggregate function argument to substitute a value for a null; see “Checking for Nulls with COALESCE()” in Chapter 5.)

• COUNT(expr) and COUNT(*) never returnnull but return either a positive integer or zero. The other aggregate functions return null if the set contains no rows or contains rows with only nulls.

• Use the DISTINCT keyword to aggregate distinct values; see “Aggregating Distinct Values with DISTINCT” later in this chapter.

• Aggregate functions often are used with the GROUP BY clause; see “Grouping Rows with GROUP BY” later in this chapter.

• Use a WHERE clause to restrict the rows used in aggregate calculations; see “Filtering Rows with WHERE” in Chapter 4.

• Default column headings for aggregate expressions vary by DBMS; use AS to name the result column. See “Creating Column Aliases with AS” in Chapter 4.

• An aggregate expression can’t appear in a WHERE clause. If you want to find the title of the book with the highest sales, you can’t use:

SELECT title_id           --Illegal
FROM titles
WHERE sales = MAX(sales);
• You can’t mix nonaggregate (row-by-row) and aggregate expressions in a SELECT clause. A SELECT clause must contain either all nonaggregate expressions or all aggregate expressions. If you want to find the title of the book with the highest sales, you can’t use:

SELECT title_id, MAX(sales)
FROM titles;              --Illegal
• The one exception to this rule is that you can mix nonaggregate and aggregate expressions for grouping columns (see “Grouping Rows with GROUP BY” later in this chapter):

SELECT type, SUM(sales)
FROM titles
GROUP BY type;            --Legal
• You can use more than one aggregate expression in a SELECT clause:

SELECT MIN(sales), MAX(sales)
FROM titles;              --Legal
• You can’t nest aggregate functions:

SELECT SUM(AVG(sales))
FROM titles;              --Illegal
• You can use aggregate expressions in sub-queries. This statement finds the title of the book with the highest sales:

SELECT title_id, price    --Legal
FROM titles
WHERE sales =
(SELECT MAX(sales) FROM titles);
• You can’t use subqueries (see Chapter 8) in aggregate expressions: AVG(SELECT price FROM titles) is illegal.

TIPS

• Oracle lets you nest aggregate expressions in GROUP BY queries. The following example calculates the average of the maximum sales of all book types. Oracle evaluates the inner aggregate MAX(sales) for the grouping column type and then aggregates the results again:

SELECT AVG(MAX(sales))
FROM titles
GROUP BY type;   --Legal in Oracle
• MySQL 4.0 and earlier versions don’t support subqueries.

DBMSes provide additional aggregate functions to calculate other statistics, such as the standard deviation; search your DBMS documentation for aggregate functions or group functions.