- The Eternal Beta
- Don't Listen to What People Say, Look at What They Do
- Collecting Data
- AB Testing
- Statistics: The Analyst's Toolbox
- The Art of Interpretation
- Gather Data, Test, Analyze

## Statistics: The Analyst’s Toolbox

*Statistics*—the manipulation and interpretation of data—is a large and complex area of mathematics that is the basis for analytics in F2P games. The multitude of tools, such as formulae and methods of data interpretation, is often bewildering to nonmathematicians.

For this reason, many leading F2P companies have extended their recruitment to city traders and other professional statisticians to fill analyst positions—a role previously unheard of in games. The considerable understanding of these experts provides deeper insight into the data of your games, highlighting correlations that might otherwise be missed or misunderstood.

Although a full and complete explanation of all of the tools used by statisticians is outside the scope of this, and almost any book, there are a few terms and techniques you should be aware of.

### Averages: Mean, Mode and Median

*Averages*—the typical amount in a data sample—are one of the most simple but useful tools that an analyst can use. When people talk about an average, they are commonly referring to the mean average: taking the sum of all the data and dividing by the sample size.

For example, if your game had 500,000 players in a given day and it made $25,000 in revenue, the mean average is the sum of the data ($25,000) divided by the sample size (500,000).

$25,000 / 500,000 = $0.05 ARPDAU

Although the actual revenue or other data from a single player in isolation will vary greatly (the amount by which is known as a *range*), the mean average will tell you the outcome you can expect to attribute to each player when considered in a group.

The mode and median averages are a bit less common, however. *Mode* is the most frequently occurring data value in a list and the *median* is the value found in the exact middle of a data set ordered from lowest to highest.

For example, if your $25,000 revenue came from three IAPs—5,000 sales at $1, 4,000 sales at $3 and 1,000 sales at $8—the mode IAP purchase would be $1 because it is the most commonly occurring value at 5,000 units. The mode tells you which option is most popular and therefore is most likely to occur when you consider a single purchase.

To calculate the median, however, you must first ascertain the middle value. You could eliminate the highest and lowest values until you are left with one value, which is the median. But in some cases, as in the preceding example, you will be left with two values. Here’s why. There are 10,000 samples, so the median is between sample 5,000 ($1) and 5,001 ($3). In this instance the median value is the mean average of these two samples. Therefore, the median sales price is $2. Knowing the median allows you to understand where a sample sits in a data set.

**“DATA IS DANGEROUS. ASK THE WRONG QUESTION AND YOU’LL GET THE WRONG ANSWER, STEERING YOUR GAME DEVELOPMENT INTO TROUBLE.”***—HENRIQUE OLIFIERS, GAMER-IN-CHIEF, BOSSA STUDIOS*

### Causation and Variables

Proving *causation*—that one factor has a distinct and provable effect upon another—is the central purpose of analytics. Causation is what makes your hypothesis either fit the behavior of your players or prove to be wildly wrong.

Often, the aim of causation is to find a link between a *dependent variable* and an *independent variable*. For instance, you could consider an output as a dependent variable, such as the number of players buying an IAP, and consider an input as an independent variable, such as an IAP’s price. When a dependent variable changes in relation to an independent variable, there is causation and a basis for a hypothesis. This link can be described by using a technique called regression analysis.

### Regression Analysis

*Regression analysis* is a set of statistical techniques that estimates the relationship between variables. Regression analysis can build a model of, for instance, the links between price and sales of an IAP and therefore predict the price point that will return maximum revenue. It is commonly carried out by humans, but in some cases can be somewhat automated in analytics software.

For example, imagine you have tested price and recorded the subsequent sales of an IAP in a multivariate test at $0.99, $1.99, $2.99, $6.99, $9.99 and $19.99 (**Figure 4.4**).

Figure 4.4. *IAP Price-Sales graph.*

From the data, you could suggest that the sales of IAPs (the dependent variable) decrease as price (the independent variable) increases. Specifically, the manner in which the drop occurs is an example of exponential decay. You could then predict and model sales at each dollar increment (**Figure 4.5**) using your own formula.

Figure 4.5.*Predicted Price-Sales graph.*

Using that data, you could predict revenue by multiplying sales by price, thereby finding the price that would produce the maximum revenue (**Figure 4.6**).

Figure 4.6. *Predicted Price-Revenue graph.*

Although this is a very simple example, it does show that when regression analysis is used well, as with other tools of analytics, it enables you to have a greater understanding of player behavior. In turn, this information can be interpreted to serve your players via better games.