Median and Mean Absolute Error
Interactive histogram with mean absolute error graph
Recall also that in our general notation, we have a data set with n points arranged in a requency distribution with k classes. The class mark of the i'th class is denoted xi; the frequency of the i'th class is denoted fi and the relative frequency of th i'th class is denoted pi = fi / n.
Recall that the median is the value that is half way through the ordered data set. Specifically, if n is odd then the median is xj where j is the smallest integer satisfying
the value with rank (n + 1)/2; if n is even the median is (xj + xl)/2 where j and l are the smallest integers satisfying
A measure of center and the corresponding measure of spread are sometimes best thought of in the context of an error function. Generally, the error function gives a measure of the overall error when a number t is used to represent the entire distribution. Thus, the best measure of center, relative to this function, is the value of t that minimizes the error function, and the minimum value of the error function is the corresponding measure of spread.
In the previous section, for example, we saw that if we start with the mean square error function, then the best measure of center is the mean and the minimum error is the variance. If we start with the root mean square error function, then the best measure of center is again the mean, but the minimum error is the standard deviation.
In this section, we will explore an error function that seems very natural at first, and indeed is related to the median, but upon closer inspection has some definite drawbacks. The main point of this section is that the mean square error function has very special properties that makes it the compelling choice. It is important that you understand this point, because other mean square error functions occur throughout statistics.
Mean Absolute Error
The mean absolute error function is given by
As the name suggests, the mean absolute error is a weighted average of the absolute errors, with the relative frequencies as the weight factors.
Recall also that we can think of the relative frequency distribution as the probability distribution of a random variable X that gives the mark of the class containing a randomly chosen value from the data set. With this interpretation, the MSE(t) is the first absolute moment of X about t:
MAE(t) = E[|X - t|]
MAE(t) may seem to be the simplest measure of overall error when t is used to represent the distribution.
As before, you can construct a frequency distribution and histogram for a continuous variable x by clicking on the horizontal axis from 0.1 to 5.0. In the applet above, when you click on points in the left graph to generate the distribution, MAE is shown in the right graph.
1. Note that MAE(t) is a continuous function of t for a fixed data set (that is, for given values of xi and pi) and its graph is composed of line segments.
2. In the applet, click on two distinct points to generate a distribution with two distinct points. Note the shape of the MAE graph.
3. Explicitly compute MAE(t) for the distribution in Exercise 2 and show that you get the same function as the one graphed in the applet.
Exercises 2 and 3 show a serious flaw in the mean absolute error function--in general, there does not exist a unique value of t minimizing MAE(t)!
4. Click on additional points to generate a more complicated distribution. Note how the shape of the MAE graph changes as you add points. Try to formulate a conjecture about the set of t values that minimize MAE(t).
In Exercise 4, you should have observed the following general behavior of the mean absolute error function: If the number of points n is odd, then the median xj (in the notation above) is the unique value of t that minimizes MAE(t). However, if n is even, then the set of values minimizing MAE(t) is the "median interval" [xj, xl]. If xj = xl, then once again the median is the unique value of t minimizing MAE(t). However if xj and xl are different, then the median
(xj + xl) / 2
has no better claim as the center of the distribution than any other point in the median interval!
The minimum value of MAE is referred to as the mean absolute deviation or MAD.
In the applet, the median ± MAD is drawn in the histogram, analogous to the mean ± standard deviation bar in the previous section. In the graph of the MAE function, a vertical red line is drawn from the median on the x-axis to the graph of MAE; the height of this line is the MAD.
5. Reset the applet and click on points to generate a distribution. Note the general behavior of the MAE function described in the previous paragraph.
6. Try to prove algebraically that the MAE function has the behavior described above.
7. Construct a distribution of each of the types indicated below. In each case, note the position and size of the boxplot and the shape of the MAE graph.