Sie sind auf Seite 1von 21

Time series

From Wikipedia, the free encyclopedia

Jump to: navigation, search Time series: random data plus trend, with best-fit line and different smoothings In statistics, signal processing, econometrics and mathematical finance, a time series is a sequence of data points, measured typically at successive times spaced at uniform time intervals. Examples of time series are the daily closing value of the Dow Jones index or the annual flow volume of the Nile River at Aswan. Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series forecasting is the use of a model to forecast future events based on known past events: to predict data points before they are measured. An example of time series forecasting in econometrics is predicting the opening price of a stock based on its past performance. Time series are very frequently plotted via line charts. Time series data have a natural temporal ordering. This makes time series analysis distinct from other common data analysis problems, in which there is no natural ordering of the observations (e.g. explaining people's wages by reference to their education level, where the individuals' data could be entered in any order). Time series analysis is also distinct from spatial data analysis where the observations typically relate to geographical locations (e.g. accounting for house prices by the location as well as the intrinsic characteristics of the houses). A time series model will generally reflect the fact that observations close together in time will be more closely related than observations further apart. In addition, time series models will often make use of the natural one-way ordering of time so that values for a given period will be expressed as deriving in some way from past values, rather than from future values (see time reversibility.) Methods for time series analysis may be divided into two classes: frequency-domain methods and time-domain methods. The former include auto-correlation, cross-correlation analysis, spectral analysis and recently wavelet analysis; auto-correlation and cross-correlation analysis can also be completed in the time domain.


1 Analysis o 1.1 General exploration o 1.2 Description o 1.3 Prediction and forecasting 2 Models o 2.1 Notation o 2.2 Conditions o 2.3 Models 3 Related tools

y y y y

4 See also 5 References 6 Further reading 7 External links

[edit] Analysis
There are several types of data analysis available for time series which are appropriate for different purposes.

[edit] General exploration

y y y

Graphical examination of data series Autocorrelation analysis to examine serial dependence Spectral analysis to examine cyclic behaviour which need not be related to seasonality. For example, sun spot activity varies over 11 year cycles.[1][2] Other common examples include celestial phenomena, weather patterns, neural activity, commodity prices, and economic activity.

[edit] Description
y y

Separation into components representing trend, seasonality, slow and fast variation, cyclical irregular: see decomposition of time series Simple properties of marginal distributions

[edit] Prediction and forecasting


Fully formed statistical models for stochastic simulation purposes, so as to generate alternative versions of the time series, representing what might happen over non-specific time-periods in the future Simple or fully formed statistical models to describe the likely outcome of the time series in the immediate future, given knowledge of the most recent outcomes (forecasting).

[edit] Models
Models for time series data can have many forms and represent different stochastic processes. When modeling variations in the level of a process, three broad classes of practical importance are the autoregressive (AR) models, the integrated (I) models, and the moving average (MA) models. These three classes depend linearly[3] on previous data points. Combinations of these ideas produce autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA) models. The autoregressive fractionally integrated moving average (ARFIMA) model generalizes the former three. Extensions of these classes to deal with vectorvalued data are available under the heading of multivariate time-series models and sometimes the preceding acronyms are extended by including an initial "V" for "vector". An additional set of

extensions of these models is available for use where the observed time-series is driven by some "forcing" time-series (which may not have a causal effect on the observed series): the distinction from the multivariate case is that the forcing series may be deterministic or under the experimenter's control. For these models, the acronyms are extended with a final "X" for "exogenous". Non-linear dependence of the level of a series on previous data points is of interest, partly because of the possibility of producing a chaotic time series. However, more importantly, empirical investigations can indicate the advantage of using predictions derived from non-linear models, over those from linear models. Among other types of non-linear time series models, there are models to represent the changes of variance along time (heteroskedasticity). These models are called autoregressive conditional heteroskedasticity (ARCH) and the collection comprises a wide variety of representation (GARCH, TARCH, EGARCH, FIGARCH, CGARCH, etc). Here changes in variability are related to, or predicted by, recent past values of the observed series. This is in contrast to other possible representations of locally varying variability, where the variability might be modelled as being driven by a separate time-varying process, as in a doubly stochastic model. In recent work on model-free analyses, wavelet transform based methods (for example locally stationary wavelets and wavelet decomposed neural networks) have gained favor. Multiscale (often referred to as multiresolution) techniques decompose a given time series, attempting to illustrate time dependence at multiple scales.

[edit] Notation
A number of different notations are in use for time-series analysis. A common notation specifying a time series X that is indexed by the natural numbers is written X = {X1, X2, ...}. Another common notation is Y = {Yt: t T},

where T is the index set.

[edit] Conditions
There are two sets of conditions under which much of the theory is built:
y y

Stationary process Ergodicity

However, ideas of stationarity must be expanded to consider two important ideas: strict stationarity and second-order stationarity. Both models and applications can be developed under

each of these conditions, although the models in the latter case might be considered as only partly specified. In addition, time-series analysis can be applied where the series are seasonally stationary or nonstationary. Situations where the amplitudes of frequency components change with time can be dealt with in time-frequency analysis which makes use of a timefrequency representation of a time-series or signal.[4]

[edit] Models
Main article: Autoregressive model The general representation of an autoregressive model, well-known as AR(p), is where the term t is the source of randomness and is called white noise. It is assumed to have the following characteristics: 1. 2. 3. With these assumptions, the process is specified up to second-order moments and, subject to conditions on the coefficients, may be second-order stationary. If the noise also has a normal distribution, it is called normal white noise (denoted here by Normal-WN): In this case the AR process may be strictly stationary, again subject to conditions on the coefficients.

From Wikipedia, the free encyclopedia Jump to: navigation, search This article is about the statistical concept. For other uses, see Mean (disambiguation).

In statistics, mean has two related meanings:


the arithmetic mean (and is distinguished from the geometric mean or harmonic mean).

the expected value of a random variable, which is also called the population mean.

There are other statistical measures that use samples that some people confuse with averages including 'median' and 'mode'. Other simple statistical analyses use measures of spread, such as range, interquartile range, or standard deviation. For a real-valued random variable X, the mean is the expectation of X. Note that not every probability distribution has a defined mean (or variance); see the Cauchy distribution for an example. For a data set, the mean is the sum of the values divided by the number of values. The mean of a set of numbers x1, x2, ..., xn is typically denoted by , pronounced "x bar". This mean is a type of arithmetic mean. If the data set was based on a series of observations obtained by sampling a statistical population, this mean is termed the "sample mean" to distinguish it from the "population mean". The mean is often quoted along with the standard deviation: the mean describes the central location of the data, and the standard deviation describes the spread. An alternative measure of dispersion is the mean deviation, equivalent to the average absolute deviation from the mean. It is less sensitive to outliers, but less mathematically tractable. If a series of observations is sampled from a larger population (measuring the heights of a sample of adults drawn from the entire world population, for example), or from a probability distribution which gives the probabilities of each possible result, then the larger population or probability distribution can be used to construct a "population mean", which is also the expected value for a sample drawn from this population or probability distribution. For a finite population, this would simply be the arithmetic mean of the given property for every member of the population. For a probability distribution, this would be a sum or integral over every possible value weighted by the probability of that value. It is a universal convention to represent the population mean by the symbol .[1] In the case of a discrete probability distribution, the mean of a discrete random variable x is given by taking the product of each possible value of x and its probability P(x), and then adding all these products together, giving .[2] The sample mean may differ from the population mean, especially for small samples, but the law of large numbers dictates that the larger the size of the sample, the more likely it is that the sample mean will be close to the population mean.[3] As well as statistics, means are often used in geometry and analysis; a wide range of means have been developed for these purposes, which are not much used in statistics. These are listed below.


1 Examples of means o 1.1 Arithmetic mean (AM) o 1.2 Geometric mean (GM) o 1.3 Harmonic mean (HM) o 1.4 Relationship between AM, GM, and HM

y y y y

1.5 Generalized means  1.5.1 Power mean  1.5.2 -mean o 1.6 Weighted arithmetic mean o 1.7 Truncated mean o 1.8 Interquartile mean o 1.9 Mean of a function o 1.10 Mean of a Probability Distribution o 1.11 Mean of angles o 1.12 Frchet mean o 1.13 Other means 2 Properties o 2.1 Weighted mean o 2.2 Unweighted mean o 2.3 Convert unweighted mean to weighted mean o 2.4 Means of tuples of different sizes 3 Population and sample means 4 See also 5 References 6 External links

[edit] Examples of means

[edit] Arithmetic mean (AM)
Main article: Arithmetic mean

The arithmetic mean is the "standard" average, often simply called the "mean". The mean may often be confused with the median, mode or range. The mean is the arithmetic average of a set of values, or distribution; however, for skewed distributions, the mean is not necessarily the same as the middle value (median), or the most likely (mode). For example, mean income is skewed upwards by a small number of people with very large incomes, so that the majority have an income lower than the mean. By contrast, the median income is the level at which half the population is below and half is above. The mode income is the most likely income, and favors the larger number of people with lower incomes. The median or mode are often more intuitive measures of such data. Nevertheless, many skewed distributions are best described by their mean such as the exponential and Poisson distributions. For example, the arithmetic mean of six values: 34, 27, 45, 55, 22, 34 is

[edit] Geometric mean (GM)

Main article: Geometric mean

The geometric mean is an average that is useful for sets of positive numbers that are interpreted according to their product and not their sum (as is the case with the arithmetic mean) e.g. rates of growth. For example, the geometric mean of six values: 34, 27, 45, 55, 22, 34 is:

[edit] Harmonic mean (HM)

Main article: Harmonic mean

The harmonic mean is an average which is useful for sets of numbers which are defined in relation to some unit, for example speed (distance per unit of time). For example, the harmonic mean of the six values: 34, 27, 45, 55, 22, and 34 is

[edit] Relationship between AM, GM, and HM

Main article: Inequality of arithmetic and geometric means

AM, GM, and HM satisfy these inequalities: Equality holds only when all the elements of the given sample are equal.

[edit] Generalized means

[edit] Power mean

The generalized mean, also known as the power mean or Hlder mean, is an abstraction of the quadratic, arithmetic, geometric and harmonic means. It is defined for a set of n positive numbers xi by By choosing the appropriate value for the parameter m we get

m=2 m=1

quadratic mean, arithmetic mean, geometric mean,

m = 1 harmonic mean,

[edit] -mean

This can be generalized further as the generalized f-mean and again a suitable choice of an invertible
harmonic mean,

will give

(x) = xm

power mean,

(x) = lnx geometric mean.

[edit] Weighted arithmetic mean
The weighted arithmetic mean is used, if one wants to combine average values from samples of the same population with different sample sizes: The weights wi represent the bounds of the partial sample. In other applications they represent a measure for the reliability of the influence upon the mean by respective values.

[edit] Truncated mean

Sometimes a set of numbers might contain outliers, i.e. a datum which is much lower or much higher than the others. Often, outliers are erroneous data caused by artifacts. In this case one can use a truncated mean. It involves discarding given parts of the data at the top or the bottom end, typically an equal amount at each end, and then taking the arithmetic mean of the remaining data. The number of values removed is indicated as a percentage of total number of values.

[edit] Interquartile mean

The interquartile mean is a specific example of a truncated mean. It is simply the arithmetic mean after removing the lowest and the highest quarter of values. assuming the values have been ordered, so is simply a specific example of a weighted mean for a specific set of weights.

[edit] Mean of a function

In calculus, and especially multivariable calculus, the mean of a function is loosely defined as the average value of the function over its domain. In one variable, the mean of a function (x) over the interval (a,b) is defined by (See also mean value theorem.) In several variables, the mean over a relatively compact domain U in a Euclidean space is defined by This generalizes the arithmetic mean. On the other hand, it is also possible to generalize the geometric mean to functions by defining the geometric mean of to be More generally, in measure theory and probability theory either sort of mean plays an important role. In this context, Jensen's inequality places sharp estimates on the relationship between these two different notions of the mean of a function. There is also a harmonic average of functions and a quadratic average (or root mean square) of functions.

[edit] Mean of a Probability Distribution

See expected value

[edit] Mean of angles

Most of the usual means fail on circular quantities, like angles, daytimes, fractional parts of real numbers. For those quantities you need a mean of circular quantities.

[edit] Frchet mean

The Frchet mean gives a manner for determining the "center" of a mass distribution on a surface or, more generally, Riemannian manifold. Unlike many other means, the Frchet mean is defined on a space whose elements cannot necessarily be added together or multiplied by scalars. It is sometimes also known as the Karcher mean (named after Hermann Karcher).

[edit] Other means

y y y y y y y y y y y

Arithmetic-geometric mean Arithmetic-harmonic mean Cesro mean Chisini mean Contraharmonic mean Elementary symmetric mean Geometric-harmonic mean Heinz mean Heronian mean Identric mean Lehmer mean

y y y y y y y y

Logarithmic mean Median Moving average Root mean square Stolarsky mean Weighted geometric mean Weighted harmonic mean Rnyi's entropy (a generalized f-mean)

[edit] Properties
All means share some properties and additional properties are shared by the most common means. Some of these properties are collected here.

[edit] Weighted mean

A weighted mean M is a function which maps tuples of positive numbers to a positive number such that the following properties hold:
y y y

"Fixed point": M(1,1,...,1) = 1 Homogeneity: M( x1, ..., xn) = M(x1, ..., xn) for all and xi. In vector notation: M( x) = Mx for all n-vectors x. Monotonicity: If xi yi for each i, then Mx My

It follows
y y y

Boundedness: min x Mx max x Continuity: There are means which are not differentiable. For instance, the maximum number of a tuple is considered a mean (as an extreme case of the power mean, or as a special case of a median), but is not differentiable. All means listed above, with the exception of most of the Generalized f-means, satisfy the presented properties. o If f is bijective, then the generalized f-mean satisfies the fixed point property. o If f is strictly monotonic, then the generalized f-mean satisfy also the monotony property. o In general a generalized f-mean will miss homogeneity.

The above properties imply techniques to construct more complex means: If C, M1, ..., Mm are weighted means and p is a positive real number, then A and B defined by are also weighted means.

[edit] Unweighted mean

Intuitively spoken, an unweighted mean is a weighted mean with equal weights. Since our definition of weighted mean above does not expose particular weights, equal weights must be asserted by a different way. A different view on homogeneous weighting is, that the inputs can be swapped without altering the result. Thus we define M to be an unweighted mean if it is a weighted mean and for each permutation of inputs, the result is the same.
Symmetry: Mx = M( x) for all n-tuples and permutations on n-tuples.

Analogously to the weighted means, if C is a weighted mean and M1, ..., Mm are unweighted means and p is a positive real number, then A and B defined by are also unweighted means.

[edit] Convert unweighted mean to weighted mean

An unweighted mean can be turned into a weighted mean by repeating elements. This connection can also be used to state that a mean is the weighted version of an unweighted mean. Say you have the unweighted mean M and weight the numbers by natural numbers . (If the numbers are rational, then multiply them with the least common denominator.) Then the corresponding weighted mean A is obtained by

[edit] Means of tuples of different sizes

If a mean M is defined for tuples of several sizes, then one also expects that the mean of a tuple is bounded by the means of partitions. More precisely

Given an arbitrary tuple x, which is partitioned into y1, ..., yk, then (See Convex hull.)

[edit] Population and sample means

The mean of a population has an expected value of , known as the population mean. The sample mean makes a good estimator of the population mean, as its expected value is the same as the population mean. The sample mean of a population is a random variable, not a constant, and consequently it will have its own distribution. For a random sample of n observations from a normally distributed population, the sample mean distribution is Often, since the population variance is an unknown parameter, it is estimated by the mean sum of squares, which changes the distribution of the sample mean from a normal distribution to a Student's t distribution with n 1 degrees of freedom.

From Wikipedia, the free encyclopedia Jump to: navigation, search This article is about the statistical concept. For other uses, see Median (disambiguation). Not to be confused with Median language.

In probability theory and statistics, a median is described as the numeric value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one. If there is an even number of observations, then there is no single middle value; the median is then usually defined to be the mean of the two middle values.[1][2] In a sample of data, or a finite population, there may be no member of the sample whose value is identical to the median (in the case of an even sample size), and, if there is such a member, there may be more than one so that the median may not uniquely identify a sample member. Nonetheless, the value of the median is uniquely determined with the usual definition. A related concept, in which the outcome is forced to correspond to a member of the sample, is the medoid. At most, half the population have values less than the median, and, at most, half have values greater than the median. If both groups contain less than half the population, then some of the population is exactly equal to the median. For example, if a < b < c, then the median of the list {a, b, c} is b, and, if a < b < c < d, then the median of the list {a, b, c, d} is the mean of b and c; i.e., it is (b + c)/2. The median can be used as a measure of location when a distribution is skewed, when end-values are not known, or when one requires reduced importance to be attached to outliers, e.g., because they may be measurement errors. A disadvantage of the median is the difficulty of handling it theoretically.[citation needed]

y y y y y

1 Notation 2 Measures of statistical dispersion 3 Medians of probability distributions o 3.1 Medians of particular distributions 4 Medians in descriptive statistics 5 Theoretical properties o 5.1 An optimality property

y y y y y y y y

o 5.2 An inequality relating means and medians 6 The sample median o 6.1 Efficient computation of the sample median o 6.2 Easy explanation of the sample median  6.2.1 For an odd number of values  6.2.2 For an even number of values 7 Other estimates of the median 8 Median-unbiased estimators, and bias with respect to loss functions 9 In image processing 10 In multidimensional statistical inference 11 History 12 See also 13 References 14 External links

[edit] Notation
The median of some variable x is denoted either as or as [3]

[edit] Measures of statistical dispersion

When the median is used as a location parameter in descriptive statistics, there are several choices for a measure of variability: the range, the interquartile range, the mean absolute deviation, and the median absolute deviation. Since the median is the same as the second quartile, its calculation is illustrated in the article on quartiles.

[edit] Medians of probability distributions

For any probability distribution on the real line with cumulative distribution function F, regardless of whether it is any kind of continuous probability distribution, in particular an absolutely continuous distribution (and therefore has a probability density function), or a discrete probability distribution, a median m satisfies the inequalities or in which a LebesgueStieltjes integral is used. For an absolutely continuous probability distribution with probability density function , we have

[edit] Medians of particular distributions

The medians of certain types of distributions can be easily calculated from their parameters: The median of a normal distribution with mean and variance 2 is . In fact, for a normal distribution, mean = median = mode. The median of a uniform distribution in the interval [a, b] is (a + b) / 2, which is also the mean. The median of a Cauchy distribution with location

parameter x0 and scale parameter y is x0, the location parameter. The median of an exponential distribution with rate parameter is the natural logarithm of 2 divided by the rate parameter: 1 ln 2. The median of a Weibull distribution with shape parameter k and scale parameter is (ln 2)1/k.

[edit] Medians in descriptive statistics

The median is used primarily for skewed distributions, which it summarizes differently than the arithmetic mean. Consider the multiset { 1, 2, 2, 2, 3, 14 }. The median is 2 in this case, as is the mode, and it might be seen as a better indication of central tendency than the arithmetic mean of 4. Calculation of medians is a popular technique in summary statistics and summarizing statistical data, since it is simple to understand and easy to calculate, while also giving a measure that is more robust in the presence of outlier values than is the mean.

[edit] Theoretical properties

[edit] An optimality property
A median is also a central point that minimizes the average of the absolute deviations. In the above example, the median value of 2 minimizes the average of the absolute deviations (1 + 0 + 0 + 0 + 1 + 12) / 6 = 2.33; in contrast, the mean value of 4 minimizes the average of the squares (9 + 4 + 4 + 4 + 1 + 100) / 6 = 20.33. In the language of statistics, a value of c that minimizes is a median of the probability distribution of the random variable X. However, a median c need not be uniquely defined. Where exactly one median exists, statisticians speak of "the median" correctly; even when no unique median exists, some statisticians speak of "the median" informally. See also k-medians clustering.

[edit] An inequality relating means and medians

For continuous probability distributions, the difference between the median and the mean is never more than one standard deviation. See an inequality on location and scale parameters.

[edit] The sample median

[edit] Efficient computation of the sample median

Even though sorting n items requires O(n log n) operations, selection algorithms can compute the kth-smallest of n items (e.g., the median) with only O(n) operations.[4]

[edit] Easy explanation of the sample median

[edit] For an odd number of values

As an example, we will calculate the sample median for the following set of observations: 1, 5, 2, 8, 7. Start by sorting the values: 1, 2, 5, 7, 8. In this case, the median is 5 since it is the middle observation in the ordered list. The median is the ((n + 1)/2)th item, where n is the number of values. For example, for the list {1, 2, 5, 7, 8}, we have n = 5, so the median is the ((5 + 1)/2)th item.
median = (6/2)th item median = 3rd item median = 5 [edit] For an even number of values

As an example, we will calculate the sample median for the following set of observations: 1, 5, 2, 8, 7, 2. Start by sorting the values: 1, 2, 2, 5, 7, 8. In this case, the average of the two middlemost terms is (2 + 5)/2 = 3.5. Therefore, the median is 3.5 since it is the average of the middle observations in the ordered list. We also use this formula MEDIAN = {(n+1)/2} th item . n= Number of values As above example 1,2,2,5,7,8 n=6 Median={(6+1)/2}th item =3.5 th item
3rd item is 2 Median Median Median Median Median = = = = = {2+(0.5*(difference of 3rd and 4th item)} {2+(0.5*(2-5)} {2+(0.5*3)} (2+1.5) 3.5

[edit] Other estimates of the median

If data are represented by a statistical model specifying a particular family of probability distributions, then estimates of the median can be obtained by fitting that family of probability

distributions to the data and calculating the theoretical median of the fitted distribution. See, for example Pareto interpolation.

[edit] Median-unbiased estimators, and bias with respect to loss functions

Any mean-unbiased estimator minimizes the risk (expected loss) with respect to the squarederror loss function, as observed by Gauss. A median-unbiased estimator minimizes the risk with respect to the absolute-deviation loss function, as observed by Laplace. Other loss functions are used in statistical theory, particularly in robust statistics. The theory of median-unbiased estimators was revived by George W. Brown in 1947: An estimate of a one-dimensional parameter will be said to be median-unbiased, if, for fixed , the median of the distribution of the estimate is at the value ; i.e., the estimate underestimates just as often as it overestimates. This requirement seems for most purposes to accomplish as much as the mean-unbiased requirement and has the additional property that it is invariant under one-to-one transformation. [page 584] Further properties of median-unbiased estimators have been noted by Lehmann, Birnbaum, van der Vaart and Pfanzagl. In particular, median-unbiased estimators exist in cases where meanunbiased and maximum-likelihood estimators do not exist. Besides being invariant under one-toone transformations, median-unbiased estimators have surprising robustness.

[edit] In image processing

Main article: Median filter

In monochrome raster images there is a type of noise, known as the salt and pepper noise, when each pixel independently become black (with some small probability) or white (with some small probability), and is unchanged otherwise (with the probability close to 1). An image constructed of median values of neighborhoods (like 33 square) can effectively reduce noise in this case.

[edit] In multidimensional statistical inference

In multidimensional statistical inference, the value c that minimizes is also called a centroid.[5] In this case is indicating a norm for the vector difference, such as the Euclidean norm, rather than the one-dimensional case's use of an absolute value. (Note that in some other contexts a centroid is more like a multidimensional mean than the multidimensional median described here.) Like a centroid, a medoid minimizes , but c is restricted to be a member of specified set. For instance, the set could be a sample of points drawn from some distribution

In statistics, the mode is the value that occurs most frequently in a data set or a probability distribution[1]. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score.[2] Like the statistical mean and the median, the mode is a way of capturing important information about a random variable or a population in a single quantity. The mode is in general different from the mean and median, and may be very different for strongly skewed distributions. The mode is not necessarily unique, since the same maximum frequency may be attained at different values. The most ambiguous case occurs in uniform distributions, wherein all values are equally likely.

y y y

y y y

1 Mode of a probability distribution 2 Mode of a sample 3 Comparison of mean, median and mode o 3.1 When do these measures make sense? o 3.2 Uniqueness and definedness o 3.3 Properties o 3.4 Example for a skewed distribution 4 See also 5 References 6 External links

[edit] Mode of a probability distribution

The mode of a discrete probability distribution is the value x at which its probability mass function takes its maximum value. In other words, it is the value that is most likely to be sampled. The mode of a continuous probability distribution is the value x at which its probability density function attains its maximum value, so, informally speaking, the mode is at the peak. As noted above, the mode is not necessarily unique, since the probability mass function or probability density function may achieve its maximum value at several points x1, x2, etc. The above definition tells us that only global maxima are modes. Slightly confusingly, when a probability density function has multiple local maxima it is common to refer to all of the local maxima as modes of the distribution. Such a continuous distribution is called multimodal (as opposed to unimodal).

In symmetric unimodal distributions, such as the normal (or Gaussian) distribution (the distribution whose density function, when graphed, gives the famous "bell curve"), the mean (if defined), median and mode all coincide. For samples, if it is known that they are drawn from a symmetric distribution, the sample mean can be used as an estimate of the population mode.

[edit] Mode of a sample

The mode of a data sample is the element that occurs most often in the collection. For example, the mode of the sample [1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17] is 6. Given the list of data [1, 1, 2, 4, 4] the mode is not unique - the dataset may be said to be bimodal, while a set with more than two modes may be described as multimodal. For a sample from a continuous distribution, such as [0.935..., 1.211..., 2.430..., 3.668..., 3.874...], the concept is unusable in its raw form, since each value will occur precisely once. The usual practice is to discretize the data by assigning frequency values to intervals of equal distance, as for making a histogram, effectively replacing the values by the midpoints of the intervals they are assigned to. The mode is then the value where the histogram reaches its peak. For small or middle-sized samples the outcome of this procedure is sensitive to the choice of interval width if chosen too narrow or too wide; typically one should have a sizable fraction of the data concentrated in a relatively small number of intervals (5 to 10), while the fraction of the data falling outside these intervals is also sizable. An alternate approach is kernel density estimation, which essentially blurs point samples to produce a continuous estimate of the probability density function which can provide an estimate of the mode. The following MATLAB code example computes the mode of a sample:
X = sort(x); indices = find(diff([X; realmax]) > 0); % indices where repeated values change [modeL,i] = max (diff([0; indices])); % longest persistence length of repeated values mode = X(indices(i));

The algorithm requires as a first step to sort the sample in ascending order. It then computes the discrete derivative of the sorted list, and finds the indices where this derivative is positive. Next it computes the discrete derivative of this set of indices, locating the maximum of this derivative of indices, and finally evaluates the sorted sample at the point where that maximum occurs, which corresponds to the last member of the stretch of repeated values.

[edit] Comparison of mean, median and mode

See also: mean and median Comparison of common averages Type Description Equation Example Result

Arithmetic mean Median Mode

Total sum divided by number of values Middle value that separates the greater and lesser halves of a data set Most frequent number in a data set

(1+2+2+3+4+7+9) / 7 1, 2, 2, 3, 4, 7, 9 1, 2, 2, 3, 4, 7, 9

4 3 2

[edit] When do these measures make sense?

Unlike mean and median, the concept of mode also makes sense for "nominal data" (i.e., not consisting of numerical values). For example, taking a sample of Korean family names, one might find that "Kim" occurs more often than any other name. Then "Kim" would be the mode of the sample. In any voting system where a plurality determines victory, a single modal value determines the victor, while a multi-modal outcome would require some tie-breaking procedure to take place. Unlike median, the concept of mean makes sense for any random variable assuming values from a vector space, including the real numbers (a one-dimensional vector space) and the integers (which can be considered embedded in the reals). For example, a distribution of points in the plane will typically have a mean and a mode, but the concept of median does not apply. The median makes sense when there is a linear order on the possible values. Take into account that sometimes the centerpoint is used as a generalization of the median to high dimensional spaces.

[edit] Uniqueness and definedness

For the remainder, the assumption is that we have (a sample of) a real-valued random variable. For some probability distributions, the expected value may be infinite or undefined, but if defined, it is unique. The mean of a (finite) sample is always defined. The median is the value such that the fractions not exceeding it and not falling below it are both at least 1/2. It is not necessarily unique, but never infinite or totally undefined. For a data sample it is the "halfway" value when the list of values is ordered in increasing value, where usually for a list of even length the numerical average is taken of the two values closest to "halfway". Finally, as said before, the mode is not necessarily unique. Certain pathological distributions (for example, the Cantor distribution) have no defined mode at all.[citation needed] For a finite data sample, the mode is one (or more) of the values in the sample.

[edit] Properties
Assuming definedness, and for simplicity uniqueness, the following are some of the most interesting properties.

All three measures have the following property: If the random variable (or each value from the sample) is subjected to the linear or affine transformation which replaces X by aX+b, so are the mean, median and mode.

However, if there is an arbitrary monotonic transformation, only the median follows; for example, if X is replaced by exp(X), the median changes from m to exp(m) but the mean and mode won't. Except for extremely small samples, the mode is insensitive to "outliers" (such as occasional, rare, false experimental readings). The median is also very robust in the presence of outliers, while the mean is rather sensitive. In continuous unimodal distributions the median lies, as a rule of thumb, between the mean and the mode, about one third of the way going from mean to mode. In a formula, median (2 mean + mode)/3. This rule, due to Karl Pearson, often applies to slightly non-symmetric distributions that resemble a normal distribution, but it is not always true and in general the three statistics can appear in any order.[3][4] For unimodal distributions, the mode is within standard deviations of the mean, and the root mean square deviation about the mode is between the standard deviation and twice the standard deviation.[5]

[edit] Example for a skewed distribution

A well-known example of a skewed distribution is personal wealth: Few people are very rich, but among those some are extremely rich. However, many are rather poor. A well-known class of distributions that can be arbitrarily skewed is given by the log-normal distribution. It is obtained by transforming a random variable X having a normal distribution into random variable Y = exp(X). Then the logarithm of random variable Y is normally distributed, hence the name. Taking the mean of X to be 0, the median of Y will be 1, independent of the standard deviation of X. This is so because X has a symmetric distribution, so its median is also 0. The transformation from X to Y is monotonic, and so we find the median exp(0) = 1 for Y. When X has standard deviation = 0.2, the distribution of Y is not very skewed. We find (see under Log-normal distribution), with values rounded to four digits:
y y

mean = 1.0202 mode = 0.9608

Indeed, the median is about one third on the way from mean to mode. When X has a much larger standard deviation, Now
y y

= 5, the distribution of Y is strongly skewed.

mean = 7.3891 mode = 0.0183

Here, Pearson's rule of thumb fails.