The coefficient of variation (CV), also known as relative standard deviation (RSD), is a statistical measure of the dispersion of data points in a data series around the mean. The coefficient of variation represents the ratio of the standard deviation to the mean, and it is a useful statistic for comparing the degree of variation from one data series to another, even if the means are drastically different from one another.
E.g. in finance or stock market, the coefficient of variation allows investors to determine how much volatility, or risk, is in the investments. The lower the ratio of the standard deviation to mean return, the better risk-return trade-off.
Table of Contents
Examples
- A data set of [100, 100, 100] has constant values. Its standard deviation is 0 and average is 100, giving the coefficient of variation as 0 / 100 = 0
- A data set of [90, 100, 110] has more variability. Its sample standard deviation is 8.165 and its average is 100, giving the coefficient of variation as 8.165 / 100 = 0.08165
- A data set of [1, 5, 6, 8, 10, 40, 65, 88] has still more variability. Its standard deviation is 32.9 and its average is 27.9, giving a coefficient of variation of 32.9 / 27.9 = 1.18
Advantages
- For comparison between data sets with different units or widely different means, one should use the coefficient of variation instead of the standard deviation. The standard deviation of data must always be understood in the context of the mean of the data, however, the actual value of the CV is independent of the unit in which the measurement has been taken, so it is a dimensionless number.
Disadvantages
- When the mean value is close to zero, the coefficient of variation will approach infinity and is therefore sensitive to small changes in the mean. So, if the expected return in the denominator of the coefficient of variation formula is negative or zero, the result could be misleading.
- Unlike the standard deviation, it cannot be used directly to construct confidence intervals for the mean.
- CVs are not an ideal index of the certainty of measurement when the number of replicates varies across samples because CV is invariant to the number of replicates while the certainty of the mean improves with increasing replicates. In this case, standard error in percent is suggested to be superior.
- Relative Value Problem: If, for example, the data sets are temperature readings from two different sensors (a Celsius sensor and a Fahrenheit sensor) and you want to know which sensor is better by picking the one with the least variance, then you will be misled if you use CV. The problem here is that you have divided by a relative value rather than an absolute.
- Drawback Example: Comparing coefficients of variation between parameters using relative units can result in differences that may not be real. If we compare the same set of temperatures in Celsius and Fahrenheit (both relative units, where kelvin and Rankine scale are their associated absolute values):
- Celsius: [0, 10, 20, 30, 40]
- Fahrenheit: [32, 50, 68, 86, 104]
- The sample standard deviations are 15.81 and 28.46, respectively. The CV of the first set is 15.81/20 = 79%. For the second set (which are the same temperatures) it is 28.46/68 = 42%.
Video
Code
Example 1: Coefficient of Variation for a Single Series
#import library
import numpy as np
#create vector of data
data = [1,3,5,7,9,11,20]
#define function to calculate cv
cv = lambda x: np.std(x, ddof=1) / np.mean(x) * 100
#calculate CV
cv(data)
62.36095644623235
Example 2: Coefficient of Variation for all columns of a Data Frame
#import libraries
import numpy as np
import pandas as pd
#define function to calculate cv
cv = lambda x: np.std(x, ddof=1) / np.mean(x) * 100
#create pandas DataFrame
df = pd.DataFrame({'a': [1,3,5,7,9,11,20],
'b': [110, 130, 150, 170, np.nan, 110, 120],
'c': [210, 230, 250, 270, 220, 290, np.nan]})
#calculate CV for each column in data frame
df.apply(cv)
a 78.726848 b 18.238394 c 12.580437 dtype: float64
Note: Missing values will simply be ignored when calculating the coefficient of variation.