Variance

Variance is a measure of dispersion. It is the expectation of the squared deviation of a random variable from its population mean or sample mean. In simple words, it is a measure of how far a set of numbers is spread out from their average value.

Symbol of Variance is σ2 (Sigma Square).

Variance is the average of the squared differences from the Mean.

To calculate the variance follow these steps:

  • Find the Mean (the simple average of the numbers)
  • Subtract the Mean from each number
  • Square the Result (the squared difference) and take sum
  • Take the average of squared differences.

Example

Data = 8, 9, 10, 11, 12

  • Step 1: Find the Mean (the simple average of the numbers)
    • Mean: (8 + 9 + 10 + 11 + 12) ÷ 5 = 10
  • Step 2: Subtract the Mean from each number
ScoreDeviation from the mean
88 – 10 = -2
99 – 10 = 1
1010 – 10 = 0
1111 – 10 = 1
1215 – 10 = 5
  • Step 3: Square each deviation from the mean and take sum
    • (-2)2 + (1)2 + (0)2 + (1)2 + (2)2 = 10
  • Step 4: Divide the sum of squares by n – 1 or N
    • Divide the sum of the squares by n – 1 (for a sample) or N (for a population).
    • Since we’re working with a sample, we’ll use  n – 1, where n = 5.
    • 10/(5-1) = 2.5

Advantages

  • Statisticians use variance to see how individual numbers relate to each other within a data set, rather than using broader mathematical techniques such as arranging numbers into quartiles.
  • The advantage of variance is that it treats all deviations from the mean as the same regardless of their direction.
  • The squared deviations cannot sum to zero and give the appearance of no variability at all in the data.

Disadvantages

  • Variance gives added weight to outliers. These are the numbers far from the mean. Squaring these numbers can skew the data. Another pitfall of using variance is that it is not easily interpreted. Users often employ it primarily to take the square root of its value, which indicates the standard deviation of the data set. As noted above, investors can use standard deviation to assess how consistent returns are over time.

Video

Code

Starting Python 3.4, the standard library comes with the variance function (sample variance or variance n-1) as part of the statistics module:

# Python code
# variance() function of Statistics Module
 
# Import statistics module
import statistics
 
# Creating a sample of data
sample = [8, 9, 10, 11, 12]
 
# Print variance of the sample set
print(f"Variance of sample set is {statistics.variance(sample)}")
Variance of sample set is 2.5

The population variance (or variance n) can be obtained using the pvariance function:

# Python code
# pvariance() function of Statistics Module
 
# Import statistics module
import statistics
 
# Creating a sample of data
population = [8, 9, 10, 11, 12]

# Print variance of the sample set
print(f"Variance of population set is {statistics.pvariance(population)}")
Variance of population set is 2

Related Links

Leave a Comment

Keytodatascience Logo

Connect

Subscribe

Join our email list to receive the latest updates.

© 2022 KeyToDataScience