Skewness

Skewness is a degree of asymmetry observed in a probability distribution that deviates from the symmetrical normal distribution (bell curve) in a given set of data.

Type of Skewness (Symmetrical, right skewed, left skewed)
Type of Skewness (Symmetrical, right or left skewed)

Types of Skewness:

  1. Symmetrical Distribution:
  2. Positive skewed or right-skewed:
  3. Negative skewed or left-skewed

Symmetrical Distribution:

Mean = Median = Mode

The normal distribution helps to know a skewness. When we talk about normal distribution, data symmetrically distributed. The symmetrical distribution has zero skewness as all measures of a central tendency lies in the middle. When data is symmetrically distributed, the left-hand side, and right-hand side, contain the same number of observations. (If the dataset has 90 values, then the left-hand side has 45 observations, and the right-hand side has 45 observations.)

Positive skewed or right-skewed:

Mean > Median > Mode

In statistics, a positively skewed distribution is a sort of distribution where, unlike symmetrically distributed data where all measures of the central tendency (mean, median, and mode) equal each other, with positively skewed data, the measures are dispersing, which means Positively Skewed Distribution is a type of distribution where the mean, median, and mode of the distribution are positive rather than negative or zero.

In Positively skewed, the mean of the data is greater than the median (a large number of data-pushed on the right-hand side). To put it another way, the results are bent towards the lower side. The mean will be more than the median as the median is the middle value and mode is always the highest value.

The extreme positive skewness is not desirable for distribution, as a high level of skewness can cause misleading results. The data transformation tools are helping to make the skewed data closer to a normal distribution. For positively skewed distributions, the famous transformation is the log transformation. The log transformation proposes the calculations of the natural logarithm for each value in the dataset.

Negative skewed or left-skewed

Mean > Median > Mode

A negatively skewed distribution is the straight reverse of a positively skewed distribution. In statistics, negatively skewed distribution refers to the distribution model where more values are plots on the right side of the graph, and the tail of the distribution is spreading on the left side.

In negatively skewed, the mean of the data is less than the median (a large number of data-pushed on the left-hand side). Negatively Skewed Distribution is a type of distribution where the mean, median, and mode of the distribution are negative rather than positive or zero.

How Do We Transform Skewed Data?

Since you know how much the skewed data can affect our machine learning model’s predicting capabilities, it is better to transform the skewed data to normally distributed data. Here are some of the ways you can transform your skewed data:

  • Power Transformation
  • Log Transformation
  • Exponential Transformation

Note: The selection of transformation depends on the statistical characteristics of the data.

Video

Code

Python Code for Skewness

# if scipy library is not installed
# pip install scipy

# create data
x = [55, 78, 65, 98, 97, 60, 67, 65, 83, 65]


# import skew package
from scipy.stats import skew

print(skew(x))
OUTPUT:
0.6475112950060684

Understand Skewness Output

  • Zero Skewness value (= 0): Indicates a symmetrical distribution
  • Negative Skewness value (< 0): Indicates an asymmetry in the distribution and the tail is larger towards the left hand side of the distribution.
  • Positive Skewness value (>0): Indicates an asymmetry in the distribution and the tail is larger towards the right hand side of the distribution.

So, in the above example skewness is right tailed or Positive skewed.

Kurtosis

Coefficient of Variation

Leave a Comment

Keytodatascience Logo

Connect

Subscribe

Join our email list to receive the latest updates.

© 2022 KeyToDataScience