# Univariate Analysis

Univariate analysis explores each variable (feature) one by one, separately. “Uni” means “one”, and variate means variable. So, only single variable will be used at a time for statistical analysis and visualization techniques.

Univariate Analysis fall under Descriptive Statistics. The objective of univariate analysis is to describe or summarizes data and finds patterns. It explores each variable separately in a dataset. Variables can be either categorical or numerical.

Before going into the details of Univariate Analysis, let’s first revise the types of Data in statistics (check the below article):

Data Types in Statistics

1. Numerical (or Quantitative) data
• Discrete data
• Continuous data
2. Categorical (or Qualitative) data
• Nominal data
• Ordinal data

## Numerical

A numerical variable (feature or attribute) is one that may take on any value within a finite or infinite interval (e.g., height, weight, temperature, blood glucose,…).

### 1.  Measure of Central Tendencies

Min, Max, Mean, Median, Mode

Which ‘Measures of Central Tendency’ should be used and when? Link

### 2.  Measure of Variability

Range, Quantile, IQR, Variance, Standard Deviation, Coefficient of Variation

#### Range

The difference between maximum and minimum value.

#### Quantile

A set of ‘cut points’ that divide a set of data into groups containing equal numbers of values or equal-sized subgroups (Median, Quartile, Percentile, and more). Link

• The only 2-quantile or dividing the set of data into 2 groups is called the median
• The 3-quantiles are called tertiles or terciles → T
• The 4-quantiles are called quartiles → Q. The difference between upper and lower quartiles is also called the interquartile rangemid-spread or middle fifty → IQR = Q3 − Q1.
• The 5-quantiles are called quintiles → QU
• The 6-quantiles are called sextiles → S
• The 7-quantiles are called septiles
• The 8-quantiles are called octiles
• The 10-quantiles are called deciles → D
• The 100-quantiles are called percentiles → P

#### IQR (Interquartile Range)

A measure of statistical dispersion and variability based on dividing a data set into quartiles.

Mid-spread or middle fifty → IQR = Q3 − Q1

#### Variance

Variance is a measure of data dispersion. In simple words, it is a measure of how far a set of numbers is spread out from their average value. Article explains difference in population vs sample variance using examples, video, and code. Symbol of Variance is σ2 (Sigma Square).

#### Standard Deviation

Standard Deviation

Standard Deviation is a measure of data dispersion relative to its mean. Standard Deviation is also calculated as the square root of the Variance. Symbol of standard deviation is σ (the Greek letter Sigma). Article covers all aspects using examples, video, and python code.

#### Coefficient of Variation

The coefficient of variation (CV) represents the ratio of the standard deviation to the mean, and it is a useful statistic for comparing the degree of variation from one data series to another, even if the means are drastically different from one another.

### 3. Curve Types

#### Skewness

Skewness

Skewness is a degree of asymmetry observed in a probability distribution that deviates from the symmetrical normal distribution (bell curve) in a given set of data. Skewness types – Symmetrical, right and left skewed. Article explains skewness using examples, video, and python code.

#### Kurtosis

Kurtosis

Kurtosis provides information about distribution. There are 3 types of Kurtosis: Mesokurtic, Leptokurtic Platykurtic. Article explains types of kurtosis using examples, video, visualization, and python code.

### 4. Numerical Visualization

#### Histogram

A histogram plot shows the underlying frequency distribution (shape) of a set of continuous data. Using histogram we can understand features of data such as distribution (normal or skewed), outliers, etc.

There are visual similarities in histograms and bar plots as both display the same categorical variables against the category of data. However, histogram is generally used on continuous data to display it as bins which indicate the number of data points in a range.

#### Boxplot

Boxplot

Boxplot or Whisker plot visually show the distribution of numerical data. A Box Plot is the visual representation of the statistical five points summary, including the minimum score, first (lower) quartile, median, third (upper) quartile, and maximum score.

## Categorical

### 1. Frequency Table

frequency table counts occurrence of each category of the variable. In addition to count we can add percentages that fall into each category. We can get Count and Count%.

For example, in a class variable we can understand the category count and percentage of boys and girls students.

### 2. Categorical Visualization

#### Pie Chart

A Pie Chart is a circular graph and the “pie slices” denote the relative size of that particular category. Pie charts are mainly used to understand division of group into smaller pieces. The whole pie represents 100 percent.

#### Bar Plot

The bar plot (or bar graph or bar chart) is very suitable for comparing categories of data or different groups of data. It can help track and compare changes over time. Mostly used for visualizing discrete data.