Data Science Course Syllabus

Table of Contents

1.  PROGRAMMING (30 Hours)

1.1  Excel
1.2  SQL
1.3  Python

2.  STATISTICS (10 Hours)

2.1  Descriptive vs Inferential Statistics
2.2  Terms
2.3  Data Exploration
2.2.1  Univariate
2.2.1.1  Categorical
a.  Count%
b.  Pie Chart, Bar Chart
2.2.1.2  Numerical
a.  Min, Max, Mean, Median, Mode
b.  Range, IQR, Variance, Standard Deviation, Coefficient of Variation
c.  Skewness, Kurtosis
d.  Histogram, Box plot
2.2.2  Bivariate
2.2.2.1  Categorical & Categorical
a.  Chi sqaure test
b.  Bar Chart, 2-y axis plot
2.2.2.2  Numerical & Numerical
a.  Correlation
b.  Scatter plot
2.2.2.3  Numerical & Categorical
a.  Z test, t test, ANOVA
2.2.2.4  Bar & Line Chart, 2-y axis plot
2.2.3  Multivariate
2.2.3.1  Cluster Analysis
2.2.3.2  Principal Component Analysis
2.2.3.3  Correspondence Analysis

3.  ALGORITHMS (30 Hours)

3.1  Supervised
3.1.1  Regression
3.1.1.1  Linear Regression
3.1.1.2  Polynomial Regression
3.1.2  Classification
3.1.2.1  Logistic Regression
3.1.2.2  K Nearest Neighbours
3.1.2.3  Trees
a.  Decision Trees
b.  Random Forest
c.  xgBoost
3.2  Unsupervised
3.2.1  Clustering
3.2.2  Association
3.2.3  Dimensionality reduction

4.  PROJECT LIFE CYCLE (20 Hours)

4.1  Understand Problem/Objective
4.2  Data Collection
4.3  Data Preparation
4.3.1  Cleaning Data
4.3.2  EDA
4.3.3  Feature Engineering
4.3.4  Feature Selection
4.3.5  Train/Validation/Test Split
4.4  Modeling
4.5  Evaluation
4.5.1  Classification
4.5.1.1  Confusion Matrix
4.5.1.2  Lif Charts
4.5.1.3  Gain Chart
4.5.1.4  KS Statistic (Kolomogorov Smirnov)
4.5.1.5  AUC-ROC
4.5.1.6  F1 Score
4.5.1.7  Log Loss
4.5.1.8  Gini Coefficient
4.5.1.9 Concordant – Discordant ratio
4.5.2  Regression
4.5.2.1  RMSE
4.5.2.2  RSE
4.5.2.3  MAE
4.5.2.4  RAE
4.5.2.5  Coefficient of Determination (R2)
4.6  Model Deployment

5.  CASE STUDIES (40 Hours)

5.1  Educational
5.1.1  Classification: Titanic Survival Prediction
5.1.2  Regression: Boston Housing Prediction
5.2  Business
5.1.1  Click Through Rate Prediction
5.1.2  Fraud Detection

1.  PROGRAMMING

1.1  Excel

1.2  SQL

1.3  Python

2.  STATISTICS

2.1  Descriptive vs Inferential Statistics

2.2  Terms

2.3  Data Exploration

2.2.1  Univariate

2.2.1.1  Categorical
a.  Count%
b.  Pie Chart, Bar Chart
2.2.1.2  Numerical
a.  Min, Max, Mean, Median, Mode
b.  Range, IQR, Variance, Standard Deviation, Coefficient of Variation
c.  Skewness, Kurtosis
d.  Histogram, Box plot

2.2.2  Bivariate

2.2.2.1  Categorical & Categorical
a.  Chi sqaure test
b.  Bar Chart, 2-y axis plot
2.2.2.2  Numerical & Numerical
a.  Correlation
b.  Scatter plot
2.2.2.3  Numerical & Categorical
a.  Z test, t test, ANOVA
2.2.2.4  Bar & Line Chart, 2-y axis plot

2.2.3  Multivariate

2.2.3.1  Cluster Analysis
2.2.3.2  Principal Component Analysis
2.2.3.3  Correspondence Analysis

3.  ALGORITHMS

3.1  Supervised

3.1.1  Regression

3.1.1.1  Linear Regression
3.1.1.2  Polynomial Regression

3.1.2  Classification

3.1.2.1  Logistic Regression
3.1.2.2  K Nearest Neighbours
3.1.2.3  Trees
a.  Decision Trees
b.  Random Forest
c.  xgBoost

3.2  Unsupervised

3.2.1  Clustering

3.2.2  Association

3.2.3  Dimensionality reduction

4.  PROJECT LIFE CYCLE

4.1  Understand Problem/Objective

4.2  Data Collection

4.3  Data Preparation

4.3.1  Cleaning Data

4.3.2  EDA

4.3.3  Feature Engineering

4.3.4  Feature Selection

4.3.5  Train/Validation/Test Split

4.4  Modeling

4.5  Evaluation

4.5.1  Classification

4.5.1.1  Confusion Matrix
4.5.1.2  Lif Charts
4.5.1.3  Gain Chart
4.5.1.4  KS Statistic (Kolomogorov Smirnov)
4.5.1.5  AUC-ROC
4.5.1.6  F1 Score
4.5.1.7  Log Loss
4.5.1.8  Gini Coefficient
4.5.1.9 Concordant – Discordant ratio

4.5.2  Regression

4.5.2.1  RMSE
4.5.2.2  RSE
4.5.2.3  MAE
4.5.2.4  RAE
4.5.2.5  Coefficient of Determination (R2)

4.6  Model Deployment

5.  CASE STUDIES

5.1  Educational

5.1.1  Classification: Titanic Survival Prediction

5.1.2  Regression: Boston Housing Prediction

5.2  Business

5.1.1  Click Through Rate Prediction

5.1.2  Fraud Detection

Connect

Subscribe

Join our email list to receive the latest updates.

© 2021 KeyToDataScience