Course Outline

The course is self-contained. It is designed for people with different skill levels. A beginner with no prior data science or programming experience can take this course as well as someone who has basic data science knowledge and wants to advance their career or someone from a different field who wants to change their career trajectory. We will cover all the basics, from the ground up and give you access to the tools and skills necessary in the sexiest career in the 21st century. This is a five week intensive hands-on course that balances theory with practices. We believe these two are intertwined and understanding the mechanics of underlying tools will be a force multiplier that will make you stand out as you pursue your career in this exciting field whether you want to be a business analyst, data engineer, data scientist or an academic researcher etc. We strive to cater for you all.

Week 0
Elementary Math for Data Science
- Basic Linear Algebra
- Statistics and Probability Theory
- Calculus and Optimization
Week 1
Introduction to Data Science
- What is data science?
- Overview of the data science workflow
- Tools and resources for data science
- Basics of Python programming
- Introduction to data types, variables, and operators
- Data structures in Python: lists, tuples, dictionaries, and sets
- Object Oriented Programming (OOP)
- I/O in Python
Week 2
Data Manipulation and Analysis
- Introduction to NumPy and Pandas
- Loading and manipulating data with Pandas
- Data cleaning and preprocessing
- Data visualization with Matplotlib and Seaborn
- Exploratory data analysis (EDA)
Week 3
Machine Learning Fundamentals
- Machine learning basics
- Supervised learning
- Unsupervised learning
- Introduction to scikit-learn
- Feature engineering
- Building and evaluating machine learning models
Week 4
Capstone Project
- Participants will work on a capstone project that demonstrates their understanding of the concepts covered in the previous four weeks.
- The project involves data cleaning, exploratory data analysis, and building a machine learning model.
- We will provide project ideas and data sets for participants to choose from, or participants can propose their own project ideas.

This course includes lectures, hands-on exercises, and assignments to reinforce the concepts covered. The capstone project is an opportunity for you to apply what you have learned and showcase your skills. By the end of the course, you should have a basic understanding of data science concepts and tools, and be able to analyze and manipulate data, and build simple machine learning models.

Detail Course Schedule

Week 0: Elementary Math for Data Science
Day 1: Basic Linear Algebra
- Introduction to Vectors and Matrices
- Vector Operations: Addition, Subtraction, Scalar Multiplication
- Matrix Operations: Addition, Subtraction, Scalar Multiplication, Matrix Multiplication
- Inverse and Transpose of a Matrix
- Applications of Linear Algebra
Resources:
- Khan Academy's Linear Algebra course
- Video lectures by Professor Gilbert Strang on Linear Algebra on MIT OpenCourseWare
Day 2: Statistics and Probability Theory
- Descriptive Statistics: Measures of Central Tendency (Mean, Median, Mode)
- Measures of Dispersion (Range, Variance, Standard Deviation)
- Probability: Basic Concepts, Probability Distributions (Normal), Bayes Theorem
- Hypothesis Testing
Resources:
- Khan Academy's Statistics and Probability course
- Video lectures on Probability and Statistics by Professor Blitzstein on Harvard OpenCourseWare
Day 3: Calculus
- Limits
- Derivatives
- Chain Rule
- Applications of Derivatives (Maxima, Minima, Optimization)
Resources:
- Khan Academy's Calculus course
- MIT OpenCourseWare Calculus Course
Week 1: Introduction to Data Science
Day 1: Introduction to Data Science
- What is Data Science?
- What is a Data Scientist?
- What is the Data Science Process?
- What is the Data Science Workflow?
- What is the Data Science Pipeline?
- What is the Data Science Toolkit?
- What is the Data Science Venn Diagram?
- What is the Data Science Maturity Model?
- What is the Data Science Career Path?
Resources:
Day 2: Introduction to Python
- Data types, variables, and operators in Python
- Understanding data structures in Python: lists, tuples, dictionaries, and sets
- Understanding control flow in Python: if, else, for, while, and try/except
- Understanding functions in Python
- Understanding classes in Python
- Introduction to Jupyter Notebook
- Python Virtual Environments
Resources:
Day 3: Functions and I/O Management in Python
- Functions in Python
- Reading and writing files in Python
- Reading and writing data in different formats in Python
Resources:
Week 2: Data Manipulation, Analysis and Visualization
Day 1: Introduction to NumPy and Pandas
- Overview of NumPy and Pandas
- Installation and Setup of NumPy and Pandas
- Working with arrays and dataframes
- Loading data from different sources (csv, excel, sql)
- Selecting, Filtering and Sorting Data
Resources:
Day 2: Data Cleaning and Preprocessing
- Identifying and handling missing data
- Removing duplicates
- Dealing with outliers
Resources:
Day 3: Data Visualization with Matplotlib and Seaborn
- Introduction to Data Visualization
- Overview of Matplotlib and Seaborn
- Creating basic plots, histograms, and scatter plots
Resources:
- Matplotlib Tutorial (DataCamp)
- Seaborn Tutorial (Seaborn)
Week 3: Machine Learning Fundamentals
Day 1: Introduction to Machine Learning
- Overview of Machine Learning
- Types of Machine Learning
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
- Applications of Machine Learning in various industries
Resources:
Day 2: Supervised Learning
- Introduction to Supervised Learning
- Linear Regression
- Logistic Regression
- Decision Trees
- Random Forests
- Support Vector Machines
- Model evaluation metrics: R-squared, mean squared error, confusion matrix
Resources:
Day 3: Unsupervised Learning
- Introduction to Unsupervised Learning
- Clustering: K-Means and Hierarchical Clustering
- Dimensionality Reduction
Resources:
Day 4: Building and Evaluating Machine Learning Models
- Model selection and training
- Overfitting and underfitting
- Model Evaluation
- Cross-validation
Resources:
Week 4: Capstone Project
Day 1: Introduction to Capstone Project
- Overview of the capstone project
- Requirements for the capstone project
- Project proposal guidelines and examples
The capstone project

The capstone project is a hands-on opportunity for you to apply the concepts and skills you have learned in the previous three weeks to a real-world data problem. You will work on a data analysis project that involves cleaning, preprocessing, and analyzing a given dataset, followed by building and evaluating a machine learning model to solve a particular problem.

The capstone project will consist of the following steps:
1. Data Collection: You will be given a dataset to work with or can propose their own dataset. The dataset should be relevant to the problem they want to solve.
2. Data Cleaning and Preprocessing: You will clean the dataset and prepare it for analysis by handling missing values, dealing with duplicates, and scaling or transforming features as needed.
3. Data Analysis and Exploration: You will perform exploratory data analysis (EDA) to gain insights into the data, visualize the data, and identify patterns and trends.
4. Feature Engineering: You will identify relevant features for the machine learning model and engineer new features if needed.
5. Model Building: You will build a machine learning model using scikit-learn or another Python library. You will evaluate the model's performance using appropriate metrics and tune the model if needed.
6. Model Evaluation: You will evaluate the model's performance on the test set and compare it with the baseline model. You will also analyze the model's performance and identify areas for improvement.
7. Model Deployment: You will deploy the model using Flask or other web frameworks, allowing others to use the model for predictions.
8. Presentation: You will present your findings, including the problem they solved, the methodology used, the insights gained, and the machine learning model built, to the rest of the class.
The capstone project will be an opportunity for you to showcase your skills and creativity, as well as to get hands-on experience in solving a real-world data problem. You will receive guidance and feedback from the us and have access to online resources to help them complete the project.

Guidelines for the capstone project

The Capstone project is mandatory and must be completed by each participant to receive a certificate of completion.
Participants must select a dataset relevant to their area of interest, subject to approval by the instructor.
Participants must document their work throughout the project and maintain a clean and well-organized repository on GitHub.
Participants must submit a final report that includes their findings, methodology, and any code or scripts used in the project.
Participants are encouraged to collaborate with other participants but must submit individual projects.
Participants must adhere to ethical guidelines for data collection, use, and analysis, and avoid misrepresenting or misinterpreting results.
Participants must follow good software engineering practices, such as modular code design, version control, and unit testing.
Participants must adhere to the project timeline and deadlines and seek assistance from the instructor or teaching assistants in case of any issues.
The final project must demonstrate proficiency in data cleaning, data analysis, and machine learning model building.
Participants will be evaluated based on the quality of their project, their ability to communicate their findings, and their adherence to the project guidelines.

By following these rules and guidelines, participants will be able to successfully complete their Capstone project and showcase their skills in data science and machine learning.

Datasets and resources to get started

Here are some datasets and resources to get you started on your capstone project:

Target Industries:

E-commerce and retail
Healthcare
Finance and banking
Marketing and advertising
Education

Data Science Applications:

Customer segmentation and targeting
Fraud detection and prevention
Predictive maintenance
Recommender systems
Sentiment analysis and opinion mining

Examples

Titanic Dataset: This dataset contains information on the passengers who were aboard the Titanic when it sank. It includes information on age, sex, class, and survival status. This dataset is often used for predicting survival outcomes based on various features https://www.kaggle.com/c/titanic/data
Iris Dataset: This dataset contains information on the petal and sepal lengths and widths of three different species of iris flowers. This dataset is often used for classification tasks and clustering analysis https://archive.ics.uci.edu/ml/datasets/iris
Wine Quality Dataset: This dataset contains information on the physicochemical properties of different wines, along with a quality rating. This dataset is often used for classification and regression tasks https://archive.ics.uci.edu/ml/datasets/wine+quality
Boston Housing Dataset: This dataset contains information on housing prices in Boston and various features that may influence those prices. This dataset is often used for regression analysis https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html
Breast Cancer Wisconsin (Diagnostic) Dataset: This dataset contains information on various features of breast cancer tumors and a binary classification of whether the tumor is malignant or benign. This dataset is often used for classification tasks https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
MNIST Handwritten Digits Dataset: This dataset contains a large number of images of handwritten digits, along with labels indicating the digit that each image represents. This dataset is often used for classification tasks, particularly image classification http://yann.lecun.com/exdb/mnist/
California Housing Dataset: This dataset contains information on housing prices in California and various features that may influence those prices. This dataset is often used for regression analysis https://www.kaggle.com/datasets/camnugent/california-housing-prices
Pima Indians Diabetes Dataset: This dataset contains information on various features of Pima Indian women and a binary classification of whether or not they have diabetes. This dataset is often used for classification tasks https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database
Adult Income Dataset: This dataset contains information on various features of individuals and a binary classification of whether their income is above or below $50,000 per year. This dataset is often used for classification tasks https://archive.ics.uci.edu/ml/datasets/adult
Bank Marketing Dataset: This dataset contains information on various features of individuals and whether or not they subscribed to a bank's marketing campaign. This dataset is often used for classification tasks and customer segmentation https://archive.ics.uci.edu/ml/datasets/bank+marketing

These datasets are widely used in beginner data science capstone projects and provide a great starting point for learning various data science concepts and techniques.

Welcome to the Data Science Bootcamp for Beginners

Overview

By participating in this course, you can expect to gain the following skills:

Join the Data Science for Beginners course today to jumpstart your career in data science or take your data analysis skills to the next level!

Registering for this course

Course Payment

Course Prerequisites

Course Outline

Week 0

Week 1

Week 2

Week 3

Week 4

Detail Course Schedule

Week 0: Elementary Math for Data Science

Day 1: Basic Linear Algebra

Resources:

Day 2: Statistics and Probability Theory

Resources:

Day 3: Calculus

Resources:

Week 1: Introduction to Data Science

Day 1: Introduction to Data Science

Resources:

Day 2: Introduction to Python

Resources:

Day 3: Functions and I/O Management in Python

Resources:

Week 2: Data Manipulation, Analysis and Visualization

Day 1: Introduction to NumPy and Pandas

Resources:

Day 2: Data Cleaning and Preprocessing

Resources:

Day 3: Data Visualization with Matplotlib and Seaborn

Resources:

Week 3: Machine Learning Fundamentals

Day 1: Introduction to Machine Learning

Resources:

Day 2: Supervised Learning

Resources:

Day 3: Unsupervised Learning

Resources:

Day 4: Building and Evaluating Machine Learning Models

Resources:

Week 4: Capstone Project

Day 1: Introduction to Capstone Project

The capstone project

Guidelines for the capstone project

Datasets and resources to get started

Target Industries:

Data Science Applications:

Examples