[Data Science] 2. Data Science and Machine Learning
Link
https://app.datascientist.fr/learn/learning/57/60/166/762
CRISP-DM Process
1. Opportunity Assessment & Business Understanding
2. Data Understanding & Acquisition
3. Data Preparation & Cleaning & Transformation
4. Modeling
5. Evaluation & Residuals & Metrics
6. Model Deployment & Application
Data Preparation
1. Data Collection
- Data augmentation : Rotating the original versions, cropping
them differently, or altering the lighting conditions
- Data labeling
2. Data Processing
- Formatting
- Cleaning : Remove messy data
- Sampling : If you have too much data
3. Data Transformation(Feature engineering)
- Scaling
- Normalizing
- Decomposition
- Feature aggregation : RGB, Channels
* Missing & Repeated value
* Outliers & Errors
Machine Learning
Supervised Learning
1. Classification : Yes/No question
ex) Will it be hot or cold tomorrow?
- Evaluation of Classification
+ Confusion Matrix
* Recall = TP/(TP+FN)
* Precision = TP/(TP+FP)
* Accuracy = (TP+TN)/(TP+TN+FP+FN)
- Types
+ Binary Classification
+ Multiclass Classification
+ Multilabel Classification
2. Regression : Predict a numerical value
ex) What will be the etmperature tomorrow?
- Evaluation of Regression
+ MSE
+ RMSE
+ MAE
Unsupervised Learning
1. Clusting : Group observations into similar-looking groups
- Evaluation of Clustering
+ Internal Measures
* Cohesion
* Separation
+ External Measures
* Compare with Ground Truth
2. Recommender system : Recommendation
Dataset
1. Training Dataset : The sample of data used to fit the model
2. Validation
- Cross Validation
3. Test
Overfitting & Underfitting
1. Overfitting : Forcefitting, Too good to be true
2. Appropriate fitting
3. Under fitting : Too simple to explain the variance
- Model complexity
- Training Error < Test Error