Introduction to Machine Learning

Catalina Canizares


  1. Types of Models
  2. Getting the terms right
  3. Machine Learning vs. Traditional Statistics
  4. The Tradeoff
  5. Types of Machine Learning Models
  6. The process to Model
  7. How do we spend our data

Types of Models

  • Describe or illustrate characteristics of some data.
  • No other purpose than to visually emphasize some trend or artifact in the data.
  • Explore a specific hypothesis: statistical tests are used. An inferential model starts with a predefined conjecture or idea about a population and produces a statistical conclusion such as an interval estimate or the rejection of a hypothesis.

  • For example, the goal of a clinical trial might be to provide confirmation that a new therapy does a better job in prolonging life

  • Produce the most accurate prediction possible Predicted values have the highest possible fidelity to the true value of the new data.

Artificial Intelligence? Machine Learning? What?

  • Artificial intelligence is the name of a whole knowledge field.

  • Machine Learningis a part of artificial intelligence.

  • Neural Networks are one of machine learning types.

A little of History

Supervised vs Unsupervised Learning

Algorithms to analyze and cluster unlabeled datasets

  • Clustering: groups unlabeled data based on their similarities or differences

  • Dimensionality reduction: Principal component analysis

Use of labeled datasets to train algorithms that classify data or predict outcomes accurately

There is a “y” or outcome variable.

Popular algorithms:

  • Naive Bayes.
  • Decision Trees.
  • Logistic Regression.

How is it used in our daily lifes?

Machine Learning vs. “Traditional Statistics”?

  • Care about variability.
  • Care about defining the range of normal values across samples (the standard error).
  • Focus on estimating betas

\[{y} = \alpha + \beta_1x_1 + \beta_2x_2 + \dots + \beta_nx_n + \epsilon\]

  • Care about prediction.

  • Focus on estimating y-hat.

    \[\hat{y} = \hat{f}(x_1)\]

  • \(\hat{y}\) Represents the resulting prediction for \(Y\).

  • \(\hat{f}\) Represents the estimate for \(f\), which is often treated as a blackbox (No one is concerned with the exact form of \(\hat{f}\), provided that it yields accurate predictions for \(Y\)) Introduction to Statistical Learning

Machine Learning vs. “Traditional Statistics”

Tradeoff Bias/Variance

  • “When we discuss prediction models, prediction errors can be decomposed into two main subcomponents we care about: error due to ”bias” and error due to ”variance”.” (Fortman-Roe, 2012)

  • “There is a tradeoff between a model’s ability to minimize bias and variance.” (Fortman-Roe, 2012)


The error due to bias is taken as the difference between the expected (or average) prediction of our model and the correct value which we are trying to predict.


The error due to variance is taken as the variability of a model prediction for a given data point.
- The variance is how much the predictions for a given point vary between different realizations of the model.

Graphical Representation of the Tradeoff

An Example with Data


Final Remarks About Bias/Variance

What are the different models?

So Where Do We Start?

How do we Spend our Data?

For machine learning, we split data into training and test sets:

  1. The training set is used to estimate model parameters.

  2. The testing set is used to find an independent assessment of model performance.

🚫 CAUTION: Do not use the test set during training.


Let’s take a look


Resampling Methods

They are a tool consisting in repeatedly drawing samples from a dataset and calculating statistics and metrics on each of those samples.


This approach involves randomly dividing the set of observations into k folds of nearly equal size. The first fold is treated as a validation set and the model is fit on the remaining folds.

Leave one out

Only one observation is used for validation and the rest is used to fit the model.


Tunning Hyperparameters

Method Hyperparameter Description
Lasso lambda Regularization strength
KNN n_neighbors Number of neighbors to consider
KNN weights Weight function used in prediction: “uniform” or “distance”
Trees max_depth Maximum depth of the tree
Trees min_samples_split Minimum number of samples required to split an internal node
Trees min_samples_leaf Minimum number of samples required to be at a leaf node
Trees max_features Number of features to consider when looking for the best split
Random Forest n_estimators Number of decision trees in the forest
Random Forest max_depth Maximum depth of the decision trees
Random Forest min_samples_split Minimum number of samples required to split an internal node
Random Forest min_samples_leaf Minimum number of samples required to be at a leaf node
Random Forest max_features Number of features to consider when looking for the best split

The Actual Process:

  1. Collect Data
  2. Data exploration and preparation.
  3. Model training
  4. Model evaluation (Don’t PANIC will cover this next session).
  • Look at RMSE or contingency table statistics (accuracy, sensitivity, specificity, etc)
  1. Model improvement
  • Tweak preparation, reparametrize a method or use a different method
  1. Use the test data to evaluate the final model.
  2. Share/Publish results

The actual Process - As an image

How to implement all of these?