tidymodels
Catalina Canizares
tidymodels
By the end of the class, students will have:
A solid understanding of L1 and L2 penalties.
The ability to perform complete Lasso and Ridge analyses using the Tidy Models syntax.
The RSS measures the amount of error remaining between the regression function and the data set after the model has been run.
A smaller RSS represents a regression function that is well-fit to the data.
\[ \text{RSS} = \sum_{i=1}^n\bigg( y_i - \beta_0 - \sum_{j=1}^p \beta_j x_{ij}\bigg)^2 \]
Ridge regression is a regularization technique that adds a penalty term (\(\lambda\)) to the RSS.
It shrinks the regression coefficients towards zero, reducing their impact on the model.
It is also known as the L2 regularization
The formula for Ridge regression is:
\[ \sum_{i=1}^n\bigg( y_i - \beta_0 - \sum_{j=1}^p \beta_j x_{ij}\bigg)^2 + \lambda \sum_{j=1}^p \beta_j^2 \]
When \(\lambda\) is extremely large, then all of the ridge coefficient estimates are basically zero; this corresponds to the null model that contains no predictors.
Lasso (Least Absolute Shrinkage and Selection Operator) is another regularization technique that adds a penalty term (\(\lambda\)) to the RSS
It has a built-in feature selection property, as it can shrink coefficients to exactly zero.
It is also known as the L1 regularization
\[ \sum_{i=1}^n\bigg( y_i - \beta_0 - \sum_{j=1}^p \beta_j x_{ij}\bigg)^2 + \lambda \sum_{j=1}^p | \beta_j| \]
When λ = 0, then the lasso simply gives the least squares fit, and when \(\lambda\) becomes sufficiently large, the lasso gives the null model in which all coefficient estimates equal zero.
Depending on the value of \(\lambda\), the lasso can produce a model involving any number of variables.
Ridge will always include all of the variables in the model, although the magnitude of the coefficient estimates will depend on \(\lambda\)
-The fact that some lasso coefficients are shrunken entirely to zero explains why the lasso performs feature selection
\[ \sum_{i=1}^n\bigg( y_i - \beta_0 - \sum_{j=1}^p \beta_j x_{ij}\bigg)^2 + \lambda \sum_{j=1}^p \beta_j^2 + \lambda \sum_{j=1}^p | \beta_j| \]
The advantage of the elastic net is that it keeps the feature selection quality from the lasso penalty as well as the effectiveness of the ridge penalty.
tidymodels
Determine the key predictors from a comprehensive set of 42 risky behaviors, encompassing alcohol and drug use, as well as reckless activities, with the primary objective of accurately predicting the probability of engaging in texting and driving.
riskyBehaviors
riskyBehaviors_analysis <-
riskyBehaviors |>
mutate(
TextingDriving = case_when(
TextingDriving == 1 ~ 0,
TextingDriving %in% c(2, 3, 4, 5) ~ 1,
TRUE ~ NA
)) |>
mutate(TextingDriving = factor(TextingDriving)) |>
drop_na(TextingDriving) |>
select(-SourceVaping, -SourceAlcohol, -SexOrientation, -DrivingDrinking,
-SexualAbuse, -SexualAbuseByPartner, -Grade, -Age)
<Training/Testing/Total>
<8170/2724/10894>
# 5-fold cross-validation
# A tibble: 5 × 2
splits id
<list> <chr>
1 <split [6536/1634]> Fold1
2 <split [6536/1634]> Fold2
3 <split [6536/1634]> Fold3
4 <split [6536/1634]> Fold4
5 <split [6536/1634]> Fold5
texting_spec <-
logistic_reg(penalty = tune(),
mixture = 1) |>
# mixture = 1 specifies a pure lasso model,
# mixture = 0 specifies a ridge regression model, and
# 0 < mixture < 1 specifies an elastic net model, interpolating lasso and ridge.
# https://parsnip.tidymodels.org/reference/glmnet-details.html
set_engine('glmnet')
texting_spec
Logistic Regression Model Specification (classification)
Main Arguments:
penalty = tune()
mixture = 1
Computational engine: glmnet
══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: logistic_reg()
── Preprocessor ────────────────────────────────────────────────────────────────
5 Recipe Steps
• step_zv()
• step_impute_mode()
• step_impute_mean()
• step_normalize()
• step_dummy()
── Model ───────────────────────────────────────────────────────────────────────
Logistic Regression Model Specification (classification)
Main Arguments:
penalty = tune()
mixture = 1
Computational engine: glmnet
# Used for Dr. Balise Class
glmnet_grid <- data.frame(penalty = 10^seq(-6, -1, length.out = 20))
glmnet_grid
penalty
1 1.000000e-06
2 1.832981e-06
3 3.359818e-06
4 6.158482e-06
5 1.128838e-05
6 2.069138e-05
7 3.792690e-05
8 6.951928e-05
9 1.274275e-04
10 2.335721e-04
11 4.281332e-04
12 7.847600e-04
13 1.438450e-03
14 2.636651e-03
15 4.832930e-03
16 8.858668e-03
17 1.623777e-02
18 2.976351e-02
19 5.455595e-02
20 1.000000e-01
# A tibble: 100 × 7
penalty .metric .estimator mean n std_err .config
<dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
1 1 e-10 accuracy binary 0.579 5 0.00473 Preprocessor1_Model01
2 1 e-10 roc_auc binary 0.612 5 0.00547 Preprocessor1_Model01
3 1.60e-10 accuracy binary 0.579 5 0.00473 Preprocessor1_Model02
4 1.60e-10 roc_auc binary 0.612 5 0.00547 Preprocessor1_Model02
5 2.56e-10 accuracy binary 0.579 5 0.00473 Preprocessor1_Model03
6 2.56e-10 roc_auc binary 0.612 5 0.00547 Preprocessor1_Model03
7 4.09e-10 accuracy binary 0.579 5 0.00473 Preprocessor1_Model04
8 4.09e-10 roc_auc binary 0.612 5 0.00547 Preprocessor1_Model04
9 6.55e-10 accuracy binary 0.579 5 0.00473 Preprocessor1_Model05
10 6.55e-10 roc_auc binary 0.612 5 0.00547 Preprocessor1_Model05
# ℹ 90 more rows
# A tibble: 1 × 2
penalty .config
<dbl> <chr>
1 0.000543 Preprocessor1_Model34
══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: logistic_reg()
── Preprocessor ────────────────────────────────────────────────────────────────
5 Recipe Steps
• step_zv()
• step_impute_mode()
• step_impute_mean()
• step_normalize()
• step_dummy()
── Model ───────────────────────────────────────────────────────────────────────
Logistic Regression Model Specification (classification)
Main Arguments:
penalty = 0.000542867543932386
mixture = 1
Computational engine: glmnet
══ Workflow [trained] ══════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: logistic_reg()
── Preprocessor ────────────────────────────────────────────────────────────────
5 Recipe Steps
• step_zv()
• step_impute_mode()
• step_impute_mean()
• step_normalize()
• step_dummy()
── Model ───────────────────────────────────────────────────────────────────────
Call: glmnet::glmnet(x = maybe_matrix(x), y = y, family = "binomial", alpha = ~1)
Df %Dev Lambda
1 0 0.00 0.065350
2 1 0.21 0.059540
3 1 0.39 0.054250
4 1 0.54 0.049430
5 1 0.66 0.045040
6 1 0.76 0.041040
7 1 0.85 0.037390
8 2 0.93 0.034070
9 4 1.07 0.031040
10 5 1.21 0.028290
11 7 1.36 0.025770
12 9 1.53 0.023480
13 10 1.69 0.021400
14 11 1.83 0.019500
15 11 1.95 0.017760
16 12 2.06 0.016190
17 14 2.15 0.014750
18 14 2.24 0.013440
19 15 2.33 0.012240
20 17 2.41 0.011160
21 20 2.50 0.010170
22 21 2.59 0.009263
23 24 2.67 0.008440
24 24 2.75 0.007690
25 26 2.82 0.007007
26 30 2.91 0.006384
27 31 3.00 0.005817
28 32 3.08 0.005300
29 32 3.14 0.004830
30 36 3.20 0.004401
31 38 3.25 0.004010
32 39 3.30 0.003653
33 39 3.35 0.003329
34 40 3.40 0.003033
35 43 3.43 0.002764
36 43 3.47 0.002518
37 43 3.50 0.002294
38 43 3.53 0.002091
39 43 3.55 0.001905
40 44 3.57 0.001736
41 44 3.58 0.001581
42 44 3.60 0.001441
43 45 3.61 0.001313
44 45 3.62 0.001196
45 45 3.62 0.001090
46 45 3.63 0.000993
...
and 8 more lines.
# A tibble: 8,170 × 4
TextingDriving .pred_class .pred_1 .pred_0
<fct> <fct> <dbl> <dbl>
1 0 1 0.510 0.490
2 1 0 0.437 0.563
3 0 0 0.267 0.733
4 1 0 0.399 0.601
5 1 1 0.505 0.495
6 1 1 0.615 0.385
7 1 1 0.548 0.452
8 1 0 0.460 0.540
9 1 1 0.519 0.481
10 1 1 0.602 0.398
# ℹ 8,160 more rows
# A tibble: 51 × 3
term estimate penalty
<chr> <dbl> <dbl>
1 (Intercept) 0.469 0.000543
2 DrinkingDriver -0.0224 0.000543
3 WeaponCarrying 0.0593 0.000543
4 WeaponCarryingSchool 0.0666 0.000543
5 GunCarrying 0.0995 0.000543
6 UnsafeAtSchool 0.0439 0.000543
7 InjuredInSchool 0.00376 0.000543
8 PhysicalFight -0.0571 0.000543
9 SchoolPhysicalFight -0.0243 0.000543
10 TimesSexualAbuse 0.0484 0.000543
# ℹ 41 more rows
# Resampling results
# Manual resampling
# A tibble: 1 × 6
splits id .metrics .notes .predictions .workflow
<list> <chr> <list> <list> <list> <list>
1 <split [8170/2724]> train/test split <tibble> <tibble> <tibble> <workflow>
# A tibble: 3 × 4
.metric .estimator .estimate .config
<chr> <chr> <dbl> <chr>
1 accuracy binary 0.569 Preprocessor1_Model1
2 kap binary 0.105 Preprocessor1_Model1
3 roc_auc binary 0.599 Preprocessor1_Model1
yardstick
# A tibble: 2,724 × 7
id .pred_class .row .pred_0 .pred_1 TextingDriving .config
<chr> <fct> <int> <dbl> <dbl> <fct> <chr>
1 train/test split 0 3 0.589 0.411 1 Preprocess…
2 train/test split 0 6 0.527 0.473 0 Preprocess…
3 train/test split 1 8 0.492 0.508 0 Preprocess…
4 train/test split 1 13 0.417 0.583 1 Preprocess…
5 train/test split 1 20 0.460 0.540 0 Preprocess…
6 train/test split 0 24 0.545 0.455 1 Preprocess…
7 train/test split 1 28 0.305 0.695 1 Preprocess…
8 train/test split 1 29 0.411 0.589 0 Preprocess…
9 train/test split 1 30 0.221 0.779 1 Preprocess…
10 train/test split 1 32 0.456 0.544 0 Preprocess…
# ℹ 2,714 more rows
yardstick
# A tibble: 4 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 sens binary 0.713
2 spec binary 0.389
3 accuracy binary 0.569
4 kap binary 0.105
library(vip)
texting_last_fit |>
extract_fit_engine() |>
vi() |>
group_by(Sign) |>
slice_max(Importance, n = 5) |>
ungroup() |>
ggplot(aes(Importance, fct_reorder(Variable, Importance), fill = Sign)) +
geom_col() +
facet_wrap(vars(Sign), scales = "free_y") +
labs(y = NULL) +
theme(legend.position = "none")
tidymodels
to perform a ridge regression what should I change in this code?