2024-06-05
Session 2 - Shake It Off: Mastering Data Tables with Taylor Swift’s Tracks © 2024 by Raymond Balise and Catalina Canizares is licensed under CC BY-NC-ND 4.0
This material is freely available under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Some sections are based on content from other presentations, which are credited at the end of this presentation.
For more information on this license, please visit: Creative Commons License
table1
to make tablesgt
and gtsummary
to make complex tablesTables are useful to show the exact values of your data or estimates.
They are not the best solution to show a lot of data or if you want to show the data in a compact space.
They are not usually intended to give a quick, visual representation of data.
gt
, kableExtra
.flextable
, huxtable
dt
, reactable
Not all packages support all R Markdown output formats.
# A tibble: 240 × 29
album_name ep album_release track_number track_name artist featuring
<chr> <lgl> <date> <int> <chr> <chr> <chr>
1 Taylor Swift FALSE 2006-10-24 1 Tim McGraw Taylo… <NA>
2 Taylor Swift FALSE 2006-10-24 2 Picture To Bu… Taylo… <NA>
3 Taylor Swift FALSE 2006-10-24 3 Teardrops On … Taylo… <NA>
4 Taylor Swift FALSE 2006-10-24 4 A Place In Th… Taylo… <NA>
5 Taylor Swift FALSE 2006-10-24 5 Cold As You Taylo… <NA>
6 Taylor Swift FALSE 2006-10-24 6 The Outside Taylo… <NA>
7 Taylor Swift FALSE 2006-10-24 7 Tied Together… Taylo… <NA>
8 Taylor Swift FALSE 2006-10-24 8 Stay Beautiful Taylo… <NA>
9 Taylor Swift FALSE 2006-10-24 9 Should've Sai… Taylo… <NA>
10 Taylor Swift FALSE 2006-10-24 10 Mary's Song (… Taylo… <NA>
# ℹ 230 more rows
# ℹ 22 more variables: bonus_track <lgl>, promotional_release <date>,
# single_release <date>, track_release <date>, danceability <dbl>,
# energy <dbl>, key <int>, loudness <dbl>, mode <int>, speechiness <dbl>,
# acousticness <dbl>, instrumentalness <dbl>, liveness <dbl>, valence <dbl>,
# tempo <dbl>, time_signature <int>, duration_ms <int>, explicit <lgl>,
# key_name <chr>, mode_name <chr>, key_mode <chr>, lyrics <list>
Name | taylor_album_songs |
Number of rows | 240 |
Number of columns | 29 |
_______________________ | |
Column type frequency: | |
character | 7 |
Date | 4 |
list | 1 |
logical | 3 |
numeric | 14 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
album_name | 0 | 1.00 | 5 | 29 | 0 | 11 | 0 |
track_name | 0 | 1.00 | 3 | 68 | 0 | 240 | 0 |
artist | 3 | 0.99 | 12 | 12 | 0 | 1 | 0 |
featuring | 217 | 0.10 | 4 | 33 | 0 | 20 | 0 |
key_name | 3 | 0.99 | 1 | 2 | 0 | 12 | 0 |
mode_name | 3 | 0.99 | 5 | 5 | 0 | 2 | 0 |
key_mode | 3 | 0.99 | 7 | 8 | 0 | 19 | 0 |
Variable type: Date
skim_variable | n_missing | complete_rate | min | max | median | n_unique |
---|---|---|---|---|---|---|
album_release | 0 | 1.00 | 2006-10-24 | 2024-04-19 | 2021-11-12 | 11 |
promotional_release | 228 | 0.05 | 2017-10-20 | 2023-11-29 | 2021-05-31 | 12 |
single_release | 211 | 0.12 | 2006-06-19 | 2024-04-19 | 2020-01-27 | 29 |
track_release | 0 | 1.00 | 2006-06-19 | 2024-04-19 | 2021-11-12 | 27 |
Variable type: list
skim_variable | n_missing | complete_rate | n_unique | min_length | max_length |
---|---|---|---|---|---|
lyrics | 0 | 1 | 238 | 4 | 4 |
Variable type: logical
skim_variable | n_missing | complete_rate | mean | count |
---|---|---|---|---|
ep | 0 | 1.00 | 0.00 | FAL: 240 |
bonus_track | 0 | 1.00 | 0.15 | FAL: 203, TRU: 37 |
explicit | 3 | 0.99 | 0.14 | FAL: 204, TRU: 33 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
track_number | 0 | 1.00 | 11.98 | 7.31 | 1.00 | 6.00 | 11.00 | 17.00 | 31.00 | ▇▇▆▃▁ |
danceability | 3 | 0.99 | 0.58 | 0.12 | 0.29 | 0.50 | 0.59 | 0.65 | 0.90 | ▂▅▇▃▁ |
energy | 3 | 0.99 | 0.56 | 0.18 | 0.13 | 0.42 | 0.56 | 0.70 | 0.93 | ▂▆▇▇▃ |
key | 3 | 0.99 | 4.39 | 3.54 | 0.00 | 1.00 | 4.00 | 7.00 | 11.00 | ▇▂▂▃▃ |
loudness | 3 | 0.99 | -7.77 | 2.78 | -15.49 | -9.73 | -7.38 | -5.77 | -1.91 | ▁▃▆▇▂ |
mode | 3 | 0.99 | 0.90 | 0.30 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▁▁▇ |
speechiness | 3 | 0.99 | 0.06 | 0.05 | 0.02 | 0.03 | 0.04 | 0.06 | 0.52 | ▇▁▁▁▁ |
acousticness | 3 | 0.99 | 0.34 | 0.33 | 0.00 | 0.03 | 0.21 | 0.67 | 0.97 | ▇▂▂▂▃ |
instrumentalness | 3 | 0.99 | 0.00 | 0.03 | 0.00 | 0.00 | 0.00 | 0.00 | 0.33 | ▇▁▁▁▁ |
liveness | 3 | 0.99 | 0.14 | 0.08 | 0.04 | 0.09 | 0.12 | 0.15 | 0.61 | ▇▂▁▁▁ |
valence | 3 | 0.99 | 0.38 | 0.19 | 0.04 | 0.25 | 0.37 | 0.51 | 0.92 | ▅▇▇▃▁ |
tempo | 3 | 0.99 | 124.14 | 31.94 | 68.53 | 96.97 | 119.97 | 148.04 | 208.92 | ▆▇▆▅▁ |
time_signature | 3 | 0.99 | 3.96 | 0.34 | 1.00 | 4.00 | 4.00 | 4.00 | 5.00 | ▁▁▁▇▁ |
duration_ms | 3 | 0.99 | 237577.34 | 47151.74 | 131907.00 | 210240.00 | 233627.00 | 257773.00 | 613027.00 | ▆▇▁▁▁ |
table1
The simplest way to create a nice table (in my opinion)
Overall (N=240) |
|
---|---|
album_name | |
1989 (Taylor's Version) | 23 (9.6%) |
evermore | 17 (7.1%) |
Fearless (Taylor's Version) | 26 (10.8%) |
folklore | 17 (7.1%) |
Lover | 18 (7.5%) |
Midnights | 26 (10.8%) |
Red (Taylor's Version) | 30 (12.5%) |
reputation | 15 (6.3%) |
Speak Now (Taylor's Version) | 22 (9.2%) |
Taylor Swift | 15 (6.3%) |
THE TORTURED POETS DEPARTMENT | 31 (12.9%) |
energy | |
Mean (SD) | 0.560 (0.179) |
Median [Min, Max] | 0.565 [0.131, 0.934] |
Missing | 3 (1.3%) |
danceability | |
Mean (SD) | 0.578 (0.117) |
Median [Min, Max] | 0.589 [0.292, 0.897] |
Missing | 3 (1.3%) |
explicit | |
Yes | 33 (13.8%) |
No | 204 (85.0%) |
Missing | 3 (1.3%) |
table1
Let’s explore which album has the most explicit songs
We have to wrangle the data a bit:
Create the table
FALSE (N=204) |
TRUE (N=33) |
Overall (N=237) |
|
---|---|---|---|
album_name | |||
1989 (Taylor's Version) | 22 (10.8%) | 0 (0%) | 22 (9.3%) |
evermore | 11 (5.4%) | 6 (18.2%) | 17 (7.2%) |
Fearless (Taylor's Version) | 26 (12.7%) | 0 (0%) | 26 (11.0%) |
folklore | 12 (5.9%) | 5 (15.2%) | 17 (7.2%) |
Lover | 18 (8.8%) | 0 (0%) | 18 (7.6%) |
Midnights | 15 (7.4%) | 9 (27.3%) | 24 (10.1%) |
Red (Taylor's Version) | 28 (13.7%) | 2 (6.1%) | 30 (12.7%) |
reputation | 15 (7.4%) | 0 (0%) | 15 (6.3%) |
Speak Now (Taylor's Version) | 22 (10.8%) | 0 (0%) | 22 (9.3%) |
Taylor Swift | 15 (7.4%) | 0 (0%) | 15 (6.3%) |
THE TORTURED POETS DEPARTMENT | 20 (9.8%) | 11 (33.3%) | 31 (13.1%) |
energy | |||
Mean (SD) | 0.572 (0.180) | 0.483 (0.155) | 0.560 (0.179) |
Median [Min, Max] | 0.577 [0.131, 0.934] | 0.462 [0.240, 0.782] | 0.565 [0.131, 0.934] |
danceability | |||
Mean (SD) | 0.577 (0.115) | 0.580 (0.128) | 0.578 (0.117) |
Median [Min, Max] | 0.588 [0.292, 0.897] | 0.604 [0.316, 0.867] | 0.589 [0.292, 0.897] |
gtsummary
taylor_no_na %>%
tbl_summary(
by = explicit_factor,
include = c(album_name, energy, danceability),
label = list(album_name = "Album Name",
energy = "Energy",
danceability = "Danceability"),
percent = "row"
) %>%
add_p() %>%
modify_header(label = "") %>%
modify_caption("**Taylor's explicit albums**")
FALSE, N = 2041 | TRUE, N = 331 | p-value2 | |
---|---|---|---|
Album Name | |||
1989 (Taylor's Version) | 22 (100%) | 0 (0%) | |
evermore | 11 (65%) | 6 (35%) | |
Fearless (Taylor's Version) | 26 (100%) | 0 (0%) | |
folklore | 12 (71%) | 5 (29%) | |
Lover | 18 (100%) | 0 (0%) | |
Midnights | 15 (62%) | 9 (38%) | |
Red (Taylor's Version) | 28 (93%) | 2 (6.7%) | |
reputation | 15 (100%) | 0 (0%) | |
Speak Now (Taylor's Version) | 22 (100%) | 0 (0%) | |
Taylor Swift | 15 (100%) | 0 (0%) | |
THE TORTURED POETS DEPARTMENT | 20 (65%) | 11 (35%) | |
Energy | 0.58 (0.45, 0.71) | 0.46 (0.36, 0.60) | 0.006 |
Danceability | 0.59 (0.50, 0.66) | 0.60 (0.51, 0.65) | 0.8 |
1 n (%); Median (IQR) | |||
2 Wilcoxon rank sum test |
gt
taylor_no_na %>%
mutate(explicit_factor = factor(explicit, labels = c("Not Explicit", "Explicit"))) %>%
tbl_summary(
by = explicit_factor,
include = c(album_name),
label = list(album_name = "Album Name"),
percent = "row",
) %>%
modify_header(label = "") %>%
modify_caption("**Taylor's explicit albums**") %>%
as_gt() %>%
tab_style(
style = cell_text(color = "darkgrey",
align = "right"),
locations = cells_body(
columns = c(stat_1,stat_2)
)) %>%
tab_style(
style = cell_text(color = "#E5446D",
weight = "bold"),
locations = cells_body(
columns = c(stat_2),
rows = label == "Midnights"
)
)
Not Explicit, N = 2041 | Explicit, N = 331 | |
---|---|---|
Album Name | ||
1989 (Taylor's Version) | 22 (100%) | 0 (0%) |
evermore | 11 (65%) | 6 (35%) |
Fearless (Taylor's Version) | 26 (100%) | 0 (0%) |
folklore | 12 (71%) | 5 (29%) |
Lover | 18 (100%) | 0 (0%) |
Midnights | 15 (62%) | 9 (38%) |
Red (Taylor's Version) | 28 (93%) | 2 (6.7%) |
reputation | 15 (100%) | 0 (0%) |
Speak Now (Taylor's Version) | 22 (100%) | 0 (0%) |
Taylor Swift | 15 (100%) | 0 (0%) |
THE TORTURED POETS DEPARTMENT | 20 (65%) | 11 (35%) |
1 n (%) |
Not Explicit, N = 2041 | Explicit, N = 331 | |
---|---|---|
Album Name | ||
1989 (Taylor's Version) | 22 (100%) | 0 (0%) |
evermore | 11 (65%) | 6 (35%) |
Fearless (Taylor's Version) | 26 (100%) | 0 (0%) |
folklore | 12 (71%) | 5 (29%) |
Lover | 18 (100%) | 0 (0%) |
Midnights | 15 (62%) | 9 (38%) |
Red (Taylor's Version) | 28 (93%) | 2 (6.7%) |
reputation | 15 (100%) | 0 (0%) |
Speak Now (Taylor's Version) | 22 (100%) | 0 (0%) |
Taylor Swift | 15 (100%) | 0 (0%) |
THE TORTURED POETS DEPARTMENT | 20 (65%) | 11 (35%) |
1 n (%) |
Slides 3 to 16 are an exact copy of Dr. Ray Balise’s Tables lesson for the BST-623 class.