Session 2 - Shake It Off: Mastering Data Tables with Taylor Swift’s Tracks

Raymond Balise Ph.D. and Catalina Cañizares, Ph.D.

2024-06-05

About This Material

Session 2 - Shake It Off: Mastering Data Tables with Taylor Swift’s Tracks © 2024 by Raymond Balise and Catalina Canizares is licensed under CC BY-NC-ND 4.0

This material is freely available under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Some sections are based on content from other presentations, which are credited at the end of this presentation.

For more information on this license, please visit: Creative Commons License

Objectives for Today

  • Understand when to use tables vs. graphics
  • Understand guiding principles for table design
  • Understand the many tools/packages for making pretty tables in R
  • Know how to use table1 to make tables
  • Know how to use gtand gtsummaryto make complex tables

When Dr. Balise was your age…

  • When I was in school, journal tables (and figures) were monochrome and simple.
  • Now we have tables that are artfully formatted and have art that is ready for Twitter.
  • The distinction between tables and figures has completely blurred.

When you were your age…

  • The state-of-the-art tables are interactive and have embedded graphics.

When should you use a table?

  • Tables are useful to show the exact values of your data or estimates.

  • They are not the best solution to show a lot of data or if you want to show the data in a compact space.

  • They are not usually intended to give a quick, visual representation of data.

    • However, the core psychological principles leading to good graphics apply to both charts and tables.
    • Read/Study/Live Visualizing Data and The Elements of Graphing Data by William S. Cleveland.

Tufte on Visualizations

  • One of the visualization thought leaders in the 1980s, Edward Tufte, stressed the importance of erasing all unnecessary ink off the page. The same holds true for tables.
  • When in doubt, erase.

Gestalt Principles

  • Gestalt Law of Proximity –> Things close together are automatically grouped
  • Gestalt Law of Similarity –> Things of the same color will be grouped. If one thing is a different color, it will pop off of the page
  • Gestalt Law of Closure –> You can draw part of a bounding box and the brain will fill in the rest

Schwabish’s Rules for Better Tables

Schwabish’s Ten Rules for Better Tables + Balise Three

  1. Offset the Heads from the Body
  2. Use Subtle Dividers Rather Than Heavy Gridlines
  3. Right-Align Numbers and Heads
  4. Left-Align Text and Heads
  5. Select the Appropriate Level of Precision
  6. Guide Your Reader with Space between Rows and Columns
  7. Remove Unit Repetition
  8. Highlight Outliers
  9. Group Similar Data and Increase White Space
  10. Add Visualizations When Appropriate
  11. Draw Attention to the Key Point(s)
  12. Use Annotations to Explain the Statistics
  13. Make Captions/Titles Self-Contained
  • Explain Sample Size, Who and When

What is the Key Take-Away Point in These Data?

Critique PReP Table 1 - Excel Default

  1. Heads ✅
  2. Subtle Dividers ❌
  3. Right-Align Numbers ❌
  4. Left-Align Text ❌
  5. Precision ✅
  6. Space ✅
  7. Remove Repetition ❌
  8. Outliers
  9. Group Similar ✅
  10. Visualizations
  11. Key Point(s) ❌
  12. Annotations
  13. Caption/Title ❌

Critique PReP Table 1 - Also Excel

  1. Heads ✅
  2. Subtle Dividers ✅
  3. Right-Align Numbers ❌
  4. Left-Align Text ✅
  5. Precision ✅
  6. Space ✅
  7. Remove Repetition ❌
  8. Outliers
  9. Group Similar ✅
  10. Visualizations
  11. Key Point(s) ✅
  12. Annotations
  13. Caption/Title ✅

Critique PReP Table 1 - (Version 1.1)

  1. Heads ✅
  2. Subtle Dividers ✅
  3. Right-Align Numbers ❌
  4. Left-Align Text ✅
  5. Precision ✅
  6. Space ✅
  7. Remove Repetition ✅
  8. Outliers
  9. Group Similar ✅
  10. Visualizations
  11. Key Point(s) ✅
  12. Annotations
  13. Caption/Title ✅

I wish there was a “best” tool.

  • You have a lot of package options for making static (i.e., Word, PDF or HTML) and dynamic (i.e., HTML) publication-ready tables.
  • All packages make web-friendly tables.
  • Others make static graphics that are beautifully formatted for the the web: gt, kableExtra.
  • Yet others make static tables that look great in Word: flextable, huxtable
  • Some are ideal for interactive web content: dt, reactable

Output for Packages that Make Tables

Not all packages support all R Markdown output formats.

Let’s create the tables

The data

# install.packages("taylor")
library(taylor)
taylor_album_songs 
# A tibble: 240 × 29
   album_name   ep    album_release track_number track_name     artist featuring
   <chr>        <lgl> <date>               <int> <chr>          <chr>  <chr>    
 1 Taylor Swift FALSE 2006-10-24               1 Tim McGraw     Taylo… <NA>     
 2 Taylor Swift FALSE 2006-10-24               2 Picture To Bu… Taylo… <NA>     
 3 Taylor Swift FALSE 2006-10-24               3 Teardrops On … Taylo… <NA>     
 4 Taylor Swift FALSE 2006-10-24               4 A Place In Th… Taylo… <NA>     
 5 Taylor Swift FALSE 2006-10-24               5 Cold As You    Taylo… <NA>     
 6 Taylor Swift FALSE 2006-10-24               6 The Outside    Taylo… <NA>     
 7 Taylor Swift FALSE 2006-10-24               7 Tied Together… Taylo… <NA>     
 8 Taylor Swift FALSE 2006-10-24               8 Stay Beautiful Taylo… <NA>     
 9 Taylor Swift FALSE 2006-10-24               9 Should've Sai… Taylo… <NA>     
10 Taylor Swift FALSE 2006-10-24              10 Mary's Song (… Taylo… <NA>     
# ℹ 230 more rows
# ℹ 22 more variables: bonus_track <lgl>, promotional_release <date>,
#   single_release <date>, track_release <date>, danceability <dbl>,
#   energy <dbl>, key <int>, loudness <dbl>, mode <int>, speechiness <dbl>,
#   acousticness <dbl>, instrumentalness <dbl>, liveness <dbl>, valence <dbl>,
#   tempo <dbl>, time_signature <int>, duration_ms <int>, explicit <lgl>,
#   key_name <chr>, mode_name <chr>, key_mode <chr>, lyrics <list>

The data

skim(taylor_album_songs)
Data summary
Name taylor_album_songs
Number of rows 240
Number of columns 29
_______________________
Column type frequency:
character 7
Date 4
list 1
logical 3
numeric 14
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
album_name 0 1.00 5 29 0 11 0
track_name 0 1.00 3 68 0 240 0
artist 3 0.99 12 12 0 1 0
featuring 217 0.10 4 33 0 20 0
key_name 3 0.99 1 2 0 12 0
mode_name 3 0.99 5 5 0 2 0
key_mode 3 0.99 7 8 0 19 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
album_release 0 1.00 2006-10-24 2024-04-19 2021-11-12 11
promotional_release 228 0.05 2017-10-20 2023-11-29 2021-05-31 12
single_release 211 0.12 2006-06-19 2024-04-19 2020-01-27 29
track_release 0 1.00 2006-06-19 2024-04-19 2021-11-12 27

Variable type: list

skim_variable n_missing complete_rate n_unique min_length max_length
lyrics 0 1 238 4 4

Variable type: logical

skim_variable n_missing complete_rate mean count
ep 0 1.00 0.00 FAL: 240
bonus_track 0 1.00 0.15 FAL: 203, TRU: 37
explicit 3 0.99 0.14 FAL: 204, TRU: 33

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
track_number 0 1.00 11.98 7.31 1.00 6.00 11.00 17.00 31.00 ▇▇▆▃▁
danceability 3 0.99 0.58 0.12 0.29 0.50 0.59 0.65 0.90 ▂▅▇▃▁
energy 3 0.99 0.56 0.18 0.13 0.42 0.56 0.70 0.93 ▂▆▇▇▃
key 3 0.99 4.39 3.54 0.00 1.00 4.00 7.00 11.00 ▇▂▂▃▃
loudness 3 0.99 -7.77 2.78 -15.49 -9.73 -7.38 -5.77 -1.91 ▁▃▆▇▂
mode 3 0.99 0.90 0.30 0.00 1.00 1.00 1.00 1.00 ▁▁▁▁▇
speechiness 3 0.99 0.06 0.05 0.02 0.03 0.04 0.06 0.52 ▇▁▁▁▁
acousticness 3 0.99 0.34 0.33 0.00 0.03 0.21 0.67 0.97 ▇▂▂▂▃
instrumentalness 3 0.99 0.00 0.03 0.00 0.00 0.00 0.00 0.33 ▇▁▁▁▁
liveness 3 0.99 0.14 0.08 0.04 0.09 0.12 0.15 0.61 ▇▂▁▁▁
valence 3 0.99 0.38 0.19 0.04 0.25 0.37 0.51 0.92 ▅▇▇▃▁
tempo 3 0.99 124.14 31.94 68.53 96.97 119.97 148.04 208.92 ▆▇▆▅▁
time_signature 3 0.99 3.96 0.34 1.00 4.00 4.00 4.00 5.00 ▁▁▁▇▁
duration_ms 3 0.99 237577.34 47151.74 131907.00 210240.00 233627.00 257773.00 613027.00 ▆▇▁▁▁

Data dictionary

table1

The simplest way to create a nice table (in my opinion)

library(table1)

table1(~ album_name + energy + danceability + 
         explicit, data = taylor_album_songs)
Overall
(N=240)
album_name
1989 (Taylor's Version) 23 (9.6%)
evermore 17 (7.1%)
Fearless (Taylor's Version) 26 (10.8%)
folklore 17 (7.1%)
Lover 18 (7.5%)
Midnights 26 (10.8%)
Red (Taylor's Version) 30 (12.5%)
reputation 15 (6.3%)
Speak Now (Taylor's Version) 22 (9.2%)
Taylor Swift 15 (6.3%)
THE TORTURED POETS DEPARTMENT 31 (12.9%)
energy
Mean (SD) 0.560 (0.179)
Median [Min, Max] 0.565 [0.131, 0.934]
Missing 3 (1.3%)
danceability
Mean (SD) 0.578 (0.117)
Median [Min, Max] 0.589 [0.292, 0.897]
Missing 3 (1.3%)
explicit
Yes 33 (13.8%)
No 204 (85.0%)
Missing 3 (1.3%)

table1

Let’s explore which album has the most explicit songs

We have to wrangle the data a bit:

taylor_no_na <- 
  taylor_album_songs %>% 
  filter(!is.na(explicit)) %>% 
  mutate(explicit_factor = factor(explicit))

Create the table

table1(~ album_name + energy + 
         danceability | explicit_factor, 
       caption = "Taylor's explicit albums",
       data = taylor_no_na)
Taylor's explicit albums
FALSE
(N=204)
TRUE
(N=33)
Overall
(N=237)
album_name
1989 (Taylor's Version) 22 (10.8%) 0 (0%) 22 (9.3%)
evermore 11 (5.4%) 6 (18.2%) 17 (7.2%)
Fearless (Taylor's Version) 26 (12.7%) 0 (0%) 26 (11.0%)
folklore 12 (5.9%) 5 (15.2%) 17 (7.2%)
Lover 18 (8.8%) 0 (0%) 18 (7.6%)
Midnights 15 (7.4%) 9 (27.3%) 24 (10.1%)
Red (Taylor's Version) 28 (13.7%) 2 (6.1%) 30 (12.7%)
reputation 15 (7.4%) 0 (0%) 15 (6.3%)
Speak Now (Taylor's Version) 22 (10.8%) 0 (0%) 22 (9.3%)
Taylor Swift 15 (7.4%) 0 (0%) 15 (6.3%)
THE TORTURED POETS DEPARTMENT 20 (9.8%) 11 (33.3%) 31 (13.1%)
energy
Mean (SD) 0.572 (0.180) 0.483 (0.155) 0.560 (0.179)
Median [Min, Max] 0.577 [0.131, 0.934] 0.462 [0.240, 0.782] 0.565 [0.131, 0.934]
danceability
Mean (SD) 0.577 (0.115) 0.580 (0.128) 0.578 (0.117)
Median [Min, Max] 0.588 [0.292, 0.897] 0.604 [0.316, 0.867] 0.589 [0.292, 0.897]

Can we make it prettier?

gtsummary

taylor_no_na %>%
  tbl_summary(
    by = explicit_factor, 
    include = c(album_name, energy, danceability), 
    label = list(album_name = "Album Name",
                 energy = "Energy",
                 danceability = "Danceability"),
    percent = "row"
    
  ) %>%
  add_p() %>%
  modify_header(label = "") %>% 
  modify_caption("**Taylor's explicit albums**")
Taylor’s explicit albums
FALSE, N = 2041 TRUE, N = 331 p-value2
Album Name


    1989 (Taylor's Version) 22 (100%) 0 (0%)
    evermore 11 (65%) 6 (35%)
    Fearless (Taylor's Version) 26 (100%) 0 (0%)
    folklore 12 (71%) 5 (29%)
    Lover 18 (100%) 0 (0%)
    Midnights 15 (62%) 9 (38%)
    Red (Taylor's Version) 28 (93%) 2 (6.7%)
    reputation 15 (100%) 0 (0%)
    Speak Now (Taylor's Version) 22 (100%) 0 (0%)
    Taylor Swift 15 (100%) 0 (0%)
    THE TORTURED POETS DEPARTMENT 20 (65%) 11 (35%)
Energy 0.58 (0.45, 0.71) 0.46 (0.36, 0.60) 0.006
Danceability 0.59 (0.50, 0.66) 0.60 (0.51, 0.65) 0.8
1 n (%); Median (IQR)
2 Wilcoxon rank sum test

And better?…

gt

taylor_no_na %>%
  mutate(explicit_factor = factor(explicit, labels = c("Not Explicit", "Explicit"))) %>% 
  tbl_summary(
    by = explicit_factor, 
    include = c(album_name), 
    label = list(album_name = "Album Name"), 
     percent = "row", 
  ) %>%
  modify_header(label = "") %>% 
  modify_caption("**Taylor's explicit albums**") %>% 
  as_gt() %>% 
  tab_style(
    style = cell_text(color = "darkgrey", 
                      align = "right"),
    locations = cells_body(
      columns = c(stat_1,stat_2)
    )) %>%
  tab_style(
    style = cell_text(color = "#E5446D", 
                      weight = "bold"),
    locations = cells_body(
      columns = c(stat_2),
      rows = label == "Midnights"
    )
  )
Taylor’s explicit albums
Not Explicit, N = 2041 Explicit, N = 331
Album Name

    1989 (Taylor's Version) 22 (100%) 0 (0%)
    evermore 11 (65%) 6 (35%)
    Fearless (Taylor's Version) 26 (100%) 0 (0%)
    folklore 12 (71%) 5 (29%)
    Lover 18 (100%) 0 (0%)
    Midnights 15 (62%) 9 (38%)
    Red (Taylor's Version) 28 (93%) 2 (6.7%)
    reputation 15 (100%) 0 (0%)
    Speak Now (Taylor's Version) 22 (100%) 0 (0%)
    Taylor Swift 15 (100%) 0 (0%)
    THE TORTURED POETS DEPARTMENT 20 (65%) 11 (35%)
1 n (%)

Critique Taylor Swift Table

Taylor’s explicit albums
Not Explicit, N = 2041 Explicit, N = 331
Album Name

    1989 (Taylor's Version) 22 (100%) 0 (0%)
    evermore 11 (65%) 6 (35%)
    Fearless (Taylor's Version) 26 (100%) 0 (0%)
    folklore 12 (71%) 5 (29%)
    Lover 18 (100%) 0 (0%)
    Midnights 15 (62%) 9 (38%)
    Red (Taylor's Version) 28 (93%) 2 (6.7%)
    reputation 15 (100%) 0 (0%)
    Speak Now (Taylor's Version) 22 (100%) 0 (0%)
    Taylor Swift 15 (100%) 0 (0%)
    THE TORTURED POETS DEPARTMENT 20 (65%) 11 (35%)
1 n (%)
  1. Heads ✅
  2. Subtle Dividers ✅
  3. Right-Align Numbers ✅
  4. Left-Align Text ✅
  5. Precision ✅
  6. Space ✅
  7. Remove Repetition ❌
  8. Outliers
  9. Group Similar ✅
  10. Visualizations
  11. Key Point(s) ✅
  12. Annotations ✅
  13. Caption/Title ✅

Credit

Slides 3 to 16 are an exact copy of Dr. Ray Balise’s Tables lesson for the BST-623 class.

The end…