Session 2 - Shake It Off: Mastering Data Tables with Taylor Swift’s Tracks

Raymond Balise Ph.D. and Catalina Cañizares, Ph.D.

2024-06-05

About This Material

Session 2 - Shake It Off: Mastering Data Tables with Taylor Swift’s Tracks © 2024 by Raymond Balise and Catalina Canizares is licensed under CC BY-NC-ND 4.0

This material is freely available under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Some sections are based on content from other presentations, which are credited at the end of this presentation.

For more information on this license, please visit: Creative Commons License

Objectives for Today

Understand when to use tables vs. graphics
Understand guiding principles for table design
Understand the many tools/packages for making pretty tables in R
Know how to use table1 to make tables
Know how to use gtand gtsummaryto make complex tables

When Dr. Balise was your age…

When I was in school, journal tables (and figures) were monochrome and simple.
Now we have tables that are artfully formatted and have art that is ready for Twitter.
The distinction between tables and figures has completely blurred.

When you were your age…

The state-of-the-art tables are interactive and have embedded graphics.

Source

When should you use a table?

Tables are useful to show the exact values of your data or estimates.
They are not the best solution to show a lot of data or if you want to show the data in a compact space.
They are not usually intended to give a quick, visual representation of data.
- However, the core psychological principles leading to good graphics apply to both charts and tables.
- Read/Study/Live Visualizing Data and The Elements of Graphing Data by William S. Cleveland.

Tufte on Visualizations

One of the visualization thought leaders in the 1980s, Edward Tufte, stressed the importance of erasing all unnecessary ink off the page. The same holds true for tables.
When in doubt, erase.

Gestalt Principles

Gestalt Law of Proximity –> Things close together are automatically grouped
Gestalt Law of Similarity –> Things of the same color will be grouped. If one thing is a different color, it will pop off of the page
Gestalt Law of Closure –> You can draw part of a bounding box and the brain will fill in the rest

Schwabish’s Rules for Better Tables

Schwabish’s Ten Rules for Better Tables + Balise Three

Offset the Heads from the Body
Use Subtle Dividers Rather Than Heavy Gridlines
Right-Align Numbers and Heads
Left-Align Text and Heads
Select the Appropriate Level of Precision
Guide Your Reader with Space between Rows and Columns
Remove Unit Repetition
Highlight Outliers
Group Similar Data and Increase White Space
Add Visualizations When Appropriate
Draw Attention to the Key Point(s)
Use Annotations to Explain the Statistics
Make Captions/Titles Self-Contained

Explain Sample Size, Who and When

What is the Key Take-Away Point in These Data?

Critique PReP Table 1 - Excel Default

Heads ✅
Subtle Dividers ❌
Right-Align Numbers ❌
Left-Align Text ❌
Precision ✅
Space ✅
Remove Repetition ❌
Outliers
Group Similar ✅
Visualizations
Key Point(s) ❌
Annotations
Caption/Title ❌

Critique PReP Table 1 - Also Excel

Heads ✅
Subtle Dividers ✅
Right-Align Numbers ❌
Left-Align Text ✅
Precision ✅
Space ✅
Remove Repetition ❌
Outliers
Group Similar ✅
Visualizations
Key Point(s) ✅
Annotations
Caption/Title ✅

Critique PReP Table 1 - (Version 1.1)

Heads ✅
Subtle Dividers ✅
Right-Align Numbers ❌
Left-Align Text ✅
Precision ✅
Space ✅
Remove Repetition ✅
Outliers
Group Similar ✅
Visualizations
Key Point(s) ✅
Annotations
Caption/Title ✅

I wish there was a “best” tool.

You have a lot of package options for making static (i.e., Word, PDF or HTML) and dynamic (i.e., HTML) publication-ready tables.
All packages make web-friendly tables.
Others make static graphics that are beautifully formatted for the the web: gt, kableExtra.
Yet others make static tables that look great in Word: flextable, huxtable
Some are ideal for interactive web content: dt, reactable

Output for Packages that Make Tables

Not all packages support all R Markdown output formats.

Let’s create the tables

The data

# install.packages("taylor")
library(taylor)
taylor_album_songs

# A tibble: 240 × 29
   album_name   ep    album_release track_number track_name     artist featuring
   <chr>        <lgl> <date>               <int> <chr>          <chr>  <chr>    
 1 Taylor Swift FALSE 2006-10-24               1 Tim McGraw     Taylo… <NA>     
 2 Taylor Swift FALSE 2006-10-24               2 Picture To Bu… Taylo… <NA>     
 3 Taylor Swift FALSE 2006-10-24               3 Teardrops On … Taylo… <NA>     
 4 Taylor Swift FALSE 2006-10-24               4 A Place In Th… Taylo… <NA>     
 5 Taylor Swift FALSE 2006-10-24               5 Cold As You    Taylo… <NA>     
 6 Taylor Swift FALSE 2006-10-24               6 The Outside    Taylo… <NA>     
 7 Taylor Swift FALSE 2006-10-24               7 Tied Together… Taylo… <NA>     
 8 Taylor Swift FALSE 2006-10-24               8 Stay Beautiful Taylo… <NA>     
 9 Taylor Swift FALSE 2006-10-24               9 Should've Sai… Taylo… <NA>     
10 Taylor Swift FALSE 2006-10-24              10 Mary's Song (… Taylo… <NA>     
# ℹ 230 more rows
# ℹ 22 more variables: bonus_track <lgl>, promotional_release <date>,
#   single_release <date>, track_release <date>, danceability <dbl>,
#   energy <dbl>, key <int>, loudness <dbl>, mode <int>, speechiness <dbl>,
#   acousticness <dbl>, instrumentalness <dbl>, liveness <dbl>, valence <dbl>,
#   tempo <dbl>, time_signature <int>, duration_ms <int>, explicit <lgl>,
#   key_name <chr>, mode_name <chr>, key_mode <chr>, lyrics <list>

The data

skim(taylor_album_songs)

Data summary
Name	taylor_album_songs
Number of rows	240
Number of columns	29
_______________________
Column type frequency:
character	7
Date	4
list	1
logical	3
numeric	14
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
album_name	0	1.00	5	29	11
track_name	0	1.00	3	68	240
artist	3	0.99	12	12	1
featuring	217	0.10	4	33	20
key_name	3	0.99	1	2	12
mode_name	3	0.99	5	5	2
key_mode	3	0.99	7	8	19

Variable type: Date

skim_variable	n_missing	complete_rate	min	max	median	n_unique
album_release	0	1.00	2006-10-24	2024-04-19	2021-11-12	11
promotional_release	228	0.05	2017-10-20	2023-11-29	2021-05-31	12
single_release	211	0.12	2006-06-19	2024-04-19	2020-01-27	29
track_release	0	1.00	2006-06-19	2024-04-19	2021-11-12	27

Variable type: list

skim_variable	n_missing	complete_rate	n_unique	min_length	max_length
lyrics	0	1	238	4	4

Variable type: logical

skim_variable	n_missing	complete_rate	mean	count
ep	0	1.00	0.00	FAL: 240
bonus_track	0	1.00	0.15	FAL: 203, TRU: 37
explicit	3	0.99	0.14	FAL: 204, TRU: 33

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
track_number	0	1.00	11.98	7.31	1.00	6.00	11.00	17.00	31.00	▇▇▆▃▁
danceability	3	0.99	0.58	0.12	0.29	0.50	0.59	0.65	0.90	▂▅▇▃▁
energy	3	0.99	0.56	0.18	0.13	0.42	0.56	0.70	0.93	▂▆▇▇▃
key	3	0.99	4.39	3.54	0.00	1.00	4.00	7.00	11.00	▇▂▂▃▃
loudness	3	0.99	-7.77	2.78	-15.49	-9.73	-7.38	-5.77	-1.91	▁▃▆▇▂
mode	3	0.99	0.90	0.30	0.00	1.00	1.00	1.00	1.00	▁▁▁▁▇
speechiness	3	0.99	0.06	0.05	0.02	0.03	0.04	0.06	0.52	▇▁▁▁▁
acousticness	3	0.99	0.34	0.33	0.00	0.03	0.21	0.67	0.97	▇▂▂▂▃
instrumentalness	3	0.99	0.00	0.03	0.00	0.00	0.00	0.00	0.33	▇▁▁▁▁
liveness	3	0.99	0.14	0.08	0.04	0.09	0.12	0.15	0.61	▇▂▁▁▁
valence	3	0.99	0.38	0.19	0.04	0.25	0.37	0.51	0.92	▅▇▇▃▁
tempo	3	0.99	124.14	31.94	68.53	96.97	119.97	148.04	208.92	▆▇▆▅▁
time_signature	3	0.99	3.96	0.34	1.00	4.00	4.00	4.00	5.00	▁▁▁▇▁
duration_ms	3	0.99	237577.34	47151.74	131907.00	210240.00	233627.00	257773.00	613027.00	▆▇▁▁▁

Data dictionary

`table1`

The simplest way to create a nice table (in my opinion)

library(table1)

table1(~ album_name + energy + danceability + 
         explicit, data = taylor_album_songs)

	Overall (N=240)
album_name
1989 (Taylor's Version)	23 (9.6%)
evermore	17 (7.1%)
Fearless (Taylor's Version)	26 (10.8%)
folklore	17 (7.1%)
Lover	18 (7.5%)
Midnights	26 (10.8%)
Red (Taylor's Version)	30 (12.5%)
reputation	15 (6.3%)
Speak Now (Taylor's Version)	22 (9.2%)
Taylor Swift	15 (6.3%)
THE TORTURED POETS DEPARTMENT	31 (12.9%)
energy
Mean (SD)	0.560 (0.179)
Median [Min, Max]	0.565 [0.131, 0.934]
Missing	3 (1.3%)
danceability
Mean (SD)	0.578 (0.117)
Median [Min, Max]	0.589 [0.292, 0.897]
Missing	3 (1.3%)
explicit
Yes	33 (13.8%)
No	204 (85.0%)
Missing	3 (1.3%)

`table1`

Let’s explore which album has the most explicit songs

We have to wrangle the data a bit:

taylor_no_na <- 
  taylor_album_songs %>% 
  filter(!is.na(explicit)) %>% 
  mutate(explicit_factor = factor(explicit))

Create the table

table1(~ album_name + energy + 
         danceability | explicit_factor, 
       caption = "Taylor's explicit albums",
       data = taylor_no_na)

Taylor's explicit albums
	FALSE (N=204)	TRUE (N=33)	Overall (N=237)
album_name
1989 (Taylor's Version)	22 (10.8%)	0 (0%)	22 (9.3%)
evermore	11 (5.4%)	6 (18.2%)	17 (7.2%)
Fearless (Taylor's Version)	26 (12.7%)	0 (0%)	26 (11.0%)
folklore	12 (5.9%)	5 (15.2%)	17 (7.2%)
Lover	18 (8.8%)	0 (0%)	18 (7.6%)
Midnights	15 (7.4%)	9 (27.3%)	24 (10.1%)
Red (Taylor's Version)	28 (13.7%)	2 (6.1%)	30 (12.7%)
reputation	15 (7.4%)	0 (0%)	15 (6.3%)
Speak Now (Taylor's Version)	22 (10.8%)	0 (0%)	22 (9.3%)
Taylor Swift	15 (7.4%)	0 (0%)	15 (6.3%)
THE TORTURED POETS DEPARTMENT	20 (9.8%)	11 (33.3%)	31 (13.1%)
energy
Mean (SD)	0.572 (0.180)	0.483 (0.155)	0.560 (0.179)
Median [Min, Max]	0.577 [0.131, 0.934]	0.462 [0.240, 0.782]	0.565 [0.131, 0.934]
danceability
Mean (SD)	0.577 (0.115)	0.580 (0.128)	0.578 (0.117)
Median [Min, Max]	0.588 [0.292, 0.897]	0.604 [0.316, 0.867]	0.589 [0.292, 0.897]

Can we make it prettier?

`gtsummary`

taylor_no_na %>%
  tbl_summary(
    by = explicit_factor, 
    include = c(album_name, energy, danceability), 
    label = list(album_name = "Album Name",
                 energy = "Energy",
                 danceability = "Danceability"),
    percent = "row"
    
  ) %>%
  add_p() %>%
  modify_header(label = "") %>% 
  modify_caption("**Taylor's explicit albums**")

**Taylor’s explicit albums**
	FALSE, N = 204¹	TRUE, N = 33¹	p-value²
Album Name
1989 (Taylor's Version)	22 (100%)	0 (0%)
evermore	11 (65%)	6 (35%)
Fearless (Taylor's Version)	26 (100%)	0 (0%)
folklore	12 (71%)	5 (29%)
Lover	18 (100%)	0 (0%)
Midnights	15 (62%)	9 (38%)
Red (Taylor's Version)	28 (93%)	2 (6.7%)
reputation	15 (100%)	0 (0%)
Speak Now (Taylor's Version)	22 (100%)	0 (0%)
Taylor Swift	15 (100%)	0 (0%)
THE TORTURED POETS DEPARTMENT	20 (65%)	11 (35%)
Energy	0.58 (0.45, 0.71)	0.46 (0.36, 0.60)	0.006
Danceability	0.59 (0.50, 0.66)	0.60 (0.51, 0.65)	0.8
¹ n (%); Median (IQR)
² Wilcoxon rank sum test

And better?…

`gt`

taylor_no_na %>%
  mutate(explicit_factor = factor(explicit, labels = c("Not Explicit", "Explicit"))) %>% 
  tbl_summary(
    by = explicit_factor, 
    include = c(album_name), 
    label = list(album_name = "Album Name"), 
     percent = "row", 
  ) %>%
  modify_header(label = "") %>% 
  modify_caption("**Taylor's explicit albums**") %>% 
  as_gt() %>% 
  tab_style(
    style = cell_text(color = "darkgrey", 
                      align = "right"),
    locations = cells_body(
      columns = c(stat_1,stat_2)
    )) %>%
  tab_style(
    style = cell_text(color = "#E5446D", 
                      weight = "bold"),
    locations = cells_body(
      columns = c(stat_2),
      rows = label == "Midnights"
    )
  )

**Taylor’s explicit albums**
	Not Explicit, N = 204¹	Explicit, N = 33¹
Album Name
1989 (Taylor's Version)	22 (100%)	0 (0%)
evermore	11 (65%)	6 (35%)
Fearless (Taylor's Version)	26 (100%)	0 (0%)
folklore	12 (71%)	5 (29%)
Lover	18 (100%)	0 (0%)
Midnights	15 (62%)	9 (38%)
Red (Taylor's Version)	28 (93%)	2 (6.7%)
reputation	15 (100%)	0 (0%)
Speak Now (Taylor's Version)	22 (100%)	0 (0%)
Taylor Swift	15 (100%)	0 (0%)
THE TORTURED POETS DEPARTMENT	20 (65%)	11 (35%)
¹ n (%)

Critique Taylor Swift Table

**Taylor’s explicit albums**
	Not Explicit, N = 204¹	Explicit, N = 33¹
Album Name
1989 (Taylor's Version)	22 (100%)	0 (0%)
evermore	11 (65%)	6 (35%)
Fearless (Taylor's Version)	26 (100%)	0 (0%)
folklore	12 (71%)	5 (29%)
Lover	18 (100%)	0 (0%)
Midnights	15 (62%)	9 (38%)
Red (Taylor's Version)	28 (93%)	2 (6.7%)
reputation	15 (100%)	0 (0%)
Speak Now (Taylor's Version)	22 (100%)	0 (0%)
Taylor Swift	15 (100%)	0 (0%)
THE TORTURED POETS DEPARTMENT	20 (65%)	11 (35%)
¹ n (%)

Heads ✅
Subtle Dividers ✅
Right-Align Numbers ✅
Left-Align Text ✅
Precision ✅
Space ✅
Remove Repetition ❌
Outliers
Group Similar ✅
Visualizations
Key Point(s) ✅
Annotations ✅
Caption/Title ✅

Credit

Slides 3 to 16 are an exact copy of Dr. Ray Balise’s Tables lesson for the BST-623 class.

Session 2 - Shake It Off: Mastering Data Tables with Taylor Swift’s Tracks

About This Material

Objectives for Today

When Dr. Balise was your age…

When you were your age…

When should you use a table?

Tufte on Visualizations

Gestalt Principles

Schwabish’s Rules for Better Tables

Schwabish’s Ten Rules for Better Tables + Balise Three

What is the Key Take-Away Point in These Data?

Critique PReP Table 1 - Excel Default

Critique PReP Table 1 - Also Excel

Critique PReP Table 1 - (Version 1.1)

I wish there was a “best” tool.

Output for Packages that Make Tables

Let’s create the tables

The data

The data

Data dictionary

table1

table1

Can we make it prettier?

gtsummary

And better?…

gt

Critique Taylor Swift Table

Credit

The end…

`table1`

`table1`

`gtsummary`

`gt`