Session 2 - `ggplot2`

Catalina Cañizares PhD

Francisco Cardozo MSc

Raymond Balise PhD

June 5, 2024

About This Material

Session 2 - ggplot2 © 2024 by Catalina Canizares, Francisco Cardozo, and Raymond Balise is licensed under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

This material is freely available under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

For more information on this license, please visit: Creative Commons License

For whom is this workshop?

Agenda

Why should we care?

I have 4 different data sets with 11 observations each and 2 variables:

Table

Lets look at them:

Importance of Data Viz

Uncovering Patterns: Data visualization helps in identifying and understanding complex patterns and relationships in data.
Simplifying the Complex: It transforms intricate data into a format that’s easier to grasp, effectively narrating a story.
Engaging Presentation: Visual representations are more visually appealing and engaging for the audience.
Effective Communication: Enables clear and concise communication of information, facilitating better understanding and decision-making.

Gestalt

Gestalt psychology is a school of thought dedicated to understanding the human brain’s perception and interpretation of experiences.

Researchers in this field have identified various laws of perception or principles, which are based on our inherent tendency to find order in chaos.

The prinicples applied to data visualization

Proximity

Things that are spatially near to one another seem to be related.

Similarity

Things that look alike seem to be related.

Closure

Our brains tend to ignore gaps and complete structures with open areas

Connection

Things that are visually tied to one another seem to be related.

Preattentive Processing or Pop-out

Some objects in our visual field are easier to see than others.

Pop-out makes some things on a data graphic easier to see or find than others.

Find the blue circle:

Preattentive Processing or Pop-out

Foundations Set, Let’s Navigate Further!

🌟 What We’ve Learned:

Why should we care.
Leveraging Our Brain’s Perception to Make Clearer Graphs.

Next Step: Perfecting Graph Construction

Crafting Clear and Compelling Graphs

The Musts in a Graph

Ease of Interpretation: Do not require the reader to think unnecessarily hard.
Cognitive Comfort: lighten the cognitive load when processing.
Honest Scaling: Avoid misleading through y-axis manipulation.

Graphical excellence is the well-designed presentation of interesting data (Tufte, 1983, p 51)

Ease of Interpretation

What are the common sources from which adolescents obtain the alcohol they consume?

Ease of Interpretation

Perspective Distortion
Inconsistent Baselines
Visual Complexity
Distraction from the Data

Perspective Distortion: The 3D perspective can distort the perception of the bar’s height because the top of the bar is not aligned with the corresponding value on the y-axis due to the angle of view. This makes it challenging to accurately determine the value at a glance.
Inconsistent Baselines: Depending on the angle, the baselines of the bars may not appear consistent across the graph, which can lead to misinterpretation of the comparative lengths of the bars.
Visual Complexity: The 3D effect adds visual complexity that is unnecessary for interpreting the data and can lead to cognitive overload, as the brain has to work harder to decipher the information.
Distraction from the Data: The 3D effect can draw attention away from the actual data being presented, as the aesthetic can overshadow the message.

Cognitive Comfort

What is the distribution of family-provided alcohol to adolescents across different racial groups?

Cognitive Comfort

Difficulty in Comparing Sections
Ineffective for Large Number of Categories
Reliance on Color or Patterns
Area Perception Issues
Difficulty in Reading Exact Values

Difficulty in Comparing Sections: It’s hard to judge the size of angles accurately, making it challenging to compare the size of different sections, especially when they are of similar size.

Ineffective for Large Number of Categories: Pie charts become cluttered and unreadable when there are many categories. The slices become too thin to be distinct or labeled clearly.

Reliance on Color or Patterns: To differentiate slices, pie charts often rely on different colors or patterns, which can be a problem for color-blind readers or when reproducing in black and white.

Misleading with Improper Use: When pie charts are not drawn to scale or have a 3D design, they can misrepresent the data, making some slices appear larger or smaller than their actual proportion.

Area Perception Issues: Humans are better at comparing lengths than areas. With pie charts, we are comparing areas (or angles), which is less intuitive and can lead to misinterpretation.

Difficulty in Reading Exact Values: It’s often difficult to read exact values from a pie chart unless they are annotated, and annotations can clutter the chart.

Honest Scaling

This chart shows the average number of Facebook likes on posts by pages of the political left. The point of this chart was to show the disparity between Mr Corbyn’s posts and others

Quick Experiment

https://ig.ft.com/science-of-charts/

These results replicate

In the 1980s, Cleveland and McGill ran experiments where participants estimated and compared values in charts

The results of Cleveland and McGill

The overall pattern of results seems clear: performance worsening substantially as we move away from comparison on a common scale to length-based comparisons to angles and finally areas

In conclusion

These findings strongly suggest that there are better and worse ways of visually representing data when estimating and comparing values within the graph.

Key Takeaways:

🚫💭 Anything that makes a viewer need to think is bad!

📊 3D Charts: Misleading and Overcomplicated

🥧 Pie Charts: Hard to Compare Accurately

🍩 Donut Charts: Stylish, Yet Less Functional

📚 Stacked Graphics: Can Obscure Data Details

🖋️ Extra Ink: Clutters and Confuses

🔑 Key/Legend: Necessary but Keep It Simple

Applying the Takeaways

Designing with Precision: Checklist

🔍 Include Sample Size in Title/Caption.

🗑️ Cut Extra Information.

🖌️ Remove Unnecessary Ink.

💡 Highlight Key Points.

🌈 Use Mnemonic Colors (Color-Blind Friendly).

🔖 Directly Label Categories.

📏 Think about the Range of the Y Axis

We found

`ggplot2`

On June 10, 2007, Hadley Wickham officially released ggplot2
An open-source data visualization package for R.
ggplot2 is an implementation of Leland Wilkinson’s Grammar of Graphics—a general scheme for data visualization
It is a system for declarativley creating grpahics

The components of a `ggplot`

Component	Function	Explanation
Data	ggplot(data)	The raw data that you want to visualize.
Aesthetics	aes()	Links your data to how it’s shown on the graph.
Geometries	geom_*()	The geometric shape of a layer representing the data.

Advantages of `ggplot`

Active and helpful community.
Very flexible, you can create layer plot specifications.
You can create themes for desired appearance.
Reproducibility

The `ggplot2` Showcase

Ariane Aumaitre - Reference

The `ggplot2` Showcase

Recreating the New York Times COVID-19 Spiral Graph

The `ggplot2` Showcase

Recreating the Economist Barplot

The `ggplot2` Showcase

Where do tornado outbreaks usually occur?

Reference - Tanya Shapiro

We are here

Our Mission

Examine the average number of alcoholic drinks consumed by adolescents, differentiated by sex, for the years 2017, 2019, and 2021

To Follow The Code

Or click here

The Data Set

Source: Youth Risk Behavioral Surveillance System 2017, 2019, and 2021
Variables: Alcohol use in adolescents
R package: RiskExplorer

# install.packages("tidyverse")
# install.packages("devtools")
# install_github("ccani007/dissertationData")

library(tidyverse)
library(RiskExplorer)

data("youthAlcoholUse")