Code, Craft, Communicate: A Hands-On Journey into Data Science with R & Quarto

Catalina Cañizares

2025-05-29

Code, Craft, Communicate: A Hands-On Journey into Data Science with R & Quarto© 2025 by Catalina Canizares is licensed under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

This material is freely available under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

For more information on this license, please visit: Creative Commons License

Welcome!! 👋

I am…

Agenda for the Morning 📋

  • Introduction to Data Science and R
  • Setting up our development environment
  • Creating our first Quarto document
  • Working with data in R
  • Hands-on practice session

For the next 4 days we will be: 🚀

  • 📊 Learning how to use R and RStudio
  • 📝 Learning how to use Quarto
  • 🔧 Learning how to use tidyverse for data wrangling
  • 🤖 Learning how to use tidymodels for machine learning
    • We will learn about the following SUPERVISED Machine Learning Models:
      • Logistic Regression
      • LASSO Regression
      • Ridge Regression
      • Elastic Net Regression
      • Tree-based Models
      • Random Forest
      • K-Nearest Neighbors

For the next 4 days we will be: 🚀

🌐 We will deploy our class notes and products as a website!

How? 💡

  • This workshop is going to be mainly hands-on, you will be following my presentations and coding along with me.
  • You should have received an invitation to Posit Cloud to your emails, that is the environment we will be using.
  • We will be using REAL life data! 📈

Quick Poll: How familiar are you with R?

  • Complete beginner

  • Some experience

  • Intermediate

  • Advanced

Before we start… ⚡

Some code conduct clarifications: We will go as fast as the slowest person in the room! 🐢

  • This means, if you are not following, please ask questions! 🙋‍♀️

  • I will interrupt the lecture to troubleshoot in your computers, we will do it talking through the steps, so if you find yourself in the same position you can troubleshoot

  • If you know how to help your neighbor, please do it! 🤝

  • All questions are welcome, valid and EVERYONE will respect your voice. 💫

  • No trolling, bullying, or any other behavior that is not respectful will be allowed or tolerated. ❌

Module Project

Create a website using Quarto to analyze the Youth Risk Behavior Survey data 2023 that includes:

  1. A home page with a brief introduction to the project.
  2. A tab introducing yourself and your research interests.
  3. A tab per model, showing the model’s performance and a visualization of the model’s predictions.

This is a peak into the final project

Objectives for Today

  • Define and identify the primary responsibilities of a data scientist.
  • Gain insight into the history of R and RStudio.
  • Recognize the importance of working within projects and initiate a new project in RStudio (Locally).
  • Create the first Quarto document and understand its components.
  • Learn to import data from various sources into the Quarto file.
  • Explore a dataset and identify different data types.

Introduction to Data Science

What is Data Science?

Scan this QR Code

Defining Data Science

Data Science allows you to transform raw data into understanding, insight, and knowledge (Wickham, Çetinkaya-Rundel, and Grolemund 2023)

The Data Science Association:

  • Data Science: “The scientific study of the creation, validation and transformation of data to create meaning”

  • Statistics: “The practice or science of collecting and analyzing numerical data in large quantities”

There is a disagreement between academics over the terminology, value and contribution of both disciplines (Hassani et al. 2021)

Defining Data Science

Introduction to R and R Studio

Let’s talk about R

How learning R feels like

The best advice…

  • The best advice about working in R is to organize your work within projects.

  • Why?

  • Reproducibility

  • Organization

  • More reasons here

Steps to create the project

Steps to create the project

Steps to create the project

Steps to create the project

Now that we have a project

  • Let’s create a script and explore the IDE

(Integrated Development Environment)

Creating the script

The R Studio IDE

Let’s take a break

05:00

Quarto Files

Inspecting the file

YAML

The Meta-Data Header: YAML

  • The first few lines of the document are the document header.
  • These lines tell R crucial information about how to build your report.
  • The entire header, and all of the document options in it, are bounded by the three horizontal dashes - - - above and below.

The Meta-Data Header: YAML

Currently our header is quite basic. It includes:

  • The title of the document; title: “Hello Quarto”
  • Who wrote it: author: “Catalina Canizares”
  • Today’s date: “2024-05-30”
  • The output type: format –> html
  • Bibliography
  • Type of citation and format

Different YAML examples

Code Chunks

Anatomy of a Chunk

Chunk Options

Source: Code Chunks

More about Chunks

Make chunks like a pro

More about Chunks

Name chunks like a pro with kebab-case

Code chunk example

]

Markup Text

Markup Text

Now that we have a report with a header and some code, we need to explain what the code is doing and why.

  • This is where the plain text comes in.

  • Outside of a code chunk, type anything you want. You can even include pictures and tables.

Markup Text

Cross-Reference

Tables

  • Always starts with tbl-

Figures

  • Always starts with fig-

Text Formatting

]

Source Get Started with Quarto

Text Headings

Source Get Started with Quarto

Small break

05:00

Exercise

Exercise

  • Let’s go to the project we created

  • Modify the YAML with your information, and name your file

  • Check the YAML so we can see the code and the results.

Exercise

  • Create a chunk and type:
install.packages("palmerpenguins")
install.packages("table1")

library(palmerpenguins)
library(table1)

Exercise

Are there differences between the penguin species regarding their flipper lengths?

  • Create a chunk and load the palmerpenguins data
data("penguins")
  • Create another chunk, name it, and explore the data for the flipper length for the different kind of penguins with this code:
table1(~ flipper_length_mm | species, data = penguins)
  • Describe what you see in the table.

Let’s visualize our findings

  • Create another chunk, let’s check
 ggplot(
  data = penguins,
  aes(x = species, y = flipper_length_mm)) +
  geom_boxplot(aes(color = species), 
               width = 0.3,
               show.legend = FALSE) +
  geom_jitter(aes(color = species),
              alpha = 0.5, 
              show.legend = FALSE, 
              position = position_jitter(
                width = 0.2, seed = 0)) +
  scale_color_manual(values = c("darkorange", "purple", "cyan4")) +
  labs(
    x = "Species",
    y = "Flipper length (mm)"
  )

We are ready to Render our report

Save your report and…

Look at your Console while it Renders

A little more more of that reproducible magic

Inside your text you can include executable code with the syntax:

For example:

There are 344 rows in the penguins data set.

The truth:*

More resources

Quarto Website

R Markdown: The Definitive Guide

Let’s take another break

Summary

  • Explored the foundational concepts of data science.
  • Delved into the history and evolution of R.
  • Learned how to create and manage projects in RStudio.
  • Gained proficiency in working with Quarto files.

Next Steps: - Understand how to structure and organize projects for scientific research.

How Should We Organize Our Projects in Science?

  • Scientific papers generally follow a consistent structure and require similar tools, especially when working with quantitative data.
  • I’ll share a useful trick to organize your files and start a project in alignment with the typical structure of scientific papers.

The rUM Package

The rUM Package in your own machine

  1. In your current RStudio window type in the console: install.packages("rUM")
  2. Go to “Session” and click “Restart R”
  3. Go to File –> New Project…

What happened?

Inspecting the Project

  • .gitignore: We will skip this for now, but it will become important later as it helps manage what files get tracked by Git.
  • analysis: This is the Quarto file where you will write your manuscript. It’s central to our project as it integrates our analysis and reporting.
  • What do you think about the organization of the Quarto file? Does it align with what we’ve just learned?
  • data folder: Currently, this folder is empty, but it will store the dataset we will use for our analysis. Proper organization of data is crucial for reproducible research.
  • Two text files, packages.bib and references.bib, which are used to hold details for your paper’s bibliography.
  • The-new-england-journal-of-medicine.csl is the citation style language (CSL) based on the New England Journal of Medicine requirements.

Reading data in our quarto files

  • R can read data from a variety of formats, including:
    • Excel (e.g., CSV, XLSX, TXT)
    • SAS
    • Stata
    • SPSS
    • Other statistical packages

Let’s get the Data

  1. From Excel
  2. From .csv
  3. From .sav
  4. From R

rio package: A Swiss-Army Knife for Data


install.packages("rio")
library(rio)
yrbs_2021_df <- import("data/yrbs_2021.csv")

rio package: A Swiss-Army Knife for Data


yrbs_2021_xlsx <- import("data/yrbs_2021.xlsx")

rio package: A Swiss-Army Knife for Data


yrbs_2021_sav <- import("data/yrbs_2021.sav")

Exporting data from a Package

install.packages("devtools")
library(devtools)

install_github("ccani007/dissertationData")
library(dissertationData)

data("clean_yrbs_2023")

Explore the data set

  • Let’s check out two ways to explore datasets in R!
  1. The simpler, traditional way:
glimpse(clean_yrbs_2023)
Rows: 20,103
Columns: 110
$ weight                        <dbl> 0.8614, 0.8920, 0.5081, 1.1759, 0.8920, …
$ stratum                       <dbl> 103, 103, 103, 103, 103, 103, 103, 103, …
$ psu                           <dbl> 16294, 16294, 16294, 16294, 16294, 16294…
$ Sex                           <chr> "Female", "Male", "Male", "Female", "Mal…
$ Race                          <chr> NA, "White", "White", "White", "White", …
$ Age                           <int> 14, 15, 16, 17, 14, 16, 17, 15, 15, 17, …
$ Grade                         <chr> "9", "9", "11", "10", "9", "9", "11", "9…
$ SexOrientation                <chr> "Gay or Lesbian", "Heterosexual", "Bisex…
$ HispanicLatino                <fct> NA, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, …
$ Height                        <dbl> 1.65, NA, 1.68, NA, 1.85, 1.80, 1.83, 1.…
$ Weight                        <dbl> 81.65, NA, 74.84, NA, 56.70, 72.58, 74.8…
$ SeatBeltUse                   <fct> 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1…
$ DrinkingDriver                <fct> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0…
$ DrivingDrinking               <fct> NA, NA, 0, 0, NA, NA, NA, NA, NA, 0, NA,…
$ TextingDriving                <fct> NA, NA, 1, 1, NA, NA, NA, NA, NA, NA, NA…
$ WeaponCarryingSchool          <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ GunCarrying                   <fct> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0…
$ UnsafeAtSchool                <fct> 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0…
$ InjuredInSchool               <fct> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0…
$ PhysicalFight                 <fct> 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0…
$ SchoolPhysicalFight           <fct> 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0…
$ AttackedInNeighborhood        <fct> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0…
$ ForcedSexualIntercourse       <fct> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0…
$ SexualViolence                <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ SexualAbuseByPartner          <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ PhysicalDatingViolence        <fct> 0, 0, 0, 0, NA, 0, 0, 0, 0, 0, NA, 0, 0,…
$ TreatedUnfairlyRace           <fct> 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0…
$ Bullying                      <fct> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ CyberBullying                 <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ Hopelessness                  <fct> 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0…
$ SuicideIdeation               <fct> 0, 0, 0, 0, 0, 0, 0, 0, 1, NA, 0, 1, 0, …
$ SuicidePlan                   <fct> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0…
$ SuicideAttempts               <fct> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0…
$ SuicideInjuryMedicalAttention <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ EverTriedCigarette            <fct> 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, NA, …
$ FirstCigaretteBefore13        <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, NA, …
$ CurrentlySmokingCigarette     <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ MoreThan10CigarettesPerDay    <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ EverTriedVaping               <fct> 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0…
$ CurrentlyVaping               <fct> 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0…
$ StoreSourceVaping             <fct> NA, NA, 0, 0, NA, NA, 0, NA, NA, NA, NA,…
$ CurrentlySmokelessTobacco     <fct> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ CurrentlySmokingCigar         <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ QuitTobacco                   <fct> NA, NA, 0, 0, 0, NA, 0, NA, NA, NA, NA, …
$ FirstAlcoholBefore13          <fct> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0…
$ CurrentlyAlcohol              <fct> 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0…
$ CurrentlyBingeDrinking        <fct> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0…
$ MoreThan10Drinks              <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ SomeoneSourceAlcohol          <fct> NA, NA, NA, 0, 0, NA, 0, NA, NA, NA, NA,…
$ EverUsedMarijuana             <fct> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1…
$ FirstMarijuanaBefore13        <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ CurrentlyUsingMarijuana       <fct> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1…
$ PainMedicine                  <fct> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0…
$ EverUsedCocaine               <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ EverUsedInhalant              <fct> 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ EverUsedHeroin                <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ EverUsedMethamphetamine       <fct> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ EverUsedEcstasy               <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ EverInjectedIllegalDrug       <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, N…
$ EverHadSex                    <fct> 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, NA, …
$ SexBefore13                   <fct> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, NA, …
$ Sex4OrMorePartners            <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, NA, …
$ SexuallyActive                <fct> 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, NA, …
$ AlcoholOrDrugsSex             <fct> NA, NA, 0, 0, 0, NA, 0, NA, NA, 0, NA, N…
$ UsedCondom                    <fct> NA, NA, 1, 0, 1, NA, 1, NA, NA, 1, NA, N…
$ BirthControl                  <fct> NA, NA, 1, 1, 0, NA, 0, NA, NA, 1, NA, N…
$ VeryOverweight                <fct> 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1…
$ WeightLoss                    <fct> 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1…
$ NoFruitJuice                  <fct> 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0…
$ NoFruit                       <fct> 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0…
$ NoSalad                       <fct> 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1…
$ NoPotatoes                    <fct> 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0…
$ NoCarrots                     <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0…
$ NoOtherVeggies                <fct> 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0…
$ NoSoda                        <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0…
$ NoBreakfast                   <fct> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0…
$ PhysicalActivity              <fct> 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0…
$ AttendedPEClass               <fct> 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0…
$ SportsTeam                    <fct> 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0…
$ ConcussionSports              <fct> 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1…
$ SocialMedia                   <fct> 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1…
$ HIVTested                     <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ STDTested                     <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ DentistVisit                  <fct> 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0…
$ NotGoodMentalHealth           <fct> 0, 0, 0, 0, 0, NA, 0, 0, 0, 0, 0, 0, 0, …
$ EightOrMoreHoursSleep         <fct> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0…
$ Homelessness                  <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ GradesSchool                  <fct> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, …
$ SexualAbuseByOlderPerson      <fct> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0…
$ ParentalEmotionalAbuse        <fct> NA, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, …
$ ParentalPhysicalAbuse         <fct> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0…
$ ParentalDomesticViolence      <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ CurrentMisusePainMedicine     <fct> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ EverHallucinogenicDrugs       <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ AskedForConsent               <fct> NA, NA, 1, 1, 0, 1, 1, NA, 0, 1, NA, NA,…
$ NoSportsDrinks                <fct> 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0…
$ NoDrinksWater                 <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ ExcerciseMuscles              <fct> 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0…
$ Sunburn                       <fct> 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0…
$ AdultMeetingBasicNeeds        <fct> 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0…
$ ParentSubstanceUse            <fct> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2…
$ ParentMentalIllness           <fct> 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
$ ParentIncarceration           <fct> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
$ SchoolConnectedness           <fct> 2, 1, 1, 2, 1, 1, 2, 2, 2, 2, 1, 2, 1, 2…
$ ParentalMonitoring            <fct> 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2…
$ UnfairDisciplineAtSchool      <fct> 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2…
$ DifficultyConcentrating       <fct> 2, 1, 2, 1, 2, 2, 2, 2, 1, 2, 2, 1, NA, …
$ EnglishProficiency            <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1…
$ SexSexualContact              <chr> "Never had sex", "Never had sex", "Both"…
$ Transgender                   <chr> "No", "No", "No", "No", "No", "No", "No"…

Explore the data set

  1. The neatest and most detailed way (in my personal opinion):
# install.packages("skimr")
library(skimr)
skim(clean_yrbs_2023)
Data summary
Name clean_yrbs_2023
Number of rows 20103
Number of columns 110
_______________________
Column type frequency:
character 6
factor 98
numeric 6
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Sex 158 0.99 4 6 0 2 0
Race 370 0.98 5 25 0 8 0
Grade 241 0.99 1 2 0 4 0
SexOrientation 2003 0.90 8 33 0 6 0
SexSexualContact 4722 0.77 4 13 0 4 0
Transgender 1802 0.91 2 33 0 4 0

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
HispanicLatino 254 0.99 FALSE 2 2: 15852, 1: 3997
SeatBeltUse 5032 0.75 FALSE 2 0: 8506, 1: 6565
DrinkingDriver 505 0.97 FALSE 2 0: 16674, 1: 2924
DrivingDrinking 11527 0.43 FALSE 2 0: 8095, 1: 481
TextingDriving 12034 0.40 FALSE 2 0: 4755, 1: 3314
WeaponCarryingSchool 508 0.97 FALSE 2 0: 18747, 1: 848
GunCarrying 8729 0.57 FALSE 2 0: 10861, 1: 513
UnsafeAtSchool 1121 0.94 FALSE 2 0: 16681, 1: 2301
InjuredInSchool 2129 0.89 FALSE 2 0: 16260, 1: 1714
PhysicalFight 8451 0.58 FALSE 2 0: 9309, 1: 2343
SchoolPhysicalFight 2643 0.87 FALSE 2 0: 16014, 1: 1446
AttackedInNeighborhood 5526 0.73 FALSE 2 0: 11082, 1: 3495
ForcedSexualIntercourse 2801 0.86 FALSE 2 0: 15639, 1: 1663
SexualViolence 4351 0.78 FALSE 2 0: 13912, 1: 1840
SexualAbuseByPartner 6662 0.67 FALSE 2 0: 12550, 1: 891
PhysicalDatingViolence 8596 0.57 FALSE 2 0: 10299, 1: 1208
TreatedUnfairlyRace 1817 0.91 FALSE 2 0: 12530, 1: 5756
Bullying 201 0.99 FALSE 2 0: 15856, 1: 4046
CyberBullying 205 0.99 FALSE 2 0: 16475, 1: 3423
Hopelessness 240 0.99 FALSE 2 0: 11755, 1: 8108
SuicideIdeation 436 0.98 FALSE 2 0: 15453, 1: 4214
SuicidePlan 1737 0.91 FALSE 2 0: 15154, 1: 3212
SuicideAttempts 767 0.96 FALSE 2 0: 17341, 1: 1995
SuicideInjuryMedicalAttention 5468 0.73 FALSE 2 0: 14263, 1: 372
EverTriedCigarette 2661 0.87 FALSE 2 0: 14331, 1: 3111
FirstCigaretteBefore13 3283 0.84 FALSE 2 0: 15682, 1: 1138
CurrentlySmokingCigarette 364 0.98 FALSE 2 0: 18933, 1: 806
MoreThan10CigarettesPerDay 19580 0.03 FALSE 2 0: 473, 1: 50
EverTriedVaping 299 0.99 FALSE 2 0: 12835, 1: 6969
CurrentlyVaping 1012 0.95 FALSE 2 0: 15705, 1: 3386
StoreSourceVaping 17457 0.13 FALSE 2 0: 2492, 1: 154
CurrentlySmokelessTobacco 1180 0.94 FALSE 2 0: 18431, 1: 492
CurrentlySmokingCigar 1161 0.94 FALSE 2 0: 18074, 1: 868
QuitTobacco 15766 0.22 FALSE 2 1: 2326, 0: 2011
FirstAlcoholBefore13 855 0.96 FALSE 2 0: 16404, 1: 2844
CurrentlyAlcohol 901 0.96 FALSE 2 0: 15050, 1: 4152
CurrentlyBingeDrinking 4662 0.77 FALSE 2 0: 14010, 1: 1431
MoreThan10Drinks 5689 0.72 FALSE 2 0: 14054, 1: 360
SomeoneSourceAlcohol 17077 0.15 FALSE 2 0: 1845, 1: 1181
EverUsedMarijuana 3500 0.83 FALSE 2 0: 11383, 1: 5220
FirstMarijuanaBefore13 864 0.96 FALSE 2 0: 18067, 1: 1172
CurrentlyUsingMarijuana 503 0.97 FALSE 2 0: 16193, 1: 3407
PainMedicine 576 0.97 FALSE 2 0: 17165, 1: 2362
EverUsedCocaine 3256 0.84 FALSE 2 0: 16411, 1: 436
EverUsedInhalant 5877 0.71 FALSE 2 0: 13344, 1: 882
EverUsedHeroin 781 0.96 FALSE 2 0: 19016, 1: 306
EverUsedMethamphetamine 1122 0.94 FALSE 2 0: 18641, 1: 340
EverUsedEcstasy 3608 0.82 FALSE 2 0: 16016, 1: 479
EverInjectedIllegalDrug 5006 0.75 FALSE 2 0: 14905, 1: 192
EverHadSex 3982 0.80 FALSE 2 0: 10701, 1: 5420
SexBefore13 1530 0.92 FALSE 2 0: 17976, 1: 597
Sex4OrMorePartners 2458 0.88 FALSE 2 0: 16527, 1: 1118
SexuallyActive 1764 0.91 FALSE 2 0: 14362, 1: 3977
AlcoholOrDrugsSex 16526 0.18 FALSE 2 0: 2918, 1: 659
UsedCondom 16194 0.19 FALSE 2 1: 2023, 0: 1886
BirthControl 16614 0.17 FALSE 2 0: 2794, 1: 695
VeryOverweight 8308 0.59 FALSE 2 0: 7900, 1: 3895
WeightLoss 6167 0.69 FALSE 2 0: 7529, 1: 6407
NoFruitJuice 3495 0.83 FALSE 2 0: 10723, 1: 5885
NoFruit 1076 0.95 FALSE 2 0: 16728, 1: 2299
NoSalad 4767 0.76 FALSE 2 0: 8497, 1: 6839
NoPotatoes 4858 0.76 FALSE 2 0: 9689, 1: 5556
NoCarrots 4827 0.76 FALSE 2 1: 8451, 0: 6825
NoOtherVeggies 4891 0.76 FALSE 2 0: 12186, 1: 3026
NoSoda 4920 0.76 FALSE 2 0: 10800, 1: 4383
NoBreakfast 5835 0.71 FALSE 2 0: 11427, 1: 2841
PhysicalActivity 1227 0.94 FALSE 2 0: 9814, 1: 9062
AttendedPEClass 6227 0.69 FALSE 2 0: 7283, 1: 6593
SportsTeam 7139 0.64 FALSE 2 1: 6697, 0: 6267
ConcussionSports 4594 0.77 FALSE 2 0: 13209, 1: 2300
SocialMedia 4900 0.76 FALSE 2 1: 11872, 0: 3331
HIVTested 4584 0.77 FALSE 2 0: 14447, 1: 1072
STDTested 7325 0.64 FALSE 2 0: 12048, 1: 730
DentistVisit 2179 0.89 FALSE 2 1: 13356, 0: 4568
NotGoodMentalHealth 4398 0.78 FALSE 2 0: 10897, 1: 4808
EightOrMoreHoursSleep 2662 0.87 FALSE 2 0: 13435, 1: 4006
Homelessness 5562 0.72 FALSE 2 0: 14043, 1: 498
GradesSchool 3568 0.82 FALSE 2 1: 11679, 0: 4856
SexualAbuseByOlderPerson 6280 0.69 FALSE 2 0: 12648, 1: 1175
ParentalEmotionalAbuse 6183 0.69 FALSE 2 0: 12129, 1: 1791
ParentalPhysicalAbuse 6186 0.69 FALSE 2 0: 13524, 1: 393
ParentalDomesticViolence 6350 0.68 FALSE 2 0: 13396, 1: 357
CurrentMisusePainMedicine 7722 0.62 FALSE 2 0: 11731, 1: 650
EverHallucinogenicDrugs 8932 0.56 FALSE 2 0: 10441, 1: 730
AskedForConsent 14611 0.27 FALSE 2 1: 4417, 0: 1075
NoSportsDrinks 8194 0.59 FALSE 2 0: 6550, 1: 5359
NoDrinksWater 6519 0.68 FALSE 2 0: 13145, 1: 439
ExcerciseMuscles 8906 0.56 FALSE 2 1: 5679, 0: 5518
Sunburn 9315 0.54 FALSE 2 1: 5868, 0: 4920
AdultMeetingBasicNeeds 4353 0.78 FALSE 2 1: 13575, 0: 2175
ParentSubstanceUse 6876 0.66 FALSE 2 2: 9398, 1: 3829
ParentMentalIllness 6798 0.66 FALSE 2 2: 8999, 1: 4306
ParentIncarceration 6761 0.66 FALSE 2 2: 10982, 1: 2360
SchoolConnectedness 8926 0.56 FALSE 2 1: 6016, 2: 5161
ParentalMonitoring 9253 0.54 FALSE 2 1: 9038, 2: 1812
UnfairDisciplineAtSchool 9520 0.53 FALSE 2 2: 8412, 1: 2171
DifficultyConcentrating 9157 0.54 FALSE 2 2: 5685, 1: 5261
EnglishProficiency 9308 0.54 FALSE 2 1: 10550, 2: 245

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
weight 0 1.00 1.00 1.27 0.00 0.15 0.41 1.36 22.54 ▇▁▁▁▁
stratum 0 1.00 228.37 73.29 101.00 201.00 212.00 300.00 300.00 ▃▁▆▁▇
psu 0 1.00 414571.69 235438.55 16294.00 214944.00 401089.00 573255.00 788430.00 ▆▃▇▇▆
Age 98 1.00 15.89 1.22 12.00 15.00 16.00 17.00 18.00 ▁▃▇▆▇
Height 2289 0.89 1.70 0.10 1.32 1.63 1.70 1.78 2.03 ▁▅▇▆▁
Weight 2289 0.89 68.68 18.47 31.30 55.79 64.41 77.11 180.99 ▆▇▂▁▁

Types of variables

How to see it in the data?

Now it’s your turn!

In your projects:

  1. Lets open Posit Cloud and create our projects!!

See ya next time!

References

Giorgi, Federico M, Carmine Ceraolo, and Daniele Mercatelli. 2022. “The r Language: An Engine for Bioinformatics and Data Science.” Life 12 (5): 648.
Hassani, Hossein, Christina Beneki, Emmanuel Sirimal Silva, Nicolas Vandeput, and Dag Øivind Madsen. 2021. “The Science of Statistics Versus Data Science: What Is the Future?” Technological Forecasting and Social Change 173: 121111.
Wickham, Hadley, Mine Çetinkaya-Rundel, and Garrett Grolemund. 2023. R for Data Science. " O’Reilly Media, Inc.".