Skip to content

Practice plotting using ggplot2: Lesson 2

Load the data

For these exercises, you will explore the titanic data from kaggle.com, which was downloaded from here. You will need to download the data and load into R. As this is a comma separated file, you will need to explore the read.csv() function.

Description of the data:

Column Description
Survived 0 = No, 1 = Yes
Pclass Ticket Class / Socioeconomic status (1 = 1st, 2 = 2nd, 3 = 3rd)
Name Passenger name
Sex Male / Female
Age Numeric age in years
Siblings/Spouses Aboard # of siblings / spouses aboard the Titanic
Parents/Children Aboard # of parents / children aboard the Titanic
Fare Passenger fare

Get the data here.

Load ggplot2.

library(ggplot2)

Exercise Questions

Question 1

Load titanic.csv and save to an object named titanic.

Possible Solution

titanic <- read.csv("https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv",header=TRUE)

Question 2

Explore the data. What is the structure of the data? Try str(). What are the column names? Try colnames(). How can you get help if you do not know how to use these functions?

Possible Solution

str(titanic) # get the structure 
## 'data.frame':    887 obs. of  8 variables:
##  $ Survived               : int  0 1 1 1 0 0 0 0 1 1 ...
##  $ Pclass                 : int  3 1 3 1 3 3 1 3 3 2 ...
##  $ Name                   : chr  "Mr. Owen Harris Braund" "Mrs. John Bradley (Florence Briggs Thayer) Cumings" "Miss. Laina Heikkinen" "Mrs. Jacques Heath (Lily May Peel) Futrelle" ...
##  $ Sex                    : chr  "male" "female" "female" "female" ...
##  $ Age                    : num  22 38 26 35 35 27 54 2 27 14 ...
##  $ Siblings.Spouses.Aboard: int  1 1 0 1 0 0 0 3 0 1 ...
##  $ Parents.Children.Aboard: int  0 0 0 0 0 0 0 1 2 0 ...
##  $ Fare                   : num  7.25 71.28 7.92 53.1 8.05 ...
colnames(titanic) # get the column names. 
## [1] "Survived"                "Pclass"                 
## [3] "Name"                    "Sex"                    
## [5] "Age"                     "Siblings.Spouses.Aboard"
## [7] "Parents.Children.Aboard" "Fare"

?str # get help
?colnames

Question 3

Make a simple scatter plot. Is there a relationship between the age of the passenger and the passenger fare?

Possible Solution

ggplot(titanic) + 
  geom_point(aes(x=Age, y=Fare))

Question 4

Color the points from question 3 by Pclass. Remember that Pclass is a proxy for socioeconomic status. While the values are treated as numeric upon loading, they are really categorical and should be treated as such. You will need to coerce Pclass into a categorical (factor) variable. See factor() and as.factor().

Possible Solution

ggplot(titanic) + 
  geom_point(aes(x=Age, y=Fare, color=as.factor(Pclass)))

Question 5

Manually scale the colors in question 4. 1st class = yellow, 2nd class = purple, 3rd class = seagreen. Also change the legend labels (1 = 1st Class, 2 = 2nd Class, 3 = 3rd Class).

Possible Solution

ggplot(titanic) + 
  geom_point(aes(x=Age, y=Fare, color=as.factor(Pclass)))+ 
  scale_color_manual(values=c("yellow","purple","seagreen"),
                     labels=c("1st Class","2nd Class","3rd Class"))

Question 6

Facet the plot made in 5 by the column 'Sex'.

Possible Solution

ggplot(titanic) + 
  geom_point(aes(x=Age, y=Fare, color=as.factor(Pclass))) +
  scale_color_manual(values=c("yellow","purple","seagreen"),
                     labels=c("1st Class","2nd Class","3rd Class")) + 
  facet_wrap(~Sex)

Challenge question 1

Let's use some other geoms. Plot the number of passengers (a simple count) that survived by ticket class and facet by sex.

Possible Solution

ggplot(titanic) + 
  geom_bar(aes(x=Pclass, fill=factor(Survived)), 
           position=position_dodge()) + 
  facet_wrap(~Sex)+
  labs( y="Number of Passengers", x="Passenger Class", 
        title="Titanic Survival Rate by Passenger Class")

Challenge question 2

Add a variable to the data frame called age_cat (child = <12, adolescent = 12-17,adult= 18+). Plot the number of passengers (a simple count) that survived by age_cat, fill by Sex, and facet by class and survival.

Possible Solution

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
titanic %>%
  mutate(age_cat= case_when(Age < 12 ~ "child",
  Age >= 12 & Age < 18  ~ "adolescent",
  Age >= 18 ~ "adult"
)) %>%
  ggplot() + 
  geom_bar(aes(x=age_cat, fill=factor(Sex)), 
           position=position_dodge()) + 
  facet_grid(Pclass~Survived)+
  labs( y="Number of Passengers", x="Age Category", 
        title="Titanic Survival")

Want more practice?

Let's use the dataset mtcars. According to the help documentation (?mtcars), "the data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models)." Each question below will depend on code from the previous question.

Question 1

Let's check out the structure of the data.

Possible Solution

str(mtcars)

## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

Question 2

How might we plot automobile weight (wt) versus miles per gallon (mpg).

Possible Solution

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point()

Question 3

What if we want to represent the number of cylinders (cyl) by color and shape?

Possible Solution

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(aes(color = factor(cyl),shape = factor(cyl)))

Question 4

Make the size of the points change by the quarter mile time (qsec).

Possible Solution

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(aes(color = factor(cyl),shape = factor(cyl),size=qsec))

Question 5

Create subplots by transmission (am).

Possible Solution

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(aes(color = factor(cyl),shape = factor(cyl),size=qsec))+
  facet_wrap(~am)

Question 6

Model the trend using geom_smooth(). What is the default method used by geom_smooth()?

Possible Solution

ggplot(mtcars, aes(wt, mpg)) + 
  geom_point(aes(color = factor(cyl),shape = factor(cyl),size=qsec))+
  facet_wrap(~am) +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'