This post was adapted from a section in The Jigsaw Puzzle Pieces - Creating the Graphs with ggplot2, Part II of a series of posts on building Shiny App. We created many bar plots there and the variations of bar plots. For summary purposes, I want to bring all of them together here.
As usual, before we begin, let’s load all the packages we will need.
library(dplyr)
library(ggplot2)
First we’ll load the sample survey data.
## setwd()
load("~/Desktop/r sample files/survey")
head(survey)
## # A tibble: 6 x 17
## status country major Q1.1 Q1.2 Q1.3 top_reason place_options
## <fct> <fct> <fct> <chr> <chr> <chr> <chr> <chr>
## 1 Freshm… U.S. Undef… Never Never Never Find a quiet … quiet (occasio…
## 2 Freshm… China Undef… Occas… Often Often Find a quiet … crowded,focused
## 3 Freshm… U.S. Undef… Never Occa… Often Meet up with … (close to) sil…
## 4 Freshm… China Undef… Occas… Occa… Occa… Meet up with … focused,(close…
## 5 Freshm… U.S. Undef… Never Occa… Occa… Find a quiet … (close to) sil…
## 6 Freshm… China Undef… Occas… Often Often Find a quiet … (close to) sil…
## # ... with 9 more variables: space_lib <chr>, rank_crowded <dbl>,
## # rank_modpop <dbl>, rank_noisy <dbl>, rank_quiet <dbl>,
## # rank_silent <dbl>, rank_relaxed <dbl>, rank_focused <dbl>,
## # workshops <chr>
We’ll create a list of a color palette. This will make working with colors more easily later in plotting.
palette <- list(purple = c("#351F39", "#351C4D", "#6c1f55", "#765285", "#8a6899"),
turquoise = c("#709FB0", "#849974", "#A0C1B8"),
golden = c("#D1A827", "#f3da4c"))
Let’s say we want to present how often respondents of different majors use the library website to search for articles. The data on website usage is Q1.3
(“I use our library website to (how often) - Search for articles”). The data for major is major
.
Q1.3
and major
both contain discrete values. Bar plots and their variations can easily achieve this goal.
tail(survey[c("Q1.3","major")])
## # A tibble: 6 x 2
## Q1.3 major
## <chr> <fct>
## 1 Often Data Science & Interactive Media Business
## 2 Occasionally Business, Finance & Economics
## 3 Often Interactive Media Arts
## 4 Often Business, Finance & Economics
## 5 Occasionally Humanities & Social Sciences
## 6 Occasionally CS & Engineering
table(survey$Q1.3)
##
## Never Occasionally Often
## 77 126 117
table(survey$major)
##
## Business, Finance & Economics
## 118
## Humanities & Social Sciences
## 32
## Data Science & Interactive Media Business
## 24
## Interactive Media Arts
## 20
## Science
## 21
## CS & Engineering
## 27
## Mathematics
## 21
## Undefined
## 57
ggplot(survey, aes(major)) +
geom_bar(aes(fill = Q1.3), position = position_stack(reverse = TRUE), width = 0.4, alpha = 0.75) +
## geom_bar() adds a layer of stacked bars to the plot
## aes(fill = Q1.3) fills the stacked bars with the counts of each value of Q1.3 (never/occasionally/often)
## position = position_stack(reverse = TRUE) reverses the order of the stacked bars
scale_fill_manual(values = c(palette[[1]][4], palette[[2]][1], palette[[3]][1])) +
## manually fills the bars with the preset color scheme
scale_x_discrete(limits = rev(levels(survey$major))) +
## reverse the order of levels of x axis
coord_flip() +
## flip the coordinates
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12, margin = margin(0,3,0,0)),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
axis.ticks.x = element_line(size = 0),
legend.title = element_blank(),
legend.text = element_text(size = 10),
plot.margin = unit(c(1,1,1,1), "cm"))
Alternatively, we can fill the stacked bars with the percentage of each value of Q1.3
.
ggplot(survey, aes(major)) +
geom_bar(aes(fill = Q1.3), position = "fill", width = 0.4, alpha = 0.75) +
## position = "fill" sets the plot as stacked bars with filled percentages instead of counts
scale_fill_manual(values = c(palette[[1]][4], palette[[2]][1], palette[[3]][1])) +
scale_x_discrete(limits = rev(levels(survey$major))) +
coord_flip() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12, margin = margin(0,3,0,0)),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
axis.ticks.x = element_line(size = 0),
legend.title = element_blank(),
legend.text = element_text(size = 10),
plot.margin = unit(c(1,1,1,1), "cm"))
Or we can group the bars rather than stack the bars.
ggplot(survey, aes(major)) +
geom_bar(aes(fill = Q1.3), position = "dodge", width = 0.4, alpha = 0.75) +
## position = "dodge" sets the plot as grouped bars
scale_fill_manual(values = c(palette[[1]][4], palette[[2]][1], palette[[3]][1])) +
scale_x_discrete(limits = rev(levels(survey$major))) +
coord_flip() +
theme(axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12, margin = margin(0,3,0,0)),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
axis.ticks.x = element_line(size = 0),
legend.title = element_blank(),
legend.text = element_text(size = 10),
plot.margin = unit(c(1,1,1,1), "cm"))
Bar charts are cognitively light to process. But we could also be easily bored by bar charts. A lollipop chart, which is a variation of bar chart, can do the same thing for us with some visual diversity.
To make a lollipop chart, we need a frequency table that summarizes the counts of each major.
tb <- survey %>% count(major) %>% data.frame() # generates a frequency table
tb <- tb %>% arrange(-n) %>% filter(major != "Undefined") # reorder by frequency; remove undefined majors
tb
## major n
## 1 Business, Finance & Economics 118
## 2 Humanities & Social Sciences 32
## 3 CS & Engineering 27
## 4 Data Science & Interactive Media Business 24
## 5 Science 21
## 6 Mathematics 21
## 7 Interactive Media Arts 20
Now let’s plot the lollipop chart from the frequency table.
ggplot(tb, aes(n, reorder(major, -n), label = n)) +
geom_segment(aes(x = 0, y = reorder(major, -n), xend = n, yend = reorder(major, -n)),
size = 0.5, color = "grey50") +
## reorder Group2 by its frequency in descending order
geom_point(size = 8) +
## geom_point() creates scatterplots that display the relationship between two variables, which takes the values of both x and y axes.
## Here, if we add geom_point() before geom_segment(), we'll see the sticks stabbing the points.
## So the sequence actually matters here.
geom_text(color = "white", size = 3) +
coord_flip() +
theme(axis.text.x = element_text(size = 12, angle = 90, hjust = 1),
## The label of x axis would be too wide to lay out horizontally. hjust = 1 means right-justified.
axis.text.y = element_text(size = 12),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
axis.ticks.x = element_line(size = 0))