This post was adapted from a section in The Jigsaw Puzzle Pieces - Creating the Graphs with ggplot2, Part II of a series of posts on building Shiny App. We created many bar plots there and the variations of bar plots. For summary purposes, I want to bring all of them together here.

First Steps

As usual, before we begin, let’s load all the packages we will need.

library(dplyr)
library(ggplot2)

Sample data

First we’ll load the sample survey data.

## setwd()
load("~/Desktop/r sample files/survey")

head(survey)
## # A tibble: 6 x 17
##   status  country major  Q1.1   Q1.2  Q1.3  top_reason     place_options  
##   <fct>   <fct>   <fct>  <chr>  <chr> <chr> <chr>          <chr>          
## 1 Freshm… U.S.    Undef… Never  Never Never Find a quiet … quiet (occasio…
## 2 Freshm… China   Undef… Occas… Often Often Find a quiet … crowded,focused
## 3 Freshm… U.S.    Undef… Never  Occa… Often Meet up with … (close to) sil…
## 4 Freshm… China   Undef… Occas… Occa… Occa… Meet up with … focused,(close…
## 5 Freshm… U.S.    Undef… Never  Occa… Occa… Find a quiet … (close to) sil…
## 6 Freshm… China   Undef… Occas… Often Often Find a quiet … (close to) sil…
## # ... with 9 more variables: space_lib <chr>, rank_crowded <dbl>,
## #   rank_modpop <dbl>, rank_noisy <dbl>, rank_quiet <dbl>,
## #   rank_silent <dbl>, rank_relaxed <dbl>, rank_focused <dbl>,
## #   workshops <chr>

Color palette

We’ll create a list of a color palette. This will make working with colors more easily later in plotting.

palette <- list(purple = c("#351F39", "#351C4D", "#6c1f55", "#765285", "#8a6899"),
                turquoise = c("#709FB0", "#849974", "#A0C1B8"),
                golden = c("#D1A827", "#f3da4c"))

A Lot of Bars

Let’s say we want to present how often respondents of different majors use the library website to search for articles. The data on website usage is Q1.3 (“I use our library website to (how often) - Search for articles”). The data for major is major.

Q1.3 and major both contain discrete values. Bar plots and their variations can easily achieve this goal.

tail(survey[c("Q1.3","major")])
## # A tibble: 6 x 2
##   Q1.3         major                                    
##   <chr>        <fct>                                    
## 1 Often        Data Science & Interactive Media Business
## 2 Occasionally Business, Finance & Economics            
## 3 Often        Interactive Media Arts                   
## 4 Often        Business, Finance & Economics            
## 5 Occasionally Humanities & Social Sciences             
## 6 Occasionally CS & Engineering
table(survey$Q1.3)
## 
##        Never Occasionally        Often 
##           77          126          117
table(survey$major)
## 
##             Business, Finance & Economics 
##                                       118 
##              Humanities & Social Sciences 
##                                        32 
## Data Science & Interactive Media Business 
##                                        24 
##                    Interactive Media Arts 
##                                        20 
##                                   Science 
##                                        21 
##                          CS & Engineering 
##                                        27 
##                               Mathematics 
##                                        21 
##                                 Undefined 
##                                        57

Stacked bars (count)

ggplot(survey, aes(major)) + 
  geom_bar(aes(fill = Q1.3), position = position_stack(reverse = TRUE), width = 0.4, alpha = 0.75) +
## geom_bar() adds a layer of stacked bars to the plot
## aes(fill = Q1.3) fills the stacked bars with the counts of each value of Q1.3 (never/occasionally/often)
## position = position_stack(reverse = TRUE) reverses the order of the stacked bars
  scale_fill_manual(values = c(palette[[1]][4], palette[[2]][1], palette[[3]][1])) +
## manually fills the bars with the preset color scheme
  scale_x_discrete(limits = rev(levels(survey$major))) +
## reverse the order of levels of x axis  
  coord_flip() +
## flip the coordinates
theme(axis.text.x = element_text(size = 12),
      axis.text.y = element_text(size = 12, margin = margin(0,3,0,0)),
      axis.title.y = element_blank(),
      axis.title.x = element_blank(),
      axis.ticks.x = element_line(size = 0),
      legend.title = element_blank(),
      legend.text = element_text(size = 10),
      plot.margin = unit(c(1,1,1,1), "cm")) 


Stacked bars (percentage)

Alternatively, we can fill the stacked bars with the percentage of each value of Q1.3.

ggplot(survey, aes(major)) + 
  geom_bar(aes(fill = Q1.3), position = "fill", width = 0.4, alpha = 0.75) +
## position = "fill" sets the plot as stacked bars with filled percentages instead of counts
  scale_fill_manual(values = c(palette[[1]][4], palette[[2]][1], palette[[3]][1])) +
  scale_x_discrete(limits = rev(levels(survey$major))) +
  coord_flip() +
  theme(axis.text.x = element_text(size = 12),
        axis.text.y = element_text(size = 12, margin = margin(0,3,0,0)),
        axis.title.y = element_blank(),
        axis.title.x = element_blank(),
        axis.ticks.x = element_line(size = 0),
        legend.title = element_blank(),
        legend.text = element_text(size = 10),
        plot.margin = unit(c(1,1,1,1), "cm")) 


Grouped bars

Or we can group the bars rather than stack the bars.

ggplot(survey, aes(major)) + 
  geom_bar(aes(fill = Q1.3), position = "dodge", width = 0.4, alpha = 0.75) +
## position = "dodge" sets the plot as grouped bars
  scale_fill_manual(values = c(palette[[1]][4], palette[[2]][1], palette[[3]][1])) +
  scale_x_discrete(limits = rev(levels(survey$major))) +
  coord_flip() +
  theme(axis.text.x = element_text(size = 12),
        axis.text.y = element_text(size = 12, margin = margin(0,3,0,0)),
        axis.title.y = element_blank(),
        axis.title.x = element_blank(),
        axis.ticks.x = element_line(size = 0),
        legend.title = element_blank(),
        legend.text = element_text(size = 10),
        plot.margin = unit(c(1,1,1,1), "cm")) 


Lollipop chart

Bar charts are cognitively light to process. But we could also be easily bored by bar charts. A lollipop chart, which is a variation of bar chart, can do the same thing for us with some visual diversity.

To make a lollipop chart, we need a frequency table that summarizes the counts of each major.

tb <- survey %>% count(major) %>% data.frame() # generates a frequency table
tb <- tb %>% arrange(-n) %>% filter(major != "Undefined") # reorder by frequency; remove undefined majors
tb
##                                       major   n
## 1             Business, Finance & Economics 118
## 2              Humanities & Social Sciences  32
## 3                          CS & Engineering  27
## 4 Data Science & Interactive Media Business  24
## 5                                   Science  21
## 6                               Mathematics  21
## 7                    Interactive Media Arts  20

Now let’s plot the lollipop chart from the frequency table.

ggplot(tb, aes(n, reorder(major, -n), label = n)) +
  geom_segment(aes(x = 0, y = reorder(major, -n), xend = n, yend = reorder(major, -n)), 
               size = 0.5, color = "grey50") +
  ## reorder Group2 by its frequency in descending order
  geom_point(size = 8) +
  ## geom_point() creates scatterplots that display the relationship between two variables, which takes the values of both x and y axes.
  ## Here, if we add geom_point() before geom_segment(), we'll see the sticks stabbing the points. 
  ## So the sequence actually matters here. 
  geom_text(color = "white", size = 3) +
  coord_flip() +
  theme(axis.text.x = element_text(size = 12, angle = 90, hjust = 1),
  ## The label of x axis would be too wide to lay out horizontally. hjust = 1 means right-justified.
        axis.text.y = element_text(size = 12),
        axis.title.y = element_blank(),
        axis.title.x = element_blank(),
        axis.ticks.x = element_line(size = 0))