Week 5: Annotations

Tutorial
Get with the Plot!
Data Visualization
R
Learn how to add annotations and text labels in ggplot2.
Author
Published

November 13, 2018

This is the fifth of a series of posts on how to use ggplot2 to visualise data in R.

We begin by loading the tidyverse package which contains ggplot2 alongside other useful packages. If you haven’t yet, you first need to install the tidyverse package by running install.packages("tidyverse").

library(tidyverse)

This week’s dataset comes from a study by Cuddy et al. (2009). Students from seven EU nations (Belgium, France, Germany, the Netherlands, Portugal, Spain, and the UK) rated how competent and warm they thought each of fifteen EU nations (including their own) was perceived by other EU citizens.

dl <- read_rds("https://github.com/nilsreimer/data-visualisation-workshop/raw/master/materials/gwtp/dl_wk5.rds")
print(dl, n = 5)
# A tibble: 22 × 4
  country rater competence warmth
  <chr>   <chr>      <dbl>  <dbl>
1 Austria other      0.369 0.219 
2 Belgium other      0.378 0.418 
3 Belgium same       0.505 0.0903
4 Denmark other      0.422 0.365 
5 Finland other      0.351 0.327 
# ℹ 17 more rows

This dataset contains the aggregated competence and warmth ratings for each country. It contains two ratings for countries that were represented among respondents, one by raters from the same country (rater == "same") and one by raters from other countries (rater == "other").

In another post, we have already used geom_hline() to annotate a plot. We add geom_vline() to divide the plot into quadrants.

ggplot(dl, aes(x = competence, y = warmth)) +
  geom_hline(yintercept = 0.5, linetype = "dashed", colour = "grey20") +
  geom_vline(xintercept = 0.5, linetype = "dashed", colour = "grey20") +
  geom_point() +
  coord_fixed(1, xlim = c(0, 1), ylim = c(0, 1))

All of this should be familiar by now. If not, have a look at the other posts before reading on or use the help() function. This plot shows that respondents rated only one country as both competent and warm (upper-right quadrant).1

We use now-familiar commands to compare how ratings by the same group and other groups differ.

ggplot(dl, aes(x = competence, y = warmth)) +
  geom_hline(yintercept = 0.5, linetype = "dashed", colour = "grey20") +
  geom_vline(xintercept = 0.5, linetype = "dashed", colour = "grey20") +
  geom_line(aes(group = country)) +
  geom_point(aes(shape = rater)) +
  coord_fixed(1, xlim = c(0, 1), ylim = c(0, 1))

Overall, respondents tended to think that their own country was seen as warmer and more competent than respondents from other countries thought.

One striking exception is Belgium. Belgians seem to think that other EU citizens see Belgians as a lot less warm. This, however, is difficult to tell from the plot. We add an annotate() layer to highlight this data point.

ggplot(dl, aes(x = competence, y = warmth)) +
  geom_hline(yintercept = 0.5, linetype = "dashed", colour = "grey20") +
  geom_vline(xintercept = 0.5, linetype = "dashed", colour = "grey20") +
  annotate(
    geom = "point",
    x = 0.5048966268,
    y = 0.0903429925,
    size = 4,
    colour = "red"
  ) +
  geom_line(aes(group = country)) +
  geom_point(aes(shape = rater)) +
  coord_fixed(1, xlim = c(0, 1), ylim = c(0, 1))

The annotate() function does not inherit aesthetics (x, y) from the ggplot() function. It can take on the form of any other geom (in this case, geom = "point"). Note that we have placed the annotate() layer below the geom_point() layer.

This plot highlights Belgians’ ratings of how they thought Belgians were seen by others. We could explain this in the plot’s caption, but it’d be easier for the reader if we included that information in the plot itself. We add another annotate() layer.

ggplot(dl, aes(x = competence, y = warmth)) +
  geom_hline(yintercept = 0.5, linetype = "dashed", colour = "grey20") +
  geom_vline(xintercept = 0.5, linetype = "dashed", colour = "grey20") +
  annotate(
    geom = "point",
    x = 0.5048966268,
    y = 0.0903429925,
    size = 4,
    colour = "red"
  ) +
  annotate(
    geom = "text",
    x = 0.5048966268,
    y = 0.0903429925,
    label = "Belgium",
    vjust = 1.5
  ) +
  geom_line(aes(group = country)) +
  geom_point(aes(shape = rater)) +
  coord_fixed(1, xlim = c(0, 1), ylim = c(0, 1))

This time, the annotate() layer creates a geom_text() layer. Note that geom_text() requires the label aesthetic in addition to the x and y aesthetics. The vjust aesthetic specifies the vertical justification of the text relative to its x-y coordinates (see here for details).

We might want to highlight another data point. We can use vectors, c(...), to annotate more than one point.

ggplot(dl, aes(x = competence, y = warmth)) +
  geom_hline(yintercept = 0.5, linetype = "dashed", colour = "grey20") +
  geom_vline(xintercept = 0.5, linetype = "dashed", colour = "grey20") +
  annotate(
    geom = "point",
    x = c(0.5048966268, 0.6964091404), 
    y = c(0.0903429925, 0.1242246627),
    size = 4,
    colour = "red"
  ) +
  annotate(
    geom = "text",
    x = c(0.5048966268, 0.6964091404), 
    y = c(0.0903429925, 0.1242246627),
    label = c("Belgium", "UK"),
    vjust = 1.5
  ) +
  geom_line(aes(group = country)) +
  geom_point(aes(shape = rater)) +
  coord_fixed(1, xlim = c(0, 1), ylim = c(0, 1))

This plot shows that ratings by the same group and other groups are a lot closer for the UK than for Belgium. Still, this method gets cumbersome when we want to annotate more than two data points. Instead, we could add a layer based on another dataset.

For example, Cuddy et al. (2009) divided countries into three clusters: countries seen as low in competence and high in warmth (LC-HW), countries seen as high in competence and low in warmth (HC-LW), and countries seen as very high in competence and very low in warmth (HHC-LLW).

dc <- read_rds("https://github.com/nilsreimer/data-visualisation-workshop/raw/master/materials/gwtp/dc_wk5.rds")
print(dc)
# A tibble: 3 × 3
  competence warmth cluster
       <dbl>  <dbl> <chr>  
1      0.235  0.589 LC-HW  
2      0.444  0.356 HC-LW  
3      0.721  0.161 HHC-LLW

This dataset defines the centre of each cluster. We can use the data argument to add a geom_text() layer that uses the new dataset to position the cluster labels at the centre of each cluster.

ggplot(dl, aes(x = competence, y = warmth)) +
  geom_hline(yintercept = 0.5, linetype = "dashed", colour = "grey20") +
  geom_vline(xintercept = 0.5, linetype = "dashed", colour = "grey20") +
  geom_line(aes(group = country)) +
  geom_point(aes(shape = rater)) +
  geom_text(data = dc, aes(label = cluster)) +
  coord_fixed(1, xlim = c(0, 1), ylim = c(0, 1))

Instead, we can use a geom_label() layer which makes the text easier to read.

ggplot(dl, aes(x = competence, y = warmth)) +
  geom_hline(yintercept = 0.5, linetype = "dashed", colour = "grey20") +
  geom_vline(xintercept = 0.5, linetype = "dashed", colour = "grey20") +
  geom_line(aes(group = country)) +
  geom_point(aes(shape = rater)) +
  geom_label(data = dc, aes(label = cluster), alpha = 0.5) +
  coord_fixed(1, xlim = c(0, 1), ylim = c(0, 1))

Note that we have placed the geom_label() layer above the geom_point() layer, but have also made the labels’ background semi-transparent (alpha = 0.5). This plot shows that ratings by the same group and other groups are most similar for countries perceived as very competent.

Still, this plot would be more informative if every data point was labelled. We again use a geom_text() layer.

ggplot(dl, aes(x = competence, y = warmth)) +
  geom_point(aes(shape = rater)) +
  geom_text(aes(label = country)) +
  coord_fixed(1, xlim = c(0, 1), ylim = c(0, 1))

This plot is difficult to read as points and labels overlap. One way to go from here would be to remove the points and rely on labels alone. This, however, would make it difficult to infer the exact ratings for a country.

Instead, we load an extension for ggplot2 that allows adding labels that repel one another and are repelled by the data points. If you haven’t yet, you need to install the ggrepel package by running install.packages("ggrepel").

library(ggrepel)

The geom_text_repel() layer requires the same aesthetics as a geom_text() layer.

ggplot(dl, aes(x = competence, y = warmth)) +
  geom_point(aes(shape = rater)) +
  geom_text_repel(aes(label = country)) +
  coord_fixed(1, xlim = c(0, 1), ylim = c(0, 1))

This plot looks a lot better, though we might want to make clearer which ratings are by the same group and by other groups.

ggplot(dl, aes(x = competence, y = warmth, colour = rater)) +
  geom_point(aes(shape = rater)) +
  geom_text_repel(aes(label = country)) +
  scale_colour_grey(start = 0.0, end = 0.5) +
  coord_fixed(1, xlim = c(0, 1), ylim = c(0, 1))

This plot shows that, for ratings of other EU nations, competence and warmth seem to be negatively correlated. We might also want to add an annotate() layer to highlight the difference in ratings of Belgium.

ggplot(dl, aes(x = competence, y = warmth, colour = rater)) +
  annotate(
    geom = "segment",
    x = 0.3775843308, 
    y = 0.4178957764,
    xend = 0.5048966268, 
    yend = 0.0903429925,
    colour = "red",
    arrow = arrow(length = unit(0.2, "cm"), type = "closed")
  ) +
  geom_point(aes(shape = rater)) +
  geom_text_repel(aes(label = country)) +
  scale_colour_grey(start = 0.0, end = 0.5) +
  coord_fixed(1, xlim = c(0, 1), ylim = c(0, 1))

And that’s it for this post. You now know how to use annotations and labels to create more informative figures. If you have a question or found a mistake, please comment on Twitter or send me an email.

Next week, we’ll tackle the important issue of how to visualise results from statistical models in ggplot2.

Footnotes

  1. It’s French people rating how they think French people are seen by other EU nations.↩︎