Visualization

Timelines in R

A simple worked example for a timeline in R with ggplot() and survey data.

We conducted two waves of a public opinion survey in Lebanon. One of the primary themes assessed in the survey was Lebanese perceptions of Syrian refugees in the country. Nearly 20% of the resident population of Lebanon are refugees from Syria, making Lebanon the country with the highest ratio of refugees-to-citizens in the word and in recent history. Before and during the second wave of surveying, the Lebanese Armed Forces (LAF) and Hizbullah were engaged in military operations in the Ras Baalbek region of Lebanon, seeking to expel the Syrian Al-Qaida affiliate Hay’at Tahrir al-Sham (formerly Jabhat al-Nusra) from this territory. These military operations took place between approximately 21 July and 30 August 2017. The operations were covered extensively by the media, and human rights groups also raised concerns about the LAF’s treatment of detained Syrian refugees.

As these operations could plausibly have affected Lebanese perceptions of Syrian refugees in the country, we sought to highlight the overlap between our surveys and these operations, in order to more fully explore the relationship between these events and Lebanese public opinion. For example, the operations could have improved Lebanese perceptions of safety and security, through the elimination of a potential threat, or these operations could have hightened Lebanese perceptions of the Syrian presence in the country as a ‘threat’ to security and stability, or both.

This example of a timeline in R uses selected data from the two survey waves, available here, or the data may be loaded via the console with the url() function. Examining the data, there are three variables: an end date and time for the survey, an indicator for the survey wave and an indicator for the gender of the respondent.

library(tidyverse)
library(scales)
data <- read.csv(url("http://rnotr.com/assets/files/survey_dates.csv"), header = TRUE)
head(data)
                            end   wave gender
1 2017-04-24T13:07:13.000+03:00 Wave I   Male
2 2017-04-26T09:50:38.000+03:00 Wave I   Male
3 2017-04-26T10:27:34.000+03:00 Wave I   Male
4 2017-04-26T11:16:44.000+03:00 Wave I   Male
5 2017-04-26T11:20:12.000+03:00 Wave I   Male
6 2017-04-26T12:02:46.000+03:00 Wave I   Male

The data were collected with KoBo Toolbox, and the end time was recorded as the system date and time on the tablet when the form was submitted; however, we only want the calendar date. Looking at the end variable, we can see that it’s (a) a factor and (b) that the letter ‘T’ clearly separates the date and time from one another. The time is recorded in hours:minutes:seconds with a time-zone adjustment. We can use strsplit() to extract just the date, but then we must transpose this matrix with t() to bind it back to our original data frame. We then use as.Date() to transform this to the date class, indicating that the format of the date is YYYY-MM-DD.

dates <- strsplit(as.character(data$end),"T") %>%
  data.frame() %>% 
  t() %>% 
  data.frame() 
colnames(dates) <- c("Date","Time")
dates <- cbind(dates,data)
dates$Date <- as.character(dates$Date) %>% 
  as.Date(format = "%Y-%m-%d") # capital 'Y', 4-digit year

Because we want to overlay our survey observations with the timeframe of the military operations in Ras Baalbek, we make a small data frame with only a start and end date for the operations, specifying the date in the same format as our survey observations.

operations <- data.frame(xstart = as.Date("2017-07-21", format = "%Y-%m-%d"), 
                    xend = as.Date("2017-08-30", format = "%Y-%m-%d"))

We can then make a very basic plot, overlaying a histogram of observations on a rectangle indicating the timeframe of the military operations. While this works, there is much that can be done to improve this visualization.

ggplot()+
  geom_rect(data = operations, aes(xmin = xstart, xmax = xend, ymin = 0, ymax = Inf)) + 
  stat_bin(data = dates, aes(x=Date, group=wave, fill=wave), position="identity") +
  ggtitle("Not a Very Nice Plot")

plot of chunk first_timeline

We can add some additional parameters in the call to ggplot():

  • Make the rectangle indicating the timeline of operations semi-transparent.
  • Change the binwidh to one day and add a white border to the histogram bars.
  • Add in a text annotation over the rectangle.
  • Specify custom colors and replace the legend title.
  • Add axis labels and a plot title
  • Increase both the x and y limits marginally to add some padding.
  • Remove the blank space between the x-axis and the histogram with expand = c(0,0).
  • Change the theme, centering the title and moving the legend to the bottom.

The result is our final timeline.

ggplot() +
  geom_rect(data = operations, aes(xmin = xstart, xmax = xend, ymin = 0, ymax = Inf), 
    alpha = 0.3) + 
  stat_bin(data = dates, aes(x=Date, group=wave, fill=wave), colour = "white", 
    binwidth=1, position="identity") + 
  annotate("text", label = "Operations in\nRas Baalbek", x = as.Date("2017-08-10", 
    format = "%Y-%m-%d"), y = 280, size = 4) +
  scale_fill_manual(values=c("#45B29D", "#E27A3F"), name = "Wave") +
  xlab("\nDate of Interview") + ylab("Frequency\n") + 
  ggtitle("\nTimeline of Interviews by Wave, May - September 2017 \n") +
  scale_y_continuous(limits = c(0,325), expand = c(0, 0)) +
  scale_x_date(breaks=date_breaks("1 month"), labels=date_format("%b %y"), 
    limits=c(as.Date("2017-04-01", format = "%Y-%m-%d"), 
    as.Date("2017-10-01", format = "%Y-%m-%d"))) +
  theme_bw(base_size = 14) + 
  theme(legend.position = "bottom", plot.title = element_text(hjust = .5)) 

plot of chunk final_timeline

Some of the logistical trends of surveying are fairly apparent, but more so with the Wave II histogram:

  • The small start is the pilot of the questionaire.
  • The steady rise from the start date is the deployment of additional teams over succesive days.
  • The small dips are weekends, when not all enumerators worked.
  • The very largest dips in Wave II are the Eid al-Adha (1 September) and Islamic New Year (21 September) holidays.
  • The trailing-end is the completion of replacement surveys and surveys in more remote and difficult to access areas.

GGPLOT
tutorial survey timeline dates

Dialogue & Discussion