Filling in missing values with previous or next value of selected columns can be useful, when values are recorded only when they change and otherwise not recorded.
For instance, below we have a sample dataset that is part of a tracking log (transformed from JSON format), which recorded user behaviors of an online course site. User interactions with the course site are logged when events data are emitted by the server, the browser, or the mobile device.
head(session, 10)
## datetime event id
## 1 2016-02-21 09:14:49 <NA> 5
## 2 2016-02-21 09:14:48 <NA> 5
## 3 2016-03-10 19:52:23 <NA> 16
## 4 2016-03-10 19:46:06 <NA> 16
## 5 2016-03-10 19:51:33 seek_video 16
## 6 2016-03-10 19:46:06 pause_video 16
## 7 2016-03-10 19:48:41 seek_video 16
## 8 2016-03-10 19:46:06 play_video 16
## 9 2016-03-10 19:56:00 <NA> 16
## 10 2016-03-10 19:43:38 <NA> 16
What we want to do is fill in the missing values of event
for each id
using the previous value of event
.
We show how to do this with tidyr::fill()
and zoo::na.locf()
below.
tidyr::fill()
In fill()
, .direction
is the direction in which to fill missing values. We can change the direction to “down” (the default), “up”, “downup” (first down and then up) or “updown” (first up and then down).
library(tidyr)
library(dplyr)
session2 <- session %>%
mutate(event2 = event) %>%
group_by(id) %>%
fill(event2, .direction = "down")
head(session2, 20)
## # A tibble: 20 x 4
## # Groups: id [2]
## datetime event id event2
## <dttm> <fct> <int> <fct>
## 1 2016-02-21 09:14:49 <NA> 5 <NA>
## 2 2016-02-21 09:14:48 <NA> 5 <NA>
## 3 2016-03-10 19:52:23 <NA> 16 <NA>
## 4 2016-03-10 19:46:06 <NA> 16 <NA>
## 5 2016-03-10 19:51:33 seek_video 16 seek_video
## 6 2016-03-10 19:46:06 pause_video 16 pause_video
## 7 2016-03-10 19:48:41 seek_video 16 seek_video
## 8 2016-03-10 19:46:06 play_video 16 play_video
## 9 2016-03-10 19:56:00 <NA> 16 play_video
## 10 2016-03-10 19:43:38 <NA> 16 play_video
## 11 2016-03-10 19:46:51 play_video 16 play_video
## 12 2016-03-10 19:51:28 speed_change_video 16 speed_change_video
## 13 2016-03-10 19:55:46 seq_goto 16 seq_goto
## 14 2016-03-10 19:48:10 <NA> 16 seq_goto
## 15 2016-03-10 19:51:36 play_video 16 play_video
## 16 2016-03-10 19:51:30 play_video 16 play_video
## 17 2016-03-10 19:57:32 play_video 16 play_video
## 18 2016-03-10 19:55:49 load_video 16 load_video
## 19 2016-03-10 19:58:07 <NA> 16 load_video
## 20 2016-03-10 19:46:46 load_video 16 load_video
zoo::na.locf()
Package zoo
deals with time series data efficiently.
Below we use its function na.locf0()
, which is like na.locf()
except it defaults to na.rm = FALSE
and requires a single vector argument. na.locf()
is a generic function for replacing each NA with the most recent non-NA prior to it.
library(zoo)
session3 <- session %>%
group_by(id) %>%
mutate(event2 = na.locf0(event, fromLast = FALSE))
In na.locf0()
, fromLast
sets whether observations are to be carried backward or forward. fromLast = FALSE
fills the value with the previous entry. fromLast = TRUE
carries the next observation backward.
The result is shown below.
head(session3, 20)
## # A tibble: 20 x 4
## # Groups: id [2]
## datetime event id event2
## <dttm> <fct> <int> <fct>
## 1 2016-02-21 09:14:49 <NA> 5 <NA>
## 2 2016-02-21 09:14:48 <NA> 5 <NA>
## 3 2016-03-10 19:52:23 <NA> 16 <NA>
## 4 2016-03-10 19:46:06 <NA> 16 <NA>
## 5 2016-03-10 19:51:33 seek_video 16 seek_video
## 6 2016-03-10 19:46:06 pause_video 16 pause_video
## 7 2016-03-10 19:48:41 seek_video 16 seek_video
## 8 2016-03-10 19:46:06 play_video 16 play_video
## 9 2016-03-10 19:56:00 <NA> 16 play_video
## 10 2016-03-10 19:43:38 <NA> 16 play_video
## 11 2016-03-10 19:46:51 play_video 16 play_video
## 12 2016-03-10 19:51:28 speed_change_video 16 speed_change_video
## 13 2016-03-10 19:55:46 seq_goto 16 seq_goto
## 14 2016-03-10 19:48:10 <NA> 16 seq_goto
## 15 2016-03-10 19:51:36 play_video 16 play_video
## 16 2016-03-10 19:51:30 play_video 16 play_video
## 17 2016-03-10 19:57:32 play_video 16 play_video
## 18 2016-03-10 19:55:49 load_video 16 load_video
## 19 2016-03-10 19:58:07 <NA> 16 load_video
## 20 2016-03-10 19:46:46 load_video 16 load_video