Dates and times are not the most pleasant objects to deal with. At times they can be confusing and even annoying. Base R have built-in functions that handle dates and times in their creation, conversion, extraction, and calculation. Additionally, package lubridate
has made manipulating dates and times more intuitive in terms of conversion, accessing components and calculating time spans. Package zoo
has made dealing with the neighboring values of dates and times a lot easier.
Below we will go through the fundamentals of working with dates and times with base R functions and package lubridate
.
There are three basic date and time classes: Date
, POSIXct
and POSIXlt
. Class Date
handles dates without times. POSIXct
(calendar time) and POSIXlt
(local time) represent dates and times.
Date
and POSIXct
are internally stored as number of days and seconds from January 1, 1970. POSIXlt
stores dates and times as a list of components: second, minute, hour, day, month, year, time zone etc.
now <- Sys.time()
datetime1 <- as.POSIXct(now)
unclass(datetime1)
## [1] 1557735496
datetime2 <- as.POSIXlt(now)
unclass(datetime2)
## $sec
## [1] 16.06629
##
## $min
## [1] 18
##
## $hour
## [1] 16
##
## $mday
## [1] 13
##
## $mon
## [1] 4
##
## $year
## [1] 119
##
## $wday
## [1] 1
##
## $yday
## [1] 132
##
## $isdst
## [1] 0
##
## $zone
## [1] "CST"
##
## $gmtoff
## [1] 28800
##
## attr(,"tzone")
## [1] "" "CST" "CDT"
When working with dates and times, we use the simplest object that solves the problem. For instance, we use Date
to work with dates without times, and POSIXct
to work with date-times.
Base R functions as.Date()
, as.POSIXct()
, as.POSIXlt()
and strptime()
, and the group of lubridate
functions ymd()
, hm()
,ymd_hms()
etc. convert characters to dates and times objects.
as.Date()
, as.POSIXct()
, as.POSIXlt()
as.Date()
, as.POSIXct()
and as.POSIXlt()
convert dates and times in character forms to classes of dates and times.
date_string <- "2019-01-14 14:17:30"
as.Date(date_string)
## [1] "2019-01-14"
as.POSIXct(date_string)
## [1] "2019-01-14 14:17:30 CST"
as.POSIXlt(date_string)
## [1] "2019-01-14 14:17:30 CST"
as.Date()
, as.POSIXct()
and as.POSIXlt()
accept various input formats.
The default input formats are year-month-day hour:minutes:seconds or year/month/day hour:minutes:seconds.
as.Date("2019/01/14 14:17:30")
## [1] "2019-01-14"
as.POSIXct("2019/01/14 14:17:30")
## [1] "2019-01-14 14:17:30 CST"
as.POSIXlt("2019/01/14 14:17:30")
## [1] "2019-01-14 14:17:30 CST"
If the input format is not standard, we need to set the format
argument to map the displayed format.
%b
abbreviated month name%m
month as decimal number (01–12)%c
date and time%d
day of the month as decimal number (01–31)%e
day of the month as decimal number (1–31)%H
hours as decimal number (00–23); strings such as 24:00:00 are accepted for input%I
hours as decimal number (01–12)%M
minute as decimal number (00–59)%S
second as integer (00–61)%OS
seconds including fractional seconds%Y
year with century%y
year without century (00–99)The full list of allowed formats can be found by ?strptime()
.
The first group of examples:
as.Date("14jan2019 14:17:30", "%d%b%Y")
## [1] "2019-01-14"
as.POSIXct("14jan2019 14:17:30", format = "%d%b%Y %H:%M:%S")
## [1] "2019-01-14 14:17:30 CST"
as.POSIXlt("14jan2019 14:17:30", format = "%d%b%Y %H:%M:%S")
## [1] "2019-01-14 14:17:30 CST"
Another group of examples:
as.Date("14/01/2019T14:17:30", "%d/%m/%Y")
## [1] "2019-01-14"
as.POSIXct("14/01/2019T14:17:30", format = "%d/%m/%YT%H:%M:%S", tz = "GMT")
## [1] "2019-01-14 14:17:30 GMT"
as.POSIXlt("14/01/2019T14:17:30", format = "%d/%m/%YT%H:%M:%S", tz = "GMT")
## [1] "2019-01-14 14:17:30 GMT"
To keep the fractional seconds:
as.POSIXct("2016-01-24T10:51:53.798520+00:00", format = "%Y-%m-%dT%H:%M:%OS", tz = "UTC")
## [1] "2016-01-24 10:51:53 UTC"
There are times we get incomplete date-time information. For instance, in this example below, we want to plot the time series but our data only contains years and months. We end up adding the first days of each month to the strings so that we can make the line plot.
year_month <- c("Jan-1992", "Feb-1992", "Mar-1992", "Apr-1992", "May-1992", "Jun-1992", "Jul-1992", "Aug-1992", "Sep-1992", "Oct-1992", "Nov-1992", "Dec-1992")
value <- c(13, 63, 53, 84, 83, 82, 90, 45, 36, 54, 53, 56)
date <- as.Date(paste("01-", year_month, sep = ""), format = "%d-%b-%Y")
df <- data.frame(year_month, date, value)
head(df, 3)
## year_month date value
## 1 Jan-1992 1992-01-01 13
## 2 Feb-1992 1992-02-01 63
## 3 Mar-1992 1992-03-01 53
plot(df$date, df$value, type = "l", xlab = "date", ylab = "value")
strptime()
strptime()
converts characters to POSIXlt
date-time objects.
strptime("14jan2019 14:17:30", "%d%b%Y %H:%M:%S")
## [1] "2019-01-14 14:17:30 CST"
strptime("14/01/2019T14:17:30", "%d/%m/%YT%H:%M:%S")
## [1] "2019-01-14 14:17:30 CST"
lubridate
functionslubridate
provides more intuitive ways to convert characters to dates and times.
ymd()
, ydm()
, mdy()
, myd()
, dmy()
, dym()
, yq()
parses dates with year, month, and day components.
library(lubridate)
ymd("2019/01/14")
## [1] "2019-01-14"
ydm("2019-14-01")
## [1] "2019-01-14"
dmy("14jan2019")
## [1] "2019-01-14"
hm()
, ms()
, hms()
parses periods with hour, minute, and second components.
hms("14:17:30")
## [1] "14H 17M 30S"
hm("14:17")
## [1] "14H 17M 0S"
ms("17:30")
## [1] "17M 30S"
ymd_hms()
, ymd_hm()
, ymd_h()
, dmy_hms()
, dmy_hm()
, dmy_h()
, mdy_hms()
, mdy_hm()
, mdy_h()
, ydm_hms()
, ydm_hm()
, ydm_h()
parses date-times with year, month, and day, hour, minute, and second components.
ymd_hms("2019/01/14 14:17:30")
## [1] "2019-01-14 14:17:30 UTC"
dmy_hms("14jan2019 14:17:30")
## [1] "2019-01-14 14:17:30 UTC"
dmy_hms("14/01/2019T14:17:30")
## [1] "2019-01-14 14:17:30 UTC"
The Unix epoch is the number of seconds that have elapsed since January 1, 1970. To convert the Unix timestamp to a date-time object, we need to set the origin
argument.
date <- c(1304362260, 1216256400, 1311344765, 1331309010, 1297437420, 1417795235)
date <- as.POSIXct(date, origin = "1970-01-01")
date
## [1] "2011-05-03 02:51:00 CST" "2008-07-17 09:00:00 CST"
## [3] "2011-07-22 22:26:05 CST" "2012-03-10 00:03:30 CST"
## [5] "2011-02-11 23:17:00 CST" "2014-12-06 00:00:35 CST"
It is as easy as below.
datetime <- Sys.time()
class(datetime)
## [1] "POSIXct" "POSIXt"
date <- as.Date(datetime)
class(date)
## [1] "Date"
We can reset the date-time object to the internal numeric form with as.numeric()
.
as.numeric(Sys.time())
## [1] 1557735497
We can use seq()
to create date-time sequences.
seq(from = as.Date("2019/01/14"), to = as.Date("2020/01/14"), by = "month")
## [1] "2019-01-14" "2019-02-14" "2019-03-14" "2019-04-14" "2019-05-14"
## [6] "2019-06-14" "2019-07-14" "2019-08-14" "2019-09-14" "2019-10-14"
## [11] "2019-11-14" "2019-12-14" "2020-01-14"
seq(from = as.Date("2019/01/14"), to = as.Date("2019/07/14"), by = "week")
## [1] "2019-01-14" "2019-01-21" "2019-01-28" "2019-02-04" "2019-02-11"
## [6] "2019-02-18" "2019-02-25" "2019-03-04" "2019-03-11" "2019-03-18"
## [11] "2019-03-25" "2019-04-01" "2019-04-08" "2019-04-15" "2019-04-22"
## [16] "2019-04-29" "2019-05-06" "2019-05-13" "2019-05-20" "2019-05-27"
## [21] "2019-06-03" "2019-06-10" "2019-06-17" "2019-06-24" "2019-07-01"
## [26] "2019-07-08"
We can also use paste()
to concatenate characters and then convert them to date and time objects.
date <- c("14jan2019", "14feb2019", "14mar2019")
time <- c("14:17:30", "15:17:30", "16:17:30")
datetime <- paste(date, time)
datetime <- as.POSIXct(datetime, format = "%d%b%Y %H:%M:%S")
datetime
## [1] "2019-01-14 14:17:30 CST" "2019-02-14 15:17:30 CST"
## [3] "2019-03-14 16:17:30 CST"
make_datetime()
and make_date()
from package lubridate
create date-time objects from numeric components.
make_datetime(year = 2019, month = 1, day = 14, hour = 14, min = 17, sec = 30)
## [1] "2019-01-14 14:17:30 UTC"
Earlier we’ve said that POSIXlt
objects store dates and times components of second, minute, hour, day, month, year, time zone etc. as a list. To access the components, we can use the $
operator to subset the elements we need.
now <- Sys.time()
now <- as.POSIXlt(now)
now
## [1] "2019-05-13 16:18:17 CST"
now$year
## [1] 119
now$wday
## [1] 1
now$zone
## [1] "CST"
strptime()
Since strptime()
returns POSIXlt
objects, we can also use $
to obtain strptime()
components. The outputs are integers.
strptime("14jan2019 14:17:30", "%d%b%Y %H:%M:%S")$year
## [1] 119
strptime("14jan2019 14:17:30", "%d%b%Y %H:%M:%S")$mon
## [1] 0
strptime("14jan2019 14:17:30", "%d%b%Y %H:%M:%S")$mday
## [1] 14
strptime("14jan2019 14:17:30", "%d%b%Y %H:%M:%S")$wday
## [1] 1
strptime("14jan2019 14:17:30", "%d%b%Y %H:%M:%S")$yday
## [1] 13
strptime("14jan2019 14:17:30", "%d%b%Y %H:%M:%S")$hour
## [1] 14
strptime("14jan2019 14:17:30", "%d%b%Y %H:%M:%S")$min
## [1] 17
strptime("14jan2019 14:17:30", "%d%b%Y %H:%M:%S")$sec
## [1] 30
strftime()
We can also extract components from strftime()
, if the string is in standard unambiguous format.
strftime("2019-01-14 14:17:30", "%Y")
## [1] "2019"
strftime("2019-01-14 14:17:30", "%H:%M:%S")
## [1] "14:17:30"
strftime("2019-01-14 14:17:30", "%H:%M")
## [1] "14:17"
strftime("2019-01-14 14:17:30", "%M:%S")
## [1] "17:30"
The outputs are characters.
weekdays()
, months()
, quarters()
weekdays()
, months()
, and quarters()
are base R functions to extract parts of a POSIXt
or Date
object.
now <- Sys.time()
weekdays(now)
## [1] "Monday"
months(now)
## [1] "May"
quarters(now)
## [1] "Q2"
lubridate
accessor functionsyear()
, month()
, week()
, date()
, day()
, mday()
(day of the month), wday()
(day of the week), hour()
, minute()
, second()
, and tz()
(time zone) are lubridate
accessor functions to extract components from the date-time objects.
now <- Sys.time()
year(now)
## [1] 2019
month(now)
## [1] 5
wday(now, label = TRUE)
## [1] Mon
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
second(now)
## [1] 17.1059
difftime()
x <- strptime("2019-01-14 14:17:30", "%Y-%m-%d %H:%M:%S")
y <- strptime("2018-12-14 18:10:12", "%Y-%m-%d %H:%M:%S")
x - y
## Time difference of 30.8384 days
The base R function difftime()
calculates a difference of two date-time objects and returns a difftime
object.
difftime(x, y)
## Time difference of 30.8384 days
difftime(x, y, units = "hours")
## Time difference of 740.1217 hours
difftime(x, y, units = "mins")
## Time difference of 44407.3 mins
difftime(x, y, units = "secs")
## Time difference of 2664438 secs
difftime(x, y, units = "days")
## Time difference of 30.8384 days
difftime(x, y, units = "weeks")
## Time difference of 4.405486 weeks
difftime
objects can be converted to numeric objects with as.numeric()
.
z<- difftime(x, y)
as.numeric(z, unit = "hours")
## [1] 740.1217
as.numeric(z, unit = "mins")
## [1] 44407.3
lubridate
functionsIn package lubridate
, a time span can be measured in three ways as a duration, a period or an interval.
Time zones, and coonventions such as daylight savings time and leap years / seconds are tricky things to watch out for when working with dates and times.
Let’s look at two examples from lubridate
documentation.
The first example shows the tricky condition of the leap year, when 1 year is not 365 days, but 366 days.
leap <- interval(ymd("2016-01-01"), ymd("2017-01-01"))
leap
## [1] 2016-01-01 UTC--2017-01-01 UTC
as.period(leap, unit = "days")
## [1] "366d 0H 0M 0S"
as.period(leap, unit = "years")
## [1] "1y 0m 0d 0H 0M 0S"
as.period(leap, unit = "seconds")
## [1] "31622400S"
as.duration(leap)
## [1] "31622400s (~1 years)"
Another example shows the tricky condition of daylight savings time. In this case, a day has 25 hours, rather than 24 hours.
dst <- interval(ymd("2016-11-06", tz = "America/Chicago"), ymd("2016-11-07", tz = "America/Chicago"))
dst
## [1] 2016-11-06 CDT--2016-11-07 CST
as.period(dst, unit = "hours")
## [1] "25H 0M 0S"
Like difftime
objects, period
objects can be converted to numeric objects.
per <- period(hours = 10, minutes = 6)
per
## [1] "10H 6M 0S"
as.numeric(per, unit = "hours")
## [1] 10.1
as.numeric(per, unit = "minutes")
## [1] 606
Functions round_date()
, floor_date()
and ceiling_date()
from lubridate()
round the date-time objects to a nearby unit of time.
round_date()
takes a date-time object and rounds it to the nearest value of the specified time unit. It rounds up the object when the date-time value is exactly halfway between two consecutive units.
datetime <- ymd_hms("2019/01/14 14:17:30.59")
round_date(datetime)
## [1] "2019-01-14 14:17:31 UTC"
floor_date()
takes a date-time object and rounds it down to the nearest boundary of the specified time unit.
floor_date(datetime)
## [1] "2019-01-14 14:17:30 UTC"
ceiling_date()
takes a date-time object and rounds it up to the nearest boundary of the specified time unit.
ceiling_date(datetime)
## [1] "2019-01-14 14:17:31 UTC"
We can set units by which to round date-time. Valid units are second
, minute
, hour
, day
, week
, month
, bimonth
, quarter
, season
, halfyear
and year
.
round_date(datetime, "sec")
## [1] "2019-01-14 14:17:31 UTC"
round_date(datetime, "min")
## [1] "2019-01-14 14:18:00 UTC"
round_date(datetime, "15 mins")
## [1] "2019-01-14 14:15:00 UTC"
round_date(datetime, "hours")
## [1] "2019-01-14 14:00:00 UTC"
round_date(datetime, "day")
## [1] "2019-01-15 UTC"
Time zones are stored as character strings of date-time objects in R. The time zone is an attribute that only controls printing.
R relies on the user’s OS to interpret time zone names. We can get the complete list of all time zone names with `OlsonNames(), a data base originally compiled by Arthur Olson. These names take the form “Country/City”.
head(OlsonNames())
## [1] "Africa/Abidjan" "Africa/Accra" "Africa/Addis_Ababa"
## [4] "Africa/Algiers" "Africa/Asmara" "Africa/Asmera"
POSIXct
and POSIXlt
classes contain the time zone attribute.
as.POSIXct("2019-01-14 14:17:30", format = "%Y-%m-%d %H:%M:%S", tz = "GMT")
## [1] "2019-01-14 14:17:30 GMT"
as.POSIXlt("2019-01-14 14:17:30", format = "%Y-%m-%d %H:%M:%S", tz = "GMT")
## [1] "2019-01-14 14:17:30 GMT"
with_tz()
and force_tz()
from lubridate
provide ways to change time zones.
with_tz()
displays the date-time in a different time zone, while the actual time has not been changed.
x <- strptime("2019-01-14 14:17:30", "%Y-%m-%d %H:%M:%S")
x
## [1] "2019-01-14 14:17:30 CST"
with_tz(x, tzone = "America/New_York")
## [1] "2019-01-14 01:17:30 EST"
force_tz()
sets the date-time to a different time zone.
x
## [1] "2019-01-14 14:17:30 CST"
force_tz(x, tzone = "America/New_York")
## [1] "2019-01-14 14:17:30 EST"
lubridate
to work with dates and times