Dates and Times

Data types and storage types

Values can be numeric or strings. Numeric values are stored as five types differing in range and accuracy: byte, int, long, float or double. float is the default storage type. Strings are stored as str#, where # indicates the maximum length of the string.

Storage types affect mainly how much memory will be needed. Usually we do not need to worry about the storage types, and Stata will take care of that and find the most efficient ways to store data.

Dates and times usually come in the human readable string forms, such as “March 22, 2017 16:15 pm”, “2017.03.22 16:15” etc. But Stata internally stores dates and times as integers and reads them as numeric values.

In fact, Stata understands a date and time variable as the difference from the base.The base (the numeric value 0) of a datetime variable begins at 01jan1960 00:00:00.000 (the first millisecond of 01jan1960); therefore “25jan2016 08:30:25” will be 1769329825000 (milliseconds) for Stata.

Display format

Display format allows us to specify the output of the date/time variable.
. format timevar %td sets timevar as a date variable. Here timevar is already a numeric date and time variable.
. format timevar %tcHH:MM:SS further sets timevar to be displayed as hour (00-23) : minute (00-59) :second (00-60).

format var %fmt changes the display format but not the contents of the variable. % starts the formatting, which can be a number, date, string, business calendar etc.

There are multiple ways to display a year, month, week, day, hour, minute, second etc. For instance, there are three additional codes to display the hour: Hh (00-12), hH (0-23), or hh (0-12).

Check the Stata help pages for the full list and the default display formats.

Base Display format Storage type String-to-numeric function
datetime
(assuming 86400 s/day)
01jan1960 %tc must be double clock()
datetime
(equivalent to UTC)
01jan1960 %tC must be double Clock()
date 01jan1960 %td may be float or long date()
weekly date 1960 week 1 %tw may be float or int weekly()
monthly date jan1960 %tm may be float or int monthly()
quarterly date 1960 quarter 1 %tq may be float or int quarterly()
half-yearly date 1960 half-year 1 %th may be float or int halfyearly()
yearly date 0 A.D %ty may be float or int yearly()

*Scroll left and right if the table exceeds the screen on mobile devices.

*Adapted from tables in Working with dates and times and datetime – Date and time values and variables.

More on the syntax of display formats for dates and times from the Stata data management manual

Conversion

In the last column of the table above we have introduced the functions function(string, mask[,topyear]) that transform strings to the numeric date and time variables.

In those functions, mask specifies the order of the components appearing in the string.

Mask
M month
D day
Y 4-digit year
19Y 2-digit year to be interpreted as 19xx
20Y 2-digit year to be interpreted as 20xx
h hour of day
m minutes within hour
s seconds within minute
# ignore one element

*Scroll left and right if the table exceeds the screen on mobile devices.

Source: datetime translation – String to numeric date translation functions

Here are some examples of the strings and their corresponding masks:

datetime mask
“25jan2016 08:30:25” “DMYhms”
“2016-01-25 08:30:25” “YMDhms”
“16-01-25 08:30” “20YMDhm”
“08:30:25 UTC 01252016 ” “hms#MDY”

*Scroll left and right if the table exceeds the screen on mobile devices.

Suppose we have the following dataset and the string variable anntims:
. list in 1/5


     +----------------------+
     |  anntims     anndats |
     |----------------------|
  1. | 00:46:00   25feb2008 |
  2. | 00:46:00   25feb2008 |
  3. | 00:46:00   25feb2008 |
  4. | 00:46:00   25feb2008 |
  5. | 00:46:00   25feb2008 |
     +----------------------+		
		

We want to generate a new time variable from it:
. gen double anntims2 = clock(anntims,"hms")
. format anntims2 %tcHH:MM:SS
. list anntims2 in 1/5


     +----------+
     | anntims2 |
     |----------|
  1. | 00:46:00 |
  2. | 00:46:00 |
  3. | 00:46:00 |
  4. | 00:46:00 |
  5. | 00:46:00 |
     +----------+			
		

We also want to create a new variable with both date and time from the variables anntims (string) and anndats (numeric).
. gen double anndatims = dhms(anndats,hh(anntims2),mm(anntims2),ss(anntims2))
. format anndatims %tcCCYY.NN.DD_HH:MM:SS
. list anndatims in 1/5


     +---------------------+
     |           anndatims |
     |---------------------|
  1. | 2008.02.25 00:46:00 |
  2. | 2008.02.25 00:46:00 |
  3. | 2008.02.25 00:46:00 |
  4. | 2008.02.25 00:46:00 |
  5. | 2008.02.25 00:46:00 |
     +---------------------+		
		

Working with two-digit-years

When working with the two-digit years, in addition to specifying the 19Y or 20Y masks, we can specify the topyear option in date and time function(string, mask, topyear). The full year to be returned will be the year that does not exceed the largest value of the specified topyear.

This is useful when we want to interpret the first two digits of the year components differently of a variable.

For instance, suppose we have a date variable date (string). We want to obtain the full years in the range of 1949-2048 and, based on the knowledge of the data, would like to interpret digits equal to or larger than 49 as from the years 19** and others as from 20**:
. list in 1/5


     +--------+
     |   date |
     |--------|
  1. | 470505 |
  2. | 480404 |
  3. | 490101 |
  4. | 500202 |
  5. | 510303 |
     +--------+			
		

. gen float fulldate = date(date, "YMD", 2048)
. format fulldate %td
. list in 1/5


     +--------------------+
     |   date    fulldate |
     |--------------------|
  1. | 470505   05may2047 |
  2. | 480404   04apr2048 |
  3. | 490101   01jan1949 |
  4. | 500202   02feb1950 |
  5. | 510303   03mar1951 |
     +--------------------+			
		

More on string to numeric date translation functions from the Stata data management manual

Extraction

The functions below extract the components from the date and time variables.

time-of-day component function
hour of day hh(tc) or hhC(tC)
minutes of day mm(tc) or mmC(tC)
seconds of day ss(tc) or ssC(tC)
date component function
calendar year year(td)
calendar month month(td)
calendar day day(td)
day of week (0=Sunday) dow(td)
day of year (1=first day) doy(td)
week within year (1=first week) week(td)
quarter within year (1=first quarter) quarter(td)
half within year (1=first half) halfyear(td)

*Scroll left and right if the table exceeds the screen on mobile devices.

*Adapted from tables in datetime – Date and time values and variables

Suppose we want to extract the year from the variable anndats we've seen earlier:


     +----------------------+
     |  anntims     anndats |
     |----------------------|
  1. | 00:46:00   25feb2008 |
  2. | 00:46:00   25feb2008 |
  3. | 00:46:00   25feb2008 |
  4. | 00:46:00   25feb2008 |
  5. | 00:46:00   25feb2008 |
     +----------------------+		
		

. gen annyrs = year(anndats)
. list annyrs in 1/5


     +--------+
     | annyrs |
     |--------|
  1. |   2008 |
  2. |   2008 |
  3. |   2008 |
  4. |   2008 |
  5. |   2008 |
     +--------+			
		

Suppose we want to extract the minutes from the time variable anntims2:


     +----------+
     | anntims2 |
     |----------|
  1. | 00:46:00 |
  2. | 00:46:00 |
  3. | 00:46:00 |
  4. | 00:46:00 |
  5. | 00:46:00 |
     +----------+			
		
. gen mins = mm(anntims2)
. list mins in 1/5

     +------+
     | mins |
     |------|
  1. |   46 |
  2. |   46 |
  3. |   46 |
  4. |   46 |
  5. |   46 |
     +------+			
		

Concatenation

In one example above, we created a date and time variable by
. gen double anndatims = dhms(anndats,hh(anntims2),mm(anntims2),ss(anntims2))
. format anndatims %tcCCYY.NN.DD_HH:MM:SS
Here we used the dhms(datetime, h, m, s) function to concatenate the dates, hours, minutes and seconds.

Let's see another case, where we want to create a new year variable that started from November using the same dataset below:


     +-------------------------------+
     |  anntims     anndats   annyrs |
     |-------------------------------|
  1. | 00:46:00   25feb2008     2008 |
  2. | 00:46:00   25feb2008     2008 |
  3. | 00:46:00   25feb2008     2008 |
  4. | 00:46:00   25feb2008     2008 |
  5. | 00:46:00   25feb2008     2008 |
     +-------------------------------+
	


gen annyrs2 = .
local i = 2000
while `i' < 2017 {
	replace annyrs2 = `i' if anndats >= mdy(11,1,`i') & anndats < mdy(11,1,`i'+1)
	local i = `i' + 1
}				
		
Here we used the mdy(M, D, Y) function to concatenate the month, day, and year elements.

Calculation

Since Stata internally uses the difference from the base to read the dates and times, calculation of durations is simply addition or subtraction of the date and time units.

. list time in 1/5


     +----------------------------------+
     |                             time |
     |----------------------------------|
  1. | 2016-03-31T04:20:00.122823+00:00 |
  2. | 2016-03-31T04:24:25.447985+00:00 |
  3. | 2016-03-31T04:24:25.683670+00:00 |
  4. | 2016-03-31T04:24:27.338571+00:00 |
  5. | 2016-03-31T04:24:27.552821+00:00 |
     +----------------------------------+
			

Here are the codes to get them:
. gen datetime = substr(time,1,19)
. gen microsec = substr(time,21,6)
. destring microsec, replace
. replace microsec = microsec/1000
. gen double timestamp = clock(datetime, "YMD#hms") + microsec
. format timestamp %tcCCYY.NN.DD_HH:MM:SS.sss
. gen duration = timestamp - timestamp[_n-1]
. gen minutes = minutes(duration)

Read the full documentation on date and time values and variables from the Stata data management manual

Author: Yun Dai, 2018