Values can be numeric or strings. Numeric values are stored as five types differing in range and accuracy: byte, int, long, float or double. float is the default storage type. Strings are stored as str#, where # indicates the maximum length of the string.
Storage types affect mainly how much memory will be needed. Usually we do not need to worry about the storage types, and Stata will take care of that and find the most efficient ways to store data.
Dates and times usually come in the human readable string forms, such as “March 22, 2017 16:15 pm”, “2017.03.22 16:15” etc. But Stata internally stores dates and times as integers and reads them as numeric values.
In fact, Stata understands a date and time variable as the difference from the base.The base (the numeric value 0) of a datetime variable begins at 01jan1960 00:00:00.000 (the first millisecond of 01jan1960); therefore “25jan2016 08:30:25” will be 1769329825000 (milliseconds) for Stata.
Display format allows us to specify the output of the date/time variable.
. format timevar %td sets timevar as a date variable. Here timevar is already a numeric date and time variable.
. format timevar %tcHH:MM:SS further sets timevar to be displayed as hour (00-23) : minute (00-59) :second (00-60).
format var %fmt changes the display format but not the contents of the variable. % starts the formatting, which can be a number, date, string, business calendar etc.
There are multiple ways to display a year, month, week, day, hour, minute, second etc. For instance, there are three additional codes to display the hour: Hh (00-12), hH (0-23), or hh (0-12).
Check the Stata help pages for the full list and the default display formats.
Base | Display format | Storage type | String-to-numeric function | |
---|---|---|---|---|
datetime (assuming 86400 s/day) |
01jan1960 | %tc | must be double | clock() |
datetime (equivalent to UTC) |
01jan1960 | %tC | must be double | Clock() |
date | 01jan1960 | %td | may be float or long | date() |
weekly date | 1960 week 1 | %tw | may be float or int | weekly() |
monthly date | jan1960 | %tm | may be float or int | monthly() |
quarterly date | 1960 quarter 1 | %tq | may be float or int | quarterly() |
half-yearly date | 1960 half-year 1 | %th | may be float or int | halfyearly() |
yearly date | 0 A.D | %ty | may be float or int | yearly() |
*Scroll left and right if the table exceeds the screen on mobile devices.
*Adapted from tables in Working with dates and times and datetime – Date and time values and variables.
In the last column of the table above we have introduced the functions function(string, mask[,topyear]) that transform strings to the numeric date and time variables.
In those functions, mask specifies the order of the components appearing in the string.
Mask | |
---|---|
M | month |
D | day |
Y | 4-digit year |
19Y | 2-digit year to be interpreted as 19xx |
20Y | 2-digit year to be interpreted as 20xx |
h | hour of day |
m | minutes within hour |
s | seconds within minute |
# | ignore one element |
*Scroll left and right if the table exceeds the screen on mobile devices.
Source: datetime translation – String to numeric date translation functions
Here are some examples of the strings and their corresponding masks:
datetime | mask |
---|---|
“25jan2016 08:30:25” | “DMYhms” |
“2016-01-25 08:30:25” | “YMDhms” |
“16-01-25 08:30” | “20YMDhm” |
“08:30:25 UTC 01252016 ” | “hms#MDY” |
*Scroll left and right if the table exceeds the screen on mobile devices.
Suppose we have the following dataset and the string variable anntims:
. list in 1/5
+----------------------+
| anntims anndats |
|----------------------|
1. | 00:46:00 25feb2008 |
2. | 00:46:00 25feb2008 |
3. | 00:46:00 25feb2008 |
4. | 00:46:00 25feb2008 |
5. | 00:46:00 25feb2008 |
+----------------------+
We want to generate a new time variable from it:
. gen double anntims2 = clock(anntims,"hms")
. format anntims2 %tcHH:MM:SS
. list anntims2 in 1/5
+----------+
| anntims2 |
|----------|
1. | 00:46:00 |
2. | 00:46:00 |
3. | 00:46:00 |
4. | 00:46:00 |
5. | 00:46:00 |
+----------+
We also want to create a new variable with both date and time from the variables anntims (string) and anndats (numeric).
. gen double anndatims = dhms(anndats,hh(anntims2),mm(anntims2),ss(anntims2))
. format anndatims %tcCCYY.NN.DD_HH:MM:SS
. list anndatims in 1/5
+---------------------+
| anndatims |
|---------------------|
1. | 2008.02.25 00:46:00 |
2. | 2008.02.25 00:46:00 |
3. | 2008.02.25 00:46:00 |
4. | 2008.02.25 00:46:00 |
5. | 2008.02.25 00:46:00 |
+---------------------+
When working with the two-digit years, in addition to specifying the 19Y or 20Y masks, we can specify the topyear option in date and time function(string, mask, topyear). The full year to be returned will be the year that does not exceed the largest value of the specified topyear.
This is useful when we want to interpret the first two digits of the year components differently of a variable.
For instance, suppose we have a date variable date (string). We want to obtain the full years in the range of 1949-2048 and, based on the knowledge of the data, would like to interpret digits equal to or larger than 49 as from the years 19** and others as from 20**:
. list in 1/5
+--------+
| date |
|--------|
1. | 470505 |
2. | 480404 |
3. | 490101 |
4. | 500202 |
5. | 510303 |
+--------+
. gen float fulldate = date(date, "YMD", 2048)
. format fulldate %td
. list in 1/5
+--------------------+
| date fulldate |
|--------------------|
1. | 470505 05may2047 |
2. | 480404 04apr2048 |
3. | 490101 01jan1949 |
4. | 500202 02feb1950 |
5. | 510303 03mar1951 |
+--------------------+
The functions below extract the components from the date and time variables.
time-of-day component | function |
---|---|
hour of day | hh(tc) or hhC(tC) |
minutes of day | mm(tc) or mmC(tC) |
seconds of day | ss(tc) or ssC(tC) |
date component | function |
---|---|
calendar year | year(td) |
calendar month | month(td) |
calendar day | day(td) |
day of week (0=Sunday) | dow(td) |
day of year (1=first day) | doy(td) |
week within year (1=first week) | week(td) |
quarter within year (1=first quarter) | quarter(td) |
half within year (1=first half) | halfyear(td) |
*Scroll left and right if the table exceeds the screen on mobile devices.
*Adapted from tables in datetime – Date and time values and variables
Suppose we want to extract the year from the variable anndats we've seen earlier:
+----------------------+
| anntims anndats |
|----------------------|
1. | 00:46:00 25feb2008 |
2. | 00:46:00 25feb2008 |
3. | 00:46:00 25feb2008 |
4. | 00:46:00 25feb2008 |
5. | 00:46:00 25feb2008 |
+----------------------+
. gen annyrs = year(anndats)
. list annyrs in 1/5
+--------+
| annyrs |
|--------|
1. | 2008 |
2. | 2008 |
3. | 2008 |
4. | 2008 |
5. | 2008 |
+--------+
Suppose we want to extract the minutes from the time variable anntims2:
+----------+
| anntims2 |
|----------|
1. | 00:46:00 |
2. | 00:46:00 |
3. | 00:46:00 |
4. | 00:46:00 |
5. | 00:46:00 |
+----------+
. gen mins = mm(anntims2)
+------+
| mins |
|------|
1. | 46 |
2. | 46 |
3. | 46 |
4. | 46 |
5. | 46 |
+------+
In one example above, we created a date and time variable by
. gen double anndatims = dhms(anndats,hh(anntims2),mm(anntims2),ss(anntims2))
. format anndatims %tcCCYY.NN.DD_HH:MM:SS
Here we used the dhms(datetime, h, m, s) function to concatenate the dates, hours, minutes and seconds.
Let's see another case, where we want to create a new year variable that started from November using the same dataset below:
+-------------------------------+
| anntims anndats annyrs |
|-------------------------------|
1. | 00:46:00 25feb2008 2008 |
2. | 00:46:00 25feb2008 2008 |
3. | 00:46:00 25feb2008 2008 |
4. | 00:46:00 25feb2008 2008 |
5. | 00:46:00 25feb2008 2008 |
+-------------------------------+
gen annyrs2 = .
local i = 2000
while `i' < 2017 {
replace annyrs2 = `i' if anndats >= mdy(11,1,`i') & anndats < mdy(11,1,`i'+1)
local i = `i' + 1
}
Here we used the mdy(M, D, Y) function to concatenate the month, day, and year elements.
Since Stata internally uses the difference from the base to read the dates and times, calculation of durations is simply addition or subtraction of the date and time units.
. list time in 1/5
+----------------------------------+
| time |
|----------------------------------|
1. | 2016-03-31T04:20:00.122823+00:00 |
2. | 2016-03-31T04:24:25.447985+00:00 |
3. | 2016-03-31T04:24:25.683670+00:00 |
4. | 2016-03-31T04:24:27.338571+00:00 |
5. | 2016-03-31T04:24:27.552821+00:00 |
+----------------------------------+
Here are the codes to get them:
. gen datetime = substr(time,1,19)
. gen microsec = substr(time,21,6)
. destring microsec, replace
. replace microsec = microsec/1000
. gen double timestamp = clock(datetime, "YMD#hms") + microsec
. format timestamp %tcCCYY.NN.DD_HH:MM:SS.sss
. gen duration = timestamp - timestamp[_n-1]
. gen minutes = minutes(duration)