Why and when would we want to use automation tricks, such as macros, loops, saved results or even writing a program by ourselves? Automating your work saves you from repeating very similar codes over and over again. It also reduces chances of mistakes whenever you try to tweak the codes in each step. Spending time learning the programming basics can do more for us.
Overview of Stata programming
UCLA: Statistical Consulting Group, Introduction to Stata Programming
Gabriel Rossman, Introduction to Stata Programming
Macros are strings we assign to represent variables, values, texts, commands, statements etc. Macros can be local and global.
Local macros only work in the current do-file or program.
local localmacro = exp
This is how you refer to a local macro: `localmacro'. Note that the left side is a backtick(`) usually found on the top left corner of the keyboard, and the right side is a apostrophe(').
Usually we can define macros of strings with or without the quotation marks. However, if the strings contain spaces in between, we need to enclose `" "'. For instance,
local reason `" "Work on a class assignment/paper" "Use specialized databases (e.g. Bloomberg, Wind)" "'
Global macros work across the programs in Stata.
global globalmacro = exp
We use a dollar sign to refer to a global macro: $globalmacro.
Be careful with the global macros that are accessible from all do-files and programs. Make sure you remember all the global macros you created elsewhere.
To change the contents of a macro, we simply redefine the contents where they are.
. local container apple orange
. local container apple melon papaya
first defines the local macro container and then modifies the contents of container simply by reassigning the variables.
To drop a macro, use macro drop macro.
. macro drop container
A common use of macros is to hold variable lists for later use.
Below we created a local macro control to hold all control variables, and macros application, open_day and placement to hold variable lists on application, open day performance and placement tests.
We can then use the macros in the OLS regressions where application, open_day and placement have all the relevant independent variables and control has all the control variables.
. local control gender country
. local application personality academic
. local open_day writing interview participation
. local placement math english
. reg gpa `application' `open_day' `placement' `control'
In the case of holding long variable lists in a macro, we can extend the macro each time by referring to itself from the last iteration:
. local control gender country
. local control `control' high_school entrance_exam admission_track
. local control `control' major class
Macros can hold commands.
To use macros to hold conditions:
. local condition “if level == 1 & track != 2”
. local placement math english
. summarize `placement' `condition'
To use macros to hold multiple options to make a graph:
. local option1 msymbol(o) mcolor(cranberry) clcolor(cranberry) connect(l)
. local option2 msymbol(th) mcolor(cranberry) clcolor(cranberry) connect(l)
. local option3 msymbol(o) mcolor(dknavy) clcolor(dknavy) connect(l)
. local option4 msymbol(th) mcolor(dknavy) clcolor(dknavy) connect(l)
. graph twoway (scatter length_video week [w = click_video] if group == 1, xlabel(0 (1) 10) `option1') ///
(scatter length_video week [w = click_video] if group == 2, xlabel(0 (1) 10) `option2') ///
(scatter length_text week [w = click_text] if group == 1, xlabel(0 (1) 10) `option3') ///
(scatter length_text week [w = click_text] if group == 2, xlabel(0 (1) 10) `option4'), ///
legend (order(1 "Average User: Video" 2 "Heavy User: Video" 3 "Average User: Text" 4 "Heavy User: Text"))
Macros can store values, especially constants, to be used in -
algorithms,
. local a = 1.232425
. local b = 3.899878
. local c = 2.566556
. display (-`b'+sqrt((`b')^2 - 4*`a'*`c'))/(2*`a')
or creating new variables.
. local 8hrs = 1000*60*60*8
. gen double gmt = utc + `8hrs'
which is equivalent to
. gen double gmt = utc + msofhours(8)
if the function msofhours() did not come to your mind.
Using macros will give us a clear structure in model specifications.
When we need to add groups of variables in nested regression models:
. local control gender country
. local application personality academic
. local open_day writing interview participation
. local placement math english
. eststo clear
. eststo: reg gpa `control' `application'
. eststo: reg gpa `control' `application' `open_day'
. eststo: reg gpa `control' `application' `open_day' `placement'
. esttab, b(%9.1f) t(%9.1f) r2(%9.6f)
In different models:
. reg offer `rubric1' `rubric2' `rubric3' `control'
. logit offer `rubric1' `rubric2' `rubric3' `control'
We will see below how macros can be useful in loops and programs.
foreach stores the list and loops over the items.
foreach loopname in list{
...
}
foreach loopname of varlist varlist{
...
}
foreach loopname of numlist numlist{
...
}
foreach loopname of local localmacro{
...
}
foreach loopname of global globalmacro{
...
}
For each item of the list, the loop loopname executes the commands specified in the brackets for each value of the item (variable/number/local macro/global macro etc.).
When we need to refer to the specified loopname, we must use the pair of quotes `loopname'. Note the difference of the backtick (`) and the apostrophe (').
Each loop starts with an open curly bracket that must stay on the same line as foreach. Nothing should follow the bracket on that line.
Another curly bracket should appear on a line by itself to conclude the loop.
Between the brackets are the commands. You may notice the indentation before the commands: this is not mandatory for the commands to run, but it is a good programming habit to give yourself and readers a clear structure of the commands’ logic.
We will see examples below.
Quite often we need to perform t-tests on a group of variables with very similar commands. Instead of changing the variable names one by one, we could simply write a loop to do the work for us.
. local course calculus writing1 writing2
. foreach var in `course'{
. display _newline "ttest `var', by(gender)"
. ttest `var', by(gender)
. }
Alternatively, we can say:
. local course calculus writing1 writing2
. foreach var of local course{
. display _newline "ttest `var', by(gender)"
. ttest `var', by(gender)
. }
The difference is that the second method is faster and uses less memory. Besides, in case one would like to change the contents of the local macro course, it also allows adding new elements to the end of the list in the loop.
In some cases we have hundreds of files to append. Using a loop to automatically append all files in the directory for us could save us a lot of time otherwise spent on manually appending one file to another.
. local dtafiles: dir . files "*.dta"
. foreach file of local dtafiles{
. preserve
. use `file', clear
. save temp, replace
. restore
. append using temp
. }
. rm temp.dta
. save filename, replace
In this example, local dtafiles: dir . files "*.dta" lists all Stata files in the current working directory.
dir [“]dir[”] {files|dirs|other} [“]pattern[”] is the macro extended function for file names. Type help extended_fcn to find out more.
rm [“]filename[”] removes files stored on disk for Mac and Unix users. For Windows it is erase [“]filename[”].
forvalues loops over consecutive values.
forvalues loopname = range{
...
}
Suppose we want to perform a series of t-tests on a single variable for each of its level:
. forvalues level = 1/3{
. display _newline "ttest calculus, level = `level'"
. ttest calculus if cal_placement == `level', by(gender)
. }
For this specific example, in fact we have a more efficient way to loop over each level, which is to use levelsof. We will discuss below.
#1(#2)#3 from #1 to #3 with the increment of #2; #2 can be negative
#1/#2 #1 through #2
#1 #2 #3 #1, #2 and #3
Loops can be nested.
To explain how it works let's perform another t-test.
. local course calculus writing1 writing2
. levelsof HSrank, local(level)
. foreach x of local course{
. foreach y of local level{
. display _newline "ttest `x' if HSrank =`y',by(gender)"
. ttest `x' if HSrank == `y', by(gender)
. }
. }
Here we performed a t-test by gender for every course at each level of HSrank.
Let’s take a look at another example using nested loops to clean each file before appending, expanding on the example we have seen above.
. local csvfiles: dir . files "tracking.log-*.csv"
. foreach file of local csvfiles {
. preserve
. import delimited `file',clear
. foreach id in "a" "b" "c" "d" "e" "f" "g"{
. drop if username == "`id'"
. }
. drop event* context* page session host
. save temp.dta, replace
. restore
. append using temp
. }
. rm temp.dta
. save test, replace
while runs and repeats the commands as long as the condition specified is true.
while exp{
...
}
Here we have an example in the simplest form of how we may use while.
. local i = 1
. while `i'<=3{
. display _newline "`i'"
. sum gpa if placement == `i'
. local i = `i'+1
. }
We can further include else statements.
while exp{
if{
...
}
else{
...
}
...
}
To expand on the last example:
. local i = 1
. while `i'<=5{
. if `i'<=3{
. sum writing if interview == `i'
. }
. else{
. sum application if interview == `i'
. }
. display _newline "`i'"
. local i = `i'+1
. }
Note the difference between the branching if in if `i'<=3{} and the conditional if in if rep78 == `i'.
We can include if/else statements within if/else statements.
while exp{
if{
…
}
else{
if{
…
}
else{
…
}
…
}
…
}