Syntax is the grammar of Stata commands.
[by varlist]: command [varlist] [=exp] [if exp] [in range] [weight] [using filename][, options]
Except for the command itself, everything else in the brackets are optional.
[by varlist] repeats the command on each subset of data. Data must be sorted by varlist (variable list) before we use by. If no variable list is specified, Stata assumes all variables will be used. Alternatively, use bysort to do both by and sort.
. sysuse auto
. sort foreign
. by foreign: tab rep78
or
. bysort foreign: tab rep78
-----------------------------------------------------------
-> foreign = Domestic
Repair |
Record 1978 | Freq. Percent Cum.
------------+-----------------------------------
1 | 2 4.17 4.17
2 | 8 16.67 20.83
3 | 27 56.25 77.08
4 | 9 18.75 95.83
5 | 2 4.17 100.00
------------+-----------------------------------
Total | 48 100.00
-------------------------------------------------------------
-> foreign = Foreign
Repair |
Record 1978 | Freq. Percent Cum.
------------+-----------------------------------
3 | 3 14.29 14.29
4 | 9 42.86 57.14
5 | 9 42.86 100.00
------------+-----------------------------------
Total | 21 100.00
The by prefix is not the same as the by option, which groups a variable in hierarchical data.
[= exp] is an algebraic expression specifying the value to be assigned to a variable.
[if exp] restricts the commands to perform on a subset of observations that satisfy the conditions.
[in range] specifies an observation range. Negative integers mean “from the end of the data” (e.g. -1 refers to the last observation). f refers to the first observation, and l refers to the last observation.
. sort price
. list make-rep78 in f/10
lists the make and model, repair record, price and mileages of ten cars whose prices are the lowest.
+--------------------------------------+
| make price mpg rep78 |
|--------------------------------------|
1. | Merc. Zephyr 3,291 20 3 |
2. | Chev. Chevette 3,299 29 3 |
3. | Chev. Monza 3,667 24 2 |
4. | Toyota Corolla 3,748 31 5 |
5. | Subaru 3,798 35 5 |
|--------------------------------------|
6. | AMC Spirit 3,799 22 . |
7. | Merc. Bobcat 3,829 22 4 |
8. | Renault Le Car 3,895 26 3 |
9. | Chev. Nova 3,955 19 3 |
10. | Dodge Colt 3,984 30 5 |
+--------------------------------------+
[weight = exp] is a weighting expression to attach weight to each observation.
. sysuse census
. sum medage [weight = pop]
summarizes the median age weighted by the population in 50 states of the U.S.
(analytic weights assumed)
Variable | Obs Weight Mean Std. Dev. Min Max
-------------+-----------------------------------------------------------------
medage | 50 225907472 30.11047 1.66933 24.2 34.7
[using filename] reads or writes files. We have introduced how Stata reads and writes files here.
[,options] gives a list of options following the comma. Search the help files to learn more for each command that allows for different options.
Stata commands can be abbreviated. For instance, instead of using summarize, we can use sum.
Type help command to find out how each command can be abbreviated by looking at where they are underlined (e.g. summarize means that the shortest legitimate form of summarize is su).
Stata commands are case-sensitive and must all be in lowercase.
+ addition
- subtraction or negation
* multiplication
/ division
^ exponentiation
+ concatenation
e.g. “nyu”+”sh” produces “nyush”
* duplication
e.g. “nyu”*2 produces “nyunyu”
. input str20 A str20 B
. "nyu" "sh"
. "nyu" "ny"
. end
. gen C = A + B
. gen D = A*2
. list
+---------------------------+
| A B C D |
|---------------------------|
1. | nyu sh nyush nyunyu |
2. | nyu ny nyuny nyunyu |
+---------------------------+
Relational expressions return results that are either true or false.
== equal
< less than
> greater than
>= greater than or equal to
<= less than or equal to
!= (or ~=) not equal to
Relational operators can be used on both numbers and strings. For instance, both 2>1 and “newyork”>”abudhabi” are true.
Note that = exp is for assignment and == is for comparison.
. sysuse auto
. replace rep78 = 99 if rep78 == .
first evaluates if the value is missing and then assigns 99 to replace the missing values.
The logical operators return 1 for true and 0 for false.
& and
| or (called the “pipe”)
! (or ~) not
. list make price mpg rep78 if price > 10000 & mpg > 30 |rep78 <= 2 lists car models that meet all the conditions.
It is the same as
. list make price mpg rep78 if (price > 10000 |rep78 <= 2) & (mpg > 30 |rep78 <= 2)
+------------------------------------------+
| make price mpg rep78 |
|------------------------------------------|
12. | Cad. Eldorado 14,500 14 2 |
17. | Chev. Monte Carlo 5,104 22 2 |
18. | Chev. Monza 3,667 24 2 |
21. | Dodge Diplomat 4,010 18 2 |
22. | Dodge Magnum 5,886 16 2 |
|------------------------------------------|
23. | Dodge St. Regis 6,342 17 2 |
40. | Olds Starfire 4,195 24 1 |
46. | Plym. Volare 4,060 18 2 |
48. | Pont. Firebird 4,934 18 1 |
52. | Pont. Sunbird 4,172 24 2 |
+------------------------------------------+
Note how it differs from what we would get if we run
+--------------------------------------+
| make price mpg rep78 |
|--------------------------------------|
12. | Cad. Eldorado 14,500 14 2 |
+--------------------------------------+