Syntax, or How to Speak Stata

Syntax

Syntax is the grammar of Stata commands.

[by varlist]: command [varlist] [=exp] [if exp] [in range] [weight] [using filename][, options]

Except for the command itself, everything else in the brackets are optional.

[by varlist] repeats the command on each subset of data. Data must be sorted by varlist (variable list) before we use by. If no variable list is specified, Stata assumes all variables will be used. Alternatively, use bysort to do both by and sort.
. sysuse auto
. sort foreign
. by foreign: tab rep78
or
. bysort foreign: tab rep78


-----------------------------------------------------------
-> foreign = Domestic
      
     Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          2        4.17        4.17
          2 |          8       16.67       20.83
          3 |         27       56.25       77.08
          4 |          9       18.75       95.83
          5 |          2        4.17      100.00
------------+-----------------------------------
      Total |         48      100.00
      
-------------------------------------------------------------
-> foreign = Foreign
      
     Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
          3 |          3       14.29       14.29
          4 |          9       42.86       57.14
          5 |          9       42.86      100.00
------------+-----------------------------------
      Total |         21      100.00
      
      		

The by prefix is not the same as the by option, which groups a variable in hierarchical data.

[= exp] is an algebraic expression specifying the value to be assigned to a variable.

[if exp] restricts the commands to perform on a subset of observations that satisfy the conditions.

[in range] specifies an observation range. Negative integers mean “from the end of the data” (e.g. -1 refers to the last observation). f refers to the first observation, and l refers to the last observation.
. sort price
. list make-rep78 in f/10
lists the make and model, repair record, price and mileages of ten cars whose prices are the lowest.


     +--------------------------------------+
     | make             price   mpg   rep78 |
     |--------------------------------------|
  1. | Merc. Zephyr     3,291    20       3 |
  2. | Chev. Chevette   3,299    29       3 |
  3. | Chev. Monza      3,667    24       2 |
  4. | Toyota Corolla   3,748    31       5 |
  5. | Subaru           3,798    35       5 |
     |--------------------------------------|
  6. | AMC Spirit       3,799    22       . |
  7. | Merc. Bobcat     3,829    22       4 |
  8. | Renault Le Car   3,895    26       3 |
  9. | Chev. Nova       3,955    19       3 |
 10. | Dodge Colt       3,984    30       5 |
     +--------------------------------------+
      		

[weight = exp] is a weighting expression to attach weight to each observation.
. sysuse census
. sum medage [weight = pop]
summarizes the median age weighted by the population in 50 states of the U.S.


(analytic weights assumed)
      
    Variable |     Obs      Weight        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------------------
      medage |      50   225907472    30.11047    1.66933       24.2       34.7
      
      	

[using filename] reads or writes files. We have introduced how Stata reads and writes files here.

[,options] gives a list of options following the comma. Search the help files to learn more for each command that allows for different options.

Abbreviation

Stata commands can be abbreviated. For instance, instead of using summarize, we can use sum.

Type help command to find out how each command can be abbreviated by looking at where they are underlined (e.g. summarize means that the shortest legitimate form of summarize is su).

Case-sensitivity

Stata commands are case-sensitive and must all be in lowercase.

Operators

Arithmetic

+ addition
- subtraction or negation
* multiplication
/ division
^ exponentiation

String

+ concatenation
e.g. “nyu”+”sh” produces “nyush”

* duplication
e.g. “nyu”*2 produces “nyunyu”

. input str20 A str20 B
. "nyu" "sh"
. "nyu" "ny"
. end
. gen C = A + B
. gen D = A*2
. list


     +---------------------------+
     |   A    B       C        D |
     |---------------------------|
  1. | nyu   sh   nyush   nyunyu |
  2. | nyu   ny   nyuny   nyunyu |
     +---------------------------+
      
      		

Relational

Relational expressions return results that are either true or false.

== equal
< less than
> greater than
>= greater than or equal to
<= less than or equal to
!= (or ~=) not equal to

Relational operators can be used on both numbers and strings. For instance, both 2>1 and “newyork”>”abudhabi” are true.

Note that = exp is for assignment and == is for comparison.
. sysuse auto
. replace rep78 = 99 if rep78 == .
first evaluates if the value is missing and then assigns 99 to replace the missing values.

Logical

The logical operators return 1 for true and 0 for false.

& and
| or (called the “pipe”)
! (or ~) not

. list make price mpg rep78 if price > 10000 & mpg > 30 |rep78 <= 2 lists car models that meet all the conditions.

It is the same as
. list make price mpg rep78 if (price > 10000 |rep78 <= 2) & (mpg > 30 |rep78 <= 2)


    +------------------------------------------+
    | make                 price   mpg   rep78 |
    |------------------------------------------|
12. | Cad. Eldorado       14,500    14       2 |
17. | Chev. Monte Carlo    5,104    22       2 |
18. | Chev. Monza          3,667    24       2 |
21. | Dodge Diplomat       4,010    18       2 |
22. | Dodge Magnum         5,886    16       2 |
    |------------------------------------------|
23. | Dodge St. Regis      6,342    17       2 |
40. | Olds Starfire        4,195    24       1 |
46. | Plym. Volare         4,060    18       2 |
48. | Pont. Firebird       4,934    18       1 |
52. | Pont. Sunbird        4,172    24       2 |
    +------------------------------------------+
      	
Note how it differs from what we would get if we run
. list make price mpg rep78 if price > 10000 & (mpg > 30 | rep78 <= 2)

    +--------------------------------------+
    | make             price   mpg   rep78 |
    |--------------------------------------|
12. | Cad. Eldorado   14,500    14       2 |
    +--------------------------------------+
      

Author: Yun Dai, 2018