A dummy variable denotes whether something is true, which is 1, or false, which is 0. Dummy variables are also called indicator variables.
For instance, foreign in Stata's auto dataset is a dummy variable: 1 if the car is foreign made and 0 if domestic made.
. sysuse auto
. codebook foreign
-------------------------------------------------------------------------------
foreign Car type
-------------------------------------------------------------------------------
type: numeric (byte)
label: origin
range: [0,1] units: 1
unique values: 2 missing .: 0/74
tabulation: Freq. Numeric Label
52 0 Domestic
22 1 Foreign
In Stata we can state something as true like below: use the dummy variable without explicitly specifying the condition but with the variable name alone. Stata will know that it means if foreign == 1 or if foreign ~= 1.
. list make if foreign
. list make if ~ foreign
One way to create a dummy variable is to use generate with an statement.
. sysuse auto
. gen rep78In = rep78 >= 5 if !missing(rep78)
The result would be 1 where the condition is true (repair record is more than or equal to 5) and 0 elsewhere.
In hierarchical data, in combination with the by prefix , generate and egen can be used to create indicator variables on lower levels.
. sysuse citytemp
. by region (division),sort: gen heat_Ind1 = heatdd > 8000
defines if each division, a subcategory under a region, has heating degree days larger than 8000.
. sysuse citytemp
. by region (division), sort: egen heat_Ind2 = max(heatdd > 8000)
defines if a region has divisions whose heating degree days are larger than 8000.
We can also use tabulate var, generate(newvar) to create a series of indicator variables.
. tab foreign, gen(import) generates two new variables import1, indicating whether the car is domestic, and import2, indicating whether the car is foreign made.
More on creating indicator variables:
William Gould, StataCorp, How do I create dummy variables?
Factor variables create indicator variables from categorical variables.
The example below contains several factor variables:
. sysuse auto
. reg price mpg c.weight##c.weight ib3.rep78 i.foreign
c.weight##c.weight gives us the squared weight, in addition to the main effect of weight.
ib3.rep78 sets the base value at rep78=3 and creates indicators at each value of rep78.
i.foreign creates indicators at each value of foreign.
i. indicates unique values/levels of a group
c. indicates a continuous variables
o. omits a variable or indicator
# specifies interactions
## specifies interactions including main effects
i.foreign
indicator variables for each level of foreign
i.foreign#i.rep78
indicator variables for all combinations of each value of foreign and rep78
i.foreign##i.rep78
the same as i.foreign i.rep78 i.foreign#i.rep78
i.foreign#i.rep78#i.make
indicator variables for all combinations of each value of foreign, rep78 and make (not saying i.make would make sense since it has 74 unique levels)
i.foreign##i.rep78##i.make
the same as i.foreign i.rep78 i.make i.foreign#i.rep78 i.rep78#i.make i.foreign#i.make i.foreign#i.rep78#i.make
i.rep78#c.mpg
variables created for the number of the levels of rep78. For each variable, it will be the value of mpg if at the level of rep78 and it will be 0 otherwise.
For instance, if the first observation has rep78=3 and mpg=22, then 3.rep78#c.mpg will be 22 and it will be 0 for 1b.rep78#c.mpg, 2.rep78#c.mpg, 4.rep78#c.mpg and 5.rep78#c.mpg.
3.rep78 at the level where rep78=3
i3.rep78 the same as above
i(2/4).rep78 selects the levels from rep78=2 through rep78=4
i(1 5).rep78 selects the levels where rep78=1 and rep78=5
o(1 5).rep78 omits the levels where rep78=1 and rep78=5
ib#.var changes the base level of the variable, where b is the marker indicating the base value.
For instance, ib3.rep78 sets the base value at rep78=3.