We use describe, codebook, list, and count to describe variables to get a sense of our data.
describe produces a summary of the dataset, including its storage type, display format, value label and variable label.
If no variable is specified, describe will describe every variable in the dataset.
. sysuse auto
. describe m*
storage display value
variable name type format label variable label
-------------------------------------------------------------------
make str18 %-18s Make and Model
mpg int %8.0g Mileage (mpg)
This gives us all variables whose names start with m, which are make and mpg.
codebook produces more information than describe does. It gives descriptive statistics for numeric variables and a sample of observations for string variables, in addition to the variable names, labels, and data type.
. sysuse auto
. codebook m*
---------------------------------------------------------------------------------------------------
make Make and Model
---------------------------------------------------------------------------------------------------
type: string (str18), but longest is str17
unique values: 74 missing "": 0/74
examples: "Cad. Deville"
"Dodge Magnum"
"Merc. XR-7"
"Pont. Catalina"
warning: variable has embedded blanks
---------------------------------------------------------------------------------------------------
mpg Mileage (mpg)
---------------------------------------------------------------------------------------------------
type: numeric (int)
range: [12,41] units: 1
unique values: 21 missing .: 0/74
mean: 21.2973
std. dev: 5.7855
percentiles: 10% 25% 50% 75% 90%
14 18 20 25 29
list lists values of the variables.
. sysuse auto
. list make-rep78 in 10/20 if price > 6000, table lists the values of the variables from make to rep78 from the 10th to 20th rows, whose prices are greater than 6000, in the form of a table.
+--------------------------------------+
| make price mpg rep78 |
|--------------------------------------|
11. | Cad. Deville 11,385 14 3 |
12. | Cad. Eldorado 14,500 14 2 |
13. | Cad. Seville 15,906 21 3 |
+--------------------------------------+
count counts the number of observations that satisfy the specified conditions.
. sysuse auto
. count in 10/20 if price > 6000 counts the number of observations from the 10th to 20th rows and whose prices are greater than 6000. We have 3 such cases in the sample.