Describing Your Data

We use describe, codebook, list, and count to describe variables to get a sense of our data.

describe

describe produces a summary of the dataset, including its storage type, display format, value label and variable label.

If no variable is specified, describe will describe every variable in the dataset.
. sysuse auto
. describe m*


              storage   display    value
variable name   type    format     label      variable label
-------------------------------------------------------------------
make            str18   %-18s                 Make and Model
mpg             int     %8.0g                 Mileage (mpg)
      		
This gives us all variables whose names start with m, which are make and mpg.

codebook

codebook produces more information than describe does. It gives descriptive statistics for numeric variables and a sample of observations for string variables, in addition to the variable names, labels, and data type.
. sysuse auto
. codebook m*


---------------------------------------------------------------------------------------------------
make                                                                                 Make and Model
---------------------------------------------------------------------------------------------------
      
                  type:  string (str18), but longest is str17
      
         unique values:  74                       missing "":  0/74
      
              examples:  "Cad. Deville"
                         "Dodge Magnum"
                         "Merc. XR-7"
                         "Pont. Catalina"
      
               warning:  variable has embedded blanks
      
---------------------------------------------------------------------------------------------------
mpg                                                                                   Mileage (mpg)
---------------------------------------------------------------------------------------------------
      
                  type:  numeric (int)
      
                 range:  [12,41]                      units:  1
         unique values:  21                       missing .:  0/74
      
                  mean:   21.2973
              std. dev:    5.7855
      
           percentiles:        10%       25%       50%       75%       90%
                                14        18        20        25        29
			

list

list lists values of the variables.
. sysuse auto
. list make-rep78 in 10/20 if price > 6000, table
lists the values of the variables from make to rep78 from the 10th to 20th rows, whose prices are greater than 6000, in the form of a table.


    +--------------------------------------+
    | make             price   mpg   rep78 |
    |--------------------------------------|
11. | Cad. Deville    11,385    14       3 |
12. | Cad. Eldorado   14,500    14       2 |
13. | Cad. Seville    15,906    21       3 |
    +--------------------------------------+
			

count

count counts the number of observations that satisfy the specified conditions.
. sysuse auto
. count in 10/20 if price > 6000
counts the number of observations from the 10th to 20th rows and whose prices are greater than 6000. We have 3 such cases in the sample.

Author: Yun Dai, 2018