Summarizing Your Data

The four commands all produce summary statistics. The choice of the command, however, depends on the question we ask.

summarize

summarize returns a variety of univariate summary statistics. If no options have been specified for the variables, it displays the number of observations, mean, standard deviation, minimum and maximum values.
. sysuse auto
. sum weight


					
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      weight |         74    3019.459    777.1936       1760       4840


				

. sum weight, detail produces additional statistics with the detail option.

                              Weight (lbs.)
-------------------------------------------------------------
      Percentiles      Smallest
 1%         1760           1760
 5%         1830           1800
10%         2020           1800       Obs                  74
25%         2240           1830       Sum of Wgt.          74

50%         3190                      Mean           3019.459
                        Largest       Std. Dev.      777.1936
75%         3600           4290
90%         4060           4330       Variance       604029.8
95%         4290           4720       Skewness       .1481164
99%         4840           4840       Kurtosis       2.118403
  				

tabulate; tab1; tabulate, summarize(); tabulate, generate()

tabulate

tabulate produces one- and two-way tables of frequencies.
. tab rep78 produces the distribution of the variable car repair record rep78.


     Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          2        2.90        2.90
          2 |          8       11.59       14.49
          3 |         30       43.48       57.97
          4 |         18       26.09       84.06
          5 |         11       15.94      100.00
------------+-----------------------------------
      Total |         69      100.00
      		

In combination with the categorical variable, tabulate can produce a cross-tabulation of summary statistics by category.
. tab rep78 foreign produces the car repair record frequencies by foreign or domestic car type, where foreign is a categorical variable.


    Repair |
    Record |       Car type
      1978 |  Domestic    Foreign |     Total
-----------+----------------------+----------
         1 |         2          0 |         2 
         2 |         8          0 |         8 
         3 |        27          3 |        30 
         4 |         9          9 |        18 
         5 |         2          9 |        11 
-----------+----------------------+----------
     Total |        48         21 |        69 
      
      		

. tab rep78 foreign, column row reports relative frequencies within the columns and rows when the options column and row are specified.


+-------------------+
| Key               |
|-------------------|
|     frequency     |
|  row percentage   |
| column percentage |
+-------------------+
     
    Repair |
    Record |       Car type
      1978 |  Domestic    Foreign |     Total
-----------+----------------------+----------
         1 |         2          0 |         2 
           |    100.00       0.00 |    100.00 
           |      4.17       0.00 |      2.90 
-----------+----------------------+----------
         2 |         8          0 |         8 
           |    100.00       0.00 |    100.00 
           |     16.67       0.00 |     11.59 
-----------+----------------------+----------
         3 |        27          3 |        30 
           |     90.00      10.00 |    100.00 
           |     56.25      14.29 |     43.48 
-----------+----------------------+----------
         4 |         9          9 |        18 
           |     50.00      50.00 |    100.00 
           |     18.75      42.86 |     26.09 
-----------+----------------------+----------
         5 |         2          9 |        11 
           |     18.18      81.82 |    100.00 
           |      4.17      42.86 |     15.94 
-----------+----------------------+----------
     Total |        48         21 |        69 
           |     69.57      30.43 |    100.00 
           |    100.00     100.00 |    100.00 
      
      		

Using by with the categorical or indicator variables produces separate tables of the statistics by each category.
. by foreign: tab rep78 returns the summary statistics in two tables for each foreign and domestic car type.


------------------------------------------------------------
-> foreign = Domestic
    
     Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          2        4.17        4.17
          2 |          8       16.67       20.83
          3 |         27       56.25       77.08
          4 |          9       18.75       95.83
          5 |          2        4.17      100.00
------------+-----------------------------------
      Total |         48      100.00
      
-------------------------------------------------------------
-> foreign = Foreign
     
     Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
          3 |          3       14.29       14.29
          4 |          9       42.86       57.14
          5 |          9       42.86      100.00
------------+-----------------------------------
      Total |         21      100.00
      
      		

tab1

tab1 is a convenience tool for one-way table of frequencies.
. tab1 rep78 foreign headroom returns three tables of frequencies for each variable, equivalent to
. tab rep78
. tab foreign
. tab headroom


-> tabulation of rep78  
      
     Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          2        2.90        2.90
          2 |          8       11.59       14.49
          3 |         30       43.48       57.97
          4 |         18       26.09       84.06
          5 |         11       15.94      100.00
------------+-----------------------------------
      Total |         69      100.00
      
-> tabulation of foreign  
      
  Car type |      Freq.     Percent        Cum.
-----------+-----------------------------------
  Domestic |         52       70.27       70.27
   Foreign |         22       29.73      100.00
-----------+-----------------------------------
     Total |         74      100.00
      
-> tabulation of headroom  
      
   Headroom |
      (in.) |      Freq.     Percent        Cum.
------------+-----------------------------------
        1.5 |          4        5.41        5.41
        2.0 |         13       17.57       22.97
        2.5 |         14       18.92       41.89
        3.0 |         13       17.57       59.46
        3.5 |         15       20.27       79.73
        4.0 |         10       13.51       93.24
        4.5 |          4        5.41       98.65
        5.0 |          1        1.35      100.00
------------+-----------------------------------
      Total |         74      100.00
      		

tabulate, summarize()

tabulate category, summarize(var) produces one- and two-way tables of means and standard deviations by category on var.
. tab foreign, sum(weight) returns the summary table of the means and standard deviations of the car weight by foreign and domestic car types.


            |      Summary of Weight (lbs.)
   Car type |        Mean   Std. Dev.       Freq.
------------+------------------------------------
   Domestic |   3,317.115   695.36374          52
    Foreign |   2,315.909   433.00345          22
------------+------------------------------------
      Total |   3,019.459   777.19357          74
      		

tabulate, generate()

tabulate var, generate(newvar) creates indicator variables.
. tab foreign, gen(import) generates two new variables import1, indicating whether the car is domestic made, and import2, indicating whether the car is foreign made.


   Car type |      Freq.     Percent        Cum.
------------+-----------------------------------
   Domestic |         52       70.27       70.27
    Foreign |         22       29.73      100.00
------------+-----------------------------------
      Total |         74      100.00
      		

tabstat

tabstat varlist, by(category) statistics (stat1 stat2…) reports summary statistics by groups, where multiple statistics can be specified, including mean, sum, count, standard deviation, percentiles etc.
. tabstat price weight, by(foreign) statistics(n mean sd p50) reports the number of nonmissing observations, mean, standard deviation, and the 50th percentile (median) of variable price and weight by their car type of foreign or domestic made.


Summary statistics: N, mean, sd, p50
  by categories of: foreign (Car type)
      
 foreign |     price    weight
---------+--------------------
Domestic |        52        52
         |  6072.423  3317.115
         |  3097.104  695.3637
         |    4782.5      3360
---------+--------------------
 Foreign |        22        22
         |  6384.682  2315.909
         |  2621.915  433.0035
         |      5759      2180
---------+--------------------
   Total |        74        74
         |  6165.257  3019.459
         |  2949.496  777.1936
         |    5006.5      3190
------------------------------
      			

table

table category1, by(category2) contents (stat1 var stat2 var stat3 var…) is a more flexible command that produces one-, two-, and up-to-seven-way tables of various summary statistics. Up to four variables can be selected as categories and up to five statistics can be specified.
. table rep78, by(foreign) contents(n weight mean weight sd weight p50 weight) reports the mean and the 50th percentile of car weight by repair record rep78 and foreign/domestic car types. foreign is the main category under which each value of price is specified.


------------------------------------------------------------------
Car type  |
and       |
Repair    |
Record    |
1978      |    N(weight)  mean(weight)    sd(weight)   med(weight)
----------+-------------------------------------------------------
Domestic  |
        1 |            2         3,100       523.259         3,100
        2 |            8       3,353.8      445.9961         3,465
        3 |           27       3,442.2      645.3701         3,350
        4 |            9       3,532.2      806.2068         3,700
        5 |            2         1,960      226.2742         1,960
----------+-------------------------------------------------------
Foreign   |
        1 |                                                       
        2 |                                                       
        3 |            3         2,010      158.7451         2,070
        4 |            9       2,207.8      335.4764         2,160
        5 |            9       2,403.3      405.0926         2,240
------------------------------------------------------------------
      			

Author: Yun Dai, 2018