Reshaping Your Data

reshape

reshape long varlist, i(i) j(j) or reshape wide varlist, i(i) j(j) converts data from long to wide and vice versa.

i is the id variable of a higher level;
j is the id variable of a lower level;
wide means reshaping to wide;
long means reshaping to long.

There are more observations in the long format and more variables in the wide format. In the long format, variable values of the lower levels are grouped under values of a higher level. In the wide format, each column represents variable values of lower levels.

. use long
. list


     +------------------------------------------+
     | id   semester   course    gpa   attend~e |
     |------------------------------------------|
  1. |  1          1        A   3.81          1 |
  2. |  1          1        B   3.82          2 |
  3. |  1          2        A   3.76          3 |
  4. |  1          2        B   3.77          4 |
  5. |  2          1        A   3.56          4 |
     |------------------------------------------|
  6. |  2          1        B   3.55          3 |
  7. |  2          2        A   3.45          2 |
  8. |  2          2        B   3.47          1 |
     +------------------------------------------+

. reshape wide gpa attendance, i(id semester) j(course) string converts the dataset from long to wide.


(note: j = A B)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                        8   ->       4
Number of variables                   5   ->       6
j variable (2 values)            course   ->   (dropped)
xij variables:
                                    gpa   ->   gpaA gpaB
                             attendance   ->   attendanceA attendanceB
-----------------------------------------------------------------------------

. list


     +---------------------------------------------------+
     | id   semester   gpaA   attend~A   gpaB   attend~B |
     |---------------------------------------------------|
  1. |  1          1   3.81          1   3.82          2 |
  2. |  1          2   3.76          3   3.77          4 |
  3. |  2          1   3.56          4   3.55          3 |
  4. |  2          2   3.45          2   3.47          1 |
     +---------------------------------------------------+

. reshape long changes it back to the long format.


(note: j = A B)

Data                               wide   ->   long
-----------------------------------------------------------------------
Number of obs.                        4   ->       8
Number of variables                   6   ->       5
j variable (2 values)                     ->   course
xij variables:
                              gpaA gpaB   ->   gpa
                attendanceA attendanceB   ->   attendance
-----------------------------------------------------------------------

collapse

collapse (stat1) varlist1 (stat2) varlist2…, by(group varlist) aggregates the dataset to summary statistics. stat options include mean, median, percentiles, standard deviations, standard errors, first/last values, maximum/minimum etc.

. use long
. collapse (mean) gpa, by(semester id)
returns the mean of gpa of each id by semester.
. list


     +-----------------------+
     | id   semester     gpa |
     |-----------------------|
  1. |  1          1   3.815 |
  2. |  2          1   3.555 |
  3. |  1          2   3.765 |
  4. |  2          2    3.46 |
     +-----------------------+