Reading and Saving Files of Various Types

Reading and writing Stata binary files is much faster than using text files. We may want to work with a file in Stata format as early as possible.

Read and save Stata files (.dta)

Stata will always go to the current working directory when working with files.

use filename, clear loads a Stata file (.dta) from the current working directory. clear clears the memory for Stata to load a new file. If the workspace is already clear, we can skip that as well.

save filename, replace saves a Stata file to the current working directory. replace overwrites the file saved earlier with the same file name. If such a file does not exist, we can skip the replace option.

To load a file located in a subdirectory under the working directory, we need to specify the relative path.
. use folder/mydata.dta, clear

To tell Stata to read a file elsewhere, we need to specify a full path.
. use ~/elsewhere/mydata.dta, clear

We can also read a file over the Internet.
. use http://www.stata-press.com/data/r14/auto, clear

sysuse loads example datasets installed with Stata
. sysuse auto, clear

Read and save delimited text files (.csv, .txt)

import/export delimited reads/saves text-delimited files. The data can be tab-separated (often .txt) or comma-separated (.csv). If the extension is not specified, Stata assumes that it is a .csv file.
. import delimited filename, clear
. import delimited var1 var2 var3 using filename.csv, clear loads the dataset and names the three variables var1, var2 and var3.
. export delimited output.txt, delimiter(tab) exports a file and saves it as a tab-delimited text file.

A Case

Sometimes we need to take a few more steps to read a text file. Let's look at a real case.

Let's say we have a text file downloaded from a database and we tried to read it into Stata by
. import delimited output.txt
. list in 1/10


     +-------------------------------------------------------------------------------------------+
     | v1                                                                                        |
     |-------------------------------------------------------------------------------------------|
  1. |                                                                                           |
  2. |                                                                                        .. |
  3. |                                                                                           |
  4. | ticker      evtdate       car                                                             |
  5. |  EGAS     12FEB2014     0.02356                                                           |
     |-------------------------------------------------------------------------------------------|
  6. |  EGAS     27JUL2012    -0.00276                                                           |
  7. |  HGR      03JUL2002     0.01478                                                           |
  8. |  HGR      03MAY2005    -0.05070                                                           |
  9. |  HGR      05AUG2002     0.02337                                                           |
 10. |  HGR      20JAN2004     0.04206                                                           |
     +-------------------------------------------------------------------------------------------+

We can see that all variables are squeezed into one column "v1", that there are empty rows, and that the variable names actually are the fourth row. To fix those problems, first let's try
. import delimited output.txt, rowrange(4:l) varnames(4) delimiters(space)
. list ticker-car in 1/5

rowrange(4:l) reads data from the fourth to the last row (l stands for "last"). varnames(4) means the fourth row is for the variable names. delimiters(space) tells Stata to use space as the delimiters.


     +--------------------------------------------------------------------------------------------------------+
     | ticker     v2   v3   v4   v5   v6     evtdate          v8   v9   v10       v11      v12      v13   car |
     |--------------------------------------------------------------------------------------------------------|
  1. |          EGAS    .    .    .    .   12FEB2014                                .   .02356        .       |
  2. |          EGAS    .    .    .    .   27JUL2012                          -.00276        .        .       |
  3. |           HGR    .    .    .    .               03JUL2002                    .        .   .01478       |
  4. |           HGR    .    .    .    .               03MAY2005                    .   -.0507        .       |
  5. |           HGR    .    .    .    .               05AUG2002                    .        .   .02337       |
     +--------------------------------------------------------------------------------------------------------+

Now the first few empty rows have been removed; and (some) variables are properly named. However, there was something not right. We had three variables, and now we've got a lot more (260! if we list them all). What happened? Stata used multiple spaces as delimiters, and due to that values from one variable went to different columns.

Here is what we want to do:
. import delimited output.txt, rowrange(4:l) varnames(4) delimiters(space, collapse)
Here delimiters(space, collapse) collapsed the multiple spaces into just one.

. list in 1/5


     +----------------------------------------+
     | ticker   evtdate         car        v4 |
     |----------------------------------------|
  1. |             EGAS   12FEB2014    .02356 |
  2. |             EGAS   27JUL2012   -.00276 |
  3. |              HGR   03JUL2002    .01478 |
  4. |              HGR   03MAY2005    -.0507 |
  5. |              HGR   05AUG2002    .02337 |
     +----------------------------------------+

Now it's about right. We got the correct variable names, just in the wrong places. We need to fix that.
. drop ticker
. rename evtdate ticker
. rename car evtdate
. rename v4 car
. list in 1/5


     +------------------------------+
     | ticker     evtdate       car |
     |------------------------------|
  1. |   EGAS   12FEB2014    .02356 |
  2. |   EGAS   27JUL2012   -.00276 |
  3. |    HGR   03JUL2002    .01478 |
  4. |    HGR   03MAY2005    -.0507 |
  5. |    HGR   05AUG2002    .02337 |
     +------------------------------+

Done.

Read and save Excel files

import/export excel reads/saves worksheets from Microsoft Excel (.xls and .xlsx) files.
. import excel filename.xlsx, clear loads the dataset but the variable names will be lost. What we see are Excel's column names.
. import excel filename.xls, firstrow loads the dataset with the original variable names.
. export excel output.xlsx, firstrow(varlabels) sheet("output1") saves the dataset as an Excel file "output.xlsx" to the sheet "output1".