Converting Numeric/String Variables

encode; decode

encode var, gen(newvar) creates a numeric variable newvar by assigning a numeric value to each of var's string value. It provides a mapping between the numeric value and the value label.

. sysuse auto
. encode make, gen(make_num)
creates the numeric variable make_num.
. label list make_num gives us the mapping between the numeric values and the labels of the newly defined numeric variable make_num.


make_num:
           1 AMC Concord
           2 AMC Pacer
           3 AMC Spirit
           4 Audi 5000
           5 Audi Fox
           6 BMW 320i
           7 Buick Century
           8 Buick Electra
           9 Buick LeSabre
          10 Buick Opel
          11 Buick Regal
          12 Buick Riviera
          13 Buick Skylark
          14 Cad. Deville
          15 Cad. Eldorado
          16 Cad. Seville
          17 Chev. Chevette
          18 Chev. Impala
          19 Chev. Malibu
          20 Chev. Monte Carlo
          21 Chev. Monza
          22 Chev. Nova
          23 Datsun 200
          24 Datsun 210
          25 Datsun 510
          26 Datsun 810
          27 Dodge Colt
          28 Dodge Diplomat
          29 Dodge Magnum
          30 Dodge St. Regis
          31 Fiat Strada
          32 Ford Fiesta
          33 Ford Mustang
          34 Honda Accord
          35 Honda Civic
          36 Linc. Continental
          37 Linc. Mark V
          38 Linc. Versailles
          39 Mazda GLC
          40 Merc. Bobcat
          41 Merc. Cougar
          42 Merc. Marquis
          43 Merc. Monarch
          44 Merc. XR-7
          45 Merc. Zephyr
          46 Olds 98
          47 Olds Cutl Supr
          48 Olds Cutlass
          49 Olds Delta 88
          50 Olds Omega
          51 Olds Starfire
          52 Olds Toronado
          53 Peugeot 604
          54 Plym. Arrow
          55 Plym. Champ
          56 Plym. Horizon
          57 Plym. Sapporo
          58 Plym. Volare
          59 Pont. Catalina
          60 Pont. Firebird
          61 Pont. Grand Prix
          62 Pont. Le Mans
          63 Pont. Phoenix
          64 Pont. Sunbird
          65 Renault Le Car
          66 Subaru
          67 Toyota Celica
          68 Toyota Corolla
          69 Toyota Corona
          70 VW Dasher
          71 VW Diesel
          72 VW Rabbit
          73 VW Scirocco
          74 Volvo 260		
	

decode var, gen(newvar) works the other way around. It creates a string variable newvar using the value labels attached earlier to the numeric variable var.
. decode foreign, gen(foreign_str) creates a string variable that has two values “Domestic” and “Foreign”.
. codebook foreign_str


-----------------------------------------------------------------------------------
foreign_str                                                                Car type
-----------------------------------------------------------------------------------

                  type:  string (str8)

         unique values:  2                        missing "":  0/74

            tabulation:  Freq.  Value
                            52  "Domestic"
                            22  "Foreign"			
		
Compare with
. codebook foreign

-----------------------------------------------------------------------------------
foreign                                                                    Car type
-----------------------------------------------------------------------------------

                  type:  numeric (byte)
                 label:  origin

                 range:  [0,1]                        units:  1
         unique values:  2                        missing .:  0/74

            tabulation:  Freq.   Numeric  Label
                            52         0  Domestic
                            22         1  Foreign			
		

destring; tostring

destring/tostring, replace or destring/tostring, generate() converts string variables to numeric variables and vice versa.

. tostring make_num, gen(make_str) converts the numeric variable make_num previously encoded from make back to a string variable but with values in numbers rather than alphabetical letters.
. codebook make_str


-----------------------------------------------------------------------------------
make_str                                                             Make and Model
-----------------------------------------------------------------------------------

                  type:  string (str2)

         unique values:  74                       missing "":  0/74

              examples:  "21"
                         "35"
                         "49"
                         "62"
		

. destring make_str, replace converts make_str to a numeric variable.
. codebook make_str


-----------------------------------------------------------------------------------
make_str                                                             Make and Model
-----------------------------------------------------------------------------------

                  type:  numeric (byte)

                 range:  [1,74]                       units:  1
         unique values:  74                       missing .:  0/74

                  mean:      37.5
              std. dev:   21.5058

           percentiles:        10%       25%       50%       75%       90%
                                 8        19      37.5        56        67			
		

real(); string()

string(n) and real(s) are two string functions that convert numeric/string to string/numeric variables.

string(n) is a synonym for strofreal(n) and converts numeric or missing values to strings.
. sysuse auto
. gen rep78_str = string(rep78)
converts the numeric and missing values of rep78 to strings.

real(s) converts strings to numeric or missing values. The string variable must contain number characters, otherwise missing values will be generated.
. gen rep78_str_num = real(rep78_str) converts the string variable rep78_str created above back to numeric again.

Summary

label define creates labels to each value of a numeric variable.

encode works on strings. It creates a new variable, and assigns a numeric value to each of the string value.
decode works on numeric variables that has attached value labels. It creates a new string variable that gets its variable values from the previous label values.

destring works on string variables that contain numbers stored as strings, and converts them to numbers.
tostring converts numeric variables to strings.

real()/string() functions convert strings/numeric variables between each other, but cannot be used just by themselves.

Author: Yun Dai, 2018