0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)
10 Ansichten88 Seiten
This module shows the use of if with common Stata commands. We will focus on make, rep78, foreign, mpg, and price. We can use the keep command to keep just these variables.
This module shows the use of if with common Stata commands. We will focus on make, rep78, foreign, mpg, and price. We can use the keep command to keep just these variables.
This module shows the use of if with common Stata commands. We will focus on make, rep78, foreign, mpg, and price. We can use the keep command to keep just these variables.
This module shows the use of if with common Stata commands. Let's use the auto data file. sysuse auto For this module, we will focus on make, rep78, foreign, mpg, and price We can use the keep command to keep just these variables. keep make rep78 foreign mpg price Let's make a table of rep78 by foreign to look at the repair histories of the forein and domestic cars. tabulate rep78 foreign | foreign rep78 | 0 1 | Total -----------+----------------------+---------- 1 | 2 0 | 2 2 | 8 0 | 8 3 | 27 3 | 30 4 | 9 9 | 18 5 | 2 9 | 11 -----------+----------------------+---------- Total | 48 21 | 69 Suppose we wanted to focus on just the cars with repair histories of four or better. We can use if suffi! to do this. tabulate rep78 foreign if (rep78 >=4) | foreign rep78 | 0 1 | Total -----------+----------------------+---------- 4 | 9 9 | 18 5 | 2 9 | 11 -----------+----------------------+---------- Total | 11 18 | 29 Let's make the above table usin the column and nofreq options. "ote that the column and nofreq come after the comma. These are options on the tabulate command and options need to be placed after a comma. tabulate rep78 foreign if (rep78 >=4), column nofreq | foreign rep78 | 0 1 | Total -----------+----------------------+---------- 4 | 81.82 50.00 | 62.07 5 | 18.18 50.00 | 37.93 -----------+----------------------+---------- Total | 100.00 100.00 | 100.00 The use of if is not limited to the tabulate command. #ere, we use it with the list command. list if (rep78 >= 4) make price mpg rep78 foreign 3. A! "pirit 3799 22 . 0 5. #$ick %lectra 7827 15 4 0 7. #$ick &pel 4453 26 . 0 $ 15. !'e(. )mpala 5705 16 4 0 20. *o+ge !olt 3984 30 5 0 24. ,or+ ,ie-ta 4389 28 4 0 29. erc. #o.cat 3829 22 4 0 30. erc. !o$gar 5379 14 4 0 33. erc. /0-7 6303 14 4 0 35. &l+- 98 8814 21 4 0 38. &l+- *elta 88 4890 18 4 0 43. 1l2m. !'amp 4425 34 5 0 45. 1l2m. "apporo 6486 26 . 0 47. 1ont. !atalina 5798 18 4 0 51. 1ont. 1'oeni3 4424 19 . 0 53. A$+i 5000 9690 17 5 1 55. #4 320i 9735 25 4 1 56. *at-$n 200 6229 23 4 1 57. *at-$n 210 4589 35 5 1 58. *at-$n 510 5079 24 4 1 59. *at-$n 810 8129 21 4 1 61. 5on+a Accor+ 5799 25 5 1 62. 5on+a !i(ic 4499 28 4 1 63. a6+a 78! 3995 30 4 1 64. 1e$geot 604 12990 14 . 1 66. "$.ar$ 3798 35 5 1 67. To2ota !elica 5899 18 5 1 68. To2ota !orolla 3748 31 5 1 69. To2ota !orona 5719 18 5 1 70. 94 *a-'er 7140 23 4 1 71. 94 *ie-el 5397 41 5 1 72. 94 0a..it 4697 25 4 1 73. 94 "cirocco 6850 25 4 1 74. 9ol(o 260 11995 17 5 1 %id you see the values of rep78 that had a value of & Those are missin values. For e!ample, the value of rep78 for the '() Spirit were missin. Stata treats a missin value as positive infinity, the hihest number possible. So, when we said list if !rep78 "# $% Stata included the observations where rep78 was . as well. *f we wanted to include just the valid observations that are reater than or e+ual to ,, we can do the followin to tell Stata we want rep78 "# $ and rep78 not missing. list if (rep78 >= 4) & !missing(rep78) make price mpg rep78 foreign 5. #$ick %lectra 7827 15 4 0 15. !'e(. )mpala 5705 16 4 0 20. *o+ge !olt 3984 30 5 0 24. ,or+ ,ie-ta 4389 28 4 0 29. erc. #o.cat 3829 22 4 0 30. erc. !o$gar 5379 14 4 0 33. erc. /0-7 6303 14 4 0 35. &l+- 98 8814 21 4 0 38. &l+- *elta 88 4890 18 4 0 43. 1l2m. !'amp 4425 34 5 0 47. 1ont. !atalina 5798 18 4 0 53. A$+i 5000 9690 17 5 1 55. #4 320i 9735 25 4 1 56. *at-$n 200 6229 23 4 1 57. *at-$n 210 4589 35 5 1 58. *at-$n 510 5079 24 4 1 59. *at-$n 810 8129 21 4 1 61. 5on+a Accor+ 5799 25 5 1 62. 5on+a !i(ic 4499 28 4 1 63. a6+a 78! 3995 30 4 1 66. "$.ar$ 3798 35 5 1 - 67. To2ota !elica 5899 18 5 1 68. To2ota !orolla 3748 31 5 1 69. To2ota !orona 5719 18 5 1 70. 94 *a-'er 7140 23 4 1 71. 94 *ie-el 5397 41 5 1 72. 94 0a..it 4697 25 4 1 73. 94 "cirocco 6850 25 4 1 74. 9ol(o 260 11995 17 5 1 We can use if with most Stata commands. #ere, we et summary statistics for price for cars with repair histories of $ or -. "ote the .. represents *S /01'L T2 and 3 represents 24. summarize price if (rep78 == ) ! (rep78 == ") 9aria.le | &.- ean "t+. *e(. in a3 ---------+----------------------------------------------------- price | 10 5687 3216.375 3667 14500 ' simpler way to say this would be... summarize price if (rep78 #= ") 9aria.le | &.- ean "t+. *e(. in a3 ---------+----------------------------------------------------- price | 10 5687 3216.375 3667 14500 Likewise, we can do this for cars with repair history of 5, , or 6. summarize price if (rep78 == $) ! (rep78 == 4) ! (rep78 == %) 9aria.le | &.- ean "t+. *e(. in a3 ---------+----------------------------------------------------- price | 59 6223.847 2880.454 3291 15906 Let's simplify this by sayin rep78 9. 5. summarize price if (rep78 >= $) 9aria.le | &.- ean "t+. *e(. in a3 ---------+----------------------------------------------------- price | 64 6239.984 2925.843 3291 15906 %id you see the mistake we made& We accidentally included the missin values because we forot to e!clude them. We really needed to say. summarize price if (rep78 >= $) & !missing(rep78) 9aria.le | &.- ean "t+. *e(. in a3 ---------+----------------------------------------------------- price | 59 6223.847 2880.454 3291 15906 Summar& (ost Stata commands can be followed by if, for e!ample Summari:e if rep78 e+uals - summarize if (rep78 == ") Summari:e if rep78 is reater than or e+ual to - summarize if (rep78 >= ") 5 Summari:e if rep78 reater than - summarize if (rep78 > ") Summari:e if rep78 less than or e+ual to - summarize if (rep78 #= ") Summari:e if rep78 less than - summarize if (rep78 #") Summari:e if rep78 not e+ual to - summarize if (rep78 != ") If e'pressions can be connected with 3 for 24 ; for '"% Missing (alues (issin values are represented as . and are the hihest value possible. therefore, when values are missin, be careful with commands like summarize if (rep78 > $) summarize if (rep78 >= $) summarize if (rep78 != $) to omit missin values, use summarize if (rep78 > $) & !missing(rep78) summarize if (rep78 >= $) & !missing(rep78) summarize if (rep78 != $) & !missing(rep78) Stata Learning Module ) statistical sampler in Stata This module will ive a brief overview of some common statistical tests in Stata. Let's use the auto data file that we will use for our e!amples. use auto t*tests Let's do a t<test comparin the miles per allon =mpg> of forein and domestic cars. ttest mpg , by(foreign)
T:o--ample t te-t :it' e;$al (ariance- ------------------------------------------------------------------------------ 7ro$p | &.- ean "t+. %rr. "t+. *e(. <95= !onf. )nter(al> ---------+-------------------------------------------------------------------- 0 | 52 19.82692 .657777 4.743297 18.50638 21.14747 1 | 22 24.77273 1.40951 6.611187 21.84149 27.70396 ---------+-------------------------------------------------------------------- com.ine+ | 74 21.2973 .6725511 5.785503 19.9569 22.63769 ---------+-------------------------------------------------------------------- +iff | -4.945804 1.362162 -7.661225 -2.230384 , ------------------------------------------------------------------------------ *egree- of free+om? 72 5o? mean@0A - mean@1A B +iff B 0 5a? +iff C0 5a? +iff DBE0E 5a? +iffF 0 t B -3.6308 t B -3.6308 t B -3.6308 1 C t B 0.0003 1 F |t| B 0.0005 1 F t B 0.9997 's you see in the output above, the domestic cars had sinificantly lower mpg =$?.8> than the forein cars =-,.7>. +hi*square Let's compare the repair ratin =rep78> of the forein and domestic cars. We can make a crosstab of rep78 by foreign. We may want to ask whether these variables are independent. We can use the chi, option to re+uest a chi<s+uare test of independence as well as the crosstab. tabulate rep78 foreign, c&i" | foreign rep78 | 0 1 | Total -----------+----------------------+---------- 1 | 2 0 | 2 2 | 8 0 | 8 3 | 27 3 | 30 4 | 9 9 | 18 5 | 2 9 | 11 -----------+----------------------+---------- Total | 48 21 | 69 1ear-on c'i2@4A B 27.2640 1r B 0.000 The chi<s+uare is not really valid when you have empty cells. *n such cases when you have empty cells, or cells with small fre+uencies, you can re+uest Fisher's e!act test with the e'act option. tabulate rep78 foreign, c&i" e'act | foreign rep78 | 0 1 | Total -----------+----------------------+---------- 1 | 2 0 | 2 2 | 8 0 | 8 3 | 27 3 | 30 4 | 9 9 | 18 5 | 2 9 | 11 -----------+----------------------+---------- Total | 48 21 | 69 1ear-on c'i2@4A B 27.2640 1r B 0.000 ,i-'erG- e3act B 0.000 +orrelation We can use the correlate command to et the correlations amon variables. Let's look at the correlations amon price mpg weight and rep78. =We use rep78 in the correlation even thouh it is not continuous to illustrate what happens when you use correlate with variables with missin data.> correlate price mpg (eig&t rep78 @o.-B69A | price mpg :eig't rep78 6 ---------+------------------------------------ price | 1.0000 mpg | -0.4559 1.0000 :eig't | 0.5478 -0.8055 1.0000 rep78 | 0.0066 0.4023 -0.4003 1.0000 "ote that the output above said =obs.@?>. The correlate command drops data on a listwise basis, meanin that if any of the variables are missin, then the entire observation is omitted from the correlation analysis. We can use pwcorr =pairwise correlations> if we want to obtain correlations that deletes missin data on a pairwise basis instead of a listwise basis. We will use the obs option to show the number of observations used for calculatin each correlation. p(corr price mpg (eig&t rep78, obs | price mpg :eig't rep78 ----------+------------------------------------ price | 1.0000 | 74 | mpg | -0.4686 1.0000 | 74 74 | :eig't | 0.5386 -0.8072 1.0000 | 74 74 74 | rep78 | 0.0066 0.4023 -0.4003 1.0000 | 69 69 69 69 | "ote how the correlations that involve rep78 have an " of @? compared to the other correlations that have an " of 7,. This is because rep78 has five missin values, so it only had @? valid observations, but the other variables had no missin data so they had 7, valid observations. -egression Let's look at doin reression analysis in Stata. For this e!ample, let's drop the cases where rep78 is $ or - or missin. )rop if (rep78 #= ") ! (rep78==*) @15 o.-er(ation- +elete+A "ow, let's predict mpg from price and weight. 's you see below, weight is a sinificant predictor of mpg, but price is not. regress mpg price (eig&t
"o$rce | "" +f " H$m.er of o.- B 59 ---------+------------------------------ ,@ 2I 56A B 47.87 o+el | 1375.62097 2 687.810483 1ro. F , B 0.0000 0e-i+$al | 804.616322 56 14.3681486 0--;$are+ B 0.6310 ---------+------------------------------ A+J 0--;$are+ B 0.6178 Total | 2180.23729 58 37.5902981 0oot "% B 3.7905 ------------------------------------------------------------------------------ mpg | !oef. "t+. %rr. t 1F|t| <95= !onf. )nter(al> ---------+-------------------------------------------------------------------- price | -.0000139 .0002108 -0.066 0.948 -.0004362 .0004084 :eig't | -.005828 .0007301 -7.982 0.000 -.0072906 -.0043654 Kcon- | 39.08279 1.855011 21.069 0.000 35.36676 42.79882 ------------------------------------------------------------------------------ @ What if we wanted to predict mpg from rep78 as well. rep78 is really more of a cateorical variable than it is a continuous variable. To include it in the reression, we should convert rep78 into dummy variables. Fortunately, Stata makes dummy variables easily usin tabulate. The gen!rep% option tells Stata that we want to enerate dummy variables from rep78 and we want the stem of the dummy variables to be rep. tabulate rep78, gen(rep) rep78 | ,re;. 1ercent !$m. ------------+----------------------------------- 3 | 30 50.85 50.85 4 | 18 30.51 81.36 5 | 11 18.64 100.00 ------------+----------------------------------- Total | 59 100.00 Stata has created rep. =$ if rep78 is 5>, rep, =$ if rep78 is ,> and rep/ =$ if rep78 is 6>. We can use the tabulate command to verify that the dummy variables were created properly. tabulate rep78 rep | rep78BB 3.0000 rep78 | 0 1 | Total -----------+----------------------+---------- 3 | 0 30 | 30 4 | 18 0 | 18 5 | 11 0 | 11 -----------+----------------------+---------- Total | 29 30 | 59 tabulate rep78 rep" | rep78BB 4.0000 rep78 | 0 1 | Total -----------+----------------------+---------- 3 | 30 0 | 30 4 | 0 18 | 18 5 | 11 0 | 11 -----------+----------------------+---------- Total | 41 18 | 59 tabulate rep78 rep$ | rep78BB 5.0000 rep78 | 0 1 | Total -----------+----------------------+---------- 3 | 30 0 | 30 4 | 18 0 | 18 5 | 0 11 | 11 -----------+----------------------+---------- Total | 48 11 | 59 "ow we can include rep. and rep, as dummy variables in the reression model. regress mpg price (eig&t rep rep" "o$rce | "" +f " H$m.er of o.- B 59 -------------+------------------------------ ,@ 4I 54A B 26.04 o+el | 1435.91975 4 358.979938 1ro. F , B 0.0000 0e-i+$al | 744.317536 54 13.7836581 0--;$are+ B 0.6586 -------------+------------------------------ A+J 0--;$are+ B 0.6333 Total | 2180.23729 58 37.5902981 0oot "% B 3.7126 ------------------------------------------------------------------------------ mpg | !oef. "t+. %rr. t 1F|t| <95= !onf. )nter(al> -------------+---------------------------------------------------------------- price | -.0001126 .0002133 -0.53 0.600 -.0005403 .0003151 :eig't | -.005107 .0008236 -6.20 0.000 -.0067584 -.0034557 7 rep1 | -2.886288 1.504639 -1.92 0.060 -5.902908 .1303314 rep2 | -2.88417 1.484817 -1.94 0.057 -5.861048 .0927086 Kcon- | 39.89189 1.892188 21.08 0.000 36.09828 43.6855 ------------------------------------------------------------------------------ )nal&sis of 0ariance *f you wanted to do an analysis of variance lookin at the differences in mpg amon the three repair roups, you can use the onewa& command to do this. one(ay mpg rep78 Anal2-i- of 9ariance "o$rce "" +f " , 1ro. F , ------------------------------------------------------------------------ #et:een gro$p- 506.325167 2 253.162583 8.47 0.0006 4it'in gro$p- 1673.91212 56 29.8912879 ------------------------------------------------------------------------ Total 2180.23729 58 37.5902981 #artlettG- te-t for e;$al (ariance-? c'i2@2A B 9.9384 1ro.Fc'i2 B 0.007
*f you include the tabulate option, you et mean mpg for the three roups, which shows that the roup with the best repair ratin =rep78 of 6> also has the hihest mpg =-7.5>. one(ay mpg rep78, tabulate
| "$mmar2 of mpg rep78 | ean "t+. *e(. ,re;. ------------+------------------------------------ 3 | 19.433333 4.1413252 30 4 | 21.666667 4.9348699 18 5 | 27.363636 8.7323849 11 ------------+------------------------------------ Total | 21.59322 6.1310927 59 Anal2-i- of 9ariance "o$rce "" +f " , 1ro. F , ------------------------------------------------------------------------ #et:een gro$p- 506.325167 2 253.162583 8.47 0.0006 4it'in gro$p- 1673.91212 56 29.8912879 ------------------------------------------------------------------------ Total 2180.23729 58 37.5902981 #artlettG- te-t for e;$al (ariance-? c'i2@2A B 9.9384 1ro.Fc'i2 B 0.007
*f you want to include covariates, you need to use the ano0a command. The continuous!price weight% option tells Stata that those variables are covariates. ano+a mpg rep78 price (eig&t, continuous(price (eig&t)
H$m.er of o.- B 59 0--;$are+ B 0.6586 0oot "% B 3.71263 A+J 0--;$are+ B 0.6333 "o$rce | 1artial "" +f " , 1ro. F , -----------+---------------------------------------------------- o+el | 1435.91975 4 358.979938 26.04 0.0000 | rep78 | 60.2987853 2 30.1493926 2.19 0.1221 8 price | 3.8421233 1 3.8421233 0.28 0.5997 :eig't | 529.932889 1 529.932889 38.45 0.0000 | 0e-i+$al | 744.317536 54 13.7836581 -----------+---------------------------------------------------- Total | 2180.23729 58 37.5902981
Stata Learning Module )n o0er0iew of Stata s&nta' This module shows the eneral structure of Stata commands. We will do this usin summari1e as an e!ample, althouh this eneral structure applies to most Stata commands. Let's first use the auto data file. use auto 's you have seen, we can type summari1e and it will ive us summary statistics for all of the variables. summarize 9aria.le | &.- ean "t+. *e(. in a3 ---------+----------------------------------------------------- make | 0 price | 74 6165.257 2949.496 3291 15906 mpg | 74 21.2973 5.785503 12 41 rep78 | 69 3.405797 .9899323 1 5 '+room | 74 2.993243 .8459948 1.5 5 tr$nk | 74 13.75676 4.277404 5 23 :eig't | 74 3019.459 777.1936 1760 4840 lengt' | 74 187.9324 22.26634 142 233 t$rn | 74 39.64865 4.399354 31 51 +i-pl | 74 197.2973 91.83722 79 425 gratio | 74 3.014865 .4562871 2.19 3.89 foreign | 74 .2972973 .4601885 0 1 *t is also possible to name the variables you are interested in, like below we et summary statistics just for mpg and price. summarize mpg price 9aria.le | &.- ean "t+. *e(. in a3 ---------+----------------------------------------------------- mpg | 74 21.2973 5.785503 12 41 price | 74 6165.257 2949.496 3291 15906 We could further tell Stata to limit the summary statistics to just forein cars by addin an if clause. summarize mpg price if (foreign == ) 9aria.le | &.- ean "t+. *e(. in a3 ---------+----------------------------------------------------- mpg | 22 24.77273 6.611187 14 41 price | 22 6384.682 2621.915 3748 12990 The if clause can contain more than one condition. #ere, we ask for summary statistics for the forein cars which et less than 5A miles per allon. summarize mpg price if (foreign == ) & (mpg #$,) 9aria.le | &.- ean "t+. *e(. in a3 ? ---------+----------------------------------------------------- mpg | 17 21.94118 3.896643 14 28 price | 17 6996.235 2674.552 3895 12990 We can use the detail option to ask Stata to ive us more detail in the summary statistics. "otice that the detail option oes after the comma. *f the comma were omitted, Stata would ive an error. summarize mpg price if (foreign == ) & (mpg #$,) , )etail mpg ------------------------------------------------------------- 1ercentile- "malle-t 1= 14 14 5= 14 17 10= 17 17 &.- 17 25= 18 18 "$m of 4gt. 17 50= 23 ean 21.94118 8arge-t "t+. *e(. 3.896643 75= 25 25 90= 26 25 9ariance 15.18382 95= 28 26 "ke:ne-- -.4901235 99= 28 28 L$rto-i- 2.201759 price ------------------------------------------------------------- 1ercentile- "malle-t 1= 3895 3895 5= 3895 4296 10= 4296 4499 &.- 17 25= 5079 4697 "$m of 4gt. 17 50= 6229 ean 6996.235 8arge-t "t+. *e(. 2674.552 75= 8129 9690 90= 11995 9735 9ariance 7153229 95= 12990 11995 "ke:ne-- .9818272 99= 12990 12990 L$rto-i- 2.930843 "ote that even thouh we built these parts up one at a time, they don't have to o toether. Let's look at some other forms of the summari1e command. Bou can tell Stata which observation numbers you want usin the in clause. #ere we ask for summaries of observations $ to $A. This is useful if you have a bi data file and want to try out a command on a subset of all your observations. summarize in -, 9aria.le | &.- ean "t+. *e(. in a3 ---------+----------------------------------------------------- make | 0 price | 10 5517.4 2063.518 3799 10372 mpg | 10 19.5 3.27448 15 26 rep78 | 8 3.125 .3535534 3 4 '+room | 10 3.3 .7527727 2 4.5 tr$nk | 10 14.7 3.88873 10 21 :eig't | 10 3271 558.3796 2230 4080 lengt' | 10 194 19.32759 168 222 t$rn | 10 40.2 3.259175 34 43 +i-pl | 10 223.9 71.77503 121 350 gratio | 10 2.907 .3225264 2.41 3.58 foreign | 10 0 0 0 0 $A 'lso, recall that you can ask Stata to perform summaries for forein and domestic cars separately usin b&, as shown below. sort foreign by foreign. summarize -F foreignB 0 9aria.le | &.- ean "t+. *e(. in a3 ---------+----------------------------------------------------- make | 0 price | 52 6072.423 3097.104 3291 15906 mpg | 52 19.82692 4.743297 12 34 rep78 | 48 3.020833 .837666 1 5 '+room | 52 3.153846 .9157578 1.5 5 tr$nk | 52 14.75 4.306288 7 23 :eig't | 52 3317.115 695.3637 1800 4840 lengt' | 52 196.1346 20.04605 147 233 t$rn | 52 41.44231 3.967582 31 51 +i-pl | 52 233.7115 85.26299 86 425 gratio | 52 2.806538 .3359556 2.19 3.58 foreign | 52 0 0 0 0 -F foreignB 1 9aria.le | &.- ean "t+. *e(. in a3 ---------+----------------------------------------------------- make | 0 price | 22 6384.682 2621.915 3748 12990 mpg | 22 24.77273 6.611187 14 41 rep78 | 21 4.285714 .7171372 3 5 '+room | 22 2.613636 .4862837 1.5 3.5 tr$nk | 22 11.40909 3.216906 5 16 :eig't | 22 2315.909 433.0035 1760 3420 lengt' | 22 168.5455 13.68255 142 193 t$rn | 22 35.40909 1.501082 32 38 +i-pl | 22 111.2273 24.88054 79 163 gratio | 22 3.507273 .2969076 2.98 3.89 foreign | 22 1 0 1 1 Let's review all those pieces. ' command can be preceded with a b& clause, as shown below. summari1e preceded with b& by foreign. summarize There are many parts that can come after a command, they are each presented separately below. summari1e with names of variables summarize mpg price summari1e with in specifyin records to summari:e. summarize in -, summari1e with simple if specifyin records to summari:e. summarize if (foreign == ) summari1e with comple! if specifyin records to summari:e. $$ summarize if (foreign == ) & (mpg > $,) summari1e followed by option=s>. summarize , )etail So, puttin it all toether, the eneral synta! of the summari:e command can be described asC /by +arlist.0 summarize /+arlist0 /in range0 /if e'p0 , /options0 1nderstandin the overall synta! of Stata commands helps you remember them and use them more effectively, and it also helps you understand the help in Stata. 'll the e!tra stuff about b&, if and in could be confusin. Let's have a look at the help for summari:e and it makes more sense knowin what the b&, if and in parts mean. &elp summarize ------------------------------------------------------------------------------- 'elp for -$mmari6e @man$al? <0> -$mmari6eA ------------------------------------------------------------------------------- "$mmar2 -tati-tic- ------------------ <.2 (arli-t?> -$mmari6e <(arli-t> <:eig't> <if e3p> <in range> <I M +etail | meanonl2 N format > Stata Learning Module Using and sa0ing files in Stata Using and sa0ing Stata data files The use command ets a Stata data file from disk and places it in memory so you can analy:e andDor modify it. ' data file must be read into memory before you can analy:e it. *t is kind of like when you open a 2ord documentE you need to read a 2ord document into 2ord before you can work with it. The use command below ets the Stata data file called autodta from disk and places it in memory so we can analy:e andDor modify it. Since Stata data files end with dta you need only say use auto and Stata knows to read in the file called autodta. sysuse auto The describe command tells you information about the data that is currently sittin in memory. )escribe !ontain- +ata from a$to.+ta o.-? 74 (ar-? 12 17 ,e. 1999 10?49 -i6e? 3I108 @99.6= of memor2 freeA ------------------------------------------------------------------------------- 1. make -tr17 =17- 2. price int =9.0g 3. mpg .2te =9.0g 4. rep78 .2te =9.0g 5. '+room float =9.0g 6. tr$nk .2te =9.0g 7. :eig't int =9.0g 8. lengt' int =9.0g 9. t$rn .2te =9.0g 10. +i-pl int =9.0g 11. gratio float =9.0g $- 12. foreign .2te =9.0g ------------------------------------------------------------------------------- "orte+ .2? "ow that the data is in memory, we can analy:e it. For e!ample, the summari1e command ives summary statistics for the data currently in memory. summarize 9aria.le | &.- ean "t+. *e(. in a3 ---------+----------------------------------------------------- make | 0 price | 74 6165.257 2949.496 3291 15906 mpg | 74 21.2973 5.785503 12 41 rep78 | 69 3.405797 .9899323 1 5 '+room | 74 2.993243 .8459948 1.5 5 tr$nk | 74 13.75676 4.277404 5 23 :eig't | 74 3019.459 777.1936 1760 4840 lengt' | 74 187.9324 22.26634 142 233 t$rn | 74 39.64865 4.399354 31 51 +i-pl | 74 197.2973 91.83722 79 425 gratio | 74 3.014865 .4562871 2.19 3.89 foreign | 74 .2972973 .4601885 0 1 Let's make a chane to the data in memory. We will compute a variable called price, which will be double the value of price. generate price" = "1price *f we use the describe command aain, we see the variable we just created is part of the data in memory. We also see a note from Stata sayin dataset has changed since last sa0ed. Stata knows that the data in memory has chaned, and would need to be saved to avoid losin the chanes. *t is like when you are editin a 2ord documentE if you don't save the data, any chanes you make will be lost. *f we shut the computer off before savin the chanes, the chanes we made would be lost. )escribe !ontain- +ata from a$to.+ta o.-? 74 (ar-? 13 17 ,e. 1999 10?49 -i6e? 3I404 @99.6= of memor2 freeA ------------------------------------------------------------------------------- 1. make -tr17 =17- 2. price int =9.0g 3. mpg .2te =9.0g 4. rep78 .2te =9.0g 5. '+room float =9.0g 6. tr$nk .2te =9.0g 7. :eig't int =9.0g 8. lengt' int =9.0g 9. t$rn .2te =9.0g 10. +i-pl int =9.0g 11. gratio float =9.0g 12. foreign .2te =9.0g 13. price2 float =9.0g ------------------------------------------------------------------------------- "orte+ .2? Hote? +ata-et 'a- c'ange+ -ince la-t -a(e+ The sa0e command is used to save the data in memory permanently on disk. Let's save this data and call it auto, =Stata will save it as auto,dta>. $5 sa+e auto" file a$to2.+ta -a(e+ Let's make another chane to the dataset. We will compute a variable called price/ which will be three times the value of price. generate price$ = $1price Let's try to save this data aain to auto, sa+e auto" file a$to2.+ta alrea+2 e3i-t- r@602AO %id you see how Stata said file auto,dta alread& e'ists& Stata is worried that you will accidentally overwrite your data file. Bou need to use the replace option to tell Stata that you know that the file e!ists and you want to replace it. sa+e auto", replace file a$to2.+ta -a(e+ Let's make another chane to the data in memory by creatin a variable called price$ that is four times the price. generate price4 = price14 Suppose we want to use the oriinal auto file and we don't care if we lose the chanes we just made in memory =i.e., losin the variable price$>. We can try to use the auto file. sysuse auto noO +ata in memor2 :o$l+ .e lo-t r@4AO See how Stata refused to use the file, sayin no3 data in memor& would be lost& Stata did not want you to lose the chanes that you made to the data sittin in memory. *f you really want to discard the chanes in memory, then use need to use the clear option on the use command, as shown below. sysuse auto, clear Stata tries to protect you from losin your data by doin the followinC $. *f you want to sa0e a file over an e!istin file, you need to use the replace option, e.., sa0e auto, replace. -. *f you try to use a file and the file in memory has unsaved chanes, you need to use the clear option to tell Stata that you want to discard the chanes, e.., use auto, clear. Fefore we move on to the ne!t topic, let's clear out the data in memory. clear Using files larger than . megab&te When you use a data file, Stata reads the entire file into memory. Fy default, Stata limits the si:e of data in memory to $ meabyte =G) version @.A *ntercooled>. Bou can view the amount of memory that Stata has reserved for data with the memor& command. $, memory Total memor2 1I048I576 .2te- 100.00= o(er'ea+ @pointer-A 0 0.00= +ata 0 0.00= ------------ +ata + o(er'ea+ 0 0.00= program-I -a(e+ re-$lt-I etc. 1I152 0.11= ------------ Total 1I152 0.11= ,ree 1I047I424 99.89= *f you try to use a file which e!ceeds the amount of memory Stata has allocated for data, it will ive you an error messae like this. no room to add more obser0ations r!45.%3 Bou can increase the amount of memory that Stata has allocated to data usin the set memor& command. For e!ample, if you had a data file which was $.6 meabytes, you can set the memory to, say, - meabytes shown below. set memory "m @2048kA 2nce you have increased the memory, you should be able to use the data file if you have allocated enouh memory for it. Summar& To use the auto file from disk and read it into memory sysuse auto To sa0e the file auto from memory to disk sa+e auto To sa0e a file if the file auto already e!ists sa+e auto, replace to use a file auto and clear out the current data in memory sysuse auto, clear *f you want to clear out the data in memory, you want to lose the chanes clear To allocate - meabytes of memory for a data file. set memory "m $6 To view the allocation of memory to data and how much is used. memory Stata Learning Module Inputting &our data into Stata This module will show how to input your data into Stata. This covers inputtin data with comma delimited, tab delimited, space delimited, and fi!ed column data. . 6&ping data into the Stata editor 2ne of the easiest methods for ettin data into Stata is usin the Stata data editor, which resembles an /!cel spreadsheet. *t is useful when your data is on paper and needs to be typed in, or if your data is already typed into an /!cel spreadsheet. To learn more about the Stata data editor, see the edit module. , +omma7tab separated file with 0ariable names on line . Two common file formats for raw data are comma separated files and tab separated files. Such files are commonly made from spreadsheet prorams like 8'cel. )onsider the comma delimited file shown below. type auto"*ra( makeI mpgI :eig'tI price A! !oncor+I 22I 2930I 4099 A! 1acerI 17I 3350I 4749 A! "piritI 22I 2640I 3799 #$ick !ent$r2I 20I 3250I 4816 #$ick %lectraI 15I4080I 7827 This file has two characteristicsC < The first line has the names of the variables separated by commas, < The followin lines have the values for the variables, also separated by commas. This kind of file can be read usin the insheet command, as shown below. ins&eet using auto"*ra( @4 (ar-I 5 o.-A We can check to see if the data came in riht usin the list command. list make mpg :eig't price 1. A! !oncor+ 22 2930 4099 2. A! 1acer 17 3350 4749 3. A! "pirit 22 2640 3799 4. #$ick !ent$r2 20 3250 4816 5. #$ick %lectra 15 4080 7827 Since you will likely have more observations, you can use in to list just a subset of observations. Felow, we list observations $ throuh 5. list in -$ make mpg :eig't price 1. A! !oncor+ 22 2930 4099 $@ 2. A! 1acer 17 3350 4749 3. A! "pirit 22 2640 3799 "ow that the file has been read into Stata, you can save it with the sa0e command =we will skip doin that step>. The e!act same insheet command could be used to read a tab delimited file. The insheet command is clever because it can fiure out whether you have a comma delimited or tab delimited file, and then read it. =#owever, insheet could not handle a file that uses a mi!ture of commas and tabs as delimiters.> Fefore startin the ne!t section, let's clear out the e!istin data in memory. clear / +omma7tab separated file !no 0ariable names in file% )onsider a file that is identical to the one we e!amined in the previous section, but it does not have the variable names on line $ type auto$*ra( A! !oncor+I 22I 2930I 4099 A! 1acerI 17I 3350I 4749 A! "piritI 22I 2640I 3799 #$ick !ent$r2I 20I 3250I 4816 #$ick %lectraI 15I4080I 7827 This file can be read usin the insheet command as shown below. ins&eet using auto$*ra( @4 (ar-I 5 o.-A Fut where did Stata et the variable names& *f Stata does not have names for the variables, it names them 0., 0,, 0/ etc., as you can see below. list (1 (2 (3 (4 1. A! !oncor+ 22 2930 4099 2. A! 1acer 17 3350 4749 3. A! "pirit 22 2640 3799 4. #$ick !ent$r2 20 3250 4816 5. #$ick %lectra 15 4080 7827 Let's clear out the data in memory, and then try readin the data aain. clear "ow, let's try readin the data and tell Stata the names of the variables on the insheet command. ins&eet make mpg (eig&t price using auto$*ra( @4 (ar-I 5 o.-A 's the list command shows, Stata used the variable names supplied on the insheet command. list make mpg :eig't price 1. A! !oncor+ 22 2930 4099 $7 2. A! 1acer 17 3350 4749 3. A! "pirit 22 2640 3799 4. #$ick !ent$r2 20 3250 4816 5. #$ick %lectra 15 4080 7827 The insheet command works e+ually well on files which use tabs as separators. Stata e!amines the file and determines whether commas or tabs are bein used as separators and reads the file appropriately. "ow that the file has been read into Stata, you can save it with the sa0e command =we will skip doin that step>. Let's clear out the data in memory before oin to the ne!t section. clear $ Space separated file )onsider a file where the variables are separated by spaces like the one shown below. type auto4*ra( EA! !oncor+E 22 2930 4099 EA! 1acerE 17 3350 4749 EA! "piritE 22 2640 3799 E#$ick !ent$r2E 20 3250 4816 E#$ick %lectraE 15 4080 7827 "ote that the make of car is contained within +uotation marks. This is necessary because the names contain spaces within them. Without the +uotes, Stata would think '() is the make and )oncord is the mpg. *f the make did not have spaces embedded within them, the +uotation marks would not be needed. This file can be read with the infile command as shown below. infile str$ make mpg (eig&t price using auto4*ra( @5 o.-er(ation- rea+A Bou may be askin yourself, where did the str./ come from& Since make is a character variable, we need to tell Stata that it is a character variable, and how lon it can be. The str./ tells Stata it is a strin variable and that it could be up to $5 characters wide. The list command confirms that the data was read correctly. list make mpg :eig't price 1. A! !oncor+ 22 2930 4099 2. A! 1acer 17 3350 4749 3. A! "pirit 22 2640 3799 4. #$ick !ent$r2 20 3250 4816 5. #$ick %lectra 15 4080 7827 "ow that the file has been read into Stata, you can save it with the sa0e command =we will skip doin that step>. Let's clear out the data in memory before movin on to the ne!t section. clear 9 Fi'ed format file $8 )onsider a file usin fi!ed column data like the one shown below. type auto%*ra( A! !oncor+ 22 2930 4099 A! 1acer 17 3350 4749 A! "pirit 22 2640 3799 #$ick !ent$r2 20 3250 4816 #$ick %lectra 15 4080 7827 "ote that the variables are clearly defined by which column=s> they are located. 'lso, note that the make of car is not contained within +uotation marks. The +uotations are not needed because the columns define where the make beins and ends, and the embedded spaces no loner create confusion. This file can be read with the infi' command as shown below. infi' str make 2$ mpg %23 (eig&t 82" price "$2"3 using auto%*ra( @5 o.-er(ation- rea+A #ere aain we need to tell Stata that make is a strin variable by precedin make with str. We did not need to indicate the lenth since Stata can infer that make can be up to $5 characters wide based on the column locations. The list command confirms that the data was read correctly. list make mpg :eig't price 1. A! !oncor+ 22 2930 4099 2. A! 1acer 17 3350 4749 3. A! "pirit 22 2640 3799 4. #$ick !ent$r2 20 3250 4816 5. #$ick %lectra 15 4080 7827 "ow that the file has been read into Stata, you can save it with the sa0e command =we will skip doin that step>. Let's clear out the data in memory before movin on to the ne!t section. clear : ;ther methods of getting data into Stata This does not cover all possible methods of ettin raw data into Stata, but does cover many common situations. See the Stata 1sers Huide for more comprehensive information on readin raw data into Stata. 'nother method that should be mentioned is the use of data conversion prorams. These prorams can convert data from one file format into another file format. For e!ample, they could directly create a Stata file from an /!cel Spreadsheet, a Lotus Spreadsheet, an 'ccess database, a %base database, a S'S data file, an SGSS system file, etc. Two such e!amples are Stat Transfer and %F(S )opy. Foth of these products are available on SS) G)s and %F(S )opy is available on "icco and 'ristotle. Finally, if you are usin "icco, 'ristotle or the 4SD@AAA )luster, there is a command specifically for convertin S'S data into Stata called sas,stata. *f you have S'S data you want to convert to Stata, this may be a useful way to et your S'S data into Stata. 7 Summar& Frin up the Stata data editor for typin data in. $? * e)it 4ead in the comma or tab delimited file called auto,raw takin the variable names from the first line of data. * ins&eet using auto"*ra(, clear 4ead in the comma or tab delimited file called auto/raw namin the variables mp weiht and price. * ins&eet make mpg (eig&t price using auto$*ra(, clear 4ead in the space separated file named auto$raw. The variable make is surrounded by +uotes because it has embedded blanks. * infile str$ make mpg (eig&t price using auto4*ra(, clear 4ead in the fi!ed format file named auto9raw. * infi' str make 2$ mpg %23 (eig&t 82" using auto%*ra(, clear 2ther methods <=MS7+op&, Stat 6ransfer, sas,stata, and Stata Users >uide. Stata Learning Module Using dates in Stata This module will show how to use date variables, date functions, and date display formats in Stata. +on0erting dates from raw data using the ?date!%? function The trick to inputtin dates in Stata is to foret they are dates, and treat them as character strins, and then later convert them into a Stata date variable. Bou miht have the followin date data in your raw data file. type )ates*ra( Po'n 1 Pan 1960 ar2 11 P$l 1955 Late 12 Ho( 1962 ark 8 P$n 1959 Bou can read these data by typinC infi' str name 24 str b)ay 327 using )ates*ra( @4 o.-er(ation- rea+A 1sin the list command, you can see that the date information has been read correctly into bda&. list name .+a2 1. Po'n 1 Pan 1960 2. ar2 11 P$l 1955 3. Late 12 Ho( 1962 4. ark 8 P$n 1959 Since bda& is a strin variable, you cannot do any kind of date computations with it until you make a date variable from it. Bou can enerate a date version of bda& usin the date!% function. The e!ample below creates a date variable called birthda& from the character variable bda&. The synta! is slihtly different dependin on which version of Stata -A you are usin. The difference is in how the pattern is specified. *n Stata ? it should be lower case =e.., IdmyI> and in Stata $A, it should be upper case for day, month, and year =e.., I%(BI> but lower case if you want to specify hours, minutes or seconds =e.., I%(BhmsI>. 2ur data are in the order day, month, year, so we use I%(BI =or IdmyI if you are usin Stata ?> within the date!% command. =1nless otherwise noted, all other Stata commands on this pae are the same for versions ? and $A.> *n Stata 0ersion 4C generate birt&)ay=)ate(b)ay,4)my4) *n Stata 0ersion .5C generate birt&)ay=)ate(b)ay,45674) Let's have a look at both bda& and birthda&. list name .+a2 .irt'+a2 1. Po'n 1 Pan 1960 0 2. ar2 11 P$l 1955 -1635 3. Late 12 Ho( 1962 1046 4. ark 8 P$n 1959 -207 The values for birthday may seem confusin. The value of birthda& for John is A and the value of birthda& for (ark is <-A7. %ates are actually stored as the number of da&s from @an ., .4:5 which is convenient for the computer storin and performin date computations, but is difficult for you and * to read. We can tell Stata that birthda& should be displayed usin the Kd format to make it easier for humans to read. format birt&)ay 8) list name .+a2 .irt'+a2 1. Po'n 1 Pan 1960 01Jan1960 2. ar2 11 P$l 1955 11J$l1955 3. Late 12 Ho( 1962 12no(1962 4. ark 8 P$n 1959 08J$n1959 The date!% function is very fle!ible and can handle dates written in almost any manner. For e!ample, consider the file dates,raw. type )ates"*ra( Po'n Pan 1 1960 ar2 07Q11Q1955 Late 11.12.1962 ark P$nQ8 1959 These dates are messy, but they are consistent. /ven thouh the formats look different, it is always a month day year separated by a delimiter =e.., space slash dot or dash>. We can try usin the synta! from above to read in our new dates. "ote that, as discussed above, for Stata version $A the order of the date is declared in upper case letters =i.e., I(%BI> while for version ? it is declared in all lower case =i.e., ImdyI>. clear infi' str name 24 str b)ay 327 using )ates"*ra(
@4 o.-er(ation- rea+A
generate birt&)ay=)ate(b)ay,46574) -$ format birt&)ay 8) list name .+a2 .irt'+a2 1. Po'n Pan 1 1960 01Jan1960 2. ar2 07Q11Q1955 11J$l1955 3. Late 11.12.1962 12no(1962 4. ark P$nQ8 1959 08J$n1959 Stata was able to read those dates without a problem. Let's try an even touher set of dates. For e!ample, consider the dates in dates/raw. type )ates$*ra( 4-12-1990 4.12.1990 Apr 12I 1990 Apr12I1990 April 12I 1990 4Q12.1990 Apr121990 Let's try readin these dates and see how Stata handles them. 'ain, remember that for Stata version $A dates are declared I(%BI while for version ? they are declared ImdyI. clear infi' str b)ay 2", using )ates$*ra( @7 o.-er(ation- rea+A generate birt&)ay=)ate(b)ay,46574) @1 mi--ing (al$e generate+A format birt&)ay 8) list .+a2 .irt'+a2 1. 4-12-1990 12apr1990 2. 4.12.1990 12apr1990 3. Apr 12I 1990 12apr1990 4. Apr12I1990 12apr1990 5. April 12I 1990 12apr1990 6. 4Q12.1990 12apr1990 7. Apr121990 . 's you can see, Stata was able to handle almost all of those cra:y date formats. *t was able to handle 'pr$-,$??A even thouh there was not a delimiter between the month and day =Stata was able to fiure it out since the month was character and the day was a number>. The only date that did not work was 'pr$-$??A and that is because there was no delimiter between the day and year. 's you can see, the date!% function can handle just about any date as lon as there are delimiters separatin the month day and year. *n certain cases Stata can read all numeric dates entered without delimiters, see help dates for more information. +on0erting dates from raw data using the md&!% function *n some cases, you may have the month, day, and year stored as numeric variables in a dataset. For e!ample, you may have the followin data for birth dates from dates$raw. type )ates4*ra( 7 11 1948 1 1 1960 10 15 1970 12 10 1971 Bou can read in this data usin the followin synta! to create a separate variable for month, day and year. -- clear infi' mont& 2" )ay 42% year 72, using )ates4*ra( @4 o.-er(ation- rea+A list mont' +a2 2ear 1. 7 11 1948 2. 1 1 1960 3. 10 15 1970 4. 12 10 1971 ' Stata date variable can be created usin the mdy=> function as shown below. generate birt&)ay=m)y(mont&,)ay,year) Let's format birthday usin the Ad format so it displays better. format birt&)ay 8) list mont' +a2 2ear .irt'+a2 1. 7 11 1948 11J$l1948 2. 1 1 1960 01Jan1960 3. 10 15 1970 15oct1970 4. 12 10 1971 10+ec1971 )onsider the data in dates9raw, which is the same as dates,.raw e!cept that only two diits are used to sinify the year. type )ates%*ra( 7 11 48 1 1 60 10 15 70 12 10 71 Let's try readin these dates just like we read dates$raw. clear infi' mont& 2" )ay 42% year 72, using )ates%*ra( @4 o.-er(ation- rea+A generate birt&)ay=m)y(mont&,)ay,year) @4 mi--ing (al$e- generate+A format birt&)ay 8) list mont' +a2 2ear .irt'+a2 1. 7 11 48 . 2. 1 1 60 . 3. 10 15 70 . 4. 12 10 71 . 's you can see, the values for birthda& are all missin. This is because Stata assumes that the years were literally ,8, @A, 7A and 7$ =it does not assume they are $?,8, $?@A, $?7A and $?7$>. Bou can force Stata to assume the century portion is $?AA by addin $?AA to the year as shown below =note that we use replace instead of generate since the variable birthda& already e!ists>. replace birt&)ay=m)y(mont&,)ay,year9:,,) @4 real c'ange- ma+eA format birt&)ay 8) list mont' +a2 2ear .irt'+a2 1. 7 11 48 11J$l1948 -5 2. 1 1 60 01Jan1960 3. 10 15 70 15oct1970 4. 12 10 71 10+ec1971 +omputations with elapsed dates %ate variables make computations involvin dates very convenient. For e!ample, to calculate everyone's ae on January $, -AAA simply use the followin conversion. generate age",,,=( m)y(,,",,,) 2 birt&)ay ) - $3%*"% list mont' +a2 2ear .irt'+a2 age2000 1. 7 11 48 11J$l1948 51.47433 2. 1 1 60 01Jan1960 40 3. 10 15 70 15oct1970 29.21287 4. 12 10 71 10+ec1971 28.06023 Glease note that this formula for ae does not work well over very short time spans. For e!ample, the ae for a child on their his birthday will be less than one due to usin 5@6.-6. There are formulas that are more e!act but also much more comple!. #ere is an e!ample courtesy of %an Flanchette. generate altage = floor((/ym(",,,, ) 2 ym(year(birt&)ay), mont&(birt&)ay))0 2 / # )ay(birt&)ay)0) - ") ;ther date functions Hiven a date variable, one can have the month, day and year returned separately if desired, usin the month!%, da&!% and &ear!% functions, respectively. generate m=mont&(birt&)ay) generate )=)ay(birt&)ay) generate y=year(birt&)ay) list m ) y birt&)ay m + 2 .irt'+a2 1. 7 11 1948 11J$l1948 2. 1 1 1960 01Jan1960 3. 10 15 1970 15oct1970 4. 12 10 1971 10+ec1971 *f you'd like to return the da& of the week for a date variable, use the dow!% function =where A.Sunday, $.(onday etc.>. gen (eek;)=)o((birt&)ay) list birt&)ay (eek;) .irt'+a2 :eekK+ 1. 11J$l1948 0 2. 01Jan1960 5 3. 15oct1970 4 4. 10+ec1971 5 Summar& The date!% function converts strins containin dates to date variables. The synta! varies slihtly by version. *n Stata 0ersion 4C gen )ate" = )ate()ate, 4)my4) *n Stata 0ersion .5C -, gen )ate" = )ate()ate, 45674) The md&!% function takes three numeric aruments =month, day, year> and converts them to a date variable. generate birt&)ay=m)y(mont&,)ay,year) Bou can display elapsed times as actual dates with display formats such as the Ad format. format birt&)ay 8) 2ther date functions include the month!%, da&!%, &ear!%, and dow!% functions. For online help with dates, type help dates at the command line. For more detailed e!planations about how Stata handles dates and date functions, please refer to the Stata 1sers Huide. Stata Learning Module Labeling data This module will show how to create labels for your data. Stata allows you to label your data file =data label>, to label the variables within your data file =0ariable labels>, and to label the values for your variables =0alue labels>. Let's use a file called autolab that does not have any labels. use &ttp.--(((*ats*ucla*e)u-stat-stata-mo)ules-autolab*)ta, clear Let's use the describe command to verify that indeed this file does not have any labels. )escribe !ontain- +ata from a$tola..+ta o.-? 74 1978 A$tomo.ile *ata (ar-? 12 23 &ct 2008 13?36 -i6e? 3I478 @99.9= of memor2 freeA @K+ta 'a- note-A ------------------------------------------------------------------------------------------------ ------------------------- -torage +i-pla2 (al$e (aria.le name t2pe format la.el (aria.le la.el ------------------------------------------------------------------------------------------------ ------------------------- make -tr18 =-18- price int =8.0gc mpg int =8.0g rep78 int =8.0g 'ea+room float =6.1f tr$nk int =8.0g :eig't int =8.0gc lengt' int =8.0g t$rn int =8.0g +i-placement int =8.0g gearKratio float =6.2f foreign .2te =8.0g ------------------------------------------------------------------------------- "orte+ .2? Let's use the label data command to add a label describin the data file. This label can be up to 8A characters lon. label )ata 4<&is file contains auto )ata for t&e year :784 The describe command shows that this label has been applied to the version that is currently in memory. )escribe !ontain- +ata from a$tola..+ta o.-? 74 T'i- file contain- a$to +ata for t'e 2ear 1978 -6 (ar-? 12 23 &ct 2008 13?36 -i6e? 3I478 @99.9= of memor2 freeA @K+ta 'a- note-A ------------------------------------------------------------------------------------------------ ------------------------- -torage +i-pla2 (al$e (aria.le name t2pe format la.el (aria.le la.el ------------------------------------------------------------------------------------------------ ------------------------- make -tr18 =-18- price int =8.0gc mpg int =8.0g rep78 int =8.0g 'ea+room float =6.1f tr$nk int =8.0g :eig't int =8.0gc lengt' int =8.0g t$rn int =8.0g +i-placement int =8.0g gearKratio float =6.2f foreign .2te =8.0g ------------------------------------------------------------------------------- "orte+ .2? Let's use the label 0ariable command to assin labels to the variables rep78 price, mpg and foreign. label +ariable rep78 4t&e repair recor) from :784 label +ariable price 4t&e price of t&e car in :784 label +ariable mpg 4t&e miles per gallon for t&e car4 label +ariable foreign 4t&e origin of t&e car, foreign or )omestic4 The describe command shows these labels have been applied to the variables. )escribe !ontain- +ata from a$tola..+ta o.-? 74 T'i- file contain- a$to +ata for t'e 2ear 1978 (ar-? 12 23 &ct 2008 13?36 -i6e? 3I478 @99.9= of memor2 freeA @K+ta 'a- note-A ------------------------------------------------------------------------------------------------ ------------------------- -torage +i-pla2 (al$e (aria.le name t2pe format la.el (aria.le la.el ------------------------------------------------------------------------------------------------ ------------------------- make -tr18 =-18- price int =8.0gc t'e price of t'e car in 1978 mpg int =8.0g t'e mile- per gallon for t'e car rep78 int =8.0g t'e repair recor+ from 1978 'ea+room float =6.1f tr$nk int =8.0g :eig't int =8.0gc lengt' int =8.0g t$rn int =8.0g +i-placement int =8.0g gearKratio float =6.2f foreign .2te =8.0g t'e origin of t'e carI foreign or +ome-tic ------------------------------------------------------------------------------- "orte+ .2? Let's make a value label called foreignl to label the values of the variable foreign. This is a two step process where you first define the label, and then you assin the label to the variable. The label define command below creates the value label called foreignl that associates A with domestic car and $ with foreign car. -@ label )efine foreignl , 4)omestic car4 4foreign car4 The label 0alues command below associates the variable foreign with the label foreignl. label +alues foreign foreignl *f we use the describe command, we can see that the variable foreign has a value label called foreignl assined to it. )escribe !ontain- +ata from a$tola..+ta o.-? 74 T'i- file contain- a$to +ata for t'e 2ear 1978 (ar-? 12 23 &ct 2008 13?36 -i6e? 3I478 @99.9= of memor2 freeA @K+ta 'a- note-A ------------------------------------------------------------------------------------------------ ------------------------- -torage +i-pla2 (al$e (aria.le name t2pe format la.el (aria.le la.el ------------------------------------------------------------------------------------------------ ------------------------- make -tr18 =-18- price int =8.0gc t'e price of t'e car in 1978 mpg int =8.0g t'e mile- per gallon for t'e car rep78 int =8.0g t'e repair recor+ from 1978 'ea+room float =6.1f tr$nk int =8.0g :eig't int =8.0gc lengt' int =8.0g t$rn int =8.0g +i-placement int =8.0g gearKratio float =6.2f foreign .2te =12.0g foreignl t'e origin of t'e carI foreign or +ome-tic ------------------------------------------------------------------------------- "orte+ .2? "ow when we use the tabulate foreign command, it shows the labels domestic car and foreign car instead of just A and $. table foreign -------------+----------- t'e origin | of t'e carI | foreign or | +ome-tic | ,re;. -------------+----------- +ome-tic car | 52 foreign car | 22 -------------+----------- Lalue labels are used in other commands as well. For e!ample, below we issue the ttest , b&!foreign% command, and the output labels the roups as domestic and foreign =instead of A and $>. ttest mpg , by(foreign) T:o--ample t te-t :it' e;$al (ariance- ------------------------------------------------------------------------------ 7ro$p | &.- ean "t+. %rr. "t+. *e(. <95= !onf. )nter(al> ---------+-------------------------------------------------------------------- +ome-tic | 52 19.82692 .657777 4.743297 18.50638 21.14747 foreign | 22 24.77273 1.40951 6.611187 21.84149 27.70396 ---------+-------------------------------------------------------------------- -7 com.ine+ | 74 21.2973 .6725511 5.785503 19.9569 22.63769 ---------+-------------------------------------------------------------------- +iff | -4.945804 1.362162 -7.661225 -2.230384 ------------------------------------------------------------------------------ *egree- of free+om? 72 5o? mean@+ome-ticA - mean@foreignA B +iff B 0 5a? +iff C0 5a? +iff DBE0E 5a? +iffF 0 t B -3.6308 t B -3.6308 t B -3.6308 1 C t B 0.0003 1 F |t| B 0.0005 1 F t B 0.9997 2ne very important noteC These labels are assined to the data that is currently in memory. To make these chanes permanent, you need to sa0e the data. When you sa0e the data, all of the labels =data labels, variable labels, value labels> will be saved with the data file. Summar& 'ssin a label to the data file currently in memory. label )ata 4:78 auto )ata4 'ssin a label to the variable forein. label +ariable foreign 4t&e origin of t&e car, foreign or )omestic4 )reate the value label foreignl and assin it to the variable foreign. label )efine foreignl , 4)omestic car4 4foreign car4 label +alues foreign foreignl Stata Learning Module +reating and recoding 0ariables This module shows how to create and recode variables. *n Stata you can create new variables with generate and you can modify the values of an e!istin variable with replace and with recode. +omputing new 0ariables using generate and replace Let's use the auto data for our e!amples. *n this section we will see how to compute variables with generate and replace. use auto The variable length contains the lenth of the car in inches. Felow we see summary statistics for length. summarize lengt& 9aria.le | &.- ean "t+. *e(. in a3 ---------+----------------------------------------------------- lengt' | 74 187.9324 22.26634 142 233 Let's use the generate command to make a new variable that has the lenth in feet instead of inches, called lenBft. generate len;ft = lengt& - " -8 We should emphasi:e that generate is for creatin a new variable. For an e!istin variable, you need to use the replace command =not generate>. 's shown below, we use replace to repeat the assinment to lenBft. replace len;ft = lengt& - " @49 real c'ange- ma+eA summarize lengt& len;ft 9aria.le | &.- ean "t+. *e(. in a3 ---------+----------------------------------------------------- lengt' | 74 187.9324 22.26634 142 233 lenKft | 74 15.66104 1.855528 11.83333 19.41667 The synta! of generate and replace are identical, e!ceptC < generate works when the variable does not yet e!ist and will ive an error if the variable already e!ists. < replace works when the variable already e!ists, and will ive an error if the variable does not yet e!ist. Suppose we wanted to make a variable called length, which has length s+uared. generate lengt&" = lengt&=" summarize lengt&" 9aria.le | &.- ean "t+. *e(. in a3 ---------+----------------------------------------------------- lengt'2 | 74 35807.69 8364.045 20164 54289 2r we miht want to make loglen which is the natural lo of length. generate loglen = log(lengt&) summarize loglen 9aria.le | &.- ean "t+. *e(. in a3 ---------+----------------------------------------------------- loglen | 74 5.229035 .1201383 4.955827 5.451038 Let's et the mean and standard deviation of length and we can make M<scores of length. summarize lengt& 9aria.le | &.- ean "t+. *e(. in a3 ---------+----------------------------------------------------- lengt' | 74 187.9324 22.26634 142 233 The mean is $87.?5 and the standard deviation is --.-7, so 1length can be computed as shown below. generate zlengt& = (lengt& 2 87*:$) - ""*"7 summarize zlengt& 9aria.le | &.- ean "t+. *e(. in a3 ---------+----------------------------------------------------- 6lengt' | 74 .0001092 .9998357 -2.062416 2.023799 With generate and replace you can use N < for addition and subtraction -? you can use O D for multiplication and division you can use P for e!ponents =e.., lenthP-> you can use = > for controllin order of operations. -ecoding new 0ariables using generate and replace Suppose that we wanted to break mpg down into three cateories. Let's look at a table of mpg to see where we miht draw the lines for such cateories. tabulate mpg mpg | ,re;. 1ercent !$m. ------------+----------------------------------- 12 | 2 2.70 2.70 14 | 6 8.11 10.81 15 | 2 2.70 13.51 16 | 4 5.41 18.92 17 | 4 5.41 24.32 18 | 9 12.16 36.49 19 | 8 10.81 47.30 20 | 3 4.05 51.35 21 | 5 6.76 58.11 22 | 5 6.76 64.86 23 | 3 4.05 68.92 24 | 4 5.41 74.32 25 | 5 6.76 81.08 26 | 3 4.05 85.14 28 | 3 4.05 89.19 29 | 1 1.35 90.54 30 | 2 2.70 93.24 31 | 1 1.35 94.59 34 | 1 1.35 95.95 35 | 2 2.70 98.65 41 | 1 1.35 100.00 ------------+----------------------------------- Total | 74 100.00 Let's convert mpg into three cateories to help make this more readable. #ere we convert mpg into three cateories usin generate and replace. generate mpg$ = * @74 mi--ing (al$e- generate+A replace mpg$ = if (mpg #= 8) @27 real c'ange- ma+eA replace mpg$ = " if (mpg >= :) & (mpg #="$) @24 real c'ange- ma+eA replace mpg$ = $ if (mpg >= "4) & (mpg #*) @23 real c'ange- ma+eA Let's use tabulate to check that this worked correctly. *ndeed, you can see that a value of $ for mpg/ oes from $-<$8, a value of - oes from $?<-5, and a value of 5 oes from -,<,$. tabulate mpg mpg$ 5A | mpg3 mpg | 1 2 3 | Total -----------+---------------------------------+---------- 12 | 2 0 0 | 2 14 | 6 0 0 | 6 15 | 2 0 0 | 2 16 | 4 0 0 | 4 17 | 4 0 0 | 4 18 | 9 0 0 | 9 19 | 0 8 0 | 8 20 | 0 3 0 | 3 21 | 0 5 0 | 5 22 | 0 5 0 | 5 23 | 0 3 0 | 3 24 | 0 0 4 | 4 25 | 0 0 5 | 5 26 | 0 0 3 | 3 28 | 0 0 3 | 3 29 | 0 0 1 | 1 30 | 0 0 2 | 2 31 | 0 0 1 | 1 34 | 0 0 1 | 1 35 | 0 0 2 | 2 41 | 0 0 1 | 1 -----------+---------------------------------+---------- Total | 27 24 23 | 74 "ow, we could use mpg/ to show a crosstab of mpg/ by foreign to contrast the mileae of the forein and domestic cars. tabulate mpg$ foreign, column | foreign mpg3 | 0 1 | Total -----------+----------------------+---------- 1 | 22 5 | 27 | 42.31 22.73 | 36.49 -----------+----------------------+---------- 2 | 19 5 | 24 | 36.54 22.73 | 32.43 -----------+----------------------+---------- 3 | 11 12 | 23 | 21.15 54.55 | 31.08 -----------+----------------------+---------- Total | 52 22 | 74 | 100.00 100.00 | 100.00 The crosstab above shows that -$K of the domestic cars fall into the high mileage cateory, while 66K of the forein cars fit into this cateory. -ecoding 0ariables using recode There is an easier way to recode mpg to three cateories usin generate and recode. First, we make a copy of mpg, callin it mpg/a. Then, we use recode to convert mpg/a into three cateoriesC min<$8 into $, $?<-5 into -, and -,<ma! into 5. generate mpg$a = mpg reco)e mpg$a (min-8=) (:-"$=") ("4-ma'=$) 5$ @74 c'ange- ma+eA Let's double check to see that this worked correctly. We see that it worked perfectly. tabulate mpg mpg$a | mpg3a mpg | 1 2 3 | Total -----------+---------------------------------+---------- 12 | 2 0 0 | 2 14 | 6 0 0 | 6 15 | 2 0 0 | 2 16 | 4 0 0 | 4 17 | 4 0 0 | 4 18 | 9 0 0 | 9 19 | 0 8 0 | 8 20 | 0 3 0 | 3 21 | 0 5 0 | 5 22 | 0 5 0 | 5 23 | 0 3 0 | 3 24 | 0 0 4 | 4 25 | 0 0 5 | 5 26 | 0 0 3 | 3 28 | 0 0 3 | 3 29 | 0 0 1 | 1 30 | 0 0 2 | 2 31 | 0 0 1 | 1 34 | 0 0 1 | 1 35 | 0 0 2 | 2 41 | 0 0 1 | 1 -----------+---------------------------------+---------- Total | 27 24 23 | 74
-ecodes with if Let's create a variable called mpgfd that assesses the mileae of the cars with respect to their oriin. Let this be a AD$ variable called mpgfd which isC A if below the median mp for its roup =foreinDdomestic> $ if atDabove the median mp for its roup =foreinDdomestic>. sort foreign by foreign. summarize mpg, )etail -F foreignB 0 mpg ------------------------------------------------------------- 1ercentile- "malle-t 1= 12 12 5= 14 12 10= 14 14 &.- 52 25= 16.5 14 "$m of 4gt. 52 50= 19 ean 19.82692 8arge-t "t+. *e(. 4.743297 75= 22 28 90= 26 29 9ariance 22.49887 95= 29 30 "ke:ne-- .7712432 99= 34 34 L$rto-i- 3.441459 5- -F foreignB 1 mpg ------------------------------------------------------------- 1ercentile- "malle-t 1= 14 14 5= 17 17 10= 17 17 &.- 22 25= 21 18 "$m of 4gt. 22 50= 24.5 ean 24.77273 8arge-t "t+. *e(. 6.611187 75= 28 31 90= 35 35 9ariance 43.70779 95= 35 35 "ke:ne-- .657329 99= 41 41 L$rto-i- 3.10734 We see that the median is $? for the domestic =forein..A> cars and -,.6 for the forein =forein..$> cars. The generate and recode commands below recode mpg into mpgfd based on the domestic car median for the domestic cars, and based on the forein car median for the forein cars. generate mpgf) = mpg reco)e mpgf) (min-8=,) (:-ma'=) if foreign==, @52 c'ange- ma+eA reco)e mpgf) (min-"4=,) ("%-ma'=) if foreign== @22 c'ange- ma+eA We can check usin this below, and the recoded value mpgfd looks correct. by foreign. tabulate mpg mpgf) -F foreignB 0 | mpgf+ mpg | 0 1 | Total -----------+----------------------+---------- 12 | 2 0 | 2 14 | 5 0 | 5 15 | 2 0 | 2 16 | 4 0 | 4 17 | 2 0 | 2 18 | 7 0 | 7 19 | 0 8 | 8 20 | 0 3 | 3 21 | 0 3 | 3 22 | 0 5 | 5 24 | 0 3 | 3 25 | 0 1 | 1 26 | 0 2 | 2 28 | 0 2 | 2 29 | 0 1 | 1 30 | 0 1 | 1 34 | 0 1 | 1 -----------+----------------------+---------- Total | 22 30 | 52 -F foreignB 1 | mpgf+ 55 mpg | 0 1 | Total -----------+----------------------+---------- 14 | 1 0 | 1 17 | 2 0 | 2 18 | 2 0 | 2 21 | 2 0 | 2 23 | 3 0 | 3 24 | 1 0 | 1 25 | 0 4 | 4 26 | 0 1 | 1 28 | 0 1 | 1 30 | 0 1 | 1 31 | 0 1 | 1 35 | 0 2 | 2 41 | 0 1 | 1 -----------+----------------------+---------- Total | 11 11 | 22 Summar& )reate a new variable lenBft which is length divided by $-. generate len;ft = lengt& - " )hane values of an e!istin variable named lenBft. replace len;ft = lengt& - " 4ecode mpg into mpg/, havin three cateories usin generate and replace if generate mpg$ = * replace mpg$ = if (mpg #=8) replace mpg$ = " if (mpg >=:) & (mpg #="$) replace mpg$ = $ if (mpg >="4) & (mpg #*) 4ecode mpg into mpg/a, havin three cateories, $ - 5, usin generate and recode. generate mpg$a = mpg reco)e mpg$a (min-8=) (:-"$=") ("4-ma'=$) 4ecode mpg into mpgfd, havin two cateories, but usin different cutoffs for forein and domestic cars. generate mpgf) = mpg reco)e mpgf) (min-8=,) (:-ma'=) if foreign==, reco)e mpgf) (min-"4=,) ("%-ma'=) if foreign== Stata Learning Module Subsetting data This module shows how you can subset data in Stata. Bou can subset data by keepin or droppin variables, and you can subset data by keepin or droppin observations. Bou can also subset data as you use a data file if you are tryin to read a file that is too bi to fit into the memory on your computer. Ceeping and dropping 0ariables 5, Sometimes you do not want all of the variables in a data file. Bou can use the keep and drop commands to subset variables. *f we think of your data like a spreadsheet, this section will show how you can remove columns =variables> from your data. Let's illustrate this with the auto data file. sysuse auto We can use the describe command to see its variables. )escribe !ontain- +ata from !?R1rogram ,ile-R"tata10Ra+oR.a-eQaQa$to.+ta o.-? 74 1978 A$tomo.ile *ata (ar-? 12 13 Apr 2007 17?45 -i6e? 3I478 @99.7= of memor2 freeA @K+ta 'a- note-A ------------------------------------------------------------------------------- -torage +i-pla2 (al$e (aria.le name t2pe format la.el (aria.le la.el ------------------------------------------------------------------------------- make -tr18 =-18- ake an+ o+el price int =8.0gc 1rice mpg int =8.0g ileage @mpgA rep78 int =8.0g 0epair 0ecor+ 1978 'ea+room float =6.1f 5ea+room @in.A tr$nk int =8.0g Tr$nk -pace @c$. ft.A :eig't int =8.0gc 4eig't @l.-.A lengt' int =8.0g 8engt' @in.A t$rn int =8.0g T$rn !ircle @ft.A +i-placement int =8.0g *i-placement @c$. in.A gearKratio float =6.2f 7ear 0atio foreign .2te =8.0g origin !ar t2pe ------------------------------------------------------------------------------- "orte+ .2? foreign Suppose we want to just have make mpg and price, we can keep just those variables, as shown below. keep make mpg price *f we issue the describe command aain, we see that indeed those are the only variables left. )escribe !ontain- +ata from !?R1rogram ,ile-R"tata10Ra+oR.a-eQaQa$to.+ta o.-? 74 1978 A$tomo.ile *ata (ar-? 3 13 Apr 2007 17?45 -i6e? 1I924 @99.8= of memor2 freeA @K+ta 'a- note-A ------------------------------------------------------------------------------- -torage +i-pla2 (al$e (aria.le name t2pe format la.el (aria.le la.el ------------------------------------------------------------------------------- make -tr18 =-18- ake an+ o+el price int =8.0gc 1rice mpg int =8.0g ileage @mpgA ------------------------------------------------------------------------------- "orte+ .2? Hote? +ata-et 'a- c'ange+ -ince la-t -a(e+ 4emember, this has not chaned the file on disk, but only the copy we have in memory. *f we saved this file callin it auto, it would mean that we would replace the e!istin file =with all the variables> with this file which just has make, mpg and price. *n effect, we would permanently lose all of the other variables in the data file. *t is important to be careful when usin the sa0e command after you have eliminated variables, and it is recommended that you save such 56 files to a file with a new name, e.., sa0e auto,. Let's show how to use the drop command to drop variables. First, let's clear out the data in memory and use the auto data file. sysuse auto, clear perhaps we are not interested in the variables displ and gearBratio. We can et rid of them usin the drop command shown below. )rop )ispl gear;ratio 'ain, usin describe shows that the variables have been eliminated. )escribe !ontain- +ata from !?R1rogram ,ile-R"tata10Ra+oR.a-eQaQa$to.+ta o.-? 74 1978 A$tomo.ile *ata (ar-? 10 13 Apr 2007 17?45 -i6e? 3I034 @99.7= of memor2 freeA @K+ta 'a- note-A ------------------------------------------------------------------------------- -torage +i-pla2 (al$e (aria.le name t2pe format la.el (aria.le la.el ------------------------------------------------------------------------------- make -tr18 =-18- ake an+ o+el price int =8.0gc 1rice mpg int =8.0g ileage @mpgA rep78 int =8.0g 0epair 0ecor+ 1978 'ea+room float =6.1f 5ea+room @in.A tr$nk int =8.0g Tr$nk -pace @c$. ft.A :eig't int =8.0gc 4eig't @l.-.A lengt' int =8.0g 8engt' @in.A t$rn int =8.0g T$rn !ircle @ft.A foreign .2te =8.0g origin !ar t2pe ------------------------------------------------------------------------------- "orte+ .2? foreign Hote? +ata-et 'a- c'ange+ -ince la-t -a(e *f we wanted to make this chane permanent, we could save the file as auto,dta as shown below. sa+e auto" file a$to2.+ta -a(e+ Ceeping and dropping obser0ations The above showed how to use keep and drop variables to eliminate variables from your data file. The keep if and drop if commands can be used to keep and drop observations. Thinkin of your data like a spreadsheet, the keep if and drop if commands can be used to eliminate rows of your data. Let's illustrate this with the auto data. Let's use the auto file and clear out the data currently in memory. sysuse auto , clear The variable rep78 has values $ to 6, and also has some missin values, as shown below. tabulate rep78 , missing 0epair | 0ecor+ 1978 | ,re;. 1ercent !$m. ------------+----------------------------------- 1 | 2 2.70 2.70 5@ 2 | 8 10.81 13.51 3 | 30 40.54 54.05 4 | 18 24.32 78.38 5 | 11 14.86 93.24 . | 5 6.76 100.00 ------------+----------------------------------- Total | 74 100.00 We may want to eliminate the observations which have missin values usin drop if as shown below. The portion after the drop if specifies which observations that should be eliminated. )rop if missing(rep78) @5 o.-er(ation- +elete+A 1sin the tabulate command aain shows that these observations have been eliminated. tabulate rep78 , missing rep78 | ,re;. 1ercent !$m. ------------+----------------------------------- 1 | 2 2.90 2.90 2 | 8 11.59 14.49 3 | 30 43.48 57.97 4 | 18 26.09 84.06 5 | 11 15.94 100.00 ------------+----------------------------------- Total | 69 100.00 We could make this chane permanent by usin the sa0e command to save the file. Let's illustrate usin keep if to eliminate observations. First let's clear out the current file and use the auto data file. sysuse auto , clear The keep if command can be used to eliminate observations, e!cept that the part after the keep if specifies which observations should be kept. Suppose we want to keep just the cars which had a repair ratin of 5 or less. The easiest way to do this would be usin the keep if command, as shown below. keep if (rep78 #= $) @34 o.-er(ation- +elete+A The tabulate command shows that this was successful. tabulate rep78, missing
rep78 | ,re;. 1ercent !$m. ------------+----------------------------------- 1 | 2 5.00 5.00 2 | 8 20.00 25.00 3 | 30 75.00 100.00 ------------+----------------------------------- Total | 40 100.00 Fefore we o on to the ne!t section, let's clear out the data that is currently in memory. clear 57 Selecting 0ariables and obser0ations with ?use? The above sections showed how to use keep, drop, keep if, and drop if for eliminatin variables and observations. Sometimes, you may want to use a data file which is bier than you can fit into memory and you would wish to eliminate variables andDor observations as you use the file. This is illustrated below with the auto data file. Selectin variables. Bou can specify just the variables you wish to brin in on the use command. For e!ample, let's use the auto data file with just make price and mpg. use make price mpg using &ttp.--(((*stata2press*com-)ata-r,-auto The describe command shows us that this worked. )escribe !ontain- +ata from 'ttp?QQ:::.-tata-pre--.comQ+ataQr10Qa$to.+ta o.-? 74 1978 A$tomo.ile *ata (ar-? 3 13 Apr 2007 17?45 -i6e? 1I924 @99.8= of memor2 freeA @K+ta 'a- note-A ------------------------------------------------------------------------------- -torage +i-pla2 (al$e (aria.le name t2pe format la.el (aria.le la.el ------------------------------------------------------------------------------- make -tr18 =-18- ake an+ o+el price int =8.0gc 1rice mpg int =8.0g ileage @mpgA ------------------------------------------------------------------------------- "orte+ .2? Let's clear out the data before the ne!t e!ample. clear Suppose we want to just brin in the observations where rep78 is 5 or less. We can do this as shown below. use &ttp.--(((*stata2press*com-)ata-r,-auto if (rep78 #= $) We can use tabulate to double check that this worked. tabulate rep78, missing rep78 | ,re;. 1ercent !$m. ------------+----------------------------------- 1 | 2 5.00 5.00 2 | 8 20.00 25.00 3 | 30 75.00 100.00 ------------+----------------------------------- Total | 40 100.00 Let's clear out the data before the ne!t e!ample. clear Let's show another e!ample. Lets read in just the cars that had a ratin of , or hiher. use &ttp.--(((*stata2press*com-)ata-r,-auto if (rep78 >= 4) & (rep78 #*) Let's check this usin the tabulate command. 58 tabulate rep78, missing rep78 | ,re;. 1ercent !$m. ------------+----------------------------------- 4 | 18 62.07 62.07 5 | 11 37.93 100.00 ------------+----------------------------------- Total | 29 100.00 Let's clear out the data before the ne!t e!ample. clear Bou can both eliminate variables and observations with the use command. Let's read in just make mpg price and rep78 for the cars with a repair record of 5 or lower. use make mpg price rep78 if (rep78 #= $) using &ttp.--(((*stata2press*com-)ata-r,-auto Let's check this usin describe and tabulate. )escribe !ontain- +ata from 'ttp?QQ:::.-tata-pre--.comQ+ataQr10Qa$to.+ta o.-? 40 1978 A$tomo.ile *ata (ar-? 4 13 Apr 2007 17?45 -i6e? 1I120 @99.9= of memor2 freeA @K+ta 'a- note-A ------------------------------------------------------------------------------- -torage +i-pla2 (al$e (aria.le name t2pe format la.el (aria.le la.el ------------------------------------------------------------------------------- make -tr18 =-18- ake an+ o+el price int =8.0gc 1rice mpg int =8.0g ileage @mpgA rep78 int =8.0g 0epair 0ecor+ 1978 ------------------------------------------------------------------------------- "orte+ .2?
tabulate rep78 rep78 | ,re;. 1ercent !$m. ------------+----------------------------------- 1 | 2 5.00 5.00 2 | 8 20.00 25.00 3 | 30 75.00 100.00 ------------+----------------------------------- Total | 40 100.00 Let's clear out the data before the ne!t e!ample. clear "ote that the orderin of if and using is arbitrary. use make mpg price rep78 using &ttp.--(((*stata2press*com-)ata-r,-auto if (rep78 #= $) Let's check this usin describe and tabulate. )escribe 5? !ontain- +ata from 'ttp?QQ:::.-tata-pre--.comQ+ataQr10Qa$to.+ta o.-? 40 1978 A$tomo.ile *ata (ar-? 4 13 Apr 2007 17?45 -i6e? 1I120 @99.9= of memor2 freeA @K+ta 'a- note-A ------------------------------------------------------------------------------- -torage +i-pla2 (al$e (aria.le name t2pe format la.el (aria.le la.el ------------------------------------------------------------------------------- make -tr18 =-18- ake an+ o+el price int =8.0gc 1rice mpg int =8.0g ileage @mpgA rep78 int =8.0g 0epair 0ecor+ 1978 ------------------------------------------------------------------------------- "orte+ .2? tabulate rep78 rep78 | ,re;. 1ercent !$m. ------------+----------------------------------- 1 | 2 5.00 5.00 2 | 8 20.00 25.00 3 | 30 75.00 100.00 ------------+----------------------------------- Total | 40 100.00 #ave a look at this command. %o you think it will work& use make mpg if (rep78 #= $) using &ttp.--(((*stata2press*com-)ata-r,-auto rep78 not fo$n+ r@111AO Bou see, rep78 was not one of the variables read in, so it could not be used in the if portion. To use a variable in the if portion, it has to be one of the variables that is read in. Summar& 1sin keepDdrop to eliminate variables keep make price mpg drop displ gearBratio 1sin keep ifDdrop if to eliminate observations drop if missing!rep78% keep if !rep78 D# /% /liminatin variables andDor observations with use use make mpg price rep78 using auto use auto if !rep78 D# /% use make mpg price rep78 using auto if !rep78 D# /% Stata Learning Modules +ollapsing data across obser0ations ,A Sometimes you have data files that need to be collapsed to be useful to you. For e!ample, you miht have student data but you really want classroom data, or you miht have weekly data but you want monthly data, etc. We will illustrate this usin an e!ample showin how you can collapse data across kids to make family level data. #ere is a file containin information about the kids in three families. There is one record per kid. =irth is the order of birth =i.e., $ is first>, age wt and se' are the child's ae, weiht and se!. We will use this file for showin how to collapse data across observations. use &ttp.--(((*ats*ucla*e)u-stat-stata-mo)ules-ki)s, clear list fami+ ki+name .irt' age :t -e3 1. 1 #et' 1 9 60 f 2. 1 #o. 2 6 40 m 3. 1 #ar. 3 3 20 f 4. 2 An+2 1 8 80 m 5. 2 Al 2 6 50 m 6. 2 Ann 3 2 20 f 7. 3 1ete 1 6 60 m 8. 3 1am 2 4 40 f 9. 3 1'il 3 2 20 m )onsider the collapse command below. *t collapses across all of the observations to make a sinle record with the averae ae of the kids. collapse age list age 1. 5.111111 The above collapse command was not very useful, but you can combine it with the b&!famid% option, and then it creates one record for each family that contains the averae ae of the kids in the family. use &ttp.--(((*ats*ucla*e)u-stat-stata-mo)ules-ki)s, clear collapse age, by(fami)) list fami+ age 1. 1 6 2. 2 5.333333 3. 3 4 The followin collapse command does the e!act same thin as above, e!cept that the averae of age is named a0gage and we have e!plicitly told the collapse command that we want it to compute the mean. use &ttp.--(((*ats*ucla*e)u-stat-stata-mo)ules-ki)s, clear collapse (mean) a+gage=age, by(fami)) list fami+ a(gage 1. 1 6 2. 2 5.333333 3. 3 4 We can re+uest averaes for more than one variable. #ere we et the averae for age and for wt all in the same command. use &ttp.--(((*ats*ucla*e)u-stat-stata-mo)ules-ki)s, clear collapse (mean) a+gage=age a+g(t=(t, by(fami)) list fami+ a(gage a(g:t ,$ 1. 1 6 40 2. 2 5.333333 50 3. 3 4 40 This command ets the averae of age and wt like the command above, and also computes numkids which is the count of the number of kids in each family =obtained by countin the number of observations with valid values of birth>. use &ttp.--(((*ats*ucla*e)u-stat-stata-mo)ules-ki)s, clear collapse (mean) a+gage=age a+g(t=(t (count) numki)s=birt&, by(fami)) list fami+ a(gage a(g:t n$mki+- 1. 1 6 40 3 2. 2 5.333333 50 3 3. 3 4 40 3 Suppose you wanted a count of the number of boys and irls in the family. We can do that with one e!tra step. We will create a dummy variable that is $ if the kid is a boy =A if not>, and a dummy variable that is $ if the kid is a irl =and A if not>. The sum of the bo& dummy variable is the number of boys and the sum of the girl dummy variable is the number of irls. First, let's use the kids file =and clear out the e!istin data>. use &ttp.--(((*ats*ucla*e)u-stat-stata-mo)ules-ki)s, clear We use tabulate with the generate option to make the dummy variables. tabulate se', generate(se')um) -e3 | ,re;. 1ercent !$m. ------------+----------------------------------- f | 4 44.44 44.44 m | 5 55.56 100.00 ------------+----------------------------------- Total | 9 100.00 We can look at the dummy variables. Se'dum. is the dummy variable for irls. Se'dum, is the dummy variable for boys. The sum of se'dum. is the number of irls in the family. The sum of se'dum, is the number of boys in the family. list fami) se' se')um se')um" fami+ -e3 -e3+$m1 -e3+$m2 1. 1 f 1 0 2. 1 m 0 1 3. 1 f 1 0 4. 2 m 0 1 5. 2 m 0 1 6. 2 f 1 0 7. 3 m 0 1 8. 3 f 1 0 9. 3 m 0 1 The command below creates girls which is the number of irls in the family, and bo&s which is the number of boys in the family. collapse (count) numki)s=birt& (sum) girls=se')um boys=se')um", by(fami)) We can list out the data to confirm that it worked correctly. list fami) boys girls numki)s ,- fami+ .o2- girl- n$mki+- 1. 1 1 2 3 2. 2 2 1 3 3. 3 2 1 3 Summar& To create one record per family =famid> with the averae of ae within each family. collapse age, by(fami)) To create one record per family =famid> with the averae of ae =called avae> and averae weiht =called avwt> within each family. collapse (mean) a+gage=age a+g(t=(t, by(fami)) Same as above e!ample, but also counts the number of kids within each family callin that numkids. collapse (mean) a+gage=age a+g(t=(t (count) numki)s=birt&, by(fami)) )ounts the number of boys and irls in each family by usin tabulate to create dummy variables based on se! and then summin the dummy variables within each family. tabulate se', generate(se')um) collapse (sum) girls=se')um boys=se')um", by(fami)) Stata Learning Module 2orking across 0ariables using foreach . Introduction This module illustrates =$> how to create and recode variables manually and =-> how to use foreach to ease the process of creatin and recodin variables. )onsider the sample proram below, which reads in income data for twelve months. input fami) inc2inc" $"8 $4$ $4 "%,, "7,, $%,, $4 $$: $%4 "8" "4$4 "88 " 4,4" $,84 $,8 $%, $8,, $,, %$ ":4 $8: 4"4 4"74 447 $ 3,% 3"$ 3$ 3,, 3,, 3",, 383 3$" $"$ 4"$ 3,$: 3"% en)
list The output is shown below list fami) inc2inc", clean fami+ inc1 inc2 inc3 inc4 inc5 inc6 inc7 inc8 inc9 inc10 inc11 inc12 1 3281 3413 3114 2500 2700 3500 3114 3319 3514 1282 2434 2818 2 4042 3084 3108 3150 3800 3100 1531 2914 3819 4124 4274 4471 3 6015 6123 6113 6100 6100 6200 6186 6132 3123 4231 6039 6215 ,5 , +omputing 0ariables !manuall&% Say that we wanted to compute the amount of ta! =$AK> paid for each month, the simplest way to do this is to compute $- variables =ta'inc.*ta'inc.,> by multiplyin each of the =inc.*inc.,> by .$A as illustrated below. 's you see, this re+uires enterin a command computin the ta! for each month of data =for months $ to $-> via the generate command. generate ta'inc = inc 1 *, generate ta'inc" = inc" 1 *, generate ta'inc$ = inc$ 1 *, generate ta'inc4 = inc4 1 *, generate ta'inc% = inc% 1 *, generate ta'inc3 = inc3 1 *, generate ta'inc7 = inc7 1 *, generate ta'inc8 = inc8 1 *, generate ta'inc: = inc: 1 *, generate ta'inc,= inc, 1 *, generate ta'inc= inc 1 *, generate ta'inc"= inc" 1 *, The output is shown below.
+----------------------------------------------------------------------------------------------+ / +omputing 0ariables !using the foreach command% 'nother way to compute $- variables representin the amount of ta! paid =$AK> for each month is to use the foreach command. *n the e!ample below we use the foreach command to cycle throuh the variables inc. to inc., and compute the ta!able income as ta'inc. < ta'inc.,. foreac& +ar of +arlist inc2inc" > generate ta'?+ar@ = ?+ar@ 1 *, A The initial foreach statement tells Stata that we want to cycle throuh the variables inc. to inc., usin the statements that are surrounded by the curly braces. The first time we cycle throuh the statements, the value of 0ar will be inc. and the second time the value of 0ar will be inc, and so on until the final iteration where the value of 0ar will be inc.,. /ach statement within the loop =in this case, just the one enerate statement> is evaluated and e!ecuted. When we are inside the foreach loop, we can access the value of 0ar by surroundin it with the funny +uotation marks like this E0arF . The E is the +uote riht below the Q on your keyborad and the ' is the +uote below the I on your keyboard. The first time throuh the loop, E0arF is replaced with inc., so the statement generate ta'?+ar@ = ?+ar@ 1 *, becomes generate ta'inc = inc 1 *, This is repeated for inc, and then inc/ and so on until inc., So, this foreach loop is the e+uivalent of e!ecutin the $- generate statements manually, but much easier and less error prone. $ +ollapsing across 0ariables !manuall&% 2ften one needs to sum across variables =also known as collapsin across variables>. For e!ample, let's say the +uarterly income for each observation is desired. *n order to et this information, four +uarterly variables incqtr.* incqtr$ need to be computed. 'ain, this can be achieved manually or by usin the foreach command. Felow is an ,6 e!ample of how to compute , +uarterly income variables incqtr.*incqtr$ by simply addin toether the months that comprise a +uarter. generate incqtr = inc 9 inc" 9 inc$ generate incqtr" = inc4 9 inc% 9 inc3 generate incqtr$ = inc7 9 inc8 9 inc: generate incqtr4 = inc,9 inc9 inc" list incqtr 2 incqtr4 The output is shown below. +---------------------------------------+ | inc;tr1 inc;tr2 inc;tr3 inc;tr4 | |---------------------------------------| 1. | 9808 8700 9947 6534 | 2. | 10234 10050 8264 12869 | 3. | 18251 18400 15441 16485 | +---------------------------------------+ 9 +ollapsing across 0ariables !using the foreach command% This same result as above can be achieved usin the foreach command. The e!ample below illustrates how to compute the +uarterly income variables incqtr.*incqtr$ usin the foreach command. foreac& qtr of numlist -4 > local m$ = ?qtr@1$ local m" = (?qtr@1$)2 local m = (?qtr@1$)2" generate incqtr?qtr@ = inc?m@ 9 inc?m"@ 9 inc?m$@ A list incqtr 2 incqtr4 The output is shown below. +---------------------------------------+ | inc;tr1 inc;tr2 inc;tr3 inc;tr4 | |---------------------------------------| 1. | 9808 8700 9947 6534 | 2. | 10234 10050 8264 12869 | 3. | 18251 18400 15441 16485 | +---------------------------------------+ *n this e!ample, instead of cyclin across variables, the foreach command is cyclin across numbers, $, -, 5 then , which we refer to as qtr which represent the , +uarters of variables that we wish to create. The trick is the relationship between the +uarter and the month numbers that compose the +uarter and to create a kind of formula that relates the +uarters to the months. For e!ample, +uarter $ of data corresponds to months 5, - and $, so we can say that when the +uarter =+tr> is $ we want the months represented by +trO5, =+trO5><$ and =+trO5><-, yieldin 5, -, and $. This is what the statements below from the foreach loop are doin. They are relatin the +uarter to the months. local m3 B S;trGT3 local m2 B @S;trGT3A-1 local m1 B @S;trGT3A-2 So, when qtr is $, the value for m/ is $O5, the value for m, is =$O5><$ and the value for m. is =$O5><-. Then, imaine all of those values bein substituted into the followin statement from the foreach loop. generate incqtr?qtr@ = inc?m@ 9 inc?m"@ 9 inc?m$@ ,@ This then becomes generate incqtr = inc$ 9 inc" 9 inc and for the ne!t +uarter =when qtr becomes -> the statement would become generate incqtr" = inc3 9 inc% 9 inc4 *n this e!ample, with only , +uarters of data, it would probably be easier to simply write out the , generate statements manually, however if you had ,A +uarters of data, then the foreach loop can save you considerable time, effort and mistakes. : Identif&ing patterns across 0ariables !using the foreach command% The foreach command can also be used to identify patterns across variables of a dataset. Let's say, for e!ample, that one needs to know which months had income that was less than the income of the previous month. To obtain this information, dummy indicators can be created to indicate in which months this occurred. "ote that only $$ dummy indicators are needed for a $- month period because the interest is in the chane from one month to the ne!t. When a month has income that is less than the income of the previous month, the dummy indicators lowinc,*lowinc., et assined a I$I. When this is not the case, they are assined a IAI. This proram is illustrated below =note for simplicity we assume no missin data on income>. foreac& curmon of numlist "-" > local lastmon = ?curmon@ 2 generate lo(inc?curmon@ = if ( inc?curmon@ # inc?lastmon@ ) replace lo(inc?curmon@ = , if ( inc?curmon@ >= inc?lastmon@ ) A We can list out the oriinal values of inc and lowinc and verify that this worked properly list fami) inc2inc", clean noobs fami+ inc1 inc2 inc3 inc4 inc5 inc6 inc7 inc8 inc9 inc10 inc11 inc12 1 3281 3413 3114 2500 2700 3500 3114 3319 3514 1282 2434 2818 2 4042 3084 3108 3150 3800 3100 1531 2914 3819 4124 4274 4471 3 6015 6123 6113 6100 6100 6200 6186 6132 3123 4231 6039 6215 list fami) lo(inc"2lo(inc", clean noob- fami+ lo:inc2 lo:inc3 lo:inc4 lo:inc5 lo:inc6 lo:inc7 lo:inc8 lo:inc9 lo:inc10 lo:inc11 lo:inc12 1 0 1 1 0 0 1 0 0 1 0 0 2 1 0 0 0 1 1 0 0 0 0 0 3 0 1 1 0 0 1 1 1 0 0 0 This time we used the foreach loop to compare the current month, represented by curmon, and the prior month, computed as EcurmonF*. creatin lastmon. So, for the first pass throuh the foreach loop the value for curmon is - and the value for lastmon is $, so the generate and replace statements become generate lo(inc" = if ( inc" # inc ) replace lo(inc" = , if ( inc" >= inc ) The process is repeated until curmon is $-, and then the generate and replace statements become generate lo(inc" = if ( inc" # inc ) ,7 replace lo(inc" = , if ( inc" >= inc ) *f you were usin foreach to span a lare rane of values =say $D$AAA> then it is more effcient to use for0alues since it is desined to +uickly increment throuh a se+uential list, for e!ample for+alues curmon = "-" > local lastmon = ?curmon@ 2 generate lo(inc?curmon@ = if ( inc?curmon@ # inc?lastmon@ ) replace lo(inc?curmon@ = , if ( inc?curmon@ >= inc?lastmon@ ) A Stata Learning Module Introduction to graphs in Stata This module will introduce some basic raphs in Stata 8, includin historams, bo!plots, scatterplots, and scatterplot matrices. Let's use the auto data file for makin some raphs. sysuse auto*)ta The histogram command can be used to make a simple historam of mpg &istogram mpg The graph bo' command can be used to produce a bo!plot which can help you e!amine the distribution of mpg. *f mpg were normal, the line =the median> would be in the middle of the bo! =the -6th and 76th percentiles> and the ends of the whiskers =6th and ?6th percentile> would be e+uidistant from the bo!. The bo!plot for mpg shows positive skew. The median is pulled to the low end of the bo!, and the ?6th percentile is stretched out away from the bo!. grap& bo' mpg ,8 The bo!plot can be done separately for forein and domestic cars usin the b&! % option. grap& bo' mpg, by(foreign) ' two way scatter plot can be used to show the relationship between mpg and weight. 's we would e!pect, there is a neative relationship between mpg and weight. grap& t(o(ay scatter mpg (eig&t ,? "ote that you can save typin like this t(o(ay scatter mpg (eig&t We can show the reression line predictin mpg from weiht like this. t(o(ay lfit mpg (eig&t We can combine these raphs like shown below. 6A t(o(ay (scatter mpg (eig&t) (lfit mpg (eig&t) We can add labels to the points labelin them by make as shown below. "ote that mlabel is an option on the scatter command. t(o(ay (scatter mpg (eig&t, mlabel(make) ) (lfit mpg (eig&t) We can combine et separate raphs for forein and domestic cars as shown below, and we have re+uested confidence bands around the predicted values by usin lfitci in place of lfit . "ote that the b& option is at the end of the command. 6$ t(o(ay (scatter mpg (eig&t) (lfitci mpg (eig&t), by(foreign) Bou can re+uest a scatter plot matri! with the graph matri' command. #ere we e!amine the relationships amon mpg, weight and price. grap& matri' mpg (eig&t price Stata Learning Module >raphicsG ;0er0iew of 6wowa& Hlots 6- This module shows e!amples of the different kinds of raphs that can be created with the graph twowa& command. This is illustrated by showin the command and the resultin raph. For more information, see the Stata Hraphics (anual available over the web and from within Stata by typin help graph, and in particular the section on Two Way Scatterplots. =asic twowa& scatterplot sysuse sp%,, grap& t(o(ay scatter close )ate
Line Hlot grap& t(o(ay line close )ate 65
+onnected Line Hlot grap& t(o(ay connecte) close )ate 6, Immediate scatterplot grap& t(o(ay scatteri --- :3%*8 %"$: ($) 4Bo( :3%*84 --- $7$*7$ %,,% ($) 4Cig& $7$*7$4 , msymbol(i) 66 Scatterplot and Immediate Scatterplot grap& t(o(ay --- (scatter close )ate) --- (scatteri :3%*8 %"$: ($) 4Bo(, :-", :3%*84 --- $7$*7 %,,% ($) 4Cig&, -$,, $7$*74, msymbol(i) ) 6@ )rea >raph )rop if ;n > %7 grap& t(o(ay area close )ate, sort
=ar plot grap& t(o(ay bar close )ate 67
Spike plot grap& t(o(ay spike close )ate
<ropline plot 68 grap& t(o(ay )ropline close )ate
<ot plot grap& t(o(ay )ot c&ange )ate
6? -ange plot with area shading grap& t(o(ay rarea &ig& lo( )ate
Function plot grap& t(o(ay function y=norm)en('), range(24 4) 7$ Stata Learning Module >raphicsG 6wowa& Scatterplots This module shows some of the options when usin the twowa& command to produce scatterplots. This is illustrated by showin the command and the resultin raph. This includes hotlinks to the Stata Hraphics (anual available over the web and from within Stata by typin help graph. Two Way Scatterplots .... Fasic twoway scatterplot t(o(ay (scatter rea) (rite) Schemes
Scatterplot with jitter t(o(ay (scatter (rite rea), Ditter($)) Without jitter 75 t(o(ay (scatter (rite rea)) (arker Label 2ptions
1sin small black s+uare symbols. t(o(ay (scatter (rite rea), msymbol(square) msize(small) mcolor(black)) 7,
With markers red on the inside, black medium thick outline t(o(ay (scatter (rite rea), mfcolor(re)) mlcolor(black) ml(i)t&(me)t&ick) ) *dentifyin 2bservations with (arker Labels t(o(ay (scatter rea) (rite, mlabel(i))) 76
1sin lare red marker labels at $- 2'clock t(o(ay (scatter rea) (rite if i) #=,, mlabel(i)) mlabposition(") mlabsize(large) mlabcolor(re)))
(arkers at ?A deree anle at $- 2'clock with a ap of 6 t(o(ay (scatter rea) (rite if i) #=,, mlabel(ses) mlabangle(:,) mlabposition(") mlabgap(%)) *f mlabgap option is omitted t(o(ay (scatter rea) (rite if i) #=,, --- mlabel(ses) mlabangle(:,) mlabposition(")) 7@
(odifyin marker position separately for variables =$> generate pos = $ replace pos = if (i) == %) replace pos = % if (i) == 3) replace pos = : if (i) == $) t(o(ay (scatter rea) (rite if i) #= ,, mlabel(ses) mlab+(pos))
*f option mlab0 is not used t(o(ay (scatter rea) (rite if i) #= ,, mlabel(ses)) )onnect 2ptions
)onnectin with straiht line egen mrea) = mean(rea)), by((rite) t(o(ay (scatter mrea) (rite, connect(l) sort) *f the sort option is omitted t(o(ay (scatter mrea) (rite, connect(l)) 77
(edium thick black dotted connectin line t(o(ay (scatter mrea) (rite, connect(l) cl(i)t&(me)t&ick) clcolor(black) clpattern()ot) sort)
Show aps in line when there are missin values egen s)rea) = s)(rea)), by((rite) t(o(ay (scatter s)rea) (rite, connect(l) sort cmissing(n)) 2mittin cmissing option t(o(ay (scatter s)rea) (rite, connect(l) sort cmissing(n)) 78 Footnotes R$. "otice that the variable pos is used to control the position of the marker label. 's shown in the code =repeated below>, pos is assined a value of 5 representin 5 2')lock, and then when id is 6 the position of the marker label is $ 2')lock, and when id is 6 the position is 6 2')lock, and then when id is 5 the position is ? 2')lock, allowin us to avoid labels that run off the ede of the raph or overwrite each other. generate pos # / replace pos # . if !id ## 9% replace pos # 9 if !id ## :% replace pos # 4 if !id ## /% Stata Learning Module >raphicsG +ombining 6wowa& Scatterplots This module shows e!amples of combinin twoway scatterplots. This is illustrated by showin the command and the resultin raph. This includes hotlinks to the Stata Hraphics (anual available over the web and from within Stata by typin help graph. The data set used in these e!amples can be obtained usin the followin commandC use &ttp.--(((*ats*ucla*e)u-stat-stata-notes-&sb", clear This illustrates combinin raphs in the followin situations. Glots for separate roups =usin b&> )ombinin separate plots toether into a sinle plot )ombinin separate raphs toether into a sinle raph 7? Glots for separate roups
Separate raphs by ender =male and female> t(o(ay (scatter rea) (rite), by(female)
Separate raphs by ses and ender t(o(ay (scatter rea) (rite), by(female ses) 8A
Swappin position of ses and ender t(o(ay (scatter rea) (rite), by(ses female, cols(")) )ombinin scatterplots and linear fit in one raph
Scatterplot with linear fit 8$ t(o(ay (scatter rea) (rite) --- (lfit rea) (rite) , --- ytitle(Eea)ing Fcore)
Hraphs separated by S/S and female with linear fit lines and points identified by id t(o(ay (scatter rea) (rite, mlabel(i))) --- (lfit rea) (rite, range($, 7,)) , --- ytitle(Eea)ing Fcore) by(ses female) 8- Hraph for hih ses females with linear fit with and without obs 6$ t(o(ay (scatter rea) (rite, mlabel(i))) --- (lfit rea) (rite, range($, 7,)) --- (lfit rea) (rite if i) != %, range($, 7,)) if female== & ses==$, --- ytitle(Eea)ing Fcore) legen)(lab($ 4Gitte) +alues (it&out Hbs %4)) )ombinin scatterplots with multiple variables and linear fits
4eadin and math score by writin score t(o(ay (scatter rea) (rite) --- (scatter mat& (rite) 85
4eadin and math score by writin score with fit lines t(o(ay (scatter rea) (rite) --- (scatter mat& (rite) --- (lfit rea) (rite) --- (lfit mat& (rite)
Final version of raph makin line style same as dot style, and ranes the same t(o(ay (scatter rea) (rite) --- (scatter mat& (rite) --- (lfit rea) (rite, pstyle(p) range("% 8,) ) --- (lfit mat& (rite, pstyle(p") range("% 8,) ), --- legen)(label($ 4Binear Git4) label(4 4Binear Git4)) --- legen)(or)er( $ " 4)) 86 )ombinin scatterplots and linear fit for separate roups
2verlay raph of males and females in one raph separate (rite, by(female) t(o(ay (scatter (rite, rea)) (scatter (rite rea)), --- ytitle(Iriting Fcore) legen)(or)er( 46ales4 " 4Gemales4))
2verlay raph of males and females in one raph with linear fit lines t(o(ay (scatter (rite, rea)) (scatter (rite rea)) --- 8@ (lfit (rite, rea)) (lfit (rite rea)), --- ytitle(Iriting Fcore) --- legen)(or)er( 46ales4 " 4Gemales4 $ 4Bfit 6ales4 4 4Bfit Gemales4)) )ombinin separate raphs into one raph
(akin the Hraphs First, we make 5 raphs =not shown> t(o(ay (scatter rea) (rite) (lfit rea) (rite), name(scatter) regress rea) (rite r+fplot, name(r+f) l+r"plot, name(l+r) "ow we can use graph combine to combine these into one raph, shown below. grap& combine scatter r+f l+r 87
)ombinin the raphs differently We can move the place where the empty raph is located, as shown below. grap& combine scatter r+f l+r, &ole(") 88