Sie sind auf Seite 1von 35

Stata: Getting Starting

and Being Productive


Give me six hours to chop down a tree and I will spend the
with
VA Data
first
four sharpening
the axe.
--Abraham Lincoln
Todd Wagner
June 2007

Outline
Getting data into Stata
Editing in Stata
How does Stata handle data
Stata notation and help
Using Stata and Basic Stata commands

Transferring Data
Stattransfer or DBMS copy work
Stattransfer often seeks to optimize the
Stata dataset by default

If transferring data with SCRSSN, FORCE


Stattransfer to transfer SCRSSN as double
precision

Stattransfer

CLICK ON DOUBLE

Editing in Stata
Any ASCII text editor will work
Stata has a built in text editor, but it is
limited.
I recommend using another text editor

http://fmwww.bc.edu/repec/bocode/t/textEditors.html

Handling Data
SAS processes one record at a time
Stata processes all the records at the same
time

Loops are commonly used in SAS


Loops are very rarely used in Stata

Loading Data into Memory

Stata reads the data into memory


set mem 100m (before you load the data)

You must have enough memory for your


dataset
With large datasets:
drop unnecessary variables
Use the compress command (but dont compress
SCRSSN)

Stata Abbreviations

Stata commands can be abbreviated with


the first three letters
regression income education female

could be written
reg income education female

Can also abbreviate variables if uniquely


defined
reg inc educ fem

Stata Help

Statas built in help is great


Help <command>

Stata manuals are great because they


review theory

Stata and the Web


Stata is web aware
Check for updates periodically

update all

You can search for user-written programs


findit output
findit outreg (click to install)

Stata in Windows
Page up scrolls through the previous
commands
There is a graphical user interface
(menus) if you forget a command
We have Stata on rocky and tasha no
graphical capabilities, no menus, and loss
of some shortcuts

Using Stata
Create batch files called .do files
I work interactively

Run Stata and create do file as I go


I can then use the do file as needed

Debugging code and exploratory data


analysis is very fast in Stata

Sysdir, ls and cd

Stata recognizes some unix commands, such as


ls and cd
Sysdir provides a listing of Statas working
directories

sysdir
STATA: C:\Program Files\Stata9\
UPDATES: C:\ProgramFiles\Stata9\ado\updates\
BASE: C:\Program Files\Stata9\ado\base\
SITE: C:\Program Files\Stata9\ado\site\
PLUS: c:\ado\stbplus\
PERSONAL: c:\ado\personal\
OLDPLACE: c:\ado\

Delimiters
SAS recognizes ; as a delimiter
Stata recognizes the carriage return

Always add a carriage return after your last


command

You can change delimiters to ;


#delimit ;

Missing Data
Stata and SAS both use . as missing
Stata implicitly values a missing as a very
large number
SAS implicitly values a missing as a very
small number

Generating and Recoding Variables

In SAS you type


quality=0;
If VA=1 then quality=1;

In Stata you type


gen quality=0
recode quality 0=1 if VA==1 or
replace quality=1 if VA==1

Boolean Logic

Stata is picky about Boolean logic

gen
gen
gen

y=x
y=x
y=x

if
if
if

a==b (must use two ==)


a>b & b>10 (must use &)
a<=b (< or > must be before =)

Creating Dummy Variables

Goal: create dummy variable for each DRG

gen drgnum1=drg==1 or
tab drg, gen(drgnum)

This second command automatically creates


dummy variables

Drop

Drop <varnames> (drops variables)

Drop if X==1 (drop cases where


value is 1)

egen Commands

You want to generate total costs for a medical


center
In SAS this is done by proc summary
In Stata, you can type

collapse (sum) costs, by (stan3) or


sort sta3n
by sta3n: egen sumcost=total(cost)

ICD-9 Codes
Stata has capabilities to handle ICD-9
diagnosis and procedure codes
You can

check to see if codes are valid


generate identifiers based on codes or
ranges of codes

Dates

Same date functions as SAS

Combining Data

Merge

this automatically creates a variable called _merge


merge==1 obs. from master data
merge==2 obs. from only one using dataset
merge==3 obs. from at least two datasets, master or
using

merge scrssn admitday disday using data_y

Append (stacking data)

Explicit Subscripting

Identify the most recent encounter in an


encounter database

gsort id -date
by id : gen n=_n
by id : gen N=_N
gen select=n==1

Ascending sort by ID and reverse by date


Record counter from 1 to N per person
Total number of records per person

Using Stata

Stata Interface in Windows

Set, Clear and More

Set: sets system parameters


Need to set memory size to open a database
set mem 100m

Clear erases data from memory


When output is >1 page, you are asked to
continue (set more off)

Summarizing Data
. sum gender age educ
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------gender |
4085
1.496206
.5000468
1
2
age |
4085
64.5601
9.451724
50
94
educ |
4085
4.398286
1.662883
1
9

Sum < >, d provides more details on each


variable
Tabstat provides summary info, including
totals

Tabulating Data
. tab gender
gender |
Freq.
Percent
Cum.
------------+----------------------------------1 |
2,058
50.38
50.38
2 |
2,027
49.62
100.00
------------+----------------------------------Total |
4,085
100.00

. table gender
---------------------gender |
Freq.
----------+----------1 |
2,058
2 |
2,027
----------------------

Tabulating Data
tab gender age
too many values
r(134);
tab age gender
|
gender
age |
1
2 |
Total
-----------+----------------------+---------50 |
49
69 |
118
51 |
72
71 |
143

94 |
1
0 |
1
-----------+----------------------+---------Total |
2,058
2,027 |
4,085

Tabstat
. tabstat age, by (gender)

. table gender, c(mean age)

gender |
mean
---------+---------1 | 64.77454
2 | 64.34238
---------+---------Total |
64.5601
--------------------

----------------------gender | mean(age)
----------+-----------1 |
64.77454
2 |
64.34238
-----------------------

Graphing

Diagnostic graphics
wtp
500

500

500

100

100

-.072394
stage:
. 072394
1
Density

-.060237
stage:
.060237
2
Density

-.05479
stage:
.05479
3
Density

500

500

Presenting
results

125
75

75
0
-. 055777
stage:
. 055777
4
Density

0
-.062437
stage:
.062437
5
Density

Basic Analytical Functions


OLS (reg)
Logistic, probit, count data (e.g., CLAD)
Multinomials
GLM/HLM
Duration models
Semi and non-parametric models

Output
Linear regression

Number of obs
F( 21, 1284)
Prob > F
R-squared
Root MSE

wtp

Coef.

Robust
Std. Err. t

ethn1
Ethn2
ethn3
ethn4
english
lifeus
age1999
income
incmis
_cons

1.990048
-25.74654
-35.59552
-3.244168
-11.44402
37.34419
-.6272524
.8068256
14.07434
111.3607

8.742036
11.69993
11.98309
11.16836
9.699576
13.86037
.3097408
.1714309
9.404149
24.13083

0.23
-2.20
-2.97
-0.29
-1.18
2.69
-2.03
4.71
1.50
4.61

=
=
=
=
=

1306
10.88
0.0000
0.1398
90.367

P>t

[95% Conf.Interval]

0.820
0.028
0.003
0.771
0.238
0.007
0.043
0.000
0.135
0.000

-15.16019
-48.69961
-59.1041
-25.15441
-30.47277
10.15274
-1.234906
.4705102
-4.374848
64.02051

19.14029
-2.793467
-12.08694
18.66607
7.584741
64.53564
-.0195987
1.143141
32.52352
158.7009

Outreg
Outputs data to a delimited file
Delimited file can be read into Excel
Very flexible
Creates publishable tables

Das könnte Ihnen auch gefallen