33 views

Uploaded by Shehabzia

Stata

- Gwrgeographically Weighted Regression
- Foreign
- anthony1995.pdf
- Williams Et Al. - 2013 - Assumptions of Multiple Regression Correcting Two
- Exercise 15.12
- The Steps to Follow in a SAS Multiple Regression Analysis
- Regression New
- Arg02_Quantitative Tools and Excel
- 88997-164767-2-PB
- 4i2_coba (1)
- Work Engagement and Teacher Burnout (1)
- Priv_outliers Omitted_NGO Still Sign
- Eviews Tutorial
- Chap 005
- ie352l1_labmanual (1)
- Acknowledegement Abstract Etc Thesis Template
- eportfolio project
- Chapter 13
- Auto Clave Expansion
- 2015 Winter Econ497 Lec b1

You are on page 1of 65

Christopher F Baum

Faculty Micro Resource Center

Academic Technology Services, Boston College

January 2003

baum@bc.edu

http://fmwww.bc.edu/GStat/docs/StataIntro.pdf

Thanks to Petia Petrova and Nick Cox for their comments on

this draft.

1

What is Stata? And why should you care?

Stata is a full-featured statistical programming language for Win-

dows, Macintosh, UNIX, and Linux. It can be considered a stat

package, like SAS, SPSS, RATS, or eViews. Stata has tradi-

tionally been a command-line-driven package that operates in a

graphical (windowed) environment. The latest releaseStata

version 8 (January 2003)has introduced a graphical user in-

terface (GUI) for command entry. Stata may also be used in

a command-line environment on a shared system (e.g. UNIX)

if you do not have a graphical interface to that system. The

number of variables is limited to 2,047 in standard (Intercooled)

Stata, but can be much larger in Stata/SE. The number of ob-

servations is limited only by memory.

2

Portability

Stata is eminently portable, and its developers committed to

cross-platform compatibility. Stata runs the same way on Win-

dows, Macintosh, UNIX, and Linux systems. The only platform-

specic aspects of using Stata are those related to native oper-

ating system commands: e.g. is that le

C:\Stata\StataData\myfile.dta

or

/u/baum/statadata/myfile.dta

Andperhaps unique among statistical packagesStatas binary

data les may be freely copied from one platform to any other,

or even accessed over the Internet from any machine that runs

Stata.

3

Strengths of Stata: Graphics

Stata is advertised as having three major strengths: data manip-

ulation, statistics, and graphics. Stata graphics have been ex-

tensively improved and enhanced in version 8. They are excellent

tools for exploratory data analysis, and can produce highquality

2-D publication-quality graphics. At the present time, Stata does

not have 3-D graphics capabilities, but those are under develop-

ment in the new graphics system. Graphics may be customized,

and many specialized types of 2-D graphs are available.

4

Strengths of Stata: Data Manipulation

Stata is an excellent tool for data manipulation: moving data

from external sources into the program, cleaning it up, generat-

ing new variables, generating summary data sets, merging data

sets and checking for merge errors, reshaping data sets from

long to wide... In this context, Stata is an excellent pro-

gram for answering ad hoc questions about any aspect of the

data.

5

Strengths of Stata: Statistics

In terms of statistics, Stata provides all of the standard univari-

ate, bivariate and multivariate statistical tools, from descriptive

statistics and t-tests through one-, two- and N-way ANOVA, re-

gression, principal components, and the like. It has a very pow-

erful set of techniques for the analysis of limited dependent vari-

ables: logit, probit, ordered logit and probit, multinomial logit,

and the like. Statas regression capabilities are full-featured, in-

cluding regression diagnostics, prediction, robust estimation of

standard errors, instrumental variables / two-stage least squares,

seemingly unrelated regressions / three-stage least squares, and

the like.

6

Specialized Statistical Techniques

Statas breadth and depth really shines in terms of its specialized

statistical capabilities. Families of commands provide the lead-

ing techniques utilized in each of several categories: the xt

commands for cross-section/time-series or panel (longitudinal)

data; the svy commands for the handling of survey data with

complex sampling designs; the st commands for the handling

of survival-time data with duration models. Recent additions

to this arsenal include a set of commands for cluster analysis,

epidemiology and pharmacokinetics (Stata is very widely used in

health sciences research).

7

Cost, Availability and Support

And last (but certainly not least) Stata is very inexpensive! Their

GradPlan program makes the full version of Stata version 8 soft-

ware available to BC faculty and students for $89.00 (oneyear

license for students) or $129.00 (perpetual license for faculty)

with various options for purchasing documentation. A quite

thorough set of documentation is available for $129.00 (a 4-

volume reference manual and users guide). The Small Stata

version is available to students for $39.00 for a oneyear license;

it will handle a limited number of observations and variables, but

contains all the commands. GradPlan orders are made direct to

Stata, with delivery from oncampus inventory.

8

Stata is very well supported by telephone and email technical sup-

port, as well as the more informal support provided by other users

on StataList, the listserv. The manuals are usefulparticularly

the Users Guidebut full details of the command syntax are

available online, and in hypertext form in the GUI environment.

There are tutorials available within the program, and several

small canned datasets, to introduce you to a number of com-

mon tasks; use the command tutorial.

9

But why should I type commands?

But before we discuss the specics to back up these claims, lets

consider a meta-issue: why would you want to learn how to use

a command-line-driven package? Isnt that ever so 90s?

Stata may be used in an interactive mode, but even there you are

typing command lines, generally not pulling down menus. You

do use menus extensively to interact with the computers le

system, and with elements of the computer in generalto manage

multiple windows, to change screen defaults, print results and

graphs, and the like.

Let us consider a couple of reasons why a command-line-driven

package makes for an eective and ecient research strategy.

10

Reproducibility

First, the important issue of reproducibility. If you are conducting

scientic research, you must be able to reproduce your results.

Ideally, anyone with your programs and data should be able to

do so without your assistance. If you cannot produce such re-

producible research ndings, it can be argued that you are not

following the scientic method.

A thorough discussion of this issue is covered in our webpage,

http://fmwww.bc.edu/GStat/docs/pointclick.html.

11

In a computer program where all actions are point and click,

such as a spreadsheet, who can say how you arrived at a certain

set of results? Unless every step of your transformations of the

data can be retraced, how can you nd exactly how the sample

you are employing diers from the raw data? A command-driven

program is capable of this level of reproducibility, and one could

argue that we owe it to our students to instill this level of rigor

in their research practices.

Reproducibility also makes it very easy to perform an alternate

analysis of a particular model. What would happen if we added

this interaction, or introduced this additional variable, or decided

to handle zero values as missing? Even if many steps have been

taken since the basic model was specied, it is easy to go back

and produce a variation on the analysis if all the work is repre-

sented by a series of programs.

12

Stata makes this reproducibility very easy through a log facil-

ity, the ability to generate a command log (containing only the

commands you have entered), and a do-le editor which al-

lows you to easily enter, execute and save do-les: sequences

of commands, or program fragments. There is also an elabo-

rate hypertext-based help browser, providing complete access to

commands descriptions and examples of syntax. Each of these

components appears in a separate window on the screen.

13

Extensibility

Another clear advantage of the command-line driven environ-

ment is its interaction with the continual expansion of Statas

capabilities. A command, to Stata, is a verb instructing the

program to perform some action.

Commands may be built in commandsthose elements so fre-

quently used that they have been coded into the Stata kernel.

A relatively small fraction of the total number of ocial Stata

commands are built in, but they are used very heavily.

14

The vast majority of Stata commands are written in Statas own

programming languagethe ado-le language. If a command

is not built in to the kernel, Stata searches for it along the

adopath. Like the PATH in Unix, Linux or DOS, the adopath

indicates the several directories in which an ado-le might be

located. This implies that the ocial Stata commands are

not limited to those coded into the kernel. If Statas developers

tomorrow wrote a command named agglomerate, they would

make two les available on their web site: agglomerate.ado (the

ado-le code) and agglomerate.hlp (the associated help le).

Both are straight ASCII text.

15

Update facility

One of the great strengths of Stata versions 6, 7 and 8 is that

they may be updated over the Internet. Stata is actually a

web browser, so it may contact Statas web server and enquire

whether there are more recent versions of either Statas exe-

cutable (the kernel) or the ado-les. The kernel is updated rela-

tively infrequentlyonce a month at mostbut the ado-les may

be modied every ten days or so. This enables Statas develop-

ers to distribute bug xes, enhancements to existing commands,

and even entirely new commands during the lifetime of a given

release. Updates during the life of the version you own are free;

you need only have a licensed copy of Stata and access to the

Internet (which may be by proxy server) to check for and, if

desired, download the updates.

16

User extensibility: the Stata Journal

The importance of this program design goes far beyond the limits

of ocial Stata. Since the adopath includes both Stata direc-

tories and other directories on your hard disk (or on a servers

lesystem), you may acquire new Stata commands from a num-

ber of web sites. The Stata Journal (SJ), a quarterly refereed

journal, is the primary method for distributing user contributions.

Between 1991 and 2001, the Stata Technical Bulletin played this

role, and a complete set of issues of the STB Reprints are avail-

able in ONeill Library.

17

The SJ is a subscription publication (and available at ONeill

Library: see Quest), but the ado- and hlp-les may be freely

downloaded from Statas web site. The Stata command help

accesses help on all installed commands; the Stata command

search will locate commands that have been documented in

the STB, and with one click you may install them in your version

of Stata. Help for these commands will then be available in your

own Stata.

18

User extensibility: the SSC archive

But this is only the beginning. Stata users worldwide partici-

pate in the StataList listserv, and when a user has written and

documented a new general-purpose command to extend Stata

functionality, they announce it on the StataList listserv (to which

you may freely subscribe: see Statas web site). Since September

1997, all items posted to StataList (over 600) have been placed

in the Boston College Statistical Software Components (SSC)

Archive in RePEc, available from IDEAS (http://ideas.repec.org)

and EconPapers (http://econpapers.repec.org).

19

Any component in the SSC archive may be readily inspected with

a web browser, using IDEAS or EconPapers search functions,

and if desired you may install it with one command from the

archive from within Stata. For instance, if you know there is

a module in the archive named omninorm, you could use ssc

install omninorm to install it. Anything in the archive can be

accessed via Stata 7s ssc command: thus ssc describe omninorm

will locate this module, and make it possible to install it with one

click.

20

The importance of all this is that Stata is innitely extensible.

Any ado-le on your adopath is a full-edged Stata command.

Statas capabilities thus extend far beyond the ocial, supported

features described in the Stata manual to a vast array of addi-

tional tools. Since the current directory is on the adopath, if I

create an ado le hello.ado:

program define hello

display "hello from Stata!"

end

exit

Stata will now respond to the command hello. Its that easy.

21

Command syntax

Let us consider the form of Stata commands. One of Statas

great strengths, compared with many statistical packages, is that

its command syntax follows strict rulesin grammatical terms,

there are no irregular verbs. This implies that when you have

learned the way a few key commands work, you will be able to

use many more without extensive study of the manual or even

on-line help. The search command will allow you to nd the

command you need by entering one or more keywords, even if

you do not know the commands name.

22

The fundamental syntax of all Stata commands follows a tem-

plate. Not all elements of the template are used by all com-

mands, and some elements are only valid for certain commands.

But where an element appears, it will appear in the same place,

following the same grammar.

Like Unix or Linux, Stata is case sensitive. Commands must be

given in lower case. For best results, keep all variable names in

lower case to avoid confusion.

Following the examples in the Getting Started with Stata... man-

ual, we will make use of auto.dta, a dataset of 74 automobiles

characteristics.

23

The general syntax is:

[by varlist:] cmdname [varlist] [=exp] [if exp] [in range]

[weight] [using spec] [,options]

where elements in square brackets are optional for some com-

mands. In some cases, only the cmdname itself is required:

desc without arguments gives a description of the current con-

tents of memory (including the identier and timestamp of the

current dataset), while summ without arguments provides sum-

mary statistics for all (numeric) variables. Both may be given

with a varlist specifying the variables to be considered.

What are the other elements?

24

The varlist

varlist is a list of one or more variables on which the command

is to operate: the subject(s) of the verb. Stata works on the

concept of a single set of variables currently dened and con-

tained in memory, each of which has a name. As desc will show

you, each variable has a data type (various sorts of integers and

reals, and string variables of a specied maximum length). The

varlist species which of the dened variables are to be used in

the command.

The order of variables in the dataset matters, since you can

use hyphenated lists to include all variables between rst and

last. (The order and move commands can alter the order of

variables.) You can also use wildcards to refer to all variables

with a certain prex. If you have variables pop60, pop70, pop80,

pop90, you can refer to them in a varlist as pop* or pop?0.

25

exp clause

The exp clause is used in commands such as generate and replace

where an algebraic expression is used to produce a new (or up-

dated) variable. In algebraic expressions, the operators ==, &, |

and ! are used as equal, AND, OR and NOT, respectively. The

overloaded to denote concatenation of character strings.

26

if and in clauses

Stata diers from several common programs in that Stata com-

mands will automatically apply to all observations currently de-

ned. You need not write explicit loops over the observations.

You can, but it is usually bad programming practice to do so.

Of course you may want not to refer to all observations, but to

pick out those that satisfy some criterion. This is the purpose

of the if exp and in range clauses. For instance, we might:

sort price

list make price in 1/5

to determine the ve cheapest cars in auto.dta. The 1/5 is a

numlist: in this case, a list of observation numbers. is the last

observation, thus list make price in -5/ will list the ve most

expensive cars in auto.dta.

27

Even more commonly, you may employ the if exp clause. This

restricts the set of observations to those for which the exp

Boolean expressionevaluates to true. So:

list make price if foreign==1

lists only foreign cars, and

list make price if price > 10000

lists only expensive cars (in 1978 prices!) Note the double equal

in the exp. A single equal sign, as in the C language, is used for

assignment; double equal in comparison.

28

the using clause

Some commands access lesreading data from external les,

or writing to les. These commands contain a using clause, in

which the lename appears. If a le is being written, you must

specify the replace option to overwrite an existing le of that

name.

29

Stata binary les

Statas own binary le format, the .dta le, is cross-platform

compatible, even between machines with dierent byte orderings

(low-endian and high-endian). A .dta le may be moved from

one computer to another using ftp (in binary transfer mode).

30

To bring the contents of an existing Stata le into memory, the

command:

use le [,clear]

is employed (clear will empty the current contents of memory).

You must make sucient memory available to Stata to load the

entire le, since Statas speed is largely derived from holding the

entire data set in memory. Consult Getting Started... for details

on adjusting the memory allocation on your computer, since it

diers by operating system.

31

Reading and writing binary (.dta) les is much faster than dealing

with text (ASCII) les, and permits variable labels, value labels,

and other characteristics of the le to be saved along with the

le. To write a Stata binary le, the command

save le [,replace]

is employed. The compress command can be used to economize

on the disk space (and memory) required to store variables.

32

Transportability

Stata binary les may be easily transformed into SPSS or SAS

les with the third-party application Stat/Transfer. Stat/Transfer

is available for Windows and Mac OS X systems as well as on var-

ious Unix systems on campus. Personal copies of Stat/Transfer

version 7 (which handles Stata version 8 datales) are avail-

able at a discounted academic rate of $52.00 through the Stata

GradPlan.

Stat/Transfer can also transfer SAS, SPSS and many other le

formats into Stata format, without loss of variable labels, value

labels, and the like. It is a very useful tool.

33

Accessing data over the Web

The amazing thing about use lename is that it is by no means

limited to the les on your hard disk. Since Stata is a web

browser,

use http://fmwww.bc.edu/ec-p/data/Wooldridge/crime1.dta

will read this dataset over the web.

34

The type command can display any text le, whether on your

hard disk or over the Web; thus

type http://fmwww.bc.edu/ec-p/data/Wooldridge/crime1.des

will display the codebook for this le, and

copy http://fmwww.bc.edu/ec-p/data/Wooldridge/crime1.des

crime.codebook

will make a copy of the codebook on your own hard disk.

35

When you have used a dataset over the Web, you have loaded

it into memory in your desktop Stata. You cannot save it to the

Web, but can save the data to your own hard disk. The advan-

tages of this feature for instructional and collaborative research

should be clear. Students may be given a URL from which their

assigned data are to be accessed; it matters not whether they

are using Stata for Windows, Macintosh, Linux, or UNIX.

36

The options clause

Many commands make use of options (such as clear on use, or

replace on save). All options are given following a single comma,

and may be given in any order. Options, like commands, may

generally be abbreviated.

37

The by prex

The last standard element of the command syntax (other than

weights, which we will not discuss here) is a very powerful tool:

the by prex. When a command is prexed with a bylist, it is

performed repeatedly for each element of the variable or variables

in that list, each of which must be categorical. For instance,

by foreign: summ price

will provide descriptive statistics for both foreign and domestic

cars. If the data are not already sorted by the bylist variables,

the prex bysort should be used.

38

The option ,total will add the overall summary. What about a

classication with several levels, or a combination of values?

bysort rep78: summ price

bysort rep78 foreign: summ price

This is a very handy tool, which often replaces explicit loops that

must be used in other programs to achieve the same end.

39

The by prex should not be confused with the by option available

on some commands, which allows for specication of a grouping

variable: for instance

ttest price, by(foreign)

will run a t-test for the dierence of sample means across do-

mestic and foreign cars.

40

Another useful aspect of by is the way in which it modies the

meanings of the observation number symbol. Usually n refers

to the current observation number, which varies from 1 to N,

the maximum dened observation. Under a bylist, n refers to

the observation within the bylist, and N to the total number of

observations for that category. This is often useful in creating

new variables.

For instance, if you have individual data with a family identier,

these commands might be useful:

sort famid age

by famid: gen siblings = _N

by famid: gen birthorder = _N - _n +1

41

Missing values

Missing value codes in Stata appear as the dot (.) in printed

output (and a string missing value code as well: , the null

string). It takes on the largest possible positive value, so in the

presence of missing data you do not want to say

generate hiprice = (price > 10000), but rather

generate hiprice = (price > 10000 & price <.)

which then generates a dummy variable for high-priced cars

(for which price data are complete, with prices less than miss-

ing).

Stata version 8 allows for multiple missing value codes.

42

Display formats

Each variable may have its own default display format. This

does not alter the contents of the variable, but aects how it is

displayed. For instance, %9.2f would display a two-decimal-place

real number. The command

format varname %9.2f

will save that format as the default format of the variable.

43

Variable labels

Each variable may have its own variable label. The variable label

is a character string (maximum 80 characters) which describes

the variable, associated with the variable via

label variable varname "text"

Variable labels, where dened, will be used to identify the variable

in printed output, space permitting.

44

Value labels

Value labels associate numeric values with character strings.

They exist separately from variables, so that the same map-

ping of numerics to their denitions can be dened once and

applied to a set of variables (e.g. 1=very satised...5=not satis-

ed may be applied to all responses to questions about consumer

satisfaction). Value labels are saved in the dataset.

label define sexlbl 0 male 1 female

label values sex sexlbl

If value labels are dened, they will be displayed in printed output

instead of the numeric values.

45

Generating new variables

The command generate is used to produce new variables in the

dataset, whereas replace must be used to revise an existing vari-

able (and replace must be spelled out). The syntax just demon-

strated is often useful if you are trying to generate indicator

variables, or dummies, since it combines a generate and replace

in a single command.

A full set of functions are available for use in the generate com-

mand, including the standard mathematical functions, recode

functions, string functions, date and time functions, and special-

ized functions (help functions for details). Note that the sum()

function is a running sum.

46

The D., L., and F. operators may be used under a timeseries

calendar (including in the context of panel data) to specify rst

dierences, lags, and leads, respectively. These operators un-

derstand missing data, and numlists: e.g. L(1/4).x is the rst

through fourth lags of x.

Stata also contains a full-featured matrix language, with one

limitation: matrices may not be larger than 800 rows or columns

in standard Stata. This does not prevent the computation of

matrix expressions on much larger datasets, since commands

like matrix accum can form expressions over a larger dimension.

The matrix language allows any estimation results to be stored

in matrices and manipulated, and supports the programming of

many estimators. Matrix functions include the singular value

decomposition and eigensystem of a symmetric matrix.

47

Additionally, Stata is not limited to using the set of dened func-

tions. The egen (extended generate) command makes use of

functions written in the Stata ado-le language, so that gzap.ado

would dene the extended generate function zap(). This would

then be invoked as

egen newvar = zap(oldvar)

which would do whatever zap does on the contents of oldvar,

creating the new variable newvar.

A number of egen functions provide row-wise operations: row

sum, row average, row standard deviation, etc.

48

Estimation commands

All estimation commands share the same syntax. Multiple equa-

tion estimation commands use an eqlist (equation list) rather

than a varlist, where equations are dened with the eq com-

mand.

Estimation commands display condence intervals for the coef-

cients, and tests of the most common hypotheses. Predicted

values and residuals may be obtained after any estimation com-

mand with the predict command. For nonlinear estimators, pre-

dict will produce other statistics as well (e.g. the log of the odds

ratio from logistic regression).

Robust (Huber/White) estimates of the covariance matrix are

available for almost all estimation commands by employing the

robust option.

49

An overview of useful commands

Let us now describe several categories of commands, and the

facilities Stata provides for data manipulation, statistics, and

graphics. Some data manipulation commands you should learn:

help online help on a specific command

findit online references on a keyword or topic

ssc access routines from the SSC Archive

log logging to an external file

generate create a new variable

replace modify an existing variable

sort change the sort order of the dataset

compress economize on space used by variables

50

Data manipulation commands:

append combine datasets by stacking

merge merge datasets (one-to-one or match merge)

encode generate numeric variable from categorical variable

recode recode categorical variable

destring convert string variables to numeric

tab abbreviation for tabulate: 1- and 2-way tables

table tables of summary statistics

for repeat a Stata command over a varlist, numlist or anylist

foreach loop over elements of a list, performing a block of code

forvalues loop over a numlist, performing a block of code

while loop with a counter over a block of code

local define or modify a local macro (scalar variable)

51

Data manipulation commands:

use load a Stata data set

save write the contents of memory to a Stata data set

insheet load a text file in tab- or comma-delimited format

infile load a text file in space-delimited format

or as defined in a dictionary

outfile write a text file in space- or comma-delimited format

outsheet write a text file in tab- or comma-delimited format

clear clear memory

drop drop certain variables and/or observations

keep keep only certain variables and/or observations

rename rename variable

renvars rename a set of variables

contract make a dataset of frequencies

collapse make a dataset of summary statistics

52

Data manipulation commands:

pwd print the working directory

cd change the working directory

iis define the observation indicator for panel data

tis define the timeseries indicator for panel data

tsset define the time indicator for timeseries data

quietly do not show the results of a command

update query see if Stata is up to date

exit exit the program (,clear if dataset is not saved)

53

Useful statistical commands:

summarize descriptive statistics

correlate correlation matrices

ttest perform 1-, 2-sample and paired t-tests

anova 1-, 2-, n-way analysis of variance

regress least squares regression

predict generate fitted values, residuals, etc.

test test linear hypotheses on parameters

lincom linear combinations of parameters

cnsreg regression with linear constraints

testnl test nonlinear hypothesis on parameters

ivreg instrumental variables regression

prais regression with AR(1) errors

54

Useful statistical commands:

sureg seemingly unrelated regressions

reg3 three-stage least squares

probit binomial probit estimation

logit binomial logit estimation

tobit Tobit regression

cnreg censored normal regression

mfx marginal effects after nonlinear estimation

glm generalized linear models

heckman Heckmans selection model

oprobit ordered probit estimation

ologit ordered logit estimation

qreg quantile regression, including median regression

55

Useful statistical commands:

xt commands for panel data estimation, such as

xtreg,fe fixed effects estimator

xtreg,re random effects estimator

xtgls panel-data models using generalized least squares

xtlogit panel-data logit models

xtprobit panel-data probit models

xtpois panel-data Poisson regression

xtgee panel-data models using generalized estimating equations

svy commands for the handling of complex survey data, in-

cluding stratication and PSUs

56

Useful statistical commands:

Timeseries data commands, such as

arima Box-Jenkins models

arch models of autoregressive conditional heteroskedasticity

dfuller, pperron unit root tests

corrgram correlogram estimation

Nonlinear estimation commands, such as

bstrap bootstrap sampling

nl nonlinear least squares

ml maximum likelihood estimation with user-specified LLF

57

Graphics:

graph produces a variety of graphs, depending on variables listed

graph rep78 will produce a histogram of this categorical variable

graph price mpg will produce an Y vs X scatterplot

The points may optionally be labelled:

graph price mpg if foreign==1,symbol([make])

58

Any number of two-way scatterplots can be generated with one

command using the by modier:

by rep78: graph price mpg if rep78!=.,s([make])

and a visual inspection of the linkages among many variables

may be performed with the matrix option:

graph price mpg weight turn length, matrix

59

Graphs may also be readily combined into a single graphic for

presentation. For instance,

graph rep78, saving(g1,replace)

graph price mpg if foreign==0, saving(g2,replace)

graph price mpg if foreign==1, saving(g3,replace)

graph weight length, saving(g4,replace)

graph using g1 g2 g3 g4, ti("Some exploratory aspects of auto.dta")

60

Now let us walk though some analysis of nontrivial Stata datasets.

A list of over 100 datasets suitable for instructional use is avail-

able on the economics web pages as

http://fmwww.bc.edu/ec-p/data/ecfindata.html#teach

Lets consider the data Zvi Griliches used in his 1976 article on

the wages of young men (Journal of Political Economy, 84, S69-

S85). These are cross-sectional data on 758 individuals collected

over several survey years.

61

* StataIntro: cross-section example

use http://fmwww.bc.edu/ec-p/data/hayashi/griliches76.dta

describe

summarize

label define ur 0 rural 1 urban

label values smsa ur

tab smsa

tab mrt smsa, chi2

ttest med,by(smsa)

anova lw mrt smsa

anova lw mrt smsa mrt*smsa

anova,regress

regress lw tenure kww smsa

predict lweps,resid

graph lweps kww

62

sort smsa

graph lweps kww,by(smsa) total

bysort year: regress lw tenure kww smsa

graph iq kww age s expr lw,matrix

gen medrural = med*(smsa==0)

gen medurban = med*(smsa==1)

regress lw tenure kww medurban medrural

test medurban=medrural

The following example reads some daily Dow-Jones Averages

data, graphs daily returns, then performs Dickey-Fuller tests for

unit roots on the DJIA, its log, and its returns (log price rela-

tives). AR(3) models are then estimated on the returns series,

and tests are carried out on the tted model for AR(1) errors

and ARCH eects. A portmanteau test is then performed on the

residual series.

63

* StataIntro: time-series example

use http://fmwww.bc.edu/ec-p/data/micro/ddjia.dta

desc

summ

tsset

graph ret day,c(l) s(.)

for var djia ldjia ret: dfuller X,lags(22)

regress ret L(1/3).ret

regress ret L(1/3).ret, robust

dwstat

archlm, lags(20)

predict eps,resid

wntestq eps

64

- Gwrgeographically Weighted RegressionUploaded byJames Cormier-Chisholm
- ForeignUploaded byAli QasimPouri
- anthony1995.pdfUploaded byMike Galárraga
- Williams Et Al. - 2013 - Assumptions of Multiple Regression Correcting TwoUploaded byVíctor Jurado
- Exercise 15.12Uploaded byLeonard Gonzalo Saavedra Astopilco
- The Steps to Follow in a SAS Multiple Regression AnalysisUploaded byarijitroy
- Regression NewUploaded byRick Chua
- Arg02_Quantitative Tools and ExcelUploaded byAndrea Franzini
- 88997-164767-2-PBUploaded byAnonymous OFwyjaMy
- 4i2_coba (1)Uploaded byEspee
- Work Engagement and Teacher Burnout (1)Uploaded byJocelyn Amorio
- Priv_outliers Omitted_NGO Still SignUploaded byfedro120
- Eviews TutorialUploaded bylifeblue2
- Chap 005Uploaded byErica Jurkowski
- ie352l1_labmanual (1)Uploaded byanthony
- Acknowledegement Abstract Etc Thesis TemplateUploaded byJames Aurell Mortel
- eportfolio projectUploaded byapi-270024447
- Chapter 13Uploaded byLaura Matevosyan
- Auto Clave ExpansionUploaded byDiana Jenkins
- 2015 Winter Econ497 Lec b1Uploaded byNelson Castillo
- IRJET-Error Reduction in Data Prediction using Least Square Regression MethodUploaded byIRJET Journal
- QPAsyl04Uploaded byAbdelkader Abdelali
- Multivariate AnalysisUploaded bypopat vishal
- 7252455 Marketing Research 4Uploaded byVRSHABANU
- Sungwon Lee(2017)_Nonparametric_QR_Specification_Duration.pdfUploaded bypedrohcgs
- ReadmeUploaded byrkumaravelan4137
- Burns05 Im 1s9Uploaded byravirajmistry
- Econometrics I 2Uploaded bykhongaybyt
- ch14Uploaded byAbraham Zeus
- A Study on Unilever and DaldaUploaded bymohammadahmedkhan

- Student ResourcesUploaded byShehabzia
- Foreign Remittances -Inward and OutwardUploaded byShehabzia
- Malpractices in International Banking Bangladesh Perspective Dec 2012Uploaded byShehabzia
- MEcon Lecture 5- The HO ModelUploaded byShehabzia
- MEcon Lecture 3- The Ricardian ModelUploaded byShehabzia
- Finding and Formulating Your TopicUploaded byShehabzia
- FER_ActUploaded byShehabzia
- MEcon Lecture 4- Two by Two Model and Factor Intensity ReversalUploaded byShehabzia
- Walliman-12Uploaded byShehabzia
- reviewUploaded byvirde
- How to Succeed in Group WorkUploaded byra2g
- Discussion Questions for Chapter 5 International Marketing Cateora GrahamUploaded byShehabzia
- nonprice competition banglalinkUploaded byShehabzia
- What Tutors Look for When Marking EssaysUploaded byhnudyejma
- What Tutors Look for When Marking EssaysUploaded byhnudyejma
- Developing Memory TechniquesUploaded byShehabzia
- Bcs Seat PlanUploaded byShehabzia
- Pacific HealthcareUploaded byShehabzia
- Bank Research Proposal-draftUploaded byShehabzia
- afs.pdfUploaded byms.Ahmed

- Parametric TestsUploaded bymalyn1218
- JackknifeUploaded byw108bmg
- PCA.docxUploaded byZeeshan Khan
- CE334 Trip GenerationUploaded byAbhay Vikram
- Dsur i Chapter 02 Everything You Ever Wanted to Know About StatisticsUploaded byDanny
- anova1And2.pdfUploaded byAli Hyder Zaidi
- Rtimeseries-ohpUploaded byMax Greco
- regressionUploaded byLilian Ponce
- Chapter 14Uploaded bytami
- Briggs_Meta-Analysis.pdfUploaded byfernandonef
- ANCOVAUploaded byhisham00
- Andrew Rosenberg- Lecture 6: Logistic Regression CSC 84020 - Machine LearningUploaded byTuytm2
- AssignmentUploaded byPrasanth Talluri
- Ch 10 skewness kurtosisUploaded byiamaking
- 5-estimation.pdfUploaded byKevin Hyun
- Multiple Linear Regression Applications in Real Estate PricingUploaded bytheijes
- Assignment 02Uploaded bytauqeerahmed
- c7Uploaded byrgerwwaa
- Using R for Time Series Analysis — Time Series 0.2 DocumentationUploaded byMax Greco
- Statictics and Measures of Central TendencyUploaded byId Mohammad
- Problems on Averages and DispersionUploaded byAditya Vighne
- Problems in Standard deviation and EstimationUploaded bysjaswanth
- Time Series Analysis - Smoothing Methods.pdfUploaded byMayur Hampiholi
- i1936-900X-13-2-163Uploaded byKomang Ayu Eka Wijayanti
- piyush sachdeva - Homework 5Uploaded byLucas Triana
- thorndike1995.pdfUploaded byWidad Fethallah
- AUC and ConcordanceUploaded byDeepanshu Bhalla
- Interpreting Multiple RegressionUploaded byRalph Wajah Zwena
- os2Uploaded byapi-27836396
- 02 Chapter 2Uploaded byomar