Sie sind auf Seite 1von 19

American Economic Association

Does Competition among Public Schools Benefit Students and Taxpayers? Reply
Author(s): Caroline M. Hoxby
Source: The American Economic Review, Vol. 97, No. 5 (Dec., 2007), pp. 2038-2055
Published by: American Economic Association
Stable URL: http://www.jstor.org/stable/30034600
Accessed: 07/02/2010 04:25

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=aea.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

American Economic Association is collaborating with JSTOR to digitize, preserve and extend access to The
American Economic Review.

http://www.jstor.org
Does CompetitionAmongPublicSchools BenefitStudentsand
Taxpayers?Reply
By CAROLINEM. HoxBY*

Hoxby (2000) tests several implications of This challenge is addressed with instrumental
the Tiebout (1956) model by comparing pub- variablesbased on a metropolitanarea'snumber
lic schools across metropolitan areas where of streams. Essentially, the logic is that many
households can more or less easily choose streams generate many natural boundaries,
among school districts ("Tieboutchoice"). The which in history generatedmany school atten-
article attempts to answer such questions as: dance areas,which left some areas with a linger-
when households have more Tiebout choice, do ing, large numberof school districts.
schools have higher productivity,do fewer stu- All of the raw data and code used to make
dents attend private school, and do households extractsfrom the raw data and computethe esti-
sort themselves more among school districts? mates are availableto researchers.Researchers
Readersoften focus on the achievementresults, may requestthe data by contacting their license
which suggest that, in a metropolitanarea with representativeat the US Department of Edu-
substantially more Tiebout choice, student cation.2 The replication dataset was designed
achievement is higher, all else equal. Some of and documentedfor general use by instructors
the most cited results, based on the National (several of whom had requested it), research-
EducationLongitudinalStudy (NELS), suggest ers working on related questions, and replica-
that students' reading and mathematicsperfor- tors. The dataset was created prior to Jesse
mance would be about 0.3 to 0.5 of a standard Rothstein'sdisseminationof his 2003 Comment,
deviationhigherif they were to attendschool in a and his claim (p. 4) that I created the datasetin
metropolitanarea with the maximumamountof response to his comment does not make sense.
choice observed in the United States (morethan Indeed, the datasetcould not have been created
100 districts)as opposed to a metropolitanarea in response to his comment because he used it
with the minimum amount of choice observed to write his comment (thoughhe neglects some
(no choice; only one district in the metropoli- relevant parts of its documentation-see, for
tan area).' A more realistic change of one stan- instance, the discussion of school codes below
darddeviationin Tieboutchoice would generate which shows that his apparent neglect of the
an improvement in reading and mathematics documentationcaused him to generate impor-
achievementof about0.1 standarddeviations. tant errors).3
A significantshare of the paperis devoted to
the empiricalchallengeof identifyinga sourceof
exogenous variationin Tieboutchoice. Because 2 Hoxby (2000) uses data from the NELS matched to

householdsare likely to shift from districtswith data from the US census, school districts' administrative
unsuccessful schools to districtswith successful files, and the United States Geological Survey (USGS).
Researchers require a license to obtain NELS data that
schools, the Tieboutchoice that we observe in a includes the geographic codes on which matching is based.
metropolitanarea is endogenousto schools' per- Many researchers who study education already hold such
formance. In particular,in a metropolitanarea licenses but others may apply to the National Center for
with a dysfunctional central city district, sub- EducationStatistics (NCES). I am greatly indebtedto Bruce
Daniel at PinkertonComputerConsultantsIncorporated(a
urban districts may refuse to consolidate with
contractorfor NCES) for examining the code and data in
the central district and the area may end up- the replicationdataset.I am also greatly indebtedto Jeffrey
endogenously--with a large numberof districts. A. Owings and Cynthia Barton at NCES for working out a
procedurefor distributingthe replicationdataset.
3 Although the original replication dataset contained
* Departmentof Economics, StanfordUniversity, Stan- only data and code associated with the original Hoxby
ford, CA 94305 (e-mail: choxby@stanford.edu). (2000) paper, I have subsequently added to it all of the
1 All references to the NELS are to United States raw data and code used to compute the specification tests
Departmentof Education(1996). described in this reply.
2038
VOL.97 NO. 5 AMONGPUBLICSCHOOLS:REPLY
HOXBY:COMPETITION 2039

Before getting to the substantiveelements of in the sample. The important points to take
Rothstein'scomment, it is importantto point out away are: (a) I did not decline to give Rothstein
that Rothstein's suggestion (2007, 2026) that I anythingthat existed: he asked for a datasetthat
declined to providehim with the original data is had never existed independentlyof the raw data
misleading.Rather,the originaldatado not exist and the code; and (b) the correctionsto the raw
for the very simplereasonthatone of the original data hardly affect the results, as demonstrated
data sources has been corrected(by its creators, below.
not by me), and the correctionswere automati- Rothstein'sComment (2007) contains a vari-
cally incorporated in the replication dataset. ety of criticisms. In rough orderof most to least
Since the correctionsmake only a trivial differ- important,they are as follows: (a) he argues that
ence to the results (as shownbelow), Rothstein's the resultschange when studentswho attendpri-
giving prominenceto whatis essentially a negli- vate schools are included in the sample; (b) he
gible issue is likely to mislead readers. argues that the streamsvariableshould be mea-
I was trained to (and still believe in) writing sured in different ways and that the results are
code that takes a researcherall the way from sensitive to the changes; (c) he argues that there
the raw data to estimation. Such code may cre- are errorsin the data and that they substantially
ate intermediate datasets along the way, but affect the results; (d) he argues for a different
they are replaced every time the code is run. mannerof estimating the first-stageequation.
Using this procedurehas the importantadvan- It is easy to summarize my response to the
tage that, when a correctionor update is made, main criticisms. In each case where he argues
it feeds through completely. Obsolete datasets that a correction or reasonable change would
are not left sitting aroundto be used later,acci- alter the results substantially (in claims 1, 2,
dentally. The procedureprevents the unwitting and 3), he is incorrect. Such changes alter the
propagation of erroneous or superseded data results hardly, or not at all. Moreover, several
and code, and it is often used by modernempiri- of the "corrections"he identifies are wrong or
cal researcherswho need to combine data from are presented in a very misleading way. When
numeroussources, some of which are very large Rothstein is able to alter the results, it is only
in scale. One of the raw datasets used in Hoxby by introducingsubstantialerrors.For example,
(2000)-the Common Core of Data, a sort of in the case of the private school students, he
directory of public schools-is a dataset that obtains differentresults not because he includes
I use routinely.4(I have used it in more than a private school students-in fact, they make no
dozen papers subsequentto Hoxby 2000.) After difference-but because he uses the private
the work in Hoxby (2000) was completed, the school issue to motivate the introductionof an
NCES released a corrected version of some of incorrect method of assigning students to their
the raw data-in much the same way that the home localities. In doing so, he introduces
Bureauof LaborStatistics(BLS) will sometimes errors and discards a substantial amount of
correctdata it has previouslyreleased.When the data. It is not surprisingthat his wrong method
correctedraw data werereleased, my code auto- generates wrong results. Since Tiebout's model
matically substitutedthem for the original raw suggests that the jurisdictional makeup of a
data. The corrected geographic codes-which student'slocality affects his school, one cannot
generated small changes in the metropolitan test the theory using a methodin which students
areas, students, and streams included in the are assigned to the wrong localities and locali-
sample-yield nearlythe same results. The fact ties are systemically dropped. Moreover,there
that the correct geographiccodes (which are in is no reason to assign students incorrectly:the
the replication dataset available from NCES) raw data contain codes that allow researchersto
and uncorrected codes produced nearly the assign students correctly. Rothstein ignored or
same estimates is a good sign because it indi- set aside the documentationthat describes the
cates that the results are robustto small changes codes. Another example is the pair of stream
variables. The reasonable alternative variables
he suggests make no difference to the results;
4 For the Common Core of Data, see United States he is able to generatedifferentresults only when
Departmentof Education(2004). he constructsvariablesthat measure something
2040 THEAMERICANECONOMICREVIEW DECEMBER2007

other than what is intended. With regard to increase in choice and the outcomes of students
claim 4, Rothstein does not estimate the appro- who attendthe public schools before.This effect
priate first-stageequationand misinterpretskey aggregatesthe pureeffect of competitionon pub-
coefficients from the first-stageequationhe does lic school students, the change in the students
estimate. who attend the public schools, the changes in
In addition to his criticisms, Rothstein intro- peer effects in public schools associatedwith the
duces a "preferred"sample and specification, change in students,the changes in public school
making numerouschoices almost entirely with- parents, the change in voters' support for pub-
outjustification.I show below thatthese choices lic schools, the change in who teaches in public
do not, in fact, bear scrutiny. Some introduce schools, and so on. In other words, the general
errors;others make the sample less representa- equilibrium effect provides a reply to the typi-
tive. Nevertheless,even with a free handto make cal policymaker when he asks, "When choice
changes without much or any justification-a increases, won't able students desert my public
clear opportunityfor specification searching-- schools, depriving the remaining students of
Rothsteinalters the results only marginally. good peers?Won'tmotivatedparentsdeserttoo,
depriving the schools of strong advocates and
I. PrivateSchoolStudentsand Assigning good governance?Won't voters who previously
Studentsto Localities supportedtax levies or took an interestin school
boardelections fall away?Won'ttalentedpeople
The numberand compositionof studentswho who taught in my public schools follow their
attend public schools in a metropolitan area children elsewhere?"And so on. Most debates
may be endogenous to the absence or presence on school choice do not focus exclusively on the
of Tiebout competition. One section of Hoxby pure effect of competition:they include the full
(2000) shows that, where Tiebout choice is panoply of generalequilibriumeffects.
greater, a larger share of students attend pub- We can also identify the general equilib-
lic, as opposed to private,schools. A reasonable rium effect of choice on all students, public
query is whether the positive effect of Tiebout and private. This is the difference between the
competition on public school students'achieve- outcomes of all students before and after an
ment arises partly because it induces "good" exogenous increasein choice. This effect aggre-
studentsto stay in the public schools. gates the pure effect of competition on public
Scholars may be interested in a parameter school students;the pure effect of competition
that cannot be identified econometrically: the on private school students;the change in peer
"pure"effect of competition on public school effects in public schools;the changes in parents,
students.By this I mean the following. Suppose voters, teachers,and so on in the public schools;
that competition increases because of an exog- the change in peer effects in private schools;
enous increase in choice and thatpublic schools the change in private school parents'advocacy,
improve or deteriorate in response. The pure monitoring,and financialresources;the change
effect of competition is the effect of increased in donors to the private schools; the change in
choice on the outcomes of students who would who teaches in the private schools, and so on.
have attended the public schools in the coun- Rothstein argues (2007, 2034-36) that, in
terfactual where choice had not increased. We Hoxby (2000), I produce a "biased"estimate of
cannot identify this parameterbecause we never the pure effect of competition on public school
simultaneously observe the counterfactualand students,but this is a misrepresentation.I do not
the increase in choice. Similarly, we cannot attemptto estimate the pure effect of competi-
identify the pure effect of competition on pri- tion (which is not identifiable);I estimate the
vate school students,which is a mirrorimage of general equilibriumeffect on public school stu-
the pure effect of competition on public school dents. That the estimation choice is intentional,
students. and not an unwittingerror,is obvious:the paper
Fortunately,we can identify the generalequi- contains an entire section that describes why
librium effect on public schools, which is the more Tiebout choice might cause students to
difference between the outcomes of students switch from privateto public school. Moreover,
who attendthe public schools afteran exogenous the section in question actually estimates the
VOL.97 NO. 5 AMONGPUBLICSCHOOLS:REPLY
HOXBY:COMPETITION 2041

TABLE 1-INSTRUMENTAL VARIABLES ESTIMATES OF THE COEFFICIENT ON THE CHOICE INDEX,


PUBLIC SCHOOL STUDENTS AND ALL STUDENTS

8th Grade 10th Grade 12th Grade


Reading Math Reading Math Reading Math
Public school students only (the 6.64** 5.36** 8.50** 7.98** 5.92* 4.12
general equilibrium effect on (2.69) (2.11) (2.91) (2.50) (3.42) (2.59)
public schools)
All, public and private, school 5.76** 3.51 8.16** 7.77** 5.55* 4.41"
students(the general equilibrium (2.73) (2.16) (2.89) (2.55) '(3.39) (2.56)
effect on all students)
Notes: The table shows the instrumentalvariables estimate of the coefficient on the choice index for regressions in which
observations are students in the NELS data. Stata'srobustclustered standarderrors are in parentheses.The clustering unit
is the metropolitanarea. ** (*) indicates statistical significance at 5 percent (10 percent).The data and code used to produce
these estimates are availablein the replicationdataset.The specificationis analogous to the "Base IV" regressions in Table4
of Hoxby (2000).
Source: Author'scalculations.

shareof studentswho shift from privateto pub- schools that could be important.This is because
lic schools as Tieboutchoice rises. an increase in choice induces a movement of
Because, in Hoxby (2000), I actually estimate students from private to public schools that is
the shift, it is easy to see that the general equi- tiny by public school standardsbut large by pri-
librium effect on public schools will be very vate school standards,where a fifth of students
similar in practice to the pure effect of compe- might depart.Thus, privateschools' peer effects
tition on public schools. A two-standard-devia- would change, their parentswould change, their
tion increase in choice among public schools finances would change, and so on. The general
induces 2 percent of studentsto shift from pri- equilibriumeffect on all studentsis interesting,
vate to public schools. In a typical metropolitan but it is certainly not an unbiased estimator of
area, this means that 91 percent, ratherthan 89 the pure effect of competitionon public schools.
percent, of students will attend public schools. Rothstein'sstatements about bias are confused
The 2 percent of studentswho shift would have and may lead future researchers,who need to
to have extraordinaryscores, be extraordinary keep trackof the parameterthey are estimating,
peers, have extraordinaryparents, and attract astray.
extraordinary voters and teachers to change In any case, Table 1 shows that, in the NELS
achievement significantly among public school sample, estimates of the general equilibrium
students.Thus, even thoughone cannotestimate effect on public schools (top row) are very simi-
the pure effect of competitionof public schools, lar to estimates of the generalequilibriumeffect
it constitutes most of the general equilibrium on all students (bottom row). The estimated
effect on public schools. coefficient for the eighth-grade reading score,
Rothstein, instead of realizing that the pure for instance, changes from 6.64 to 5.36; and the
competition and general equilibriumeffects on estimatedcoefficient for the tenth-gradereading
public schools are differentparameters,argues score, for instance, changes from 8.50 to 8.16.
that generalequilibriumeffect on public schools In no case do the estimates suggest that the gen-
is a "biased"estimate of the pure competition eral equilibriumeffect on public school students
effect. In a bizarre twist, he then asserts that differs from the effect on all students. (That is,
"bias can be easily avoided,"implying that he all the changes are far from being statistically
has a method of estimating the pure effect of significant. The changes mentioned are about
competition on public schools. In fact, he does 0.30, but the standarderrorsindicatethat a coef-
not do this at all. Instead,he estimates the gen- ficient would have to change by about 5.50 for
eral equilibriumeffect on all students,a param- the change to be statisticallysignificant.)
eter that contains many more elements than the Giventhese results,why does Rothsteinappar-
pure effect of competitionon public schools. In ently generatesuchdifferentresultsin his Table5
particular,it contains many changes in private when he also estimates the general equilibrium
2042 THEAMERICANECONOMICREVIEW DECEMBER2007

effect for all students? He generates different districts in the United States, and many zip
results because he reassignsthe locality of every codes cross school districtboundaries.
studentin the NELS using a "zip-code-backing- How valid are the localities assigned by the
out method"thatis error-proneand thatdoes not zip-code-backing-out method? Because the
properlyassign studentsto many districts. This NELS contains an identifying code for all of its
zip-code-backing-out method is unnecessary schools, they can be associated with the school
because, as described in its documentation,the districtin whichthey are actuallylocated.7Thus,
NELS contains codes thatidentify all schools in by following NELS documentation,studentscan
the study,public and private.5Rothsteinignored be assignedto the actual school districtin which
or set aside the documentationand thereby-- they live or, in the case of private schools, actu-
ostensibly-justified his use of a method that ally attendschool.8Table2 comparesthe actual
makes error-proneassignments of students to school districtsin whichNELS base-yearschools
locations, drops covariates, and uses only sub- are locatedto the districtsthatthe zip-code-back-
samples of the data. It is importantto note from ing-out method associates with them. Table 2
his Table 5 that it is these actions, and not the shows that 39 percentof the public schools have
inclusion of private school students, that cause problematic(missing, nonunique,or apparently
him to obtain results substantially different unique but actually incorrect) "backed-out"
from Hoxby (2000). The top row of his Table 5 codes. The table also shows that 41 percent of
excludes privateschool studentsand the bottom the privateschools have problematicbacked-out
row includes them, but the difference between codes. Detailed breakdownsare in the table.
the two rows is inconsequential.It is his intro-
duction of error, discarding of data, and drop-
ping of covariates that cause him to report that
the "[e]stimates are substantially smaller than 7 Quality EducationData codes are used for both public
those presented earlier."Misleadingly, he pres- and private schools in the NELS data. Public schools are
ents this conclusion as though it were relatedto also associated with their NCES code. The codes can be
used to match the NELS to relevant directory-type data-
the inclusion of privateschool studentswhen, in either the Quality Education Data National Database
fact, it has nothing to do with them. (1988) or the combination of the public school directory
The zip-code-backing-out method works as (the Common Core of Data) and private school directory
follows. There are several census variables in (Private School Universe Survey). Both of the latter direc-
the restricted-accessNELS that are associated tories have been published regularly by the United States
Departmentof Education since the late 1980s (see United
with each school's zip code, but students' zip States Department of Education 2004 and 2003, respec-
codes are not providedin the NELS. Rather,the tively). Private schools' addresses are used to determinethe
NELS providesa few variablesderivedfrom the school district in which they are located. This is standard
census that describe the zip code of a student's geocoding. For the purposes of this paper, a tiny number
of students (18) who may be attending private school have
school, not his residence.6 By cross-referenc-
problematic Quality Education Data codes and an even
ing these variables, one can back out a unique smaller number(9) have a missing Quality EducationData
school district location for some (only some) code. To put these numbers in perspective, consider that,
students. Unfortunately,this method associates together, these studentsrepresent0.1 percent of those who
other studentswith the incorrectdistrictor mul- completed the base-yearNELS survey.
8 It would be preferableto associate private school stu-
tiple districts (as many as nine). Two errorsare dents with the district where they live, as opposed to the
generated by this procedure. First, the method district where they attend school. Although the two dis-
does not always generate a unique zip code: tricts are the same for many students, they are likely to
multiplezip codes may have the same values for be different for cities in which prestigious private schools
and Catholic schools are located, for historical reasons, in
the descriptivevariablesprovidedby the NELS.
neighborhoods that are more urban and poorer than the
Second, zip codes are not aligned with school areas in which their students reside. By associating some
private school students with the "wrong" localities, one
introducessome degree of measurementerrorthat is proba-
5 See United States Departmentof Education,National bly systematic ratherthan random.Given the small number
Education Longitudinal Study: Base YearSample Design of students affected, the resulting bias is likely to be small.
Report (1990, 12). It affects only the estimate of the general equilibriumeffect
6 The census variables are drawn from 1990 Summary on all students. The estimate of the general equilibrium
Tape File 3B. effect on public schools is unbiased.
VOL.97 NO. 5 AMONGPUBLICSCHOOLS:REPLY
HOXBY:COMPETITION 2043

TABLE2-PERCENTAGEOFNELS BASE-YEARSCHOOLS
FORWHICHTHEZIP-CODE-BACKING-OUT
METHODPRODUCES
A
MISSING, NONUNIQUE, OR APPARENTLY UNIQUE BUT INCORRECT DISTRICT CODE

Apparentlyunique
but actually incorrect
Missing district location Nonunique district location district location Total
Public schools 10% 28% 1% 39%

Private schools 18% 22% 1% 41%

Notes: The table shows the percentageof base-yearschools in the NELS for which the zip-code-backing-outmethod,described
in the text, produces a school district code that is missing, nonunique, or apparentlyunique but incorrect. Comparison is
made to the actual school districtlocations based on school codes providedby the NELS, as described in the text.
Source: Author'scalculations.

In short, Rothstein obtains results that dif- to the USGS that can be displayed at 1:24,000
fer from those in Hoxby (2000) not because he resolution. The GNIS is a list of certain fea-
adds private school studentsto the sample but tures shown on the maps. Every USGS feature
because his zip-code-backing-outmethod mis- is classified (as a "stream,""reservoir,""lake,"
assigns about 40 percent of all students. "summit,"etc.) and its location is described by
its latitudeand longitude.A stream'slocation is
II. MeasuringStreamsby FlowRatherThanby described by the latitude and longitude of both
PrimaryLocationand RelatedIssues its "primarylocation"and its source. (Rothstein
uses the word "mouth"for primarylocation, but
In Hoxby (2000), two streams variables, I use the USGS terminology.)The vast major-
smaller streams and largerstreams, are used as ity, 93.5 percent, of streams have their source
instruments.For reasons discussed in my origi- in the same metropolitanarea as their primary
nal paper(1222) and repeatedbelow, it is impor- location.
tant to divide streams into those that are more
and less suitable for commercial navigation. A. Streamsby PrimaryLocation
However,the essentiallogic behindusing streams and Streamsby Flow
as instrumentsis simple:streamscause variation
in the number of school districts because, dur- Hoxby (2000) associates streams with the
ing the settlementof America, districtboundar- metropolitanarea in which they have their pri-
ies often were streams.In fact, early American mary location, as defined by the USGS. At the
laws often stated that students should not have time the paperwas written,this was the form in
to cross streamsto get to school. In otherwords, which the GNIS data were available.Rothstein
real walking distance, not distance as the crow argues that streams ought to be associated with
flies, was what matteredfor students'travel.The all the metropolitanareas in which they flow,
streams variables work as instrumentsbecause and he proposes use of a GNIS dataset made
they affected initial districtboundariesandthere availablesince Hoxby (2000) was written.10The
is substantial inertia in boundaries. (The idea shift in classification(fromstreams-by-primary-
evidentlyhas very old origins:Maimonides'rule location to streams-by-flow)is reasonable, but
also statedthat studentsshouldnot have to cross is unlikely to affect the results much because
streamsto attendschool.) 93.5 of streams are classified identically under
The USGS gathers accurate information on the two methods. The remaining 6.5 percent
streams and conveys it in two forms that are of streams are not removed from the count for
relevant: topographic quadrangle maps and the metropolitan areas that are their primary
the Geographic Names Information System
(GNIS).9The maps show every feature known 1o The new dataset is Dataware (1999). The dataset

became availableafter the paperwas written, thoughbefore


the year of publication (2000) owing to standardpublica-
9 See USGS (1993, 2004). tion delays.
2044 THEAMERICANECONOMICREVIEW DECEMBER2007

TABLE 3-INSTRUMENTAL VARIABLES ESTIMATES OF THE COEFFICIENT ON THE CHOICE INDEX


USING ALTERNATIVE SCHEMES FOR CLASSIFYING STREAMS

8th Grade 10th Grade 12th Grade


Reading Math Reading Math Reading Math
Largerand smaller streams, 6.64** 5.36** 8.50** 7.98** 5.92* 4.12
streams classified by primary (2.69) (2.11) (2.91) (2.50) (3.42) (2.59)
location (base-case estimates)
Largerand smaller streams, 6.11** 4.98** 7.98** 7.44** 5.80* 4.14
streams classified by flow (2.50) (2.05) (2.78) (2.38) (3.32) (2.60)
Smaller streams only 5.86* 4.21"* 7.82** 6.28** 3.88 2.35
(2.81) (2.03) (3.12) (2.49) (3.19) (2.32)
Notes: The table shows the instrumentalvariables estimate of the coefficient on the choice index for regressions in which
observations are students in the NELS data. Stata's robustclustered standarderrorsare in parentheses. The clustering unit
is the metropolitan area. ** (*) indicates statistical significance at 5 percent (10 percent). The data and code used to pro-
duce these estimates are available in the replication dataset. The specification is analogous to the "Base IV" regressionsin
Table4 of Hoxby (2000).
Source: Author'scalculations.

locations; they are simply added to the counts important for assessing commercial navigabil-
of other metropolitanareas throughwhich they ity, I used the USGS topographic quadrangle
flow. Thus, the change in classification scheme maps to measure larger streams. There is no
producesonly a small change in the instrumen- single rule for the width that makes a stream
tal variables. navigable for the purposes of commerce, but
The top two rows of Table 3 show that the states where width is explicitly considered in
estimates based on streams-by-primary-loca- the determinationof navigability include Texas
tion and streams-by-floware extremely similar. (30 feet wide), Washington (40 feet wide),
(These estimates use as instrumentssmallerand Georgia (40 to 45 feet wide), and Arizona (simi-
largerstreams, where the lattervariableis mea- lar to Georgia).1 Thus, I chose 40 feet wide as
sured as described below. The instrumentscan
be computedeitherby primarylocation or flow.) 1
The federal standard for navigability is based on a
The estimated coefficient for the eighth-grade Supreme Court case known as The Daniel Ball, 77 U.S.
reading score, for instance, changes from 6.64 (1870). In it, streamsare defined as navigable-in-fact"when
to 6.61; and the estimated coefficient for the they are used or susceptibleof being used, in their ordinary
tenth-grade math score, for instance, changes condition, as highwaysfor commerce, over which tradeand
are or may be conducted in the customary modes of
from 7.98 to 7.44. (The changes mentionedaver- travel trade and travel on water."The federal test for navigability
age 0.29, but the standarderrorsindicate that a is comprised of four criteria:(a) the stream must be suscep-
coefficient would have to change by about 5.50 tible to navigation; (b) the navigation should be for com-
for the change to be statistically significant.) mercial purposes, not merely navigation for any purpose;
Indeed, Rothstein'sown estimates (see his Table (c) the stream should be susceptible to navigation in its
ordinary condition; and (d) the stream should be navigable
4) show that using streams-by-flow does not by the customarymode of commercial transportationin the
affect the results. It is difficult to see, therefore, area. Most states, unless they have recently adopteda new
why he raises the matter. definitionof navigabilitybased on recreation(irrelevantfor
the purposes of this paper),use the federal definition. The
Texas Natural Resources Code S 21.001(3) defines as navi-
B. MeasuringNavigable LargerStreams
gable a streamthat "retainsan averagewidth of 30 feet from
the mouth up." In the state of Washington, the precedent-
As described in Hoxby (2000) and at some setting case is Griffith v. Holman (1900, Wash. 347,63 P.
length below, it is importantto create separate 239, 83 AmSt.Rep. 821 s), which says that a stream"averag-
counts of smaller streams and larger streams, ing in width about40 feet" or less is nonnavigable.Georgia
and Arizona require that a stream be navigable by barges
where larger streams are defined as those that that were commonly used for shipping commercial goods
are potentially navigable for the purposes of in, respectively, 1863 and 1912. Givens v. Ichauway, Inc.
commerce. Because width as well as length is (S97A1074., 268 Ga. 710, 493 SE2d 148, 1997) established
VOL.97 NO. 5 AMONGPUBLICSCHOOLS:REPLY
HOXBY:COMPETITION 2045

a typical standardand looked for streams that a problem of subjectivity. He is merely dem-
met this criterionand thatwere at least 3.5 miles onstratingthat, by ignoring informationthat is
in length (since shorterstreams could not plau- pertinentand then focusing on the very location
sibly connect two trading centers even if they where the pertinentinformationis most useful,
were wide). Curvilinearbodies of waterthatmet he can generatemismeasurement.
these criteriawere noted,checked to ensurethat Moreover, Rothstein's argument that only
they were USGS-designatedstreams(to exclude GNIS data, and not map-baseddata, should be
nonstreamssuch as man-madebodies of water), used is based on a fundamentalmisunderstand-
and counted. Such measurementis painstaking, ing. The GNIS data are derived from USGS
essentially because one has to carry informa- maps. The difference between the maps and the
tion correctly over the edges of maps, but it is GNIS is thatthe GNIS contains only a tiny frac-
not fundamentallydifficult.The count of larger tion of the informationon the maps. Thus, if a
streams is subtractedfrom the total numberof researcherdevotes time to the maps, it is repaid
streamsto obtain a measureof smaller streams. with accuratelymeasuredvariables.In contrast,
Rothstein argues that my method of process- there is only a certain amount a researchercan
ing USGS maps is subjective,and much of his derive from the sparse set of variablesavailable
discussion of subjectivityis devoted to a pecu- in the GNIS. To put it anotherway, supposethat
liar example meant to illustrate it. Focusing the GNIS contained not only the latitude and
on the Fort Lauderdalemetropolitan area, he longitude of a stream's origin and destination,
counts man-made canals (which, straight or but also the latitude, longitude, and shore-to-
not, were created by dredging)and the Atlantic shore width at every turningpoint in a stream's
IntracoastalWaterway,an engineered channel. course. A researcher could then measure a
Considering that Fort Lauderdale'smunicipal stream'ssize looking for a combinationof length
and tourist offices heavily advertise the city's and width that is continuous(not interruptedby
being known as the "Venice of America" and narrow stretches).This is what the maps allow
emphasize how its waterwayswere artificially one to do, except the maps are superiorbecause
created from the marsh, it is obviously a loca- they show width at all points, not merely occa-
tion where man-made water features-which sional points. Another advantageof using maps
are not streams and which blatantlyviolate the is that it is so time-consuming to measure
spirit of the instrumentalvariable-abound.12 It streamsusing those maps that the equivalentof
can be no surprisethat,in a location like this, he "specificationsearching"is impracticable.One
comes up with a count that differs from mine: must adopt a metric for counting streams and
he is knowingly counting bodies of water that stick with it.
are not defined as streams by the USGS.13He
need not guess: he need only acknowledge the C. Larger (PotentiallyNavigable) Streams
USGS designations.Rothsteinis not identifying
As argued in Hoxby (2000, 1222), the one
shortcoming of streams as instruments is that
that the width of the smallest such barge was 35 feet. Thus, large streams that are navigable for shipping
in practice, Georgia and Arizona look for streams with a
width of 40 to 45 feet.
purposesmay be (or have been) importantchan-
12 The canals of Fort Lauderdalewere deliberately dug
nels for trade. If large rivers attractcommerce
to create housing developments and facilitate commercial and cities are built where commerce thrives,
traffic.Thus, they are a conspicuousexample of what ought then we might expect to find big city districts
not to count as a stream if the instrumental variable is to aroundimportantrivers. To take a particularly
work as intended. Even Fort Lauderdale'sport, which is
obvious example,considerPittsburgh,a city that
linked to its man-made canals and channels, is artificially
constructed. For information and history on the canals
would not exist if it were not for the confluence
of the Atlantic coast, including the Atlantic Intracoastal of the Allegheny, Monongahela,and Ohio riv-
Waterway, see Aubrey Parkman (1983), especially pages ers. In otherwords, a large navigablerivermight
83-87. For a history, includingdescriptionsof the canal and attractcommerce to a particularlocation. The
port creation, see Susan Gillis (2004), especially pages 30
and 39. dense population that gathers in the location
13 The USGS standardfor what constitutes a streamis "a may cause a central city jurisdiction to arise,
linear flowing body of water"(see USGS 2006). and it may have the political power to swallow
2046 THEAMERICANECONOMICREVIEW DECEMBER2007

up others.In short,large streamsmay,like small math score changes from 8.50 to 7.82. (The
streams, generate more initial jurisdictions,but changes mentionedaverage0.7, but the standard
large streams may also create the conditions errors indicate that a coefficient would have to
in which jurisdictions are more likely to con- change by aboutsix for the change to be statisti-
solidate into "centralcity" ones. Therefore,we cally significant.)
would not expect that large and small streams As noted in Hoxby (2000), the two streams
should necessarily have the same effect on the instruments have different coefficients in the
choice index. first-stage regression. This may be an indica-
A separate concern about large streams is tion that larger streams have the offsetting
that they may indirectly affect achievementby effects mentioned earlier: more large streams
affecting commerce which, in turn, may affect mean more initial jurisdictions, but more large
the type of personwho decides to live in an area. streams also create conditions favorableto con-
The directionof such effects is unclearand need solidation. Also, while the first-stageregression
not be uniform across metropolitanareas. For that includes all metropolitan areas produces
instance, large streams may make one metro- positive coefficients on both streams variables,
politan area a center for agricultural goods, a regressionrun on certain subsets of the metro-
anothera center for industrialgoods, and a third politan areasproducesa coefficient on the larger
a center for finance. The types of people who streams variable that is not statistically signifi-
work in agriculture, industry, and finance are cantly different from zero and occasionally has
likely not the same. a point estimate with a negative sign. This is a
In short, although all streams are created by hint-though not firm statisticalevidence-that
nature, there is reason to expect that large and the larger streams variable is associated with
small streams will have differenteffects on the commerce and central city consolidation in
number of jurisdictions in an area. Moreover, some metropolitanareas. Put anotherway, this
large streams may have nonmonotonic effects: is a hint that the largerstreamsvariablemay not
positive in some areas, negativein others. Thus, fulfill the monotonicitycondition for an instru-
in Hoxby (2000), I argue that small streams mental variable. This would be anotherreason
(streams too small to be commercially navi- to think that it is the smaller streams variable
gable) are more credible instruments than are that should be regarded as the more reliable
commercially navigable rivers. It is very cred- instrument,as suggested in Hoxby (2000).
ible that the number of smaller streams fulfills Rothstein argues that one should be able to
the requirementthat an instrumentalvariablebe add the large and small streams variables and
uncorrelatedwith the unobserveddeterminants use total streams as an instrument. But, as
of achievement. noted above, the two streams variables do not
Fortunately,the numbersof smallerandlarger have similar coefficients. If there are potential
streamsare not highly collinear: the correlation monotonicity issues with the larger streams
is only 0.41. Thus, in Hoxby (2000), I let the two variable, adding it to the smaller streams vari-
streams variables enter the first-stageequation able produces a single contaminated vari-
separatelyto see whether their coefficients are able-aggravating rather than remedying the
the same (they are not). Moreover,in thatpaper, problem.Rothstein also argues that one should
I compute estimates that rely solely on smaller be able to measure larger streams based solely
streamsin orderto determinewhetherthey dif- on such criteria as length or whether a stream
fer statistically significantly from the estimates crosses a county boundary.But, as noted above,
thatrely on both streamsvariables(they do not). the goal is to separate streams that are useful
(See the bottomrows of Tables4 and 6 of Hoxby for commercial purposes from those that are
2000.) Similarly, the bottom panel of Table 3 not. Merely being 3.5 miles long or crossing a
shows that the estimates that rely on only the boundarydoes not make it useful for commerce.
smaller streamsvariableare very similar to the It is width that is usually the binding criterion,
estimates thatrely on both streamvariables.For not the 3.5-mile length, which is a simple mini-
instance,the estimatedcoefficientfor the eighth- mum. There are literally thousands of creeks
grade reading score changes from 6.64 to 5.86; that run for 10 miles but that are only a few feet
and the estimatedcoefficient for the tenth-grade wide, and it is obvious that they can serve no
VOL.97 NO. 5 PUBLICSCHOOLS:REPLY
HOXBY:COMPETITIONAMONG 2047

real commercialpurpose(regardlessof whether data from the Common Core, it calls the more
they cross a boundary).It is no wonder that, by correctdata availablenow.
introducing variables (streams that are merely On this point, Rothstein's comment may
longerthan 3.5 miles, streamsthat merely cross mislead readers. First, he implies that he was
boundaries)that do not even attemptto measure refused access to some dataset that exists. In
what one needs to measure, Rothstein is able fact, no dataset ever existed independently of
to obtain results in his Table 4 that differ from the raw data:the estimation sample was created
mine. from the raw data each time the code was rerun.
Second, he claims that I changed an "assign-
III. A Varietyof ClaimsaboutErrors ment algorithm"for matchingschool districtsto
or Corrections geographic codes for metropolitanareas (2007,
2027). In fact, the only thing that changed was
Rothstein makes several claims that errors the raw data.The word "algorithm"is itself mis-
and/or changes in the program and variables leading: the programjust assigns each district
contained in the replicationdataset have a sub- to the metropolitanarea in which it is located.
stantial effect on the results. These claims are Third, his language (2007, 2027-28) suggests
incorrect. None of the issues he describes has that I generated the corrections to the data,
mucheffect on the results.Moreover,his discus- and that I did it in response to his comment. In
sion of the errorsand/orchanges is consistently fact, the raw data were correctedby the census
misleading, incorrect,or both. and the corrections predated his comment by
years.'
A. Raw Data Correctedby the Census In any case, despiteRothstein'srepeatedinsis-
tence on the issue, there is not much difference
Rothsteinstates thatI declined to providehim between using the codes availablefor the origi-
with the original datasetfrom Hoxby (2000). As nal paper and using the codes correctedby the
mentioned in the introduction, I refused him census. A comparisonof the top row of Table 1
nothing that exists but providedhim with all of (above)andthe toprowof Table4 in Hoxby(2000)
the code and raw data. One source of raw data revealsvery similarresults.The similarityis also
has been correctedsince my workon the original shown in Rothstein'sown Table 1 (comparecol-
paper:the coding of the metropolitanareasasso- umns 1 and 2), which makes it hard to see why
ciated with school districts. This complicated he raises the issue.
geographic coding is performedby the Bureau
of the Census in coordinationwith NCES. The B. Four Ohio School Districts
codes become part of the Common Core of
Data, which is a directory-likesource routinely Rothsteinarguesthat the programincorrectly
used by education researchers.When corrected assigns four school districts in Ohio to a North
Common Core data becomes available,they are Carolinametropolitanarea.In fact, the program
released by NCES. Because I use the Common is correct.It is the raw data-the CommonCore
Core many times each year, I automatically of Data-that contain an errorand associates the
download the correcteddata.14Therefore,when Ohio districtswith NorthCarolina.More impor-
the code in the replicationdataset calls the raw tantly, the four districts' being misassigned has

'5 For clarity, it is useful to distinguish between codes


14There are still a few errorsremaining in the Common that were corrected and codes that were updated. When
Core. These are corrected by the code in the replication the census and NCES correct a code in, say, the 1987-88
dataset wherever they are found. Compared to the data administrative data, they give it the code it should have
available in 1993, however,the currentlyavailable data are had at the time. When the census and NCES update a code
substantially error-free. Also, the data available in 1993 between, say, the 1987-88 and the 1991-92 administrative
were a vast improvement on the parallel dataset for the years, they are switching from one correct code to another
1980 census: SummaryTapeFile 3F. correct code in order to reflect changed circumstances.
For more information on changes to the geographic Thus, when I say that the program calls the correct geo-
codes, see United States Departmentof Commerce (1999) graphic codes, it calls the correct codes for the year-not
and United States Departmentof Education(2002). some later, updatedyear.
2048 THEAMERICANECONOMICREVIEW DECEMBER2007

TABLE 4-INSTRUMENTAL VARIABLES ESTIMATES OF THE COEFFICIENT ON THE CHOICE INDEX


WITH OHIO DISTRICTS AS IN RAW DATA AND AS FIXED

8th Grade 10th Grade 12th Grade


Reading Math Reading Math Reading Math
Four Ohio districts left as in 6.64** 5.36** 8.50** 7.98** 5.92* 4.12
raw Common Core data (2.69) (2.11) (2.91) (2.50) (3.42) (2.59)

Common Core data fixed for 6.64** 5.36** 8.50** 7.98** 5.92* 4.12
four Ohio districts (2.69) (2.11) (2.91) (2.50) (3.42) (2.59)
Notes: The table shows the instrumentalvariables estimate of the coefficient on the choice index for regressions in which
observations are students in the NELS data. Stata'srobustclustered standarderrorsare in parentheses. The clustering unit
is the metropolitanarea. ** (*) indicates statistical significance at 5 percent (10 percent). The data and code used to produce
these estimates are available in the replicationdataset. The specification is analogous to the "Base IV" regressionsin Table
4 of Hoxby (2000).
Source: Author'scalculations.

no effect on the results because there are no 1987-88 sweep are matched to the 1989-90
NELS studentsin them. Common Core. The second follow-up year of
Justforcompleteness,Table4 showsthe results the NELS is 1991-92, and the census updated
with and withoutthe correctionfor the fourOhio a few metropolitanareas between 1989-90 and
districts.The resultsare identical;the errorin the 1991-92. Thus, a NELS district that remains
raw data is harmless. Rothstein'sdiscussion of unmatchedafter the firsttwo sweeps is matched
the Ohio districts(2007, 2028-29) is misleading: to the 1991-92 CommonCore.
he implies that they are included in the NELS In fact, the number of districts that change
estimationsample and affect the results. codes between 1987-88 and 1991-92 is so small
that it does not matterwhich of the three survey
C. Use of ContemporaneousMetropolitan years' codes are used. This is shown in Table5,
Area Codes which reveals that the coefficients hardly vary
with the year of the codes.
Rothstein argues that several school districts In short, I match NELS survey data to con-
have "incorrect,invalid, or obsolete"metropoli- temporaneousadministrativedata. In this way, I
tan area codes (2007, 2028). The claim about accept the census's determinationof which dis-
"obsolete"codes is groundless. It is apparently tricts were in metropolitanareas in the survey
based on his mistakenbelief thatdistrictsshould years. This is the accurateway to assign districts
matchcurrentcensus codes ratherthe codes that to metropolitan areas. Rothstein's declaring
existed at the time of the NELS survey. This codes to be obsolete amounts to nothing more
is wrong: the Bureau of the Census routinely than his having arbitrarilypicked a later year's
updates the definitions of metropolitanareas.16 metropolitan area definitions and saying that
Thus, school districts in the NELS rightly have codes are obsolete if they do not matchthat later
metropolitanarea codes that were currentat the year's.
time the NELS was conducted.
The base year of the NELS data is 1987-88, D. A TypographicalError in the Program
so I firstmatchedNELS districtsto the 1987-88
Common Core of Data. The first follow-up year Rothstein makes much of a typographical
of the NELS is 1989-90, and the census updated error in a programin the replication dataset-
a few metropolitanareasto includenew districts his discussionoccupies his pages 2028 and 2029
betweenthe base andfirstfollow-upyears. Thus, and footnotes 6, 7, 8. The length and intensityof
NELS districtsthatremain unmatchedafter the this discussion suggest that Rothsteinhas found
a serious errorrelatedto missing school district
codes, but actually all that he has found is that
16For information,see Historic MetropolitanArea Defi- the word "update"is missing from two lines.
nitions, United States Departmentof Commerce (2005). Moreover,this typographicalerrorhas nothing
VOL.97 NO. 5 PUBLICSCHOOLS:REPLY
HOXBY:COMPETITIONAMONG 2049

TABLE 5-INSTRUMENTAL VARIABLES ESTIMATES OF THE COEFFICIENT ON THE CHOICE INDEX USING METROPOLITAN AREA
CODES FROM VARIOUS CONTEMPORANEOUS YEARS

8th Grade 10th Grade 12th Grade


Reading Math Reading Math Reading Math
Using the 1987-88 6.64** 5.36** 8.50** 7.98** 5.92* 4.12
metropolitanarea codes (2.69) (2.11) (2.91) (2.50) (3.42) (2.59)

Using the 1989-90 6.56** 5.40** 8.32** 7.81"* 5.68* 4.00


metropolitanarea codes (2.66) (2.11) (2.86) (2.47) (3.37) (2.57)
Using the 1991-92 6.58** 5.50** 8.39** 7.88** 5.62* 4.02
metropolitanarea codes (2.65) (2.10) (2.87) (2.47) (3.36) (2.57)
Notes: The table shows the instrumentalvariables estimate of the coefficient on the choice index for regressions in which
observations are students in the NELS data. Stata'srobust clustered standarderrors are in parentheses.The clustering unit
is the metropolitanarea. ** (*) indicates statistical significance at 5 percent (10 percent). The data and code used to produce
these estimates are availablein the replicationdataset. The specification is analogous to the "Base IV" regressions in Table
4 of Hoxby (2000).
Source: Author'scalculations.

to do with Hoxby (2000) because it was intro- the vast majorityof estimatesgeneratedwith the
duced when I was finalizing my creation of the typographicalerrorin place are so close to 5.30
replicationdataset. (I made superficialchanges that the difference is unimportant.
in an effort to make the code transparentto users
while complying with NCES's wishes regarding E. SummingUp Issues Related to Errors
methods of indicating missing observations. and Corrections
The typographicalerrorwas introducedat that
point).17In his lengthy discussion, Rothstein Tables4 through6 of this paper demonstrate
suggests that the typographicalerrorhas a sub- thateach of Rothstein'sissues regardingan error
stantial effect on the results. This is incorrect: or correctionhas no effect or little effect on the
the typographicalerrorhas no statistically sig- results. Indeed, although one would not know
nificant effect on the results and, if anything, it from his discussion, Rothstein's own results
weakens them. This is shown by comparingthe revealthe same thing (comparecolumns 1 to 3 of
top and bottom rows of Table 6. For instance, his Table 1). Rothstein'slong discussion of data
the estimated coefficient for the eighth-grade issues thatmake no differenceto the results cre-
reading score is 6.64 without the typographical ates an impressionof numerouserrors.The bar-
errorand 4.41 with it. The estimated coefficient rage of assertions with regardto errors simply
for the tenth-gradereadingscore is 8.50 without sows confusion in readers' minds and thereby
the error and 6.57 with it. (The changes in the providesajustificationfor Rothstein'spresenting
table average about 1.1, but the standarderrors a "preferred"specification(discussedbelow).
indicate that a coefficient would have to change
by about 5 for the change to be statistically sig- IV. The First StageRegression
nificant.) Rothstein himself says that, with the
typographical error, "The mean choice effect Rothstein argues that Hoxby (2000) was
for 12th grade scores is 5.39, quite close to the wrong to show a first-stageregressionthat is run
5.30 computed from the Hoxby/NCES data." at the metropolitanarea-leveland that includes
His relatedAppendix FigureAl also shows that all metropolitanareas.He arguesthatthe article
should have shown a first-stageregression that
uses student-levelobservationsand that includes
17 A variety of missing value indicators are used in the only metropolitanareasin which NELS students
NELS data. In an attemptto make the data transparentfor live. However, the first-stageregression shown
the general user while still respecting NCES distinctions
was a deliberate decision made as part of the
about missing data, I changed a few lines. When I did so,
the word "update"should have been added to lines 59 and editorial and refereeingprocess at this journal.
69 of the main program. It was intended to make it clear to readersthat
2050 THEAMERICANECONOMICREVIEW DECEMBER2007

TABLE 6-INSTRUMENTAL VARIABLES ESTIMATES OF THE COEFFICIENT ON THE CHOICE INDEX


WITH AND WITHOUT TYPOGRAPHICAL ERROR

8th Grade 10th Grade 12th Grade


Reading Math Reading Math Reading Math
Without typographicalerror 6.64** 5.36** 8.50** 7.98** 5.92* 4.12
(2.69) (2.11) (2.91) (2.50) (3.42) (2.59)
With typographicalerrorthat 4.41** 4.18** 6.57** 7.84** 5.25* 3.59
was not in Hoxby (2000) anyway (1.99) (1.79) (2.44) (2.18) (2.88) (2.29)
Notes: The table shows the instrumentalvariables estimate of the coefficient on the choice index for regressions in which
observations are students in the NELS data. Stata'srobust clustered standarderrorsare in parentheses. The clustering unit
is the metropolitanarea. ** (*) indicates statistical significance at 5 percent (10 percent).The data and code used to produce
these estimates are available in the replicationdataset.The specification is analogousto the "Base IV" regressionsin Table4
of Hoxby (2000).
Source: Author'scalculations.

the first-stageregressionmade use of only met- runs all the variantsof the first-stageregression:
ropolitan area variation because the dependent the specification stays the same but the sample
variable (the index of choice) varies only at the of the metropolitanareas that are included var-
metropolitan area level. Readers could have ies with the studentsin the regression.
been confused if they saw thousandsof observa- Rothstein (Tables 2 and 3) shows first-stage
tions in a regressionthatreally had only as much results that drop many metropolitanareas and
variationas there are metropolitanareas.'8 it is this drop that produces results that he can
Moreover, the same first-stage specification claim are "dramaticallydifferent."For instance,
is used for several second-stage regressions in in panel C of his Table2, he drops 37 percentof
the paper.In the second-stageregressionswhere metropolitanareas and, in panel D, he drops 39
NELS achievementvariables are the dependent percent. His discussion of the first-stageresults
variables, only some metropolitanareas (about is therefore misleading: he does not mention
60 percent)are included:the numberdependson how much of the sample is droppedand implies
the locations of the studentswhose achievement that the differences are due to changes in the
is being considered. But, in numerous second- data, not the sample drops. Also misleading is
stage regressions where variables like district- his commentary on how the coefficient on the
level variableswere the dependentvariables(for larger streams variable changes (2007, 2029-
instance, school spendingand the privateschool 30). He suggests that the changes in the coef-
share), all metropolitan areas are included. ficient are very worrisome, but-in fact-they
Given space constraints in the journal, it was were only to be expected since he is dropping
clear that only one first-stageregression could large parts of the sample. As described above
be shown. It was logical to show the one used for and in Hoxby (2000), the largerstreamsvariable
district-leveldependentvariablesbecause it was may exhibit a nonmonotonic relationshipwith
most general since it included all metropolitan the choice variable. Therefore, as one changes
areas. The coefficients in this regression are as the sample, the coefficient may change. This is
expected: there are more jurisdictions in areas the reason for constructing measures of both
with more streams, and smaller streams exert largerand smaller streams.
a more powerful influence on school district
boundaries than larger streams-probably for V. Rothstein's"Preferred"Specification
the reasons described above. For those who are
interested,the programin the replicationdataset Rothstein introducesa "preferred"specifica-
tion, making numerouschanges to the variables
and the sample. Four of these changes are men-
18 Hoxby (2000) also shows results of regressions that
tioned in footnote 10 but the vast majority are
are run wholly at the metropolitan area level. Code that mentionedonly in his appendices.A count of the
estimates such results is in the replicationdataset. changes cannot be constructedfrom his cursory
VOL.97 NO. 5 AMONGPUBLICSCHOOLS:REPLY
HOXBY:COMPETITION 2051

TABLE 7-INSTRUMENTAL VARIABLES ESTIMATES OF THE COEFFICIENT ON THE CHOICE INDEX, PROPER SPECIFICATION, AND
MOST REPRESENTATIVE SAMPLE VERSUS VARIOUS SPECIFICATION CHANGES AND ERRORS INTRODUCED BY ROTHSTEIN

8th Grade 10th Grade 12th Grade


Reading Math Reading Math Reading Math
Properlyspecified metropolitanincome variable and 6.64** 5.36** 8.50** 7.98** 5.92* 4.12
most representativesample available (2.69) (2.11) (2.91) (2.50) (3.42) (2.59)

Misspecified metropolitanincome variable 6.74** 5.39** 8.45** 7.98** 5.91" 4.14


recommendedby Rothstein (the log of the metropolitan (2.68) (2.11) (2.90) (2.46) '(3.38) (2.56)
mean household income)
Parentaleducation from the parentsurvey (when 5.61** 4.87** 7.50** 6.95** 5.07 3.38
available,makes sample less representative,see text) (2.37) (2.08) (2.68) (2.39) (3.20) (2.57)

Family income from surveys other than the base-year 6.52** 5.29** 8.43** 8.13** 5.67* 4.32*
survey (when no base-year income available, makes (2.60) (2.11) (2.76) (2.40) (3.26) (2.48)
sample less representative,see text)
Notes: The table shows the instrumentalvariables estimate of the coefficient on the choice index for regressions in which
observations are students in the NELS data. Stata'srobustclustered standarderrors are in parentheses.The clustering unit
is the metropolitanarea. ** (*) indicates statistical significance at 5 percent (10 percent). The data and code used to produce
these estimates are availablein the replication dataset. The specification is analogous to the "Base IV" regressions in Table
4 of Hoxby (2000).
Source: Author'scalculations (see text for a descriptionof each row).

descriptions,but it would be scores-hundreds the fact that there is a descriptionof the correct
if multiple changes to the same variable are measure in Hoxby (2000, 1217-18) and an even
counted separately.Moreover,the descriptions more detailed descriptionin the code contained
of the changes-if the reader gets to them- in the replicationdataset.
provide little justification or discussion of the The substitutionof the incorrectmetropolitan
implications.This makes it hard for a readerto income variable does not, in itself, make a dif-
assess the validity of his "preferred"estimates. ference to the results. This is shown in Table 7,
In this section, I considerjust a few of the many the top row of which uses the propervariableand
changes to illustratethe problemcreated. second row of which uses the misspecified vari-
able. The differences in the coefficients on the
A. The Constructionof the Metropolitan choice index are extremelysmall: 6.64 becomes
Income Variable 6.74, 5.36 becomes 5.39, and so on. (These
changes average0.7, but the changes would need
Rothstein (2007, footnote 10) substitutes, to be about5 to be statisticallysignificant.)
without explanation, an incorrect measure of
metropolitan income: the natural log of the B. Family BackgroundVariables
mean income of districts. The correct measure
is the mean of the districts' log mean incomes. In constructing his "preferred"estimates,
Essentially, one must include a metropolitan- Rothsteinchangesnumerousfamily background
level mean of any district-levelcovariatethat is variables(describedonly in his appendices).
included in the regression. If this is not done, Before considering these changes, let us
the district-level variables can be correlated rememberwhy the estimation included family
with metropolitanarea variables, and such cor- backgroundvariables at all. As emphasized in
relation produces a phenomenon sometimes Hoxby (2000, 1226-27), they were not included
described as "Tieboutbias." The bias is elimi- because their coefficients bore causal inter-
nated by including proper metropolitan-level pretations(they do not). Rather,such variables
aggregates. Substitutingthe incorrect variable were included because the NELS sample was
for the correct one generatesa misspecification designed to be nationally representativeonly.
error. It is hard to understandwhy Rothstein It is not representativeof individual metropoli-
introduces the error because he does not offer tan areas. Thus, family background variables
an argumentfor the substitution.This is despite are used to adjust the outcomes for sampling
2052 THEAMERICANECONOMICREVIEW DECEMBER2007

differences among metropolitan areas. For as explained by Jeffrey Grogger and Derek
example, suppose that poor and middle-income Neal (2000), only the base-year sampling used
districts are sampledin metropolitanareaA and by the NELS is representative.Its follow-up
that middle-income and affluent districts are sampling is very problematic for applications
sampled in metropolitan area B. Metropolitan related to school choice because the follow-
area B should not get "credit"for the better out- ups do not sample randomly, but systemically
comes associated with affluent, as opposed to drop students who attend a tenth-gradeschool
poor, students. Adjusting outcomes to account that relatively few of their former eighth-grade
for sampling differences is importantif we are (base-year) classmates attend.20For instance,
to comparemetropolitanareas A and B fairly. consider a public middle school that performs
Because my primary concern, in including poorly and thereby alienates the parents of its
family backgroundvariables, was making the eighth graders.In response, some parentsmight
sample maximally representative,it was impor- send their childrento private schools for grades
tant not to use measures of backgroundvari- nine throughtwelve, other parents might enroll
ables that had been constructedin such a way their children in magnet public schools for
that they would-by their inclusion-make the grades nine throughtwelve, and the remaining
sample less representative. parents might send their children to the public
Rothstein substitutes parental education high school thatis the default.The studentswho
informationfrom the parent survey for parallel attend the private and magnets schools have a
informationfrom the studentsurvey.This choice high probabilityof being droppedfromthe sam-
makes the sample less representative,largely ple, and the defaultpublic high school has a fair
because only a subset of parents responded to probabilityof being droppedtoo. Thus, the fol-
the survey. The NELS documentation says, low-up sampling is not only nonrepresentative,
"only the studentand school datasets constitute but is nonrepresentativein a way that is endog-
fully representativenational samples. While in enous to the performanceof schools in the base
various respects the parent dataset resembles year.21The "freshening"of the sample in the
a representativeor probability sample ... sev- follow-ups does not relieve, but instead aggra-
eral features of the NELS:88 parentcomponent vates, this situation by adding new students
departfrom the strict requirementsfor a proba- from schools that are already "over-sampled"
bility sample."'19The documentationalso points relative to, say, the schools in the examplejust
out that answers in the parent survey could be provided.22
biased depending on the identity of the parent
who responded to the survey (mother, father,
stepparent,and so on) and on family composi- 20 See United States Department of Education (1992,
tion (two parentsat home, divorcedparents,and 39).
21 For the reasons discussed in this and the paragraph
so on). Given that a parentaleducationvariable
above, the results shown for the base-year (eighth-grade)
was available in the student survey, that this tests are the most reliable. A reader may wish to focus on
variable generated no sampling problems or them ratherthan the results for the tenth-gradeand twelfth-
bias, and that parentaleducationdid not have a grade tests, which are less representative because of the
coefficient of interest but was merely added to manner in which NELS drops students. (The reweighting
of students from the base year to the follow-up years may
accountfor samplingdifferences,it makes sense help somewhatto counteractthe loss of representativeness,
to use parental education based on the student but the reweighting cannot do the job well. This is because
survey. the sort of student who remains in a school that students
Rothstein changes the construction of sev- tend to leave is not the sort of student who leaves a school
that studentstend to leave. Even if they are alike on observ-
eral family backgroundvariables-for instance, able characteristics,"leavers"and "stayers"must differ in
using family income from the second follow- their unobservablecharacteristics.)Despite the fact that the
up-in order to include students who were not eighth-graderesults are the most reliable, I show results for
part of the NELS base-year sample. However, all of the available grades in the spirit of transparency.
22 See United States Department of Education (1992,
40). Essentially, if a studentis already being sampled in the
follow-up, a hot-deckprocedurebased on his school's roster
19 See United States Department of Education (1990, is used to add classmates who were not attending eighth
1-2). grade in the United States in 1988, eitherbecause they were
VOL.97 NO. 5 PUBLICSCHOOLS:REPLY
HOXBY:COMPETITIONAMONG 2053

Changinga single familybackgroundvariable Rothstein merely reduces the coefficient to


affects the results only trivially. For instance, being marginally statistically significant (2007,
Table 7 (third row) shows that using parental Table 1, column 4).
educationdata from the parentsurvey produces
only slight differences relative to the top row.23 VI. A Note on the StandardErrors
The estimated coefficient for the eighth-grade
readingscore is 5.61 insteadof 6.64, for instance. Rothstein gives prominence to the fact that,
Similarly,using family income from the second in the replication dataset, I provide code for
follow-up (twelfth-grade)survey, as Rothstein Stata'srobustclusterstandardsbut not for Brent
does, producesonly slight differences.24See the R. Moulton (1986) standarderrorsas in Hoxby
fourthrow of Table7 which shows, for example, (2000). He then immediatelygoes on to say that
that the estimated coefficient for the eighth- the data from the replication dataset produce
gradereading score is 6.52 instead of 6.64. larger standarderrors. The implication is that,
somehow,the replicationdatasethas problematic
C. Summing Up the "Preferred"Specification standarderrors.Yet, Rothsteinis perfectly well
aware of the fact that it is because the replica-
To get his "preferred"specification,Rothstein tion datasetuses Stata'srobustclusteredstandard
makes numerouschanges to covariates that are errors that it produces larger standard errors.
hard to justify if scrutinized. If we take the He knows that the Moulton standarderrorsare
changes one at a time, each has little or no effect consistently smaller than Stata's robust clus-
on the results. However,Rothsteinmakes many tered standarderrors-the fact is demonstrated
changes to the sampleandto covariates,and it is throughouthis tables. Rothsteinis also perfectly
unknown how many other changes were tested well aware of why I used Stata's robust clus-
and rejected.Testing a large numberof changes terederrorsin the replicationdataset,despitethe
without much justification runs the danger of apparentloss of precision:I was concernedthat
either intentional or unintentional pretest bias the typical user would be unableto computethe
by leading an investigatorto find the specifica- Moulton standarderrorswith comfort since the
tion that best fits his priors. In any case, even computations involve some complicated deci-
with the large number of changes he made, sions. My reasoning is clearly documented in
the replication dataset. Rothstein even agrees
that Moulton standarderrors are beyond many
abroador were attending a grade other than eighth. Thus,
users-this agreementis buriedin his Appendix.
a school that is sampled may have its sample augmented Rothsteinlays out none of this relevantinforma-
by the freshening. If a student has already been dropped tion for readers. Instead, they are likely to be
from the follow-up because he attends, say, a private school left with the impressionthat there is something
that is not his eighth-gradeschool, not only does he have no
chance to be represented,but also his school is denied the wrong with the replicationdataset.
opportunity to add students in the freshening procedure.
Thus, the freshened sample aggravates the over-sampling VII. Conclusions
of certain schools and under-samplingof others.
23Following Rothstein, data from the parent survey are In each case whereRothsteinarguesthata rea-
used if available, and data from the studentsurvey are used
if parentsurvey data are missing. sonable change to the specifications estimated
24 The twelfth-grade family income variable has some in Hoxby (2000) would substantiallyaffect the
peculiar problems. By twelfth grade, students can be con- results, I have shown that the argumentis either
tributingto their family's income and such income is simul- incorrector has negligible effects on the results
taneously determined with their achievement. Also, there
is no accurate way to adjustthe family income of a twelfth
(the latter is usually shown in his Comment as
grader to make it comparablewith the income his family well). He is able to obtain substantiallydiffer-
had when he was an eighth grader. Yet, the two sources ent results only by introducingimportanterrors
of income must somehow be made comparable because such as the error-pronezip-code-backing-out
numerousNELS studentshave been droppedby the second method. Similarly, reasonable alterations to
follow-up, in accordance with the problematic sampling the instrumental variables for streams do not
procedure described. I adjust incomes for the change in
time between the base year and the second follow-up using change the results. The results change only
BLS computationsof mean family income. when Rothsteinintroducesvariablesthat do not
2054 THEAMERICANECONOMICREVIEW DECEMBER2007

measure what the instruments are intended to Hoxby,CarolineM. 2004. "CompetitionAmong


measure. Rothstein discusses various "errors" Public Schools:A Reply to Rothstein(2004)."
and "corrections"in the data, but all of these NationalBureauof Economic ResearchWork-
have been shown here to have no effect or only ing Paper11216,reissued2007.
a trivial effect on the results. Rothstein's own Moulton,BrentR. 1986. "RandomGroupEffects
tables show the same lack of effect. So, why and the Precision of Regression Estimates."
raise these points at all? Rothstein'sstyle of dis- Journal of Econometrics,32(3), 385-97.
cussion, which raises a long series of issues that Parkman,Aubrey. 1983. History of the Water-
he claims to be importantbut turn out not to be, ways of theAtlanticCoast of the UnitedStates.
is consistently misleading. Again and again, his National WaterwaysStudy commissioned by
Commentgives the impressionthatan important the United States Army Engineer. Washing-
errorhas occurredor an importantchangeneeds ton, DC.
to be made when, in fact, his own estimates Quality Education Data. 1988. National Edu-
show nothing of the kind. There is not a single cation DatabaseTM.Electronic data. Denver:
case, however, in which he alerts the reader to QualityEducationData.
the fact that an issue he has raised-sometimes Rothstein,Jesse.2003. "DoesCompetitionamong
at length-has negligible consequences or no Public Schools Benefit Students and Taxpay-
consequences at all. Instead, his discussion ers?Comment."Unpublished.
suggests that he has found a slew of errors and Rothstein,Jesse.2007. "Does Competitionamong
misjudgments in Hoxby (2000)-all of which Public Schools Benefit Students and Taxpay-
affect its results to some degree. What his dis- ers? Comment."AmericanEconomic Review,
cussion disguises is the sharpdiscontinuitythat 97(5):2026-2037.
really characterizeshis results. The reasonable Tiebout,Charles.1956. "APure Theoryof Local
changes that he suggests produce results that Expenditures."Journalof Political Economy,
differ trivially or not at all from Hoxby (2000). 64(5): 416-24.
Only when he introduces major errors does he United States Departmentof Commerce,Bureau
generate results that differ meaningfully. As of the Census,PopulationDivision.2005. His-
readers may confirm for themselves (since the toricalMetropolitanAreaDefinitions.Textfile
replication dataset is available), the results in releasedon the Internet.Revised March2005,
Hoxby (2000) can be replicated well and are http://www.census.gov/population/www/
robustto a wide arrayof reasonablechanges in estimates/pastmetro.html.
the variablesand sample. UnitedStates Departmentof Commerce,Bureau
of the Census. 1999. MetropolitanAreas and
REFERENCES Components,1990 with FIPS Codes. Textfile
released on the Internet.Revised April 1999,
Dataware.1999. GNISDigital Gazetteer.Reston, http://www.census.gov/population/estimates/
VA: UnitedStatesGeologicalSurveyfor Data- metro-city/90mfips.txt.
ware. United States Departmentof Commerce,Bureau
Gillis, Susan. 2004. Fort Lauderdale:The Ven- of the Census.2006. Race and Hispanic Ori-
ice of America. Mount Pleasant,SC: Arcadia gin of Householder-Families by Medianand
Publishing. Mean Income: 1947 to 2004, Detailed Table
Grogger,Jeffrey,and DerekNeal. 2000. "Further F-5. Text file released on the Internet,http://
Evidenceon the Effects of CatholicSecondary www.census.gov/hhes/www/income/histinc/
Schooling."In Brookings-Wharton: Papers on f05.html.
UrbanAffairs 2000, ed. William G. Gale and UnitedStatesDepartmentof Education,National
JanetRothenbergPack, 151-202. Washington, Centerfor EducationStatistics.1990.National
DC: The BrookingsInstitutionPress. Education LongitudinalStudy of 1988, Base
Hoxby, Caroline M. 2000. "Does Competition Year: Parent Component Data File User's
Among Public Schools Benefit Students and Manual. NCES publication2890-466.
Taxpayers?" American Economic Review, UnitedStatesDepartmentof Education,National
90(5): 1209-38. Centerfor EducationStatistics.1990.National
VOL.97 NO. 5 AMONGPUBLICSCHOOLS:REPLY
HOXBY:COMPETITION 2055

Education LongitudinalStudy of 1988, Base UnitedStatesDepartmentof Education,National


YearSample Design Report. NCES publica- Center for EducationStatistics.2002. School
tion 90-463. Locale Codes 1987-2000. NCES publication
UnitedStatesDepartmentof Education,National 2002-02, by Nancy Speicher.
Centerfor EducationStatistics.1992.National UnitedStatesDepartmentof Education,National
Education LongitudinalStudy of 1988, First Center for Education Statistics. 2004. The
Follow-Up: Student Component Data File Common Core of Data. 1987-88, 1989-90,
User's Manual, VolumeI. NCES publication and 1991-92 editions.Electronicdata.
92-030. UnitedStatesGeologicalSurvey.1993.1:24000Ser-
UnitedStates Departmentof Education,National ies....Quadrangle,[StateName].Maps (serial).
Centerfor EducationStatistics.1996.National UnitedStatesGeologicalSurvey.2006. Frequently
EducationLongitudinalStudyof 1988, NELS Asked Questions about GNIS [Geographic
88/94 Restricted-Accesscd-rom. Names InformationSystem],http://geonames.
UnitedStates Departmentof Education,National usgs.gov/faqs.html.
Centerfor EducationStatistics.2003. Private UnitedStatesGeologicalSurvey.2004. Geographic
School Universe Survey. 1989-90, 1991-92, Names InformationSystem, State Files. Elec-
1993-94, 1995-96 editions.Electronicdata. tronicfiles.

Das könnte Ihnen auch gefallen