Sie sind auf Seite 1von 228

Essays in political economy and public finance

Elliott Ash

Submitted in partial fulfillment of the requirements

for the degree of Doctor of Philosophy

in the Graduate School of Arts and Sciences

Columbia University

2016
© 2016

Elliott Ash

All rights reserved


ABSTRACT

Essays in political economy and public finance

Elliott Ash

This dissertation consists of three research articles in political economy and public finance.
The first chapter provides evidence on the effect of electoral institutions on the perfor-
mance of public officials. Using panel data on state supreme courts between 1947 and 1994,
we measure the effects of changes in judicial electoral processes on judge work quality –
as measured by citations by later judges. Judges selected by non-partisan elections write
higher-quality opinions than judges selected by partisan elections. Judges selected by tech-
nocratic merit commissions write higher-quality opinions than either partisan-elected judges
or non-partisan-elected judges. Election-year politics reduces judicial performance in both
partisan and non-partisan election systems. Giving stronger tenure to non-partisan-selected
judges improves performance, while giving stronger tenure to partisan-selected judges has
no effect. These results are consistent with the view that technocratic merit commissions
have better information about the quality of candidates than voters, and that political bias
can reduce the quality of elected officials.
The second chapter contributes to recent work in political economy and public finance
that focuses on how details of the tax code, rather than tax rates, are used to implement
redistributive fiscal policies. I use tools from natural language processing to construct a high-
dimensional representation of tax code changes from the text of 1.6 million statutes enacted
by state legislatures since 1963. A data-driven approach is taken to recover the effective tax
code – the set of legal phrases in tax law that have the largest impact on revenues, holding
major tax rates constant. Exogenous variation in tax legislation from judicial districts is used
to capture revenue impacts that are solely due to changes in the tax code language, with the
resulting phrases providing a robust out-of-sample predictor of tax collections. I then test
whether political parties differ in patterns of effective tax code changes when they control
state government. Relative to Republicans, Democrats use revenue-increasing language for
income taxes but use revenue-decreasing language for sales taxes – consistent with a more
redistributive fiscal policy – despite making no changes on average to statutory tax rates.
These results are consistent with the view that due to their relative salience, changing tax
rates is politically more difficult than changing the tax code.
The third chapter reports evidence on the potential benefits to local labor markets of
increasing property taxes as a source of local government revenue. The data come from
three states (308 tax districts, 16 years) where tax districts reassess properties on a state-
mandated staggered cycle, resulting in exogenous variation in assessments and accompanying
taxes. I find that an increase in taxes due to random assessment causes economic expansion,
with an increase in local population and the number of local business establishments. These
effects appear to be driven by increases in government revenues and expenditures, rather
than by changes in borrowing behavior. These results suggests that property taxes are
inefficiently low in this sample of states.
Contents

List of Tables vi

List of Figures ix

1 The Performance of Elected Officials: Evidence from State Courts 1


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Institutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Data Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.3 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.1 Merit Selection and Governor Appointment . . . . . . . . . . . . . . 12
1.3.2 Selection of Judges by Election . . . . . . . . . . . . . . . . . . . . . 13
1.3.3 Campaign Incentives . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Measuring Judge Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4.1 Performance Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4.2 Performance Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5 Effect of Being Up For Election . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5.1 Empirical Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

i
1.6 Effect of the Selection Process on Judge Quality . . . . . . . . . . . . . . . . 27
1.6.1 Empirical Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.7 Variation in Response to Incentives . . . . . . . . . . . . . . . . . . . . . . . 32
1.7.1 Effect of Judge Retention Process . . . . . . . . . . . . . . . . . . . . 32
1.7.1.1 Empirical Strategy . . . . . . . . . . . . . . . . . . . . . . . 32
1.7.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.7.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.7.2 Relative Election-Year Effect on Judges Selected by Different Processes 36
1.7.2.1 Empirical Strategy . . . . . . . . . . . . . . . . . . . . . . . 36
1.7.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.7.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2 The political economy of tax laws in the U.S. states 43


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.2 Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.3 Political economy of tax policy . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.3.1 Tax policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.3.2 Tax Politics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.4 Data on tax policy and state politics . . . . . . . . . . . . . . . . . . . . . . 57
2.4.1 Tax policy data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.4.2 State Politics Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.5 Tax Legislation Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.5.1 Raw Text Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.5.2 Processing Text Features . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.5.3 Extracting Tax Code Text Features . . . . . . . . . . . . . . . . . . . 65

ii
2.6 Constructing the effective tax code . . . . . . . . . . . . . . . . . . . . . . . 69
2.6.1 Ordinary Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.6.2 Instrumental Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.6.3 High-Dimensional IV Estimation . . . . . . . . . . . . . . . . . . . . 74
2.6.4 First Stage Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.6.5 Out-of-sample prediction of revenue with the effective tax code . . . . 79
2.6.6 Analysis of phrases that affect tax revenues . . . . . . . . . . . . . . 82
2.7 Effect of political control on tax policy . . . . . . . . . . . . . . . . . . . . . 87
2.7.1 Empirical strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
2.7.2 Effect of political control on tax revenues and tax rates . . . . . . . . 89
2.7.3 Tax code language associated with political control . . . . . . . . . . 91
2.8 The effective tax code and the politics of redistribution . . . . . . . . . . . . 93
2.8.1 Testing for the effect of political control on textually predicted tax
revenue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
2.8.2 Assessing the granularity of the redistributive consequences of tax code
language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
2.8.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
2.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

3 Property taxes and local labor markets: Evidence from staggered property
reassessments 106
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.2 Conceptual Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.2.1 Property Taxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.2.2 Firms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.2.3 Households . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.3 Data and Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.4 Empirical Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

iii
3.4.1 Staggered Tax Reassessment . . . . . . . . . . . . . . . . . . . . . . . 119
3.4.2 Instrumental Variables Framework . . . . . . . . . . . . . . . . . . . 127
3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
3.5.1 Effect of Property Tax Changes on Government Finances . . . . . . . 128
3.5.2 Effect of property taxes on local labor market . . . . . . . . . . . . . 130
3.5.3 Robustness checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

Bibliography 136

Appendix 164

A Ch. 1: The Performance of Elected Officials: Evidence from State Courts164


A.1 Model Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
A.1.1 Effect of bias and noise on judge quality . . . . . . . . . . . . . . . . 164
A.1.2 Effect of campaign incentives on effort . . . . . . . . . . . . . . . . . 165
A.2 Empirical Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
A.2.1 Notes on Institutional Reforms . . . . . . . . . . . . . . . . . . . . . 168
A.2.2 Additional Regression Results . . . . . . . . . . . . . . . . . . . . . . 169
A.2.3 Effect of Retention Process in Election Years . . . . . . . . . . . . . . 185

B Ch. 2: The political economy of tax laws in the U.S. states 193
B.1 Vector Representation of Tokens and Documents . . . . . . . . . . . . . . . . 193
B.2 Factor IV Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
B.3 Example phrases with related court cases . . . . . . . . . . . . . . . . . . . . 197
B.4 Decomposition of the party effect on tax revenues . . . . . . . . . . . . . . . 198
B.5 Substituting Phrases to Increase Tax Revenues . . . . . . . . . . . . . . . . . 200

iv
C Ch. 3: Property taxes and local labor markets: Evidence from staggered
property reassessments 204
C.1 Model Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
C.1.1 Firms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
C.1.2 Households . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
C.1.3 Feedback to the Labor Market . . . . . . . . . . . . . . . . . . . . . . 209
C.2 Tax changes with no revenue changes . . . . . . . . . . . . . . . . . . . . . . 209
C.2.1 Tax Spillovers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
C.2.2 Econometric Framework . . . . . . . . . . . . . . . . . . . . . . . . . 210
C.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

v
List of Tables

1.1 Judicial Selection and Retention Systems . . . . . . . . . . . . . . . . . . . . 6


1.2 Summary Statistics on Judge Characteristics by Selection System . . . . . . 9
1.3 Summary Statistics on Judge-Year Performance Variables . . . . . . . . . . . 18
1.4 Summary Correlations on Performance Indexes . . . . . . . . . . . . . . . . 23
1.5 Effect of Being Up For Election . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.6 Effect of Judicial Selection System on Judge Quality . . . . . . . . . . . . . 30
1.7 Effect of Changing the Retention System on Incumbent Judge Performance . 35
1.8 Relative Election-Year Effect On Judges Selected by Different Processes . . . 38
1.9 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.1 Summary Statistics on Tax Data . . . . . . . . . . . . . . . . . . . . . . . . 59


2.2 Summary Statistics on State Politics Data . . . . . . . . . . . . . . . . . . . 60
2.3 Most Similar Phrases to Revenue Source Labels . . . . . . . . . . . . . . . . 67
2.4 Phrases with a Significant 2SLS Effect on Tax Revenues . . . . . . . . . . . 84
2.5 Effect of Political Control on State Tax Policy . . . . . . . . . . . . . . . . . 90
2.6 Phrases with a Significant Relation to Political Party Control . . . . . . . . 92
2.7 Effect of Party Control on Text-Predicted Tax Revenue . . . . . . . . . . . . 96
2.8 Party Control and Text-Predicted Tax Revenue (Additional Specifications) . 98
2.9 Granularity of the Revenue-Politics Relation of Tax Code Language . . . . . 102

3.1 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

vi
3.2 State Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
3.3 Effect of Property Assessments on Collections . . . . . . . . . . . . . . . . . 125
3.4 Effect of property tax increase on government revenues . . . . . . . . . . . . 130
3.5 Effect of property tax increase in government expenditures . . . . . . . . . . 131
3.6 Effect of property tax on labor market outcomes . . . . . . . . . . . . . . . . 132
3.7 Effect of government expenditures on labor market . . . . . . . . . . . . . . 133
3.8 Effect of taxes on home sale price and borrowing . . . . . . . . . . . . . . . . 133

A.1 Summary Statistics on Judge-Year Performance Variables (Additional Out-


comes) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
A.2 Effect of Being Up For Election (Output and Effort) . . . . . . . . . . . . . 171
A.3 Effect of Being Up For Election (Quality and Impact) . . . . . . . . . . . . 172
A.4 Effect of Being Up For Election (Additional Outcomes) . . . . . . . . . . . . 174
A.5 Effect of Judicial Selection System on Judge Quality (Output and Effort) . . 175
A.6 Effect of Judicial Selection System on Judge Quality (Quality and Impact) . 176
A.7 Effect of Judicial Selection System on Judge Quality (Additional Outcomes) 178
A.8 Effect of Judicial Selection System on Judge Quality (All Years) . . . . . . . 179
A.9 Effect of Retention Process (Output and Effort) . . . . . . . . . . . . . . . . 180
A.10 Effect of Retention Process (Quality and Impact) . . . . . . . . . . . . . . . 181
A.11 Effect of Retention Process (Additional Outcomes) . . . . . . . . . . . . . . 182
A.12 Relative Election-Year Effect on Judges Selected by Different Processes (Out-
put and Effort) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
A.13 Relative Election-Year Effect on Judges Selected by Different Processes (Qual-
ity and Impact) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
A.14 Relative Election-Year Effect on Judges Selected by Different Processes (Ad-
ditional Outcomes) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
A.15 Effect of Retention Process in Election Years . . . . . . . . . . . . . . . . . . 188
A.16 Effect of Retention Process in Election Years (Additional Outcomes) . . . . . 189

vii
A.17 Effect of Partisan-to-Uncontested Retention Reform in Election Years . . . . 191
A.18 Relative Effect of Retention Process in Election Years . . . . . . . . . . . . . 192

B.1 Decomposition of Party Control Effects on Government Revenue . . . . . . . 199


B.2 Examples of Replaced Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . 202

C.1 First Stage: New Jersey Analysis . . . . . . . . . . . . . . . . . . . . . . . . 212


C.2 Effect of tax changes due to spillovers . . . . . . . . . . . . . . . . . . . . . . 213

viii
List of Figures

2.5.1 Scanned Session Laws and Resulting OCR . . . . . . . . . . . . . . . . . . . 62


2.6.1 Federal Circuit Court Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.6.2 Distribution of First-Stage F-Statistic . . . . . . . . . . . . . . . . . . . . . . 78
2.6.3 Instrument Phrases Have a Stronger Effect on Own Endogenous Phrase . . . 78
2.6.4 Out-of-Sample Tax Revenue Predictions . . . . . . . . . . . . . . . . . . . . 81
2.8.1 Dynamic Effect of Democrat Control on Text-Predicted Revenue . . . . . . . 100

3.4.1 Housing Value Trends by Reval Cohort . . . . . . . . . . . . . . . . . . . . . 121


3.4.2 Change in Property Assessments in Reval Years . . . . . . . . . . . . . . . . 122
3.4.3 Property Tax Collections Trend . . . . . . . . . . . . . . . . . . . . . . . . . 124
3.4.4 Reval Effects on Business and Residential Collections . . . . . . . . . . . . . 126
3.5.1 Reduced Form:Reval Cycle and Direct Expenditures . . . . . . . . . . . . . . 129

B.2.1Instrument Phrases Have a Stronger Effect on Own Endogenous Phrase . . . 197

ix
Acknowledgements

I am grateful to Bentley MacLeod and Suresh Naidu for their supervision of this dissertation.
I thank Wojciech Kopczuk, Massimo Morelli, and Brendan O’Flaherty for additional support
and feedback on these projects.
Many thanks to Yisehak Abraham, Ankeet Ball, Eli Ben-Michael, Josh Brown, Josh Bur-
ton, Matthew Buck, Eammonn Campbell, Matthew Chou, Lesley Cordero, Jesse Depaoli,
Seth Fromer, Gohar Harutyunyan, Archan Hazra, Montague Hung, Dong Hyeun, Mithun
Kamath, James Kim, Michael Kurish, Jennifer Kutsunai, Steven Lau, Sharon Liao, Sarah
MacDougall, Justin McNamee, Joao de Mello, Sourabh Mishra, Brendan Moore, Arielle
Napoli, Mallika Patkar, Bryn Paslawski, Olga Peshko, Aranya Ram, Daniel Reuter, Quin-
ton Robbins, Ricardo Rogriguez, Xiaofeng Shi, Carol Shou, Alex Swift, Raina Tian, Holly
Toczko, Tom Verderame, Anna Vladymyrska, Sam Waters, Sophie Wilkowske, Dustin Wil-
son, John Yang, Ding Yuan, Geoffrey Zee, Qing Zhang, Grace Zheng, Fred Zhu, and Jon
Zytnick for helpful research assistance.
Columbia University’s Program for Economic Research, the National Science Foundation
Graduate Research Fellowship Program, NSF Grant SES-1260875, NSF Grant SES-145932,
and the Lincoln Institute for Land Policy provide financial support for this research.

x
Dedication

This dissertation is dedicated to my wife. She kept me from floating out the window on a
word cloud.

xi
Chapter 1

The Performance of Elected Officials:


Evidence from State Courts

Elliott Ash and W. Bentley MacLeod

This paper provides evidence on the effect of electoral institutions on the performance of
public officials. Using panel data on state supreme courts between 1947 and 1994, we mea-
sure the effects of changes in judicial electoral processes on judge work quality – as mea-
sured by citations by later judges. Judges selected by non-partisan elections write higher-
quality opinions than judges selected by partisan elections. Judges selected by technocratic
merit commissions write higher-quality opinions than either partisan-elected judges or non-
partisan-elected judges. Election-year politics reduces judicial performance in both partisan
and non-partisan election systems. Giving stronger tenure to non-partisan-selected judges
improves performance, while giving stronger tenure to partisan-selected judges has no effect.
These results are consistent with the view that technocratic merit commissions have better
information about the quality of candidates than voters, and that political bias can reduce
the quality of elected officials.

1
1.1 Introduction

The goal of this paper is to contribute to our understanding of the labor market for elected
officials. As Epstein et al. [2013] observe for federal judges, the decision-making powers of
public officials can have large impacts upon our lives, yet their pecuniary rewards are by
design only weakly related to their performance. In consequence, a variety of concerns –
including career rewards [Ferejohn, 1986, Alesina and Tabellini, 2007, Dewatripont et al.,
1999], professionalism [Wilensky, 1964], and prosociality [Benabou and Tirole, 2006] – can
be decisive in determining the behavior of public officials. These motivations are in large part
intrinsic to the individual, meaning that the performance of public individuals is determined
in part by the type of person who is selected to serve in the public interest.
In this paper we exploit the fact that the method used to select and reappoint judges to
state appellate courts varies over time and across states. In contrast to the U.S. Supreme
Court, where justices have lifetime tenure, most U.S. states use one of three types of regular
review: partisan elections, in which judges are explicitly affiliated with a political party on
the ballot; non-partisan systems, where there is a vote, but party affiliation is not listed; and
finally, a merit system in which judges are nominated by a commission of experts – senior
attorneys and retired judges – and confirmed by the governor.
There is a lively debate regarding which is the superior system. The fact that states have
experimented with different systems illustrates that it is not clear which system is optimal.
There is a body of research that shows that the political affiliation of a judge at the margin
affects the decisions that they make [Huber and Gordon, 2004, Lim, 2013, Canes-Wrone et al.,
2014]. Yet regardless of party affiliation, judges are tasked with interpreting and applying
the law as written. In a common law system where judges follow their predecessors, the
quality of a decision can have a large impact on the evolution of legal rules. The goal of this
paper is to assess how variations in the appointment system affects the quality of judicial
decisions.
To address these questions we have created a large panel dataset consisting of 400,000

2
opinions written by more than 1500 judges for all fifty states for the years 1947 through
1994. With this data we are able to construct a large number of diagnostic performance
measures, for example the number of decisions written, the length of decisions, and how
often those decisions are cited by later judges. To make the results intuitive, we construct
five performance indexes from the individual performance variables. These include Total
Output, Effort Per Case, Discretionary Opinions, Case Quality, and Total Impact. These
indexes provide useful summaries of how judges change their behavior in response to changes
in electoral procedures.
This data is uniquely suited to study the impact of appointment systems upon the per-
formance of public officials. First, the job of judging has varied little over this time period,
meaning that we can more credibly assign variations in outcomes to variations in the indi-
vidual performance by judges. Second, there is great deal of experimentation over time by
states in the selection and retention processes.
This variation allows us to carry out a set of natural experiments to study how the quality
and performance of judges responds to electoral reforms. First, following the approach in Ash
and MacLeod [2015], we can explore within-judge changes of the electoral cycle. Specifically,
we compare the performance of a judge in a year in which he is up for election with years
in which he is not up for election. We find that in contested systems, election-year politics
takes away time from work. In an election year performance is reduced, in both partisan
and non-partisan elections. In uncontested systems where judges do not have a challenger,
there is no decrease in performance during election years.
In addition to the electoral cycle, we study the within-judge effect of reforming the
retention process. Moving from partisan to non-partisan elections reduces performance, while
moving from non-partisan to uncontested elections increases performance. However, there
is no effect on performance moving from partisan to uncontested elections. We demonstrate
that partisan-selected judges do not change their electoral behavior – even after the reform,
they reduce performance during election years.

3
A more challenging question is to measure the selection effects of the different electoral
processes. We do this by comparing the performance of judges on the same court, making
decisions in the same year, but selected under different systems. We add a number of controls,
and carry out some robustness checks, and find that compared to judges selected by voters,
there is consistent evidence that judges selected by a merit commission are better at their
jobs.
In a recent paper, Choi et al. [2010] explore a similar set of issues using data from
1998, 1999 and 2000. First, they find that the correlation between appointment systems
and measures of judge effort are quite unstable and sensitive to the control variables that
they use. In their most highly controlled specification, the partisan judges are estimated to
work harder (write more decisions) than judges selected under other systems. The results on
quality are more stable, but tend to be close to zero, with a judge selected under a partisan
system having a slightly negative effect upon quality.
These results are interesting for two reasons. First, they illustrate how the cross-section
can give a very different picture from estimates that are able to more tightly control for
judge characteristics. Second, as a practical matter, state legislators would not have access
to estimates such as ours, but would have to base their choice of appointment system upon
observations of their current system and how it compares to other states. From our perspec-
tive, if the cross-sectional estimates are unstable, this implies that the choice of appointment
system is more likely to be random, and hence our identification strategy is more likely yield
a measure of the causal effect of the change in appointment system.
To help with the interpretation of our results, we introduce a simple model of the appoint-
ment system based on Condorcet’s [1785] observation that elections are a way of aggregating
information. Specifically, we suppose that the representative voter gets a noisy signal of
judge ability. In such a model a merit plan can be viewed as a system in which the repre-
sentative voter (governor) receives a higher-quality signal of performance, and accordingly
the expected ability of the selected judge is higher than under a system that relies upon the

4
public’s impression of a judge.
Partisan elections can be distinguished from non-partisan elections by supposing that the
representative voter prefers a judge from her preferred political party. This is modeled by
adding a bias b in favor of the voter’s party. As the bias increases, the expected ability falls
and eventually approaches the expected ability that a one-candidate election would produce.
We find that this simple model is broadly consistent with the evidence on state supreme
court judges. This evidence is more broadly consistent with the early rational-choice ap-
proaches of Downs [1957] and Ferejohn [1986], in which voters use their information to make
the best decisions they can, conditional upon their policy preferences. But more information
is not always better; more information on candidate quality can improve performance [see
Pande, 2011], but more information on political affiliation can reduce performance.
The rest of the paper is organized as follows. Section 1.2 provides an institutional back-
ground on state supreme court selection and retention. Section 3 introduces a model of the
selection and incentive effects of judicial elections. Section 4 discusses the issue of measuring
judge performance. Sections 5, 6, and 7 report the results, respectively. Section 8 provides
a concluding discussion.

1.2 Background

This section provides relevant background for the theoretical and empirical analysis. First,
Subsection 1.2.1 describes the electoral institutions that provide our treatment variation.
Subsection 1.2.2 provides an overview of our data sources. Subsection 1.2.3 describes some
related literature.

1.2.1 Institutions

Our institutional setting is the set of state supreme courts, also known as state courts of
last resort. As described in greater detail in Ash and MacLeod [2015], these courts serve as

5
Table 1.1: Judicial Selection and Retention Systems
State (Years) Selection Retention State (Years) Selection Retention

Alaska Merit Uncontested Mississippi Partisan Partisan


Alabama Partisan Partisan Montana Non-Partisan Non-Partisan
Arkansas Partisan Partisan North Carolina Partisan Partisan
Arizona (-1974) Non-Partisan Non-Partisan North Dakota Non-Partisan Non-Partisan
Arizona (1975-) Merit Uncontested Nebraska (-1962) Partisan Partisan
Colorado (-1966) Partisan Partisan Nebraska (1963-) Merit Uncontested
Colorado (1967-) Merit Uncontested New Mexico (-1988) Partisan Partisan
Florida (-1971) Partisan Partisan New Mexico (1989-) Partisan Uncontested
Florida (1972-1976) Non-Partisan Non-Partisan Nevada Non-Partisan Non-Partisan
Florida (1977-) Merit Uncontested New York (1978-) Partisan Partisan
Georgia (-1984) Partisan Partisan Ohio Partisan Non-Partisan
Georgia (1985-) Non-Partisan Non-Partisan Oklahoma (-1967) Partisan Partisan
Iowa (-1962) Partisan Partisan Oklahoma (1968-) Merit Uncontested
Iowa (1963-) Merit Uncontested Oregon Non-Partisan Non-Partisan
Idaho Non-Partisan Non-Partisan Pennsylvania (1969-) Partisan Uncontested
Illinois (-1964) Partisan Partisan South Dakota (-1980) Non-Partisan Non-Partisan
Illinois (1965-) Partisan Uncontested South Dakota (1981-) Merit Uncontested
Indiana (-1970) Partisan Partisan Tennessee (-1971) Partisan Partisan
Indiana (1971-) Merit Uncontested Tennessee (1972-1977) Merit Uncontested
Kansas (-1958) Partisan Partisan Tennessee (1978-) Partisan Partisan
Kansas (1959-) Merit Uncontested Texas Partisan Partisan
Kentucky (-1975) Partisan Partisan Utah (-1951) Partisan Partisan
Kentucky (1976-) Non-Partisan Non-Partisan Utah (1952-1985) Non-Partisan Non-Partisan
Louisiana Partisan Partisan Utah (1986-) Merit Uncontested
Maryland (-1976) Non-Partisan Non-Partisan Washington Non-Partisan Non-Partisan
Maryland (1977-) Merit Uncontested Wisconsin Non-Partisan Non-Partisan
Michigan Partisan Non-Partisan West Virginia Partisan Partisan
Minnesota Non-Partisan Non-Partisan Wyoming (-1972) Non-Partisan Non-Partisan
Missouri Merit Uncontested Wyoming (1973-) Merit Uncontested

Notes. This table lists the elections systems for state supreme court judges observed in our data. Election-
system reforms indicated by cell borders.

the state judiciary’s analogue to the U.S. Supreme Court, where judges review state court
cases rather than federal court cases. In each case, a judge writes an opinion explaining the
decision. The job of a supreme court judge does not change much over the course of the
career, and it does not vary across states.
While the work tasks are the same, the rules for selecting and retaining appellate judges
vary across states and over time. These rules are listed in Table 1, with rule changes indicated
by cell borders. These changes are used in our empirical section to identify the incentive and
selection effects of changing electoral systems.
We study three major regimes for selecting and retaining appellate judges. There is a

6
large literature in political science and political economy examining how these systems affect
voter behavior and the politics of judicial decision-making [e.g. Shepherd, 2009, Canes-Wrone
et al., 2010, Lim and Snyder, 2015]. There is also a separate legal scholarship discussing the
implications of these systems for legal rulemaking [e.g. Pozen, 2010]. For a discussion of the
political motivations behind reforms to these regimes see Hanssen [2004].
The first system, partisan elections, is used for both selection of new judges and retention
of incumbent judges. For these elections, judges are members of a political party, Republican
or Democrat. They must win a primary election for their party before running in a general
election, where their political affiliation is labeled on the ballot.1 Incumbent judges rarely
face a credible challenge in the primary, but in the general election they usually face a
challenger from the opposition political party.
Second, non-partisan elections are also used for both selection and retention. In this
system there are competitive elections, but there are no primaries and party affiliations are
not on the ballot. There are generally two candidates, an incumbent and a challenger, but
the incumbent is not identified as such.
The third major system is merit selection with uncontested retention elections, also known
as the Missouri Plan. In this system, judges are nominated by a commission of experts –
senior attorneys and retired judges – and confirmed by the governor. Incumbent judges
face an up-or-down retention vote with no challenger. This system is designed to be more
meritocratic, and to impose weaker political incentives, than electoral selection. In a fourth
hybrid system, judges are initially selected through partisan elections but thereafter face
uncontested retention elections.
These institutions provide the variation in selection and incentives that we study in the
empirical analysis. In the next section we formally analyze the key differences between these
procedures.
1
Ohio and Michigan state judicial elections are difficult to classify within the partisan/non-partisan
dichotomy because they have partisan primaries and nomination processes, but the political party is not on
the ballot in general elections. Following Nelson et al. [2013], we classify these states as partisan elections.
However, coding them as non-partisan, or leaving them out of the analysis, does not change our results.

7
1.2.2 Data Overview

The dataset used for the empirical analysis is an extension of that used in Ash and MacLeod
[2015]. It merges information on judge biographies, state-level court institutions, and pub-
lished judicial opinions. These data allow panel estimates on the effects of court institutions
on judge performance.
We have biographical data on almost all the judges working at state supreme courts
between 1900 and today. Table 1.2 reports summary statistics on the characteristics of
judges working in one of the three selection systems discussed in Section 1.2.1. For many of
the variables, the systems are comparable. Relative to the partisan judges, the non-partisan
and merit judges are more likely to be female. Merit judges are the most likely to have
judicial experience, while partisan judges are the most likely to have political experience.
Non-partisan and merit judges have longer career lengths. Merit judges are the least likely
to lose re-election.
Our performance measures were constructed from published state supreme court opinions
for the years 1947 through 1994, obtained (along with some annotated metadata) from
bloomberglaw.com. The full sample includes 1,025,461 cases. Because we are interested
in studying the behavior of individual judges, we drop opinions that do not have a named
author (per curiam decisions). We also drop cases that are less than seven sentences in
length – these are summary orders such as cert denials. The restricted sample includes
387,905 majority opinions (plus attached discretionary opinions), about 25 cases per judge
per year on average.

1.2.3 Literature

As previously mentioned, Choi et al. [2010] find in the cross section that elected judges write
more opinions but merit-selected judges write more highly cited opinions. Other work in this
vein includes Hall and Bonneau [2006], who find that a judge’s qualifications – experience,
salary, and other observable characteristics – increase the chance of being reelected.

8
Table 1.2: Summary Statistics on Judge Characteristics by Selection System
Partisan Elections Non-Partisan Elections Merit Selection
Mean Std. Dev. Mean Std. Dev. Mean Std. Dev.

Background
Start Age 53.6969 8.8354 52.8235 8.3454 52.1143 7.8792
Female 0.0305 0.1721 0.0663 0.2491 0.0616 0.2410
Top School 0.0973 0.2966 0.1040 0.3058 0.1081 0.3114

Previous Experience
Private Practice 0.6862 0.4644 0.8141 0.3897 0.6645 0.4737
Judiciary 0.6082 0.4886 0.5630 0.4969 0.6818 0.4673
Politics 0.2818 0.4503 0.2269 0.4197 0.1118 0.3162
Academia 0.0879 0.2834 0.1076 0.3105 0.1060 0.3088

Partisan Affiliation

9
Republican 0.5517 0.4675
Democrat 0.4483 0.4412

Career Length 11.8401 8.6620 13.6512 9.6248 13.2120 7.0797

How Ended
Retired 0.3293 0.4703 0.3896 0.4883 0.3825 0.4871
Resigned 0.1111 0.3145 0.2098 0.4077 0.0553 0.2291
Died in Office 0.1070 0.3094 0.1144 0.3188 0.0553 0.2291
Lost Election 0.0650 0.2468 0.0409 0.1983 0.0138 0.1170
Impeached 0.0054 0.0735 0.0000 0.0000 0.0046 0.0679

Judges 738 367 217


Notes. Biographical information by judge election system. Observation is a judge. Start Age is judge age upon joining the court. Female is a
dummy for being female. Top School means the judge attended law school at Yale, Harvard, Columbia, Stanford, or Chicago. The Previous Expe-
rience items equal one if the judge has previous experience in the respective area. Republican is a dummy for being Republican, Democrat for be-
ing Democrat. Career Length is number of years working on the court, conditional on having left the court before 2014. The How Ended items
equal one if the judge's state supreme court judgeship ended for this reason.
Lim and Snyder [2015] provide especially useful evidence in our context. They find that
bar association evaluations of judge candidate quality have a large effect on voting and
electoral success in non-partisan elections, reflecting that voters care about judge quality. In
partisan elections, however, the bar association evaluations have no effect on voter choices
– the information on quality is crowded out by the information on political affiliation. In
uncontested elections, the bar association evaluation correlates with voting but does not
affect electoral success because virtually all incumbents are retained.
The literature on elections and judge quality is part of a much larger literature examining
the effects of judge elections on the content of judicial rulings. For example, a range of
papers have shown that judges impose harsher criminal sentences in response to stronger
electoral pressure [Huber and Gordon, 2004, Gordon and Huber, 2007, Lim, 2013, Berdejo
and Yuchtman, 2013, Iaryczower et al., 2013, Park, 2014]. More generally, previous papers
have demonstrated that the politics of selection matter for the ideology of the selected
judges [Landes and Posner, 2009, Epstein et al., 2013], and that incumbent judges respond
to changes in the political preferences of the body responsible for retaining them [Shepherd,
2009, Canes-Wrone et al., 2010].
More broadly, our results add to the emerging empirical literature in political economy
on how to design the institutions for selecting and rewarding public officials. These papers
include Besley and Case [1995a], Besley and Coate [2003], List and Sturm [2006], Besley et al.
[2010], and Ash et al. [2015a]. Our focus on the information available to voters is relevant
to the literature on transparency, which includes Snyder and Stromberg [2010], Ferraz and
Finan [2011], and Pande [2011].
Deserving special mention are the models in Alesina and Tabellini [2007, 2008], analyzing
the differences in incentives for elected politicians versus tenured bureaucrats. One can view
the variation in the way state judges are (re-)appointed as a natural test bed for these
ideas. Judges selected and retained by partisan or non-partisan elections can be treated
as “politicians,” while judges selected by merit commissions and given strong tenure can

10
be treated as “bureaucrats.” Our evidence that merit-selected judges produce more highly
cited decisions is consistent with the hypothesis that in the case of appellate court judges,
individuals selected to be good “bureaucrats” perform as well as or better than elected
politicians.

1.3 Model

In this section we introduce a model based upon Condorcet’s [1785] jury theorem that views
voting as an information revelation problem.2 The model provides a simple framework that
is sufficiently rich to make clear predictions for the cases we consider.3 It is assumed that
each voter has a noisy measure of judge quality that is used to make their decisions. In
addition they care about the political views of judges, which is modeled as a bias in favor of
judges from their preferred party.
More precisely, suppose that there is an opening for a judge from which there are two
candidates, A and B. One of these could be an incumbent, but we abstract from this
and suppose that each judge j has a quality level qj drawn from a normal distribution:
qj ∼ N (0, 1) , j ∈ {A, B}. It is assumed that these draws are uncorrelated, though different
jurisdictions may have different distributions. The socially desirable outcome is to choose
the most able judge, though a judge’s political views may bias this decision.
The remaining subsections analyze how differences in information on judge candidates
may influence the expected quality qj of the judge selected, as well as the judge’s performance
once he is in office. Subsection 3.1 introduces a merit selection baseline where the better
judge is always selected. Subsection 3.2 considers the consequences of electoral selection,
where voters do not have perfect information, and may be biased by politics. Subsection 3.3
looks at the effects on an incumbent judge of electoral campaign demands.
2
see Young [1988] for a discussion
3
See Ashworth and de Mesquita [2008] and Ashworth et al. [2015] for more sophisticated versions of this
class of models.

11
1.3.1 Merit Selection and Governor Appointment

The salient feature of merit selection is that there is a committee that looks carefully at each
potential candidate. We model this by supposing that qj is observable to members of the
commission. The merit commission is assumed to be able to communicate its finding clearly
to the governor, who in turn will select the more able candidate. Thus, the expected quality
of a judge under an appointment system is the first order statistic:

1
q̄ M = E {max {qA , qB }} = √ > 0.
π

If the expected ability of a randomly chosen candidate is 0, then selecting the better one
from a pool of only two judges results in positive expected quality. Increasing the size of the
pool would simply increase the expected quality of the appointed judge; it is the same logic
as Condorcet’s [1785] jury theorem.
We can compare this to an appointment system where political bias enters. As a matter
of convention we suppose that the governor (and later the representative voter) prefers Judge
A. We can model this as a bias b and suppose that Judge A is chosen if and only if:

qA + b ≥ qB . (1.3.1)

Let I (qA, qB , b) = 1 if (1.3.1) and zero otherwise. Let

q̄ G (b) = E {qA I (qA, qB , b) + (1 − I (qA, qB , b)) qB } . (1.3.2)

In the appendix we show:

Proposition 1. The average quality of judges chosen under an unbiased merit panel is
higher than that under governor appointment with bias: q̄ M = q̄ G (0) > q̄ G (b) , b 6= 0. The
difference in quality rises as the level of political bias increases: q̄ G (b) is strictly decreasing
in b.

12
This rather intuitive result illustrates the cost associated with bias. In the absence of
any bias the best candidate is chosen. However, preference for one or the other candidate
can lead to the less able individual being chosen in some cases.

1.3.2 Selection of Judges by Election

Next we consider the effect on quality of selecting judges by election. This is modeled by
supposing that the quality of information held by the electorate is lower than that of the
merit panel. Suppose that the representative voter gets a signal of judge j 0 s quality:

sj = qj + j

where j is normally distributed with mean zero and variance σj2 . The precision is defined by
ρj = 1/σj2 . The representative voter observes the two signals and then assesses the relative
quality of the judges.
We distinguish partisan and non-partisan electoral systems by introducing bias b. As a
matter of convention suppose that judge A comes from the same party as the representative
voter, where b represents the voter’s utility weight on partisan affiliation. In a non-partisan
system b = 0, while a partisan system is characterized by b > 0.
After observing sj , the voter’s posterior distribution on qj is normal with mean

E {qj |sj } = πj sj

ρj
and precision 1 + ρj , where πj = 1+ρj
is the weight assigned to sj .The representative voter
selects Judge A if and only if
πA s A + b ≥ πB s B .

As the bias in favor of a judge from the same party increases, the probability that Judge A is
selected increases. This can be understood as reducing the competitiveness of the election.

13
The expected quality of a judge selected under an electoral system with bias b is defined by:

q̄ E (b) = E {qA I (πA sA , πB sB , b) + qB (1 − I (πA sA , πB sB , b))} . (1.3.3)

In the appendix we show:

Proposition 2. When voters do not perfectly observe judge quality, the average quality of
elected judges is lower than that of merit-selected judges:

q M ≥ q̄ G (b) > q̄ E (b) .

Average judge quality falls with the strength of political bias, and therefore quality with par-
tisan elections is lower than that with non-partisan elections: q̄ E (b) falls with b.

As in the previous case, bias reduces the effectiveness of the electoral system.

1.3.3 Campaign Incentives

We now build upon the previous framework to analyze the incentives for a judge seeking re-
election. The most direct way to introduce campaign effort is to suppose that effort enhances
the quality of the signal observed by voters.
We formalize this idea as follows. We suppose that the individuals have a normal level
of effort for their work, given by ȳA and ȳB for the incumbent A and the challenger B,
respectively. In an election year the individuals divert effort to election-year politics. While
B is a challenger and is not sitting on the court, for simplicity we assume he faces the same
decision problem as the incumbent A. This approximates the situation where B is a judge
on another court – a federal court for example, or the state’s intermediate appellate court.
Thus in an election year it is assumed that the individuals supply yA and yB to their

14
jobs, resulting in election year effort:

eA = ȳA − yA ≥ 0,

eB = ȳB − yB ≥ 0.

The consequence is that the representative voter chooses judge A over judge B if and only if

πA (sA + eA ) + b ≥ πB (sB + eB ).

The probability of A winning is:

pA (eA , eB |qA , qB ) = E {I (πA (sA + eA ), πB (sB + eB ), b) |qA , qB } .

Correspondingly, define pB (eA , eB |qA , qB ) = 1 − pA (eA , eB |qA , qB ).


We suppose that candidate j has preferences:

Uj = Bpj (eA , eB |qA , qB ) − C (ej ) ,

where B is the intrinsic value from winning the election and Cj (ej ) = Cj (ȳj − yj ) is the utility
cost of campaign effort. The campaigning cost C (e) is assumed to be twice differentiable in
e and satisfies Cj (0) = Cj0 (0) = 0, Cj00 > 0. This guarantees an interior solution.
Let us suppose that A is a sitting judge, while B is a potential challenger. In our data we
can observe the output of judges, and hence both ȳA , the output before an election year, and
yA , the output in an election year, are observable. Consider first an uncontested elections
(the “Missouri Plan”), in which judges do not face a challenger. This can be understood
in the model notation as eB = 0; the challenger sets zero campaign effort. The incumbent

15
judge A sets eA accordingly.4
Next, we consider the equilibrium when there is an active challenger (details in the
appendix). If we suppose that ρA = ρB , the problem is symmetric and we have eA = eB .
The first-order conditions for effort in this case are given by:

r r  
ρ ρ 1+ρ
Cj0 (ej ) = φ (qA − qB ) + b √ . (1.3.4)
2 2 2ρ

where φ(·) is the standard normal pdf. Since φ (x) achieves its maximum value at x = 0, we
see that effort is highest when:

 
1+ρ
(qA − qB ) + b = 0. (1.3.5)
ρ

These observations can be summarized as follows:

Proposition 3. When voters have the same quality of information regarding candidates,
the candidates choose the same level of campaign effort. Moreover, the amount of effort
is highest in the most competitive races - when (1.3.5) is small. In particular, campaign
effort decreases with the bias b. This means that campaigns reduce judging effort more under
non-partisan elections than under partisan elections.

In the appendix we prove that an equilibrium to the campaign effort game exists and
that the effort of Judge A is greater than candidate B if and only if the electorate has a
better measure of Judge A’s quality.
This proposition has the following implications in our data. First, uncontested elections
are the least competitive and have the weakest electoral incentives. Among the electoral
systems, they should have a smaller effect on judging effort than partisan elections or non-
partisan elections. Second, if non-partisan elections have less bias, then they are more
4
The only caveat is for judges who feel they may not get re-elected for whatever reason (for example, bad
press from a high-profile case). Thus, there may be some judges who do exert effort, in which case eA may
be positive. There is never any reason to observe a negative effort level.

16
competitive than partisan elections. Therefore non-partisan elections should have a larger
negative effect on judging effort than partisan elections.

1.4 Measuring Judge Performance

In this section we discuss the problem of measuring judicial performance. We have a large
number of performance variables that could each be used to assess judge performance, making
interpretation of the results difficult. Looking at the separate treatment effects on all of
these outcomes would present a multiple-comparisons problem.5 We resolve this issue by
aggregating the variables into a set of five performance indexes designed to summarize the
effects of the treatments on the work components of judging.
The set of performance variables, along with justifications of how they were divided into
indexes, is discussed in Subsection 1.4.1. The formal definitions of the indexes are described
in 1.4.2.

1.4.1 Performance Variables

The set of performance variables are listed by index in Table 1.3.6 The table also reports the
mean and standard deviation, where the data are constructed at the judge-year level. The
right-most column (ML Factor Scores) will be discussed further in Subsection 1.4.2. Before
indexes are constructed, all the metrics are transformed using the inverse hyperbolic sine.7
The first set of variables constitute a Case Output Index. At the state supreme court
level, if judges accept more cases for review they are taking on more work. An additional
measure of total work is the total number of words written, and total number of sentences
written, in majority opinions. Similarly, the total amount of caselaw research performed –
5
We report the effects of treatments on these individual measures in Appendix A.2.2.
6
See Ash and MacLeod [2015] for√a detailed discussion of these variables.
7
Defined as sinh−1 (x) = log(x + 1 + x2 ), and used instead of the log transformation to allow for zeros
in the data [Burbidge et al., 1988]. Our results are robust to using levels or logs of the dependent variable.
The adjusted R2 is usually higher in the IHS or log specification than in levels.

17
Table 1.3: Summary Statistics on Judge-Year Performance Variables
Outcome Variable Mean Std. Dev. ML Factor Scores

Case Output Index

Majority Opinions Written 25.25 16.46 0.0564


Total Words in Majority Opinions 55235.35 33630.15 0.64594
Total Sentences in Majority Opinions 2849.436 1937.005 0.21498
Previous Cases Cited in Majority Opinions 510.24 387.92 0.09826

Effort Per Case Index

Words Per Majority Opinion 2453.38 1348.8 0.68332


Sentences Per Majority Opinion 129.8098 98.83284 0.22265
Previous Cases Cited Per Majority Opinion 22.62 16.92 0.11854

Discretionary Opinions Index

Discretionary Opinions Written 6.15 9.1 0.20884


Total Words in Discretionary Opinions 8034.01 15360.4 0.35855
Previous Cases Cited in Discretionary Opinions 86.17 179.71 0.45563

Case Quality Index

Positive Cites Per Opinion 13.03 12.86 0.27311


Distinguishing Cites Per Opinion 2.14 2.74 0.07903
Discuss Cites Per Opinion 2.96 2.75 0.29684
Quoted Cites Per Opinion 3.3 4.22 0.35495
Out-of-State Cites Per Opinion 1.81 2.45 0.06997

Total Impact Index

Total Positive Cites 291.31 275.64 0.2971


Total Distinguishing Cites 44.92 55.54 0.04631
Total Discuss Cites 65.35 54.15 0.35981
Total Quoted Cites 70.12 72.73 0.26187
Total Out-of-State Cites 43.1 79.49 0.06529

Notes. Observation is a judge-year, N=16,084. These statistics are constructed from each judge's yearly
output of cases. “Per Opinion” measures are divided by the number of majority opinions written that
year. See variable definitions in the accompanying text.

18
as measured by the number of previous cases cited a in a judge’s opinions – is included.
The second index is Effort Per Case. This includes two basic opinion length measures –
the average number of words, and average number of sentences, per majority opinion. We
also have a measure of the amount of research a judge engages in – the Previous Cases Cited
measure gives the number of previous authorities cited in her opinions. We include both the
number of sentences and the number of words partly because it is unclear a priori which
is a better measure of language output. It also solves the problem that three measures are
needed to construct ML Factors, as described in Subsection 1.4.2.
The third index, Discretionary Opinions, includes variables related to effort on discre-
tionary opinions. Whether to write a discretionary opinion—a concurrence or a dissent—is
up to the judge’s discretion and involves willingly taking on more work. Further, the number
of words and number of previous cases cited in those opinions are components of the time
spent on discretionary opinions. In previous versions of the paper, discretionary opinions
were merged in with majority opinions, but the covariance matrix for performance suggests
that they are separate factors in judicial decision making.
Fourth, we look at the Case Quality Index. To measure the quality of decision-making, we
use the number of citations to a judge’s opinions by other judges. In our data, Bloomberg Law
staff attorneys have categorized citations as positive, distinguishing, or negative. A positive
cite is a clear signal that a decision is found useful by a future judge. A distinguishing
cite means that part of the ruling is useful, but needs to be clarified – so this is perhaps
a weaker signal of opinion quality. In the set of positive citations, we also use information
about whether a case is discussed by the future court (rather than cited without comment)
and whether it is directly quoted by the citing court. These measures can be understood
as more direct signals that the citing court finds the opinion useful. The Out-of-State Cites
measure includes positive cites from out-of-state courts; as noted by Choi et al. [2010] among
others, this is perhaps the best measure because the cited case serves as persuasive rather
than binding precedent. Note that, while these citations provide a good signal of expert

19
evaluation, they may or may not reflect voter evaluation or what decision is best for social
welfare.
Fifth and finally, the Total Impact Index is a combined quality and quantity measure.
It gives the total number of positive, distinguishing, discussion, quoted-in, and out-of-state
cites to a judge’s work in a year. This serves to complement Case Output as a measure of
quantity, and Case Quality as a measure of quality.
In the results sections, the main text reports the effects of our treatments on the per-
formance indexes. In the appendix we report the effects on the individual variables listed,
as well as a larger set of performance variables not listed here. We break out the effects
on concurrences and dissents, for example, and show negative cites and number of cases
overruled. See Appendix A.2.2 for details.

1.4.2 Performance Indexes

We implemented two methods for constructing performance indexes. These include the Z-
score Index and the Maximum Likelihood Factor. Both of these indexes are used in the
regressions reported in the results sections.
First, the Z-score Index refers to the standard aggregation method used in O’Brien [1984],
Kling et al. [2007], and Deming [2009]. For this index, each of the performance variables
is residualized on a state and year fixed effect. Then these residuals are standardized by
dividing by the standard deviation. The index is constructed from the average of these
standardized variables for each judge-year observation.
The second index, Maximum Likelihood Factor, uses factor analysis and is based on Rao
[1955] and Akaike [1987]. Defining this measure provides an intuitive way of understanding
judge quality and its impact on observed output.
Let k ∈ {1, 2, 3, 4, 5} index the set of factors underlying judge performance. In our case
those include case quality, effort per case, etc. Let i ∈ {1, 2, ..., mk } index the observed
measures of factor k, for example the four variables representing Case Output in Table 1.3.

20
k
Let zijt represent the observed level of performance measure i in factor k of judge j in year
t, after being residualized and standardized as done for the Z-score Index. Therefore each
performance measure has zero mean and variance 1. This is natural, as we do not have an
absolute scale for judge performance and are interested in changes rather than levels.
k
In our model we suppose that rather than choosing the individual measure zijt , the judge
k k k
chooses the factor, yjt . The factor yjt is related to measure zijt by:

k
zijt k
= αik × yjt + kijt .

k k
Given that zijt is standardized for all i and k, factor analysis begins by supposing that yjt is
normally distributed with mean zero and unit variance over the whole population.
Now we apply the results from Rao [1955]. If the number of indexes is greater than or
k
equal to three (mk ≥ 3), then we can estimate αik , the loading for measure zijt k
on factor yjt ,
as follows. Given that the variances are all normalized to be 1, then αik ∈ (−1, 1). However,
if our interpretation is correct, then each component is positively correlated with zijt , hence
we should find αik ≥ 0. This is indeed the case, providing further evidence in support of our
interpretation.
In general, factor analysis allows for several unobserved factors per group of observed
measures. But in our case the natural focus is a single factor model, where all of the
measures in a group are driven by a single factor. We found that a full set of 20 performance
measures is well-explained by a five-factor model, using the standard information criterion.
This is consistent with our interpretation of one factor per group, and allows a more natural
interpretation for each factor.
Now that αi is estimated, notice that

k
zijt kijt
xkijt k
≡ k = y jt + k (1.4.1)
αi αi

k
is an unbiased estimate of the factor yjt . We can compute the empirical covariance of

21

~xkjt = xk1jt , xk2jt , ..., xkmk jt for the full sample, denoted by Σx . By construction the diagonal
2
elements will be 1/αik > 1. Next let ~y kjt be a vector of scalars, all equaling y kjt , with length
mk (the number of performance measures in group k). The covariance of ~y kjt is JJ T , where
J T is a vector of ones with length mk and JJ T is an mk × mk matrix of ones. Finally, let

Σ = Σx − JJ T

kijt
be the covariance matrix for the vector of error terms αki
.
Next we form the predicted factor ~y kjt . Since xkijt is an unbiased estimate of the factor
y kjt , it follows from O’Brien [1984, p.1082] that the best estimate of the factor is:

J T Σ−1~xkjt
~y kjt = . (1.4.2)
(J T Σ−1 J)

−1
This prediction has variance J T Σ−1 J .
We repeat this process for each of the five groups of measures to produce the five ML
Factor indexes used in the regression analysis. The estimated weight αik for each measure
k
is given in the right-most column of Table 1.3. By construction, the mean of ŷjt is zero.
k
The correlation matrices for each factor yjt with all the other factors are given in Table
1.4. The table reports the population correlations as well as the means of the within-judge
correlations. Judges who work hard overall, will also work hard per case, and thus we observe
correlation between factors.

1.5 Effect of Being Up For Election

This section examines how judges change their behavior over time in response to the election
cycle. Ash and MacLeod [2015] show that contested elections reduce performance. We add
to that analysis by distinguishing between partisan and non-partisan elections. In theory, if
judges wish to be re-elected then they should put effort into election year politics, as implied

22
Table 1.4: Summary Correlations on Performance Indexes
Population Correlations

Case Output Case Effort Discretionaries Case Quality Total Impact


Z-score Indexes
Case Output 1
Case Effort 0.2525 1
Discretionaries 0.3574 0.1498 1
Case Quality 0.1595 0.5635 0.0815 1
Total Impact 0.8078 0.2154 0.2828 0.5664 1

Maximum Likelihood Factors


Case Output 1
Case Effort 0.3402 1
Discretionaries 0.3406 0.1224 1
Case Quality 0.2044 0.5448 0.0893 1
Total Impact 0.8027 0.187 0.2786 0.5535 1

Mean Within-Judge Correlations

Case Output Case Effort Discretionaries Case Quality Total Impact


Z-score Indexes
Case Output 1
Case Effort 0.2684 1
Discretionaries 0.3397 0.1423 1
Case Quality 0.1906 0.4652 0.0694 1
Total Impact 0.7818 0.2194 0.2689 0.6088 1

Maximum Likelihood Factors


Case Output 1
Case Effort 0.3424 1
Discretionaries 0.3196 0.1091 1
Case Quality 0.2088 0.4390 0.0706 1
Total Impact 0.7877 0.1944 0.2675 0.5805 1

23
by the theory. This in turn leads to a reduction in output on the court. The question is
whether the way a judge is elected affects this effort?

1.5.1 Empirical Strategy

The empirical strategy for examining the effects of electoral demands on judicial behavior is
to exploit the staggered election cycle for identification of stronger electoral incentives. The
election schedule is arbitrarily assigned by history, so it is reasonable to assume that the
schedule is uncorrelated with other institutional or socioeconomic factors that might affect
individual judge performance. For this analysis we used data provide by Kritzer [2011].
The electoral cycle is represented in our regressions as a vector of dummy variables Eist ,
which equals one for years that a judge is up for election. There is a different element of the
vector for partisan, non-partisan, and uncontested retention elections. The dummy variable
is coded as a one regardless of whether the judge actually ran for election – this is needed
to avoid endogeneity problems from the judge’s choice whether to actually run.
One possible source of bias in this analysis comes from time-invariant characteristics of
individual judges. Some judges may have higher or lower performance than others on average
due to unobservable characteristics, and they may be up for election more often or less often
for any number of reasons. To deal with this possibility, we include a full set of judge-specific
fixed effects. Therefore any estimated election coefficients are relative to a judge’s personal
average.
A second major source of bias comes from the time-varying changes in the court work
environment which may be correlated with the electoral schedule. For example, there may be
campaigning demands during election years on all judges – not just those up for election – if
they are asked to assist fellow members of their political party. To deal with this possibility,
we include a full set of state-year fixed effects. Therefore any estimated election coefficients
are also relative to the court average in each year. This means they effectively compare
judges sitting on the same court, working at the same time, but who are in different stages

24
of the electoral cycle.
Formally, we estimate

0
yist = JUDGEi + STATEs × YEARt + Eist ρ + ist (1.5.1)

where JUDGEi is a judge fixed effect, STATEs × TIMEt is a state-year fixed effect for each
s and year t, and Eist includes the election-year treatments. Standard errors are clustered
by state.

1.5.2 Results

The coefficient estimates from Equation (1.5.1) are reported in Table 1.5. Each row is
from a separate regression, where the three columns report the estimate for partisan, non-
partisan, and uncontested elections, respectively. These coefficients can be interpreted as
the standard-deviation change in a judge’s performance when he is up for election relative
to his own average and the state-year average of his colleagues.
Columns 1 and 2 show the effects of being up for elections under contested elections.
Under partisan elections, there is a decrease in performance across the board. Under non-
partisan elections, there is a decrease in discretionary opinions and on total impact.
Column 3 gives the effect of elections for uncontested systems. The effects are the oppo-
site. Instead of a negative effect, there is a positive change in total output, effort per case,
discretionary opinions, and total impact.

1.5.3 Discussion

The fact that the point estimates in the partisan and non-partisan elections are negative
and have the same order of magnitude, while the estimated effects for uncontested elections
is, if anything positive, is consistent with the expectation that election year politics take
time. The results are generally supportive of the idea that judges reduce judging effort

25
Table 1.5: Effect of Being Up For Election
Non-Partisan Election Uncontested Election
Partisan Election Year
Year Year
Outcome (1) (2) (3)

Case Output
Z-index -0.106* -0.154 0.119+
(0.0473) (0.0959) (0.0609)
ML Factor -0.113* -0.164 0.126+
(0.0493) (0.100) (0.0643)

Effort Per Case


Z-index -0.0543+ -0.0253 0.0576*
(0.0281) (0.0281) (0.0290)
ML Factor -0.0558+ -0.0361 0.0582+
(0.0286) (0.0298) (0.0318)

Discretionary Opinions
Z-index -0.0640+ -0.0625** 0.0703+
(0.0340) (0.0231) (0.0387)
ML Factor -0.0627+ -0.0633** 0.0699+
(0.0369) (0.0233) (0.0406)

Case Quality
Z-index -0.0643+ -0.0415 0.0252
(0.0366) (0.0381) (0.0418)
ML Factor -0.0701+ -0.0532 0.0155
(0.0383) (0.0383) (0.0408)

Total Impact
Z-index -0.109* -0.165* 0.0840+
(0.0462) (0.0821) (0.0479)
ML Factor -0.112* -0.182* 0.0856+
(0.0487) (0.0882) (0.0492)

Treated States 23 17 19
Treated Judges 437 270 277
Election Events 810 517 451
N= 16,084 judge-years. Standard errors clustered by state in parentheses. + p < .1, * p < .05, ** p < .01.
Each row is from a separate regression for the stated outcome variable. Treatment variable is a dummy
equaling one for years judge is facing reelection. Regressions include a state-year fixed effect and judge
fixed effect, estimated using Stat's reg2hdfe module.

26
during election years to spend more time on campaigning. The least competitive election
system, uncontested elections, has the smallest effect on behavior as we would expect given
Proposition 3. There may actually be some positive effects, which might be consistent with
a desire to do a better job in an election year, though this would only be speculation.8
In contrast, there are negative effects on judging effort in the contested systems, where
judges spend time campaigning during election years. The results for the non-partisan
elections are consistent with the idea from Lim and Snyder [2015] that they are competitive
and require campaign work. What is possibly more surprising is the large negative effect in
the case of partisan elections. This is surprising because we would expect that given voters
tend to follow party lines, then one would not expect these elections to be as competitive
as the non-partisan elections. It would be interesting to know what role elections play in
campaign financing, and if there are spillover effects between judicial campaigns and other
campaigns that are occurring at the same time.

1.6 Effect of the Selection Process on Judge Quality

In this section we investigate how changes to the procedure to select judges affects the quality
of chosen judges. This analysis is motivated by Proposition 2. Selection mechanisms that
use better information about candidates or have less bias should, all else equal, select better
candidates on average.
A priori, there is no reason to suppose that a judge chosen by the Missouri Plan faces less
bias than in, say, a non-partisan election. However, the intent of using a merit commission
is to create a pool of better qualified judges. Similarly, political parties have an incentive to
choose qualified judges that are consistent with the party’s views. Hence, it is an empirical
question whether or not the judges chosen by the Missouri Plan or by a partisan election
system are of higher or lower quality that those selected under a non-partisan system. What
8
Note that since the mid-1990s, third-party funding for negative advertising in Missouri Plan elections
has increased significantly. Our results may not extend to more recent years (our panel ends in 1994). This
is an important area for future research.

27
the theory illustrates is that the presence of bias reduces quality, while more precise signals
increase quality.

1.6.1 Empirical Strategy

This subsection describes the empirical strategy for measuring the effects on judge quality of
different judge selection systems. The source of identification used is the set of reforms to the
judicial selection systems, depicted in Table 1. Three states changed from partisan selection
to non-partisan selection: Georgia, Kentucky, and Utah.9 Six states moved from partisan
selection to merit selection: Colorado, Iowa, Indiana, Kansas, Nebraska, and Oklahoma.10
Three states moved from non-partisan selection to merit selection: Arizona, Maryland, and
South Dakota.11 The goal of the empirics is to compare the performance of judges selected
before these reforms to the performance of judges selected after these reforms.
We control for time-varying state-specific factors by including a full set of state-year
(interacted) fixed effects. This specification effectively compares the performance of judges
sitting on the same court at the same time, but selected under different regimes. We carry
out some robustness check to ensure that timing issues, such as the age of the judge, do
not explain our results. We do this by including a full set of dummies for years of judge
experience. This means that any estimates are made relative to other judges of the same
experience level.
Second, the regressions include a full set of dummies for the judge’s starting year. This
set of controls complements the years of experience, with the goal of controlling for cohort-
specific effects on performance. For example, judges beginning in the 1970s may be system-
9
Florida also moved from partisan to non-partisan, but it is not included in this section because it changed
to merit selection five years later.
10
Tennessee moved to merit selection in 1972, but moved back to partisan selection in 1978. It is not
included in this analysis.
11
Florida also moved from non-partisan to merit, but it is not included in this section because it had
changed from partisan to non-partisan elections five years prior. Utah also moved from non-partisan to
merit, but our data set does not extend long enough to get observations with two merit-selected judges.
Wyoming also moved from non-partisan to merit, but there were not any years where there were more than
two judges selected from each system. Wyoming is therefore included in Table A.8, where there are similar
results.

28
atically better than judges beginning in the 1980s, due to changes in the economy. These
indicators control for national variation in the market for judges as a function of time.
Third, the treatment indicators are active only for years where there are at least two
judges selected from each system working on the court during that year. This is done to
make a clean comparison that is not biased by outlier pre-reform judges who remain on the
bench long after the other pre-reform judges. Appendix Table A.8 reports the results when
all years are included – they are similar.
The estimating equation for performance variable yist for judge i in state s at year t is

0 0
yist = STATEs × YEARt + Xist β + Sist ρ + ist (1.6.1)

where STATEs × YEARt includes the state-year fixed effects, Xist includes the indicators for
years of judge experience and judge’s starting year, and Sist includes the treatment indicators
equaling one for judges selected under the post-reform system. Standard errors are clustered
by state.
Given the inclusion of the fixed effects, the coefficients ρ procure the average difference
in performance between judges selected under the new system and judges selected under the
old system, controlling for other time-varying state-level factors, for years of experience, and
for cohort effects.12

1.6.2 Results

Table 1.6 reports the estimates from Equation (1.6.1). Column 1 compares non-partisan-
selected judges to partisan-selected colleagues. Column 2 compares merit-selected judges to
partisan-selected judges. Column 3 compares merit-selected judges to non-partisan-selected
judges.
12
Note that in the electoral selection systems, the judges may be initially appointed by the governor to
fill a vacant seat, rather than being initially selected through a competitive electoral process. We still code
the appointed judges as being selected under the electoral system – since the predecessor’s choice whether
to step down is endogenous to the system.

29
Table 1.6: Effect of Judicial Selection System on Judge Quality
Non-Partisan Judges Merit-Selected Judges Merit-Selected Judges
Relative to Partisan Relative to Partisan Relative to Non-Partisan
Judges Judges Judges
Outcome (1) (2) (3)

Case Output
Z-index -0.0041 -0.084 -0.0261
(0.0914) (0.0750) (0.169)
ML Factor -0.0428 -0.0734 0.00604
(0.105) (0.0743) (0.192)

Effort Per Case


Z-index -0.280* 0.289* 0.305
(0.108) (0.141) (0.213)
ML Factor -0.317* 0.270+ 0.32
(0.125) (0.155) (0.209)

Discretionary Opinions
Z-index 0.341 -0.00129 0.428+
(0.249) (0.150) (0.246)
ML Factor 0.357 0.0262 0.434
(0.224) (0.159) (0.261)

Case Quality
Z-index 0.046 0.194* 0.405+
(0.0919) (0.0767) (0.206)
ML Factor 0.0221 0.194* 0.422*
(0.106) (0.0829) (0.191)

Total Impact
Z-index 0.141+ -0.0349 0.0774
(0.0798) (0.0802) (0.196)
ML Factor 0.115 -0.0614 0.0619
(0.0844) (0.0799) (0.191)

Treated States 3 6 3
Treated State-Years 24 86 24
Treated Judges 14 54 16
N= 16,084 judge-years..Estimate of the average difference between judges selected under a new system, relative
to to judges selected under the old system, limited to years in which there are at least two judges on the court
selected from each system. Regressions include a state-year fixed effect, a full set of dummies for years of
experience, and a full set of dummies for starting years. Standard errors clustered by state in parentheses. + p < .
1, * p < .05, ** p < .01.

30
The results can be summarized as follows. Non-partisan-selected judges have lower effort-
per-case but higher total impact than partisan-selected judges on average. Merit-selected
judges have higher effort-per-case and higher case quality than partisan-selected judges.
Merit-selected judges have higher discretionary effort and case quality than non-partisan-
selected judges.

1.6.3 Discussion

First, non-partisan judges have higher total impact than partisan judges. Lim and Snyder
[2015] find that party affiliation drives voter behavior, and hence our result is consistent
with bias, where having a public political affiliation results in worse candidates. The results
on merit selection suggest that merit commissions select better judges than elections. This
is consistent with the model’s notion that merit commissions have more information about
judge quality than voters.
It is worth pointing out that Choi et al. [2010], using 3 years of data and identifying
the effect from a cross section of judges, find a much larger effect of election system upon
output (measured by number of opinions).13 The difference in results illustrates the effect
of research design upon the estimated effects. Given all our controls, one might argue that
our results are a lower bound on the effect of the selection system.
It is worth highlighting the fact that when jurisdictions choose a particular election
system they must rely upon rather crude information regarding the causal effect of a reform.
In particular, the Choi et al. [2010] show that comparing the experience of two jurisdictions
with different systems can easily lead to larger perceived effects of the selection system.
Choi et al. [2010] suggest that out-of-state citations provide the best measure of quality.
They find that merit commissions have zero effect. Since they run a single regression they are
13
Rather then estimate the causal effect by comparing the judges on the same court, they run a single model
with a large set of controls. The results are reported in Table 6; in Model 1, the three coefficients for partisan,
non-partisan, and merit selection are: 1.219** (4.930), 0.738** (3.180), 0.651* (2.240) (standard errors in
parentheses). Given that we have a much longer time period, and more tightly controlled comparisons, the
fact that these coefficients are so large and so significant suggests that the results are driven by variation
across states rather than the variation in electoral system.

31
only measuring the correlation between a merit plan and cites. Our identification strategy
attempts to measure the effect of changing from a partisan to merit commission. Here we
find large significant effects on out-of-state citations (see Appendix A.2.2), consistent with
the hypothesis that merit commissions have access to better information regarding judicial
performance.

1.7 Variation in Response to Incentives

This section examines differences in the response to incentive changes based on how the
judge was selected.

1.7.1 Effect of Judge Retention Process

This subsection reports the results on how changing the system for judge retention affects
the performance of sitting judges. Subsection 1.3.3 discusses the model mechanism for the
effects of retention elections on incumbent judge behavior. More competitive elections result
in more campaigning, which will reduce effort spent on judging. We examine this issue using
judge fixed effects and institutional reforms to the retention system.

1.7.1.1 Empirical Strategy

Identification comes from discrete changes in the rules for retaining state supreme court
judges. The timing of these reforms is illustrated in Appendix Figure 1. Four states changed
from partisan retention elections to non-partisan retention elections: Florida, Georgia, Ken-
tucky, and Utah. Eight states moved from partisan retention to uncontested retention elec-
tions: Colorado, Illinois, Iowa, Indiana, Kansas, Nebraska, New Mexico, and Oklahoma.
Six states moved from non-partisan retention to uncontested retention: Arizona, Florida,
Maryland, South Dakota, Utah, and Wyoming.14
14
For some of these treatments, there were other types of judicial reforms occurring around the same time.
See Appendix B.1 for more details and robustness checks.

32
The regression framework is a standard differences-in-differences approach based on
Bertrand et al. [2004]. To control for time-invariant judge characteristics that may be corre-
lated with the retention system in various states, we include judge fixed effects. To control for
national trends in performance, we include year fixed effects. To control for pre-existing state
trends in performance that may be confounded with the reforms, we include state-specific
linear trends.
As in Ash and MacLeod [2015], we measure effects in a ten-year window around the
reforms. The regressions include an indicator equaling one for the baseline time window
of ten years before and ten years after a change to the retention system. The treatment
variable is a dummy for the ten years after the change. Thus, with the inclusion of the
judge fixed effects, the estimates can be interpreted as the average difference in within-judge
performance for the ten years after the policy change relative to the ten years before the
policy change. In a handful of states, we shrank the time window if the reform occurred
close to the beginning or end of the sample.15
Formally, we estimate

0 0
yist = YEARt + JUDGEi + STATEs × t + R̄st ρ̄ + Rst ρ + ist (1.7.1)

where YEARt is a fixed effect for year t, JUDGEi is a judge fixed effect, and STATEs × t is
a state-level linear time trend for state s. The term R̄st is a vector of indicators equaling one
for the baseline time windows of ten years before and ten years after each of the retention
reforms. Rst is a vector of treatment indicators for the ten years after each rule change.
Standard errors are clustered by state.
With the inclusion of the judge fixed effects, the estimates for the elements of ρ can be
15
These reforms are mostly enacted by voters through ballot referendums administered in November and
officially going into effect the subsequent January. In these cases the dummy variable would turn on in the
year following the vote. In cases where the policy is effective in the first half of the year, it is coded as
turning on in that year. Note that Florida changes from partisan to non-partisan and then to uncontested
elections. In the table regressions it is coded using the years depicted in the figure. Our results do not change
substantially if Florida is left out of the analysis.

33
interpreted as the average difference in within-judge performance for the ten years after the
policy change relative to the ten years before the policy change. Notice that these results
apply to different types of judges. For example, moving from a partisan to non-partisan
system measures the effect of the change upon a judge selected under a partisan system.

1.7.1.2 Results

Table 6 reports our estimates for ρ~ from Equation 6.2. Each row is from a separate regression,
with the first column giving the partisan-to-nonpartisan effect, the second column giving the
partisan-to-uncontested effect, the the third column giving the nonpartisan-to-uncontested
effect. Each regression includes a year fixed effect, judge fixed effect, and state trend.
Column 1 gives the incentive effect on sitting judges of moving from a partisan system
to a non-partisan system. There is a negative effect on case output. Column 2 has the
effect of moving from a partisan system to an uncontested system. Here we see no effects on
performance. In Column 3, we see that a move from non-partisan to uncontested elections
is associated with an increase in discretionary effort and on case quality.

1.7.1.3 Discussion

Begin with the move from partisan to non-partisan. There is a decrease in case output.
Without placing too much emphasis on these estimates (and noting the small sample of states
for this reform), this is consistent with the point in the model that nonpartisan elections are
more competitive than partisan elections, and therefore impose greater electoral constraints
on a judge’s time.
What about the effect of moving from partisan to uncontested? There are no effects.
This could mean two things. This would be consistent with the idea from the model and
Lim and Snyder [2015] that partisan systems impose weak electoral incentives, so moving
to an uncontested system wouldn’t change incentives very much. This could also mean that
partisan systems select for judges that don’t care about quality judging, so reducing electoral

34
Table 1.7: Effect of Changing the Retention System on Incumbent Judge Performance
Partisan Retention to Partisan Retention to Non-Partisan Retention to
Non-Partisan Retention Uncontested Retention Uncontested Retention
Outcome (1) (2) (3)

Case Output
Z-index -0.183* 0.0445 -0.0257
(0.0862) (0.0567) (0.103)
ML Factor -0.167+ 0.0355 -0.0324
(0.0849) (0.0568) (0.100)

Effort Per Case


Z-index 0.0748 -0.0931 0.225
(0.196) (0.118) (0.166)
ML Factor 0.0969 -0.0953 0.181
(0.216) (0.108) (0.182)

Discretionary Opinions
Z-index -0.00425 -0.0132 0.216+
(0.0652) (0.136) (0.115)
ML Factor 0.0169 -0.018 0.226+
(0.0644) (0.143) (0.127)

Case Quality
Z-index 0.0352 -0.0716 0.283*
(0.133) (0.104) (0.111)
ML Factor 0.039 -0.0722 0.262*
(0.166) (0.120) (0.115)

Total Impact
Z-index -0.142 0.0313 0.0595
(0.0918) (0.0813) (0.122)
ML Factor -0.154 0.0498 0.016
(0.101) (0.0929) (0.140)

Treated States 4 8 6
Treated Judges 25 65 35
N= 16,084 judge-years.Estimate of the average treatment effect of changing the judge retention system on
incumbent judges at the time of the reform. Regressions include a judge fixed effect, year fixed effect, and state
trends. Standard errors clustered by state in parentheses. + p < .1, * p < .05, ** p < .01.

35
incentives does not result in increased judging effort.
We see a positive effect on work quality when moving from non-partisan retention to
uncontested retention. There is a statistically significant increase in discretionary effort and
case quality. In response to the weaker electoral incentives, the non-partisan judges improve
performance. This suggests that the strong electoral demands were taking time away from
judging.

1.7.2 Relative Election-Year Effect on Judges Selected by Different

Processes

Finally we look at whether judges selected under different systems respond differently to
electoral demands. As in Section 1.6, we focus on the states that changed their procedures
for selecting judges. We then look at the effect of the electoral cycle separately for judges
selected under different systems.

1.7.2.1 Empirical Strategy

Again we have the vector of electoral dummies Eist to equal one for judges that are up for
election in year t. In addition, we have the vector of dummies Si for the process under which
a judge is selected. We estimate

0 0
yist = JUDGEi + STATEs × YEARt + Eist ρ + Si Eist η + ist (1.7.2)

where we have included a judge and state-year fixed effect as in Section _. While Si isn’t
identified at the judge level, the interactions with the electoral cycle are identified within
and across judges.
We are interested in the following estimates. First, ρ will give the baseline electoral-
cycle effects for non-partisan judges (in the states that went from partisan to non-partisan
elections). It will also give the baseline electoral-cycle effect for merit judges (in the states

36
that moved from elections to merit selection). The vector of coefficients η will include the
effect of non-partisan elections on partisan-selected judges relative to their non-partisan-
selected counter-parts. It will also include the effect of uncontested elections on partisan-
selected judges relative to their merit-selected counter-parts. Finally, it will include the
effect of uncontested elections on non-partisan-selected judges relative to their merit-selected
counter-parts.

1.7.2.2 Results

Table 1.8 reports the relative effects of the electoral cycle by the process a judge is selected.
Column 1a gives the baseline effect of non-partisan elections on non-partisan selected
judges. Column 1a has negative effects on discretionary opinions and total impact, similar
to the baseline election-cycle results. Column 1b shows the relative effect of non-partisan
elections on partisan judges. These are zeros.
Column 2a gives the baseline effect of uncontested elections on merit-selected judges. As
with the baseline results, there are actually positive effects estimated for the election-cycle
effect in an uncontested system. When one only looks at the merit-selected judges, the
effect is stronger. Column 2b gives the relative effect of uncontested elections on partisan-
selected judges. There are significant negative effects. The coefficients are larger in absolute
value than the coefficients from Column 2a, meaning that uncontested elections actually
have a negative effect on performance for partisan-selected judges. Finally Column 2c gives
the relative effect of uncontested elections for non-partisan-selected judges, relative to merit
judges. There aren’t any significant differences here.

1.7.2.3 Discussion

The interesting effect in this section is that after changing to uncontested retention elections,
partisan-selected judges still demonstrate the same election-year behavior as they did under
partisan elections. This perhaps explains why there was no within-judge effect of moving

37
Table 1.8: Relative Election-Year Effect On Judges Selected by Different Processes
Effect of Non-Partisan Relative Effect of Non- Effect of Uncontested Relative Effect of Uncon- Relative Effect of Uncon-
Elections on Non-Parti- Partisan Elections on Par- Elections on Merit-Se- tested Elections on Parti- tested Elections on Non-
san-Selected Judges tisan-Selected Judges lected Judges san-Selected Judges Partisan-Selected Judges
Outcome (1a) (1b) (2a) (2b) (2c)

Case Output
Z-index -0.157 0.136 0.158* -0.198* 0.0663
(0.0982) (0.178) (0.0642) (0.0984) (0.150)
ML Factor -0.168 0.131 0.170* -0.217* 0.0469
(0.103) (0.188) (0.0678) (0.0994) (0.162)
Effort Per Case
Z-index -0.025 -0.0133 0.0836* -0.128* 0.0336
(0.0272) (0.204) (0.0358) (0.0551) (0.111)
ML Factor -0.0356 -0.0178 0.0893* -0.139* -0.0118
(0.0291) (0.189) (0.0377) (0.0559) (0.123)
Discretionary Opinions
Z-index -0.0616* -0.0363 0.0694 0.00347 0.00336
(0.0239) (0.103) (0.0501) (0.0637) (0.117)

38
ML Factor -0.0622* -0.0429 0.0737 -0.0117 -0.0197
(0.0241) (0.0801) (0.0513) (0.0687) (0.128)
Case Quality
Z-index -0.0486 0.279 0.0626 -0.145 -0.0964
(0.0368) (0.277) (0.0384) (0.128) (0.135)
ML Factor -0.0593 0.239 0.0566 -0.159 -0.111
(0.0370) (0.329) (0.0404) (0.134) (0.119)
Total Impact
Z-index -0.173* 0.302 0.129* -0.198* -0.0234
(0.0837) (0.234) (0.0509) (0.0783) (0.0987)
ML Factor -0.189* 0.261 0.132* -0.206** -0.0226
(0.0902) (0.255) (0.0538) (0.0770) (0.104)
Treated States 2 2 11 8 4
Treated Judges 7 4 119 51 10
Election Events 8 5 201 90 16
N= 16,084 judge-years. Standard errors clustered by state in parentheses. + p < .1, * p < .05, ** p < .01. Each row is from a separate regression for the
stated outcome variable. The estimated coefficient is a dummy equaling one for years judge is facing reelection, interacted with a dummy for if the judge is
selected under the new selection system. Regressions include a state-year fixed effect, judge fixed effect, and the baseline coefficient for the election-year
from partisan to uncontested elections – these judges are responding the same way to the
electoral cycle as they had been doing before the reform. This adds further evidence of a
difference in preferences between partisan and non-partisan systems. Partisan judges prefer
to reduce performance in election years, even when those are uncontested elections where
most everyone is retained. One possible interpretation is that because they have more
partisan preferences, they feel a desire to be involved in campaigning for other non-judge
candidates when they are up for election.

1.8 Conclusion

The goal of this paper has been to evaluate the effect of election processes on the quality of
individuals and the effort they put into their jobs. To address this question we exploit the
fact that the work of judging has remained relatively stable over time, which allows us to
build performance measures based on a database of written state appellate court decisions.
We exploit the fact that U.S. states have experimented with different methods to appoint
judges. This allows us to measure the causal effect of a change in the system upon perfor-
mance. We can also evaluate the selection effect by comparing the performance of judges
selected by different systems, but serving at the same time. Our results are summarized in
Table 1.9.
The evidence suggests that non-partisan elections select better judges than partisan elec-
tions. This is consistent with a selection model where a stronger signal on party affiliation
crowds out information on candidate quality, so candidates are lower quality on average. The
evidence also suggests that an expert, merit-based selection process selects better judges than
an election system. This is consistent with a selection model where better-informed experts
can choose more high quality officials than voters on average. In the realm of selecting public
officials, more information is not always better for the quality of the person chosen.
For incumbent judges, we find that stronger electoral incentives reduce performance in

39
Table 1.9: Summary of Results
Partisan Judges Non-Partisan Judges Merit Judges

Selection Process Effects

Relative to Partisan Judges ↑ ↑

Relative to Non-Partisan Judges ↑

Electoral Cycle Effects

Partisan Election Year ↓

Non-Partisan Election Year ↓ ↓

Uncontested Election Year ↓ ~ ↑

Retention Reform Effects

Move to Non-Partisan ↓

Move to Uncontested ~ ↑

Summary of results. The left-most column indicates the treatment, and the other column headers
indicate the sample of judges upon which the effect is being measured. Arrows indicate a positive or
negative effect on judge performance. A tilde (~) indicates no effect. See text for details.

40
election years, and that contested elections reduce performance more than uncontested elec-
tions. This is consistent with a simple model in which campaign effort takes time away from
judging. Moving from partisan to non-partisan elections reduces performance for incumbent
partisan-selected judges, which is consistent with the idea that partisan elections are less
competitive because voters are biased by political affiliations. Moving from non-partisan to
uncontested elections increases performance, consistent with the notion that non-partisan
contested elections are more demanding of a judge’s time than uncontested elections.
There is no within-judge effect of moving from partisan to uncontested elections. We
show that this occurs because the partisan-selected judges do not change their electoral
behavior – they continue to reduce performance during election years after the reform, even
though uncontested elections are not competitive. Merit-selected judges actually increase
some measures of performance during election years. These results highlight that these
electoral systems select for different types of individuals. These differences in abilities and
preferences result in measurable differences in their legal output.
A great deal of work remains. Even though we have a long panel, and arguably good
identification, the effects are often relatively small or barely significant. This fact may not
be all that surprising. If a single system had strong, consistent results, then we would have
expected the market to have moved in that direction quickly, consistent with Posner’s [1987]
view that legal institutions move in the direction of efficient exchange.
Yet, the fact that we do find a pattern of effects that are consistent with our simple model
helps explain why there is experimentation. The results are consistent with the hypothesis
that merit commissions select better judges, followed by non-partisan judges, and finally
partisan judges. Yet, judging is not a purely technical activity. There is a large literature in
political science showing that the political views of judges color their decisions, which may
explain why many jurisdictions prefer to allow the democratic process to be informed by
the political views of judges.16 In this paper we provide some evidence on the performance
16
See Epstein et al. [2013] for a discussion of federal judges, and copious citations to this large literature.

41
consequences of these choices, which will hopefully inform decision-making on this important
institutional question.

42
Chapter 2

The political economy of tax laws in the


U.S. states

Elliott Ash

This paper contributes to recent work in political economy and public finance that focuses
on how details of the tax code, rather than tax rates, are used to implement redistributive
fiscal policies. I use tools from natural language processing to construct a high-dimensional
representation of tax code changes from the text of 1.6 million statutes enacted by state
legislatures since 1963. A data-driven approach is taken to recover the effective tax code –
the set of legal phrases in tax law that have the largest impact on revenues, holding major
tax rates constant. Exogenous variation in tax legislation from judicial districts is used to
capture revenue impacts that are solely due to changes in the tax code language, with the
resulting phrases providing a robust out-of-sample predictor of tax collections. I then test
whether political parties differ in patterns of effective tax code changes when they control
state government. Relative to Republicans, Democrats use revenue-increasing language for
income taxes but use revenue-decreasing language for sales taxes – consistent with a more
redistributive fiscal policy – despite making no changes on average to statutory tax rates.
These results are consistent with the view that due to their relative salience, changing tax

43
rates is politically more difficult than changing the tax code.

44
2.1 Introduction

Standard models in the political economy of tax policy feature tax rates, public goods, and
expenditures as the key tools for implementing a redistributive fiscal policy [Persson and
Tabellini, 2002]. A redistribution-oriented government can implement a progressive tax on
income and redistribute the proceeds as public goods or lump-sum transfers. A model of
what components of income are taxable, or how those components are legally specified, is
not needed for this approach.
Recent work in public finance has shown that the legal definition of the tax base has
important revenue and redistributive consequences [Kopczuk, 2005, Gordon and Kopczuk,
2014]. The base involves a complex set of policy choices that affect the allocation of the tax
burden. For example, giving income tax credits for dependent children will favor families with
children. Exempting groceries from sales tax will favor individuals who spend a relatively
large proportion of their income on groceries.
An attractive setting for the empirical study of tax policy is the U.S. states. With
panel data on fifty different state governments, one can analyze the political determinants
of redistribution. Previous work on state politics has documented that political control of
state government has an impact on tax revenues [Reed, 2006, Warren, 2009]. But how those
revenue changes are implemented – changes in tax rates, versus changes in the tax base –
presents an open question.
The difficulty in measuring the relative importance of tax rates and the tax base is that
the definition of the base must be embodied in the language of the tax code. The wording
of legislation can have large impacts: Legislators must specify which people counts as de-
pendents, for example, and which items count as groceries. Because statutory language is
ambiguous, tax base provisions may have multiple interpretations. Legal experts, including
judges tasked with enforcing the code, often disagree on the tax consequences of these provi-
sions [Weisbach, 1999, 2002]. For the empirical researcher, this means that many provisions
cannot be reliably coded as data across states. The researcher interested in testing for the

45
revenue consequences of particular provisions across state tax codes would have to make
many subjective decisions.
This paper aims to provide a data-driven approach to this problem using tools from
natural language processing applied to the text of state tax legislation. These tools are
used to construct a high-dimensional representation of tax law from the text of 1.6 million
statutes enacted by state legislatures since 1963. Exogenous variation in the tax law comes
from diffusion of legal language within regional judicial districts. This variation is used to
estimate the impact of tax law text features on revenue. This method uncovers the effective
tax code – the set of text features in the tax code that have a measurable causal impact on
revenue collections. This data-driven method provides a more objective representation of the
tax code than would be possible with subjective coding of complex, potentially ambiguous,
provisions.
The advantage of a state-level analysis (relative to the federal government) is that one
can examine how variation in political party control is related to changes in the tax law.
In this paper, I measure the effect of a change in political party of state government on
tax rates and the tax code. Consistent with the previous literature, I document effects of
political control on tax revenue. But I find no effects of political control on average to the
major tax rates. Income tax revenues increase due to Democrat control, while sales tax
revenues decrease.
The new contribution is in demonstrating the role of the effective tax code in the imple-
mentation of redistributive fiscal policy. Relative to Republican-controlled state government,
Democrat-controlled governments use revenue-increasing language on income taxes. On sales
taxes, they use revenue-decreasing language. Because income taxes are relatively progressive,
and sales taxes are relatively regressive, this pattern is consistent with more redistributive
fiscal policy choices by Democrats. The results suggest that in U.S. state governments, po-
litical parties implement fiscal policy primarily through the legal definition of the tax base,
rather than through changes to the major tax rate structures.

46
The data include state government financial accounts linked to the text of state tax laws,
for a 48-year time period (1963 through 2010) and for three separate taxes: personal income
tax, corporate income tax, and sales tax. These three state taxes together account for 73
percent of state government tax collections and 4 percent of U.S. GDP (as of 2014).
The first challenge is to represent the features of the tax base as analyzable data. For
example, the New York state tax agency web site lists eighty major exemptions to the
sales tax, and that excludes many relatively minor exemptions, deductions, and credits for
the sales tax in other tax code sections. Trying to measure the effects of each of these
individual rules on sales tax revenue in New York would be a difficult task – and this is
just one tax source, one state, and at one point in time. Analyzing all fifty states at once
requires new techniques from natural language processing to represent the tax base using
measurable features of tax legislation. Section 2.5 describes the application of these methods
to represent tax law changes as a frequency distribution over a vocabulary of 25,000 phrases.
The goal is not to estimate precisely the revenue impact of any particular phrase, but rather
to construct a ranking of the phrases that can be used to explore how political parties differ
in the language they insert into the tax code.
Tax code language is chosen endogenously in response to variables that are correlated with
tax revenue, so standard panel data methods comparing changes in revenue to changes in tax
laws would render inconsistent estimates. Determining which phrases have a causal effect on
tax revenues requires exogenous variation in these phrases. The solution to this problem is an
instrumental-variables setup related to Bartik’s [1991] identification of labor demand shocks.
Instruments for phrase frequencies in an individual state are constructed from the lagged
phrase frequencies in states in the same federal judicial circuit. This approach is motivated
by previous historical and empirical work demonstrating a shared legal community within
circuits in which legal ideas and legal language diffuse through cultural channels that are
orthogonal to the economic variables that otherwise underlie tax revenues [Carp, 1972, Bird
and Smythe, 2008, Hinkle, 2015]. These lagged features constitute a high-dimensional set

47
of sparse instruments, requiring the application of recently introduced dimension-reduction
methods [Belloni et al., 2012, Lin et al., 2015].
The 2SLS regressions provide estimates of the predicted impact of phrases on tax revenue.
The most predictive phrases are then aggregated in a partial least squares regression model,
which can predict tax-revenue changes out of sample. The model works with both the actual
phase frequencies and the instrumented phrase frequencies, demonstrating that the textual
features of legislation are predictive of and causally related to tax revenue. Analysis of the set
of revenue-relevant phrases suggests the importance of language defining tax expenditures:
deductions, exemptions, and credits.
The next step is to investigate the role of the tax code in the political economy of state
fiscal policy. The empirical strategy is to use panel data regressions estimating the effect
of Democrat control of state government, controlling for governor votes and legislative seat
shares as forcing variables. When new political parties take control of state government,
they do not change major tax rates on average.
The main results section looks at the effect of political control on the predicted revenue
impact of the effective tax code. For income taxes, Democrats choose revenue-increasing
language. For sales taxes, Democrats choose revenue-decreasing language. Moving from full
Republican control of government to full Democrat control of government is associated with
tax code changes that are predicted to raise an additional $2 billion of income tax revenue
in the average state, with a corresponding decrease of $1.7 billion in sales tax revenue.
Income taxes are relatively progressive, while sales taxes are regressive. The use of
revenue-increasing language by Democrats on progressive taxes but revenue-decreasing lan-
guage on regressive taxes is consistent with Democrats implementing a more redistributive
fiscal policy through the tax code. Tax code provisions defining the base – rather than
the tax rate – are the key policy tool in the political economy of fiscal policy in the U.S.
states. This is consistent with the view that major tax rates are politically more difficult
to change than the tax code, perhaps because rate changes would be more salient for voters

48
[Finkelstein, 2009, Chetty et al., 2009, Cabral and Hoxby, 2012].
These results are relevant to a broad literature in political economy, reviewed in Section
2.2. Thereafter Section 2.3 presents a model to guide analysis of the data. Section 2.4
describes the tax data, while Section 2.5 details the legislative text data and methods for
text processing. Section 2.6 provides methods and results for recovering the effective tax
code using the Bartik language instruments. Section 2.7 uses changes in political control to
estimate the effect of political control on tax policy. Section 2.8 relates the phrase effects on
revenue to the political effects on phrases to analyze the role of the tax code in redistributive
fiscal policy. Section 2.9 concludes.

2.2 Related Literature

The standard models in public finance assume that tax collections are a function of rates
and audit probabilities [Mirrlees, 1971, Atkinson and Stiglitz, 1976, Feldstein, 1999, Chetty,
2009]. In that case there is no scope for legal avoidance or gaming, and a deterrence model like
Allingham and Sandmo [1972] or Logue [2007] will suffice to explain the interaction between
tax agency and taxpayer. Good empirical evidence that increased audit rates reduce evasion
include Kleven et al. [2011] and Pomeranz [2011].1
In the standard models, tax legislation is important because it encodes policies that have
socioeconomic impacts, but the wording of those statutes doesn’t have independent interest
because the policies are well-defined. On the other hand, there is a competing view among
tax law scholars that the tax code is not a complete description of policy: There is ambiguity
and indeterminacy in the language that makes a complete formal description impossible.2
1
In a Minnesota experiment, Slemrod et al. [2001] show that high-income individuals actually report less
income when threatened with a high probability of audit. This low-ball report can be understood as an
introductory offer in a bargaining exchange between taxpayer and tax agency, on the assumption that legal
ambiguity about liability creates scope for allocating a surplus. Cai and Liu [2009] report that tax avoidance
among Chinese firms is higher in more competitive industries.
2
“Between these extremes was a continuous range of transactions, and the policymaker had to decide
which were taxable and which were not. This type of problem is quite general in the tax law The tax law
distinguishes between debt and equity, selling and holding,an independent contractor and employees. There

49
Graetz [1995], for example, notes that despite the use of accounting methods to evaluate tax
reforms, there are still “massive empirical uncertainties” precluding good predictions about
the revenue consequences.
More recent work has recognized that this simple model of the tax system is too limited
[Andreoni et al., 1998, Slemrod and Yitzhaki, 2002]. In reality, the tax code is an incomplete
set of written rules, and taxpayers face administrative and legal uncertainty in their dealings
with the tax authority.3 Honest mistakes do occur, so harsh rule-based penalties are often
inefficient. But discretionary standards that require adjudication are more easily gamed.4
The importance of interpretation and language in the operation of tax law rules is well-
known in legal scholarship on tax law. Livingston [1995] and Heen [1996] discuss the im-
portance of text, as well as the limits of plain-meaning textual analysis, in tax law. In tax
law especially, judges are encouraged to interpret the intentions of legislators and not to
interpret the text literally. Shaviro [2004] discusses the dual nature of legal language in tax
and fiscal policy – both for furthering political goals and for describing policy. This results
in indeterminate and confusing language.
Efforts in economics to extend the standard model demonstrate the pros and cons of more
complex tax rules. Kopczuk [2001] uses a model of heterogeneous avoidance ability among
taxpayers to show that avoidance can be optimal if mainly performed by low earners, or if
administrative costs are sufficiently high. Kleven and Kopczuk [2011] show that increased
complexity in eligibility requirements for social benefits can reduce takeup, but that optimal
programs must have complex eligibility rules to prevent false award grants.5 A well-known
are hundreds of these types of distinctions” [Weisbach, 1999]. Vasconcellos [2007] discusses the problems
judges often face of uncertainty in tax law, and how they have to appeal to policy interests or fairness.
3
These points are consistent with Givati’s [2009] observation that tax litigation filings and IRS internal
tax appeals are persistently high; if tax law was predictable, taxpayers would not invest in these costly
challenges.
4
Likhovski [2004] examines the history of tax-shelter adjudication beginning with Learned Hand’s Gregory
v. Helving. Solan and Dean [2007] identify the importance of the rule of lenity, a statutory-construction
heuristic normally associated with criminal cases which advises strict construal of penal provisions against
the government. Because conservative judges construe tax provisions this way, corporations can avoid taxes
by structuring tax shelters that are arguably within the text of the statute but are unrelated to the policy
interest motivating the provision.
5
In practice, eligibility provisions can have undesirable consequences. In analogous work on the student

50
example of complex tax targeting is the set of multiple partially overlapping definitions of
child in the federal tax code, resulting in uncertainty for taxpayers about eligibility for credits
[Holtzblatt and McCubbin, 2003].6
Other work has analyzed the political incentives for complex tax legislation. Surrey [1957]
provides an early anecdotal account of the role of lobbyists in writing special tax provisions,
while Graetz [2007] provides a more recent account to the same effect. Holcombe [1998] pro-
poses that complex tax rules facilitate inefficient rent-seeking by giving legislators numerous
hidden opportunities to give interest groups special tax treatment. A more innocuous view
is that policymakers exploit the complexity of legislation to reduce the perceived tax burden
[Krishna and Slemrod, 2003]. Hettich and Winer [2005] argue that complex tax structures
emerge as a byproduct of electoral competition; political parties attempt to propose and
implement policies that discriminate as carefully as possible among heterogeneous voters, a
process held in check only by administrative costs.7
An important strand of this literature has focused on the definition of the tax base: The
set of transactions or components of income that are included as targets of tax collections.
In Weisbach [2002], the tax base is difficult to define and can only be measured by indirect
proxy. Tax shelters arise from efforts to exploit the limitations of these proxies. Kopczuk
[2005] examines the relation between the tax base and the income elasticity with respect to
taxes, showing that the direct effect of tax rates on taxable income is zero, but that there
financial aid system, Dynarski and Scott-Clayton [2006] show that a radically simplified process could repro-
duce the same distribution of aid with far lower administrative costs and less invasive collection of private
information.
6
Paul [1997] shows that the number of tax law reporter volumes published in a state is correlated with
state income tax revenue, suggesting some relationship between revenue and complexity. Slemrod [2005]
measures tax complexity by the number of lines in tax forms and the number of pages in tax instruction
booklets. He reports small correlations of higher tax complexity with older income tax systems, higher
legislator salaries, lower voter turnout, higher average tax rates, and higher education levels. Katz and
Bommarito II [2014] provide measurements of the complexity of the titles of the U.S. Code using measures
constructed from the text and its citations. Bommarito et al. [2011] provide a descriptive survey of the
population of U.S. Tax Court decisions.
7
Yet another idea is that the drafters of tax laws have an incentive to make those laws more complex
so they can earn rents after they leave government explaining the laws to clients [Weisbach, 2002]. Schizer
[2005] observes that private tax lawyers outmatch their government counterparts in sheer numbers, access
to information, and sheer expertise.

51
are large effects when deductions are available. This shows that previous models examining
income elasticity left out an important institutional component: the tax base. Follow-up
work by Gordon and Kopczuk [2014] shows that the choice of the tax base matters for the
incidence of the tax burden.
Another related literature examines tax expenditures – deductions and exemptions to
taxes that are designed to implement social policies [Howard, 1999]. Well-known examples
are the deduction for property taxes and mortgage interest, and the exclusion of imputed
rental income, which favor homeowners [Poterba and Sinai, 2008]. According to Slemrod
[2004], revenue losses due to corporate income tax shelters are growing and account for at
least half of the corporate tax gap.8 Desai [2005] describes how the legal distinction between
financial reporting of corporate income (for stock value) and tax reporting of income (for
tax liabilities) has led to a large gap between the two and under-collection of corporate
income taxes.9 Zucman et al. [2015] estimates that a full 8 percent of the world’s wealth
is held in tax havens. On the positive side, Chetty and Hendren [2013] show that higher
tax expenditures at the state and local level are related to better socioeconomic mobility
across generations. Methodologically, an active issue in public finance is how to measure tax
expenditures [Burman and Christopher Geissler, 2008]; the text-based methods developed
in this paper may be helpful in this area.
While there is less work on the tax base at the state level, Shaviro [1992] notes how
every state has different definitions for taxable income. This is part of a large literature
examining state tax systems. For example, Rork [2003] finds that states tend to follow the
rate changes in neighboring states for excise taxes, but but not for personal income taxes or
general sales taxes. Chernick [2005] shows that deductibility of state and local taxes is an
8
The IRS estimates that the federal tax gap, based on audits, is 17%. Alm and Borders [2014] review
the small set of papers and reports on state-level tax gaps. They find tax gaps similar to the federal level,
ranging from 10% in Idaho to 20% in Montana.
9
See also GAO [2003] and Plesko [2007]. Ordower [2010] reviews the history of tax avoidance and the
transformation of corporate tax departments from compliance centers to profit centers. This is an old issue;
Griswold [1944] blamed the low tax collections in the 1940s on “uncertainty, confusion, discrimination, and
inconsistency” in tax rules.

52
important factor increasing progressivity.
The most relevant segment of this literature is that examining the effect of political
party control on state fiscal policy. Besley and Case [2003] provide a review of this literature
and present some evidence that Democrat control of the lower legislative chamber (but not
upper chamber) is associated with higher total taxes. Reed [2006] and Warren [2009] use
data from state legislatures from 1960 through 2000 and show that Democrat control of
both legislatures is associated with higher tax collections, but they do not look at rates nor
attempt to break things out by revenue source. Leigh [2008] analyzes the effect of governor
control in an RD setup using data for 1941 through 2002. He finds that the party of the
governor has no effect on rates or collections for personal income or corporate income.10 I
couldn’t find any papers on political control and sales taxes.
The literature in behavioral public finance on tax salience provides evidence relevant to
the government’s tax policy choices. Chetty et al. [2009] show that consumer demand reacts
less strongly to sales taxes that are excluded from the posted purchase price. Goldin and
Homonoff [2013] show that low-income individuals respond just as strongly to less salient
cigarette taxes. Finkelstein [2009] shows that toll agencies increase tax rates significantly
in response to the implementation of automated toll collections that are less salient to the
taxpayer. Finally, the survey data reported in Cabral and Hoxby [2012] suggest that the
reason homeowners hate property taxes is hat they pay a salient lump sum once a year,
rather than having the payments withheld (as is the case in payroll taxes for example).
Other works in this literature include Gamage and Shanske [2011] and Goldin [2015].

2.3 Political economy of tax policy

This section presents a model of the political economy of tax policy. The government can
affect tax revenue through the tax rate, tax code, and unobserved policies. The tax code
10
See also Besley and Case [1995b], who find that Democrat governors increase sales taxes, income taxes,
and corporate taxes when they face a binding term limit. Nelson [2000] analyzes how rates relate to electoral
competitiveness.

53
affects revenues through changing the tax base, broadly defined. The goal of the model is to
isolate sources of variation in the tax code and tax revenues, in order to clarify the role of
the tax code in setting fiscal policy.

2.3.1 Tax policy

A state government is setting policy for an income stream Y > 0, say personal income. Tax
policy has three elements. The first is the tax rate τ , where I assume a linear marginal rate.
The second is the written tax code, modeled as a vector of text features x ∈ Rp , where
p > 0 is the (arbitrary) number of measurable text features. The third element is other
(unobserved) policy measures that affect tax collections, denoted by u ∈ Ro , where o is the
dimensionality of the unobserved policy space. This includes all policies besides the rate and
the written tax code, including for example the appointment of a lax tax regulator.
Therefore tax policy is a vector (τ, x, u). Total government revenue G(·) is determined
by
G(τ, x, u) = τ B(x, u)Y (τ, x, u),

where B(x, u) ∈ (0, 1] is the tax base (the proportion of income that is taxable). We take
“tax base” to be broadly defined, as the aggregate result of all tax policies besides the tax
rate.
Define g = log YG as the government revenue as a share of income, known in the previous
literature as “tax burden” [Reed, 2006]. Let b = log B. Then government revenue g is given
by
g(τ, x, u) = log τ + b(x, u).

The goal of the analysis is to understand the effect of changing text feature i on government
revenue through its effect on the tax base. Holding rates and other policies fixed, the effect

54
on log revenue of changing text feature i is

∂g ∂b
= .
∂xi ∂xi

The goal of the empirical analysis to provide estimates for this quantity. We want to identify
∂g ∂g
the set of tax code features for which ∂xi
> 0 or ∂xi
< 0. This set of features is the effective
tax code.
Extracting these features is a challenge empirically due to the presence of the unobserved
policies. Assuming (for simplicity) a linear specification for b(·) with data indexed by state
s and year t gives:
gst = log(τst ) + x0st β + u0st π + st . (2.3.1)

The basic empirical goal is to identify the set of tax code features i for which

βi 6= 0.

Each coefficient gives the average effect of increasing tax code feature i on the tax base
holding other policies constant.
Cross-sectional OLS could be used to estimate (2.3.1) while excluding ust . OLS would
procure consistent estimates for β under the assumption that x is uncorrelated with the
unobserved policies u. However, states may have different unobserved policies that are
correlated with both the tax code and revenue. Cross-sectional estimates of β are therefore
likely inconsistent.
Panel data improve the situation through fixed effects estimation. If state-level changes
in x are uncorrelated with state-level changes in u, including state fixed effects for state
and year panel OLS will procure consistent estimates for β. However, if the changes are
correlated, then the OLS estimates would still be biased. Again, this type of correlation is
likely. The changes in x are likely correlated with changes in u because tax code reforms

55
are chosen jointly and endogenously with other non-written policy reforms. If there is a
change in the ruling political party in the state, for example, the new leaders will change the
statutes x as well as other non-legislative policies u. Therefore looking at the average effect
of the change in text over time would procure biased estimates.
To estimate β, one needs variation in x that is uncorrelated with changes in u. Obtaining
this variation through instrumental variables is the goal of the empirical strategy described
in Section 2.6.

2.3.2 Tax Politics

This section discusses a change in political power. In a standard model of ideological political
parties without commitment, a new party will come in and change tax policy in line with their
ideological preferences. In the case of U.S. politics, for example, one would expect Democrats
to increase overall tax collections [Reed, 2006]. They could do so through changes to the tax
rate τ , as emphasized in standard political economy models, or through the base b(x, u) by
changing the tax code x. It is an open empirical question whether the tax rate or the tax
code is the more important component of state fiscal policy.
Consider a model with two ideological political parties, Democrat and Republican. Let
D = 1 for Democrat control and D = 0 for Republican control. The policy components
can be understood as functions of the ruling party: τ (D), x(D), and u(D). The empirical
work is designed to understand better the relative importance of these components in how
political parties implement fiscal policy.
The effect of Democratic control on revenue can be decomposed as

p o
∂g ∂ log τ X ∂g ∂xi X ∂g ∂uj
= + +
∂D ∂D i=1
∂xi ∂D j=1 ∂uj ∂D
p
X
ρg = ρτ + βi δi + U
i=1

56
∂g ∂ log τ ∂g ∂xi
Po ∂g ∂uj
where I have defined ρg = ∂D
, ρτ = ∂D
, βi = ∂xi
, δi = ∂D
, and U = j=1 ∂uj ∂D . The goal
of this paper is to provide evidence on these quantities. Appendix B.4 uses the coefficients
estimated in the empirical section to compute this decomposition and in particular measure
U.
I observe g, τ , x, and D. I do not observe u. I have panel variation in D, as described
in Section 2.7. The effect of Democratic control on revenue, ρg , and on the tax rate, ρτ , can
be obtained from estimating
gst = ρg Dst + st

log τst = ρτ Dst + st

Although u is unobserved, it is uncorrelated with treatment under the identification assump-


tions described below. Therefore these quantities can be estimated consistently.
Similarly, one can estimate the average effect of Democratic control on each text feature
i, δi , by estimating
xist = δi Dst + ist , ∀i.

Again, with variation over time in Dst , δi is consistently estimated in spite of u being omitted
∂xi
from the regression. These estimates identify the set of tax code features for which ∂D
>0
∂xi
or ∂D
< 0. Then one can compare these features to those in the effective tax code – those
∂g
that have a causal effect on revenue ( ∂x i
6= 0). This will provide insight into whether and
how political parties use the tax code (rather than tax rates) to implement fiscal policy.

2.4 Data on tax policy and state politics

This section takes account of the data sources for tax revenues and political control of state
government. Subsection 4.1 accounts for the tax policy data. Subsection 4.2 accounts for the
data on state politics. These data are used to analyze the role of the tax code in implementing
redistributive policies.

57
2.4.1 Tax policy data

There are three sources of tax data by state: actual tax revenues, statutory tax rates, and
the value of targeted income flows. The data consists of a 48-year panel (1963-2010) for all
fifty states for three taxes: personal income tax, corporate income tax, and sales tax. This
section discusses the sources for this data.
The data on taxes collected by state governments comes from the State Government
Finances census. This data have been used in many previous papers analyzing the public
finances of state government [e.g. Serrato and Zidar, 2014, Fajgelbaum et al., 2015]. The
census has separate categories for the taxes. First, there is personal income tax. Second
there is corporate net income tax. For sales tax, I use the general sales and gross receipts
tax category. The other major source of state tax revenue is the excise tax (selective sales
tax), which is an interesting topic for future work. Few state governments collect significant
revenue from property taxes, which primarily fund local government.
The state tax rate data are obtained from the World Tax Database and Tax Foundation.
The data include information on rates and brackets. The regressions condition on the rate
structure non-parametrically by including fixed effects for sets of years where the revenue
source had the same rates and brackets, excluding automatic bracket changes due to inflation.
This is preferable due to non-linearity in the tax rate structure.
The data on the value of the income flows are constructed from Bureau of Economic
Analysis (BEA) data. Personal income tax is the most straightforward; the BEA provides
data on total personal income in each state. Corporate income is measured as gross operating
surplus (corporate profits). The income flow for sales tax is measured as sectoral GDP for
retail trade (SIC 44-45); alternative specifications use total state GDP for robustness. For
further robustness, I have also used the federal tax collections by state for personal income
and corporate income. If the rate is staying the same, the proportion of state tax collections
to federal tax collections should be constant unless there are changes in the state law on the
tax base.

58
Table 2.1: Summary Statistics on Tax Data

Base Variable Mean Median Std. Dev.


Corporate Income Income Value ($B) 61.60 38.22 79.72
Tax Rate 0.06 0.06 0.03
Revenue ($B) 0.61 0.28 1.06

Personal Income Income Value ($B) 145.53 88.60 178.69


Tax Rate 0.05 0.06 0.04
Revenue ($B) 6.3 2.9 10.1

Sales Income Value ($B) 173.58 19.89 535.96


Tax Rate 0.04 0.04 0.02
Revenue ($B) 6.68 4.02 8.26
Observation is a state-year. Dollar amounts deflated to 2007 dollars.

The tax data is defined for income source r (personal income, corporate income, and
sales), state s (all fifty states), and year t (every odd-numbered year between 1963 and
2010). The main outcome measure for the regressions below is the tax burden, used in
previous work on state public finance [Chernick, 2005, Reed, 2006, Leigh, 2008]. The tax
r
burden is the revenue collected divided by the value of the income flow. Define gst , the log
tax burden for source r in state s at time t, after being residualized on the source-state-rate
fixed effects and source-year fixed effects.
Table 1 reports summary statistics on tax variables in the sample. Each of the three
tax bases is responsible for large amounts of revenue for state governments. As noted in
Fajgelbaum et al. [2015], in recent years these three state taxes together have accounted for
four percent of U.S. GDP.

2.4.2 State Politics Data

This section accounts for the data on state politics. The empirical goal is to determine how
the revenue impacts of the effective tax code relate to the preference of the two political
parties to use that language. This data has been used in many previous papers analyzing
the politics of state fiscal policy [e.g. Besley and Case, 2003, Reed, 2006, Leigh, 2008].

59
Table 2.2: Summary Statistics on State Politics Data

Variable Mean Std. Dev


Democrat Governor .5875 .4923
Democrat Lower Chamber .6627 .4728
Democrat Upper Chamber .6307 .4826

Previous Democrat Governor Vote Margin (%) 7.216 23.943


Lower Chamber Democrat Margin (%) 11.106 19.98
Upper Chamber Democrat Margin (%) 11.406 20.99

Tied Parties in Lower House .0320 .1761


Tied Parties in Upper House .0459 .2094

Log Financial Administration Expenditures 10.20 1.265


Summary statistics on state political variables.

The data include party control for both houses of the state legislatures as well as the gov-
ernorship, for the years 1963 through 2010. More specifically, it has the number of Democrat
and Republican seats in each legislature, and the number of Democrat and Republican votes
cast in the previous governor election. These measures allow me to measure the effects of
party control on policy and on legislation using panel data.
Table 2 shows summary statistics for the political variables in the dataset. Democrats
had a small advantage in both legislatures and governorships during this time period. There
were many changes in control, however. There was some change in the partisan makeup of
state governments, whether in the legislature or governorship, in 72.8% of state-bienniums.
This is the variation used in the political analysis.

2.5 Tax Legislation Data

This section describes the approach for extracting and constructing statistical representations
of tax legislation. Text is becoming an important data source for empirical work in economics
and political science [Gentzkow and Shapiro, 2010, Quinn et al., 2010, Jensen et al., 2012,
Hansen et al., 2014, Gentzkow et al., 2015, Ash et al., 2015b]. This paper builds on this

60
previous work.
Subsection 5.1 describes the source and scope of the raw legislation text. Subsection 5.2
describes the methods for tokenizing the text for analysis. Subsection 5.3 discusses how to
extract tax legislation and represent it in the regression analysis.

2.5.1 Raw Text Data

The data on legislation consists of the full text of U.S. state session laws through 2010. The
data go back to inception for most states. The “session laws” consist of the collection of
statutes enacted by a legislature during a legislative session – published every year or every
two years. All of the data are constructed biennially to account for this issue. The sample
is all fifty states, and the 24 bienniums starting in 1963 and ending in 2010.
There is a large literature in political science examining the process of drafting and en-
acting legislation [Tollison, 1988, Jansa et al., 2015]. State legislators can draft their own
statutes, and most of them are trained to do so from attorney experience. They also delegate
the task of drafting legislation to aides. Given the difficulty of crafting bills from scratch,
legislators often borrow language from other legislatures or from interest groups. For exam-
ple, Hertel-Fernandez and Kashin [2015] use text analysis to measure the influence of the
conservative lobbying group ALEC on state legislatures. There are also non-partisan profes-
sional organizations such as the National Council of State Legislators, and the American Law
Institute, which provide model legislation. These organizations provide information about
which states have adopted particular provisions. Legislators pay attention to what other
states are doing to make their state appear more competitive [Berry and Baybeck, 2005].
Legislation is the ideal source of legal text for examining the legal underpinnings of tax
policy. Unlike common-law subjects like criminal law and tort law, tax does not have a
substantial judge-made component. Shaviro [1990] recounts the cyclical back-and-forth in
tax legislation, where the base is narrowed and broadened over time.
There are some important caveats for interpreting this data. These statutes may amend

61
Figure 2.5.1: Scanned Session Laws and Resulting OCR

Scanned image and resulting OCR text for an example statute in the text data. This example is from the
Texas Legislature for the 1889 session.

or repeal previous statutory provisions, or create new provisions. These documents give the
“flow,” rather than the “stock,” of legislation. Sometimes the laws include bills that failed or
were vetoed. A team of research assistants reviewed samples and found that these practices
do not change significantly within state over the time period.
Figure 1 shows an example page of a scanned statute, with the corresponding OCR. As
can be seen, the OCR is quite high-quality. The scans for the period 1963-2010 are mostly
high-quality.

2.5.2 Processing Text Features

The first step is to merge and process all of this raw text. A script serves to append pages,
remove headers, footers, tables of contents, indexes, and other non-statute material. Then it

62
segments the text into individual bills, acts, and resolutions using text markers for the start
of new statutes. These include indicators for new Chapters, Articles, or Titles, such as a
line with “CHAPTER” followed by a Roman numeral. Some states have their own standard
indicators, such as “P.A” followed by a number to reflect a new “Public Act.” The script
also uses common text for the beginning of a statute preamble (e.g., “An act to...”) and for
enacting clauses (e.g., “Be it enacted that...”). Research assistants checked samples of the
statute segmenter for each state-year to make sure it worked well. This procedure results
in 1.56 million statutes for the years 1963 through 2010, or about 650 statutes on average
per state per year. For comparison, the federal government enacts about 5,000 statutes per
year.
The next step is to process the text for analysis. The tax code is a complex object and
could be represented as data in any number of ways. A massive amount of dimension re-
duction is necessary, and following the previous literature I eliminate any long-range word
order information. In particular, only short-range (at most four words in a row) syntac-
tic information is preserved and the code is represented as a frequency distribution over
phrases. As there are improvements in storage and computer processing power, more refined
representations of language may be useful in future research.11
The basic methods on tokenizing text and representing documents as frequency distri-
butions over tokens has become relatively standardized in the literature on political text
analysis [Gentzkow and Shapiro, 2010, Quinn et al., 2010, Jensen et al., 2012, Gentzkow
et al., 2014, Ash et al., 2015b, Gentzkow et al., 2015, Jelveh et al., 2015]. A script removes
upper-case, splits text into sentences,12 and removes punctuation. It then splits sentences
into words and stems word endings using the Snowball stemmer [Porter, 2001]. This stem-
mer is less aggressive than the better-known Porter stemmer. For example, “corporate” and
11
For example, Levy and Goldberg [2014] use grammatically parsed sentences rather than word order to
train Word2vec embeddings.
12
I use NLTK’s Punkt Sentence Tokenizer, which “uses an unsupervised algorithm to build a model for
abbreviation words, collocations, and words that start sentences; and then uses that model to find sentence
boundaries. This approach has been shown to work well for many European languages.” See nltk.org/
api/nltk.tokenize.html.

63
“corporation” would both become “corpor.” The Porter stemmer would reduce both words
to “corp,” which would confuse these corporation-related terms with unrelated terms like
“corpus.” On the other hand, I needed a more aggressive method than a lemmatizer, which
preserves “corporation” and “corporate” as separate tokens. Lemmatizers are also much more
computationally expensive than stemmers due to the required dictionary look-up.
Most previous social science papers using text analysis represent documents as frequency
distributions over stemmed words or n-grams. The disadvantage with a “bag of words”
approach is that important information about word order is left out. The segments “corpo-
rate tax on sales” and “sales tax on corporations” are treated as equivalent under a bag-of-
stemmed-words representation, even though they clearly concern taxes on different bases.
The disadvantage of a “bag of n-grams” approach is that some phrases are counted inde-
pendently even when they are clearly subordinate to a longer noun phrase. For example,
the segments “corporate income tax” and “personal income tax” would both include “income
tax” and “tax” as independent grams, even though the full three-word segments should be
represented as singular concepts.
This paper improves on previous approaches by tagging parts of speech and representing
documents as frequency distributions over informative noun phrases and verb phrases. For
example, “personal income tax” becomes “person_incom_tax.” To do this, the script first
tags each token by part of speech (nouns, verbs, adjectives, etc.) using the algorithm de-
scribed in Collins [2002].13 Then it links up phrases based on the part-of-speech patterns,
using a set of tag patterns based on Denny et al. [2015] but significantly extended for the
purposes of legal language.14 I consulted legal concept dictionaries to develop the list. For
example, “beyond a reasonable doubt” is preposition-determinant-adjective-noun (PDAN).
To be tokenized, words that form phrases have to co-occur together frequently relative
13
This is the averaged perceptron tagger implemented in the python package textblob.
14
These include A, V, N, AN, NN, VN, VV, NV, VP, NNN, AAN, ANN, NAN, NPN, VAN, VNN,
AVN, VVN, VPN,ANV,NVV,VDN, VVV, NNV, VVP,VAV,VVN, NCN,VCV, ACA, PAN, NCVN, ANNN,
NNNN, NPNN, AANN, ANNN, ANPN, NNPN, NPAN, ACAN, NCNN, NNCN, ANCN, NCAN, PDAN,
PNPN, VDNN, VDAN, VVDN for Adjective, Noun, Verb, Preposition, Determinant. Verb particles are
coded as “V” to ensure verb phrases such as “go along” are connected.

64
to how often they occur apart.15 As an example, the sentence

Eligible individuals must pay personal income tax on foreign business


earnings

becomes

elig_individu must_pay person_income_tax foreign_busi_earn.

Note that “on” is excluded from the processed output because only nouns, adjectives, and
verbs are included as single-word tokens.
Once the distribution of phrases is computed, infrequent phrases are excluded. Words
and phrases are included if they occur in at least 500 legislative sessions, or five states per
year on average. This results in a baseline vocabulary of 55,217 tokens. This threshold
was chosen for computational constraints and to exclude state-specific language (such as the
names of counties).

2.5.3 Extracting Tax Code Text Features

The next step is to construct measures of phrase frequencies for each of the three tax sources:
corporate income, personal income, and sales. The approach is to weight the statutes by
their similarity to these sources using Word2Vec, a natural language tool for representing
words as vectors introduced in Mikolov et al. [2013]. This section describes this procedure.
There is no straight-forward way to identify the tax statutes for each source. Some
statutes can have an impact on the tax sources without mentioning them explicitly, while
other statutes may mention the taxes but have little relation to them. This means that
searching for particular keywords would result in both false positives and false negatives.
15
They have to meet a point-wise mutual information threshold [Church and Hanks, 1990]. This is given
by Pr(w1 , w2 )/(Pr(w1 ) Pr(w2 )): the probability that the words co-occur, divided by the product of the
probability (frequency) that the words occur individually. The threshold chosen was informed by Mikolov
et al. [2013], but given the frequency threshold and part-of-speech restrictions, the PMI threshold did not
matter much. For three words, the threshold used was Pr(w1 , w2 , w3 )/(Pr(w1 ) Pr(w2 ) Pr(w3 )), and similarly
for four words.

65
With such a large database of statutes (1.56 million), meanwhile, manual classification is
also infeasible.
The approach is to use Word2Vec, which provides an off-the-shelf technique for mapping
the relations between words and phrases [Mikolov et al., 2013]. This tool has proven per-
formance on web search, language translation, and speech recognition. It can be trained
relatively quickly on a large corpus, and thereafter can quickly compute similarity statistics
between words and documents.
The model is described in detail in the appendix. The important point is that Word2Vec
provides a function for mapping phrases to vectors in [−1, 1]300 using information from
surrounding phrases. For a given word, Word2Vec looks at the sequence of nearby words
and learns which other words/phrases in the vocabulary would fit into the same context. It
is best-known for recognizing analogies. After being trained on the state session laws corpus,
for example, the model knows that

vec["corporate income tax"] − vec["corporation"] + vec["person"]

≈ vec["personal income tax"].

While Word2Vec is not the only solution to the problem of identifying tax legislation, it
does provide a quick and effective solution that provides intuitive rankings and can be used
feasibly on such a large corpus. The tool provides relations between similar phrases that can
be used to isolate tax code changes and better interpret results.
Classifying the statutes starts with three textual labels for the revenue sources, indexed by
r ∈{person_incom_tax, corpor_incom_tax, sale_tax}. Represent by ~r the word vector
for income label r. Table 2.3 gives examples of the types of phrases that are most related to
the three labels, as scored by the trained model.
Next the statutes k are scored by their relation to the three tax sources r. Let Pk be the
set of words and phrases in k. The average cosine similarity between the tokens in k (with

66
Table 2.3: Most Similar Phrases to Revenue Source Labels

Corporate Income Tax


corporate income tax income tax credit individual income tax
corporate income income tax liability credit for tax
corporate franchise tax insurance premium tax tax credit
income tax income tax return state income tax

Personal Income Tax


personal income tax for that taxable year corporate income
corporate tax state income tax individual income tax
income tax taxpayer net income tax
income tax return individual taxpayer individual income tax return

Sales Tax
sales tax local sales use tax revenue
use tax state sales additional sales
sales and use tax county sales amount of sales
local sales tax sales or use tax sales tax revenue

corresponding vector ~i) and tax source r (with corresponding vector ~r) is

1 X ~i · ~r
S(k, r) =
|Pk | i∈P ||~i|| · ||~r||
k

where |Pk | is the number of phrases in statute k. The metric inside the summation, the
cosine similarity between the phrases, is the standard metric in the NLP literature on word
vectors [Levy et al., 2015].16 It will weight highly the statutes that have words Table 2.3,
and other words that appear in similar contexts.
Next the statute similarities S(k, r) ∈ [0, 1] are used as weights to construct phrase
frequencies for each state, year, and source. Let Kst be the set of statutes enacted by the
government of state s at period t. Let fki equal the frequency of phrase i in statute k. The
16
Cosine similarity has also been used in recent political science work showing text reuse across states [e.g.
Hinkle, 2015, Jansa et al., 2015].

67
weighted term frequency of phrase i for source r in state s at time t is

X
S(k, r)fki .
k∈Kst

One could use this expression as the measure of text features, but in that case the effects may
be driven by the volume of legislation enacted, rather than the phrases chosen. The focus is
on the allocation, rather than the volume, of language, so proportional (relative) frequencies
are constructed. The proportional frequency for phrase i divides the term frequency for i by
the summed frequency over all phrases:

S(k, r)fki
P
ẋir
st = Pp k∈K
P st i
(2.5.1)
i=1 k∈Kst S(k, r)fk

The numerator is the term frequency of i in state s during year t, weighted by the similarity
to tax source r of the statutes where it appeared. The denominator is the total phrase
frequency in a state-year for a given source. Therefore ẋir
st is the proportional frequency for

phrase i.
As mentioned, the session laws give the flow rather than the stock of legislation. Therefore
ẋir
st can be seen as giving the within-state-source change in tax legislation. To control for

nationwide legislative trends by source, each ẋir


st is de-meaned by the average for each source-

year. Formally, define


1 X ir
xir ir
st = ẋst − ẋ
njt j jt

where the second term is the source-year average for the njt states who imposed tax r at
biennium t. Finally, each text feature variable is standardized by dividing by the within-
source standard deviation.
Let nr be the number of state-year observations for revenue source r. Define the nr × p

68
matrix
pr
x1r11 ... x11
.. .. ..
r
. . .
X = ..
x1rst . xprst
.. .. ..
. . .

as the matrix of residualized proportional phrase frequencies. The corresponding column vec-
tors are given by xir = (xir ir r 1r 2r Pr
11 , ..., xst ), and corresponding row vectors are xst =(xst , xst , ..., xst ).

For the remainder of the analysis, p = 25, 000 is selected for computational tractability,
where the 25,000 words and phrases with the highest document frequencies are included.
This cutoff was chosen to limit the number of phrases while ensuring that the phrases for
the relevant tax sources were included. This included the list of all tax sources: “sales tax,”
“personal income tax,” and “corporate tax.”

2.6 Constructing the effective tax code

This section describes the method for constructing the effective tax code by measuring the
effect on tax revenues of text features in tax legislation. The goal is not to estimate precisely
the effect on revenue of any particular phrase. One cannot measure the tax code perfectly,
and phrases are correlated with each other, so the coefficient for any particular phrase cannot
be treated as precisely estimated. Instead the objective is to construct a ranking of phrases
that can be used to explore how political parties use the tax code in their implementation
of state fiscal policy.
The approach is analogous to Gentzkow and Shapiro [2010], who use political floor speech
to score language by its association with Democrat or Republican congressmen. They then
use that measure to study political bias in newspaper articles. In this paper, phrases are
scored by their effect on tax revenue, for use in studying the role of the effective tax code in
the political economy of fiscal policy.
Subsection 6.1 outlines the approach for high-dimensional estimation in an OLS frame-

69
work. Subsection 6.2 constructs Bartik-type instruments for legislative text using variation
from statutes enacted in neighboring states. Subsection 6.3 describes the approach for reg-
ularized 2SLS estimation using these instruments.

2.6.1 Ordinary Least Squares

This section presents the basic econometric framework for measuring the average effect of a
phrase on tax revenue collected. The estimation strategy is described first using an ordinary
least squares framework, to describe the basic structure of the data.
The data is indexed by st, for state s and biennium t. Let P be the set of phrases
in the vocabulary {1, 2, ..., p}. Let R be the set of revenue sources (corporate income tax,
personal income tax, sales tax). The goal is to estimate the effect βir for each phrase i ∈ P on
r
government revenue gst for each source r ∈ R. A linear model of the effect of the proportional
frequency xir
st on legislation related to r enacted in state s at biennium t for phrase i on the

r
tax burden gst from source r, holding all other phrases constant, is

r
gst = βir xir r
st + st . (2.6.1)

r
Recall that gst has been residualized on a state fixed effect and a year fixed effect, while xir
st is

the flow of legislation and has been residualized on a year fixed effect. This means that this
regression controls for time-invariant state-level factors, as well as time-varying nationwide
factors. A positive βir means that when phrase i appears more in statutes related to source
r, there is a higher revenue for that source. A negative βir means that when phrase i appears
more in statutes related to source r, there is a lower measured revenue for that source. For
statistical inference one could cluster standard errors by state [Bertrand et al., 2004].
Consistent estimation of (2.6.1) using OLS relies on the assumption that there are no
state-level time-varying factors affecting both the phrase frequencies xir r
st and revenue gst .

Tax legislation is chosen endogenously in response to other economic factors affecting tax

70
revenues; Chang [2014] documents this type of endogeneity in the context of state R&D tax
credits. These other factors may include other phrases j, which are correlated with phrase i
as well as government revenues. One could try to include other phrases in the regression, but
there would be a problem of multi-collinearity if one tried to include all p = 25, 000 phrases.
For these reasons, OLS will likely provide inconsistent estimates for many of the phrases.

2.6.2 Instrumental Variables

Because of these identification issues, to estimate βir we need exogenous variation in xir
st that

is uncorrelated with other policies that affect tax revenues. The approach to solving the
identification problem is to construct a set of Bartik-style instruments for phrase frequencies.
Exogenous variation comes from diffusion of text from other states in the same regional
judicial district.
Bartik [1991] constructs instruments for labor demand using nationwide industry-specific
shocks, which are exogenous from the perspective of any individual locality. If one interacts
this shock with the sectoral composition of a locality, one obtains exogenous cross-sectional
variation in labor demand. Another related instrument is that used for state tax rates in
Fajgelbaum et al. [2015], who used tax rates in neighboring states as instruments in 2SLS
estimates for labor supply elasticity with respect to top tax rates.
This paper uses regional variation over time in phrase frequencies from enacted legislation
by state governments. The basic motivation stems from previous work documenting diffusion
of policies from state to state [Berry and Berry, 1990, 1992, Case et al., 1993, Berry and
Berry, 1994, Mooney and Lee, 1995]. This diffusion includes not just discrete policies but
the actual wording of statutes; Jansa et al. [2015] document that state legislatures frequently
borrow the text of legislation from other states. The goal is to find variation in statute text
17
that is more or less randomly assigned conditional on the fixed effects.
17
Balla [2001] shows that the text of insurance legislation preferentially diffuses in states whose commis-
sioners are members of the same insurance regulation professional association. Chernick [2005] documents
that the regressivity of taxes are actually negatively related to those of neighbors, showing that diffusion of

71
Figure 2.6.1: Federal Circuit Court Map

Cross-sectional variation is needed so that a year fixed effect can be included in the
regressions to control for national trends. Because the focus is on legal language, a channel
for preferential diffusion of legal language – as opposed to policies generally – is desirable.
A good fit for these needs is to use lagged regional variation in language within the federal
appellate court circuits, which comprise a set of eleven judicial districts in the federal court
system. Figure 2.6.1 illustrates the groupings of states into circuits which has been in place
since 1982. For the earlier years in the sample (1963-1981), Alabama, Florida, and Georgia
were part of the Fifth Circuit (rather than the Eleventh).
These districts were founded and are administered by the federal government (rather
than state governments) with a focus on federal law. The state governments have little
direct influence on the circuits or the decision-making of their judges, yet circuit judges are
asked to interpret and apply state law in numerous cases every year [Hoover, 1982]. Previous
empirical work has shown that policies diffuse between state governments in the same circuit
language is not necessarily accompanied by diffusions in substantive policy.

72
even more than they do between neighboring states or states in the same political party
[Bird and Smythe, 2008], supporting the idea that the circuit represents a regional legal
community [see also Carp, 1972]. Hinkle [2015] in particular uses anti-plagiarism technology
to show that the actual text of statutes preferentially diffuses to states in the same federal
circuit.
This is useful empirically because the timing of legislative choices in one state in a circuit
is likely unrelated to non-legislative factors affecting tax collections in other states in the cir-
cuit. While the groupings are more-or-less contiguous, they are not based on historically or
politically important relationships. Assignment is more or less arbitrary; for example, Wash-
ington and Utah are grouped together yet their state governments share little in common
politically.
The text instruments are constructed as follows. For each source r, state s, time t, and
phrase i, construct the leave-one-out average frequency for other states in the same federal
circuit for the previous year,

ir 1 X
zst = xir
jt−1
|J(s, t)| − 1
j6=s,j∈J(s,t)

where j indexes the other states, J(s, t) is the set of states in s’s circuit at t, and |J(s, t)| is
the number of states in J(s, t). This gives the lagged leave-one-out average phrase frequency
for phrase i on legislation for source r in the circuit.
Define
1r pr
z11 ... z11
.. .. ..
. . .
Zr = ..
1r pr
zst . zst
.. .. ..
. . .

the nr × q matrix of Bartik phrase instruments for revenue source r. Let zrst denote a row
vector from this matrix and consider the following two-stage least-squares framework. The

73
first stage for each phrase i is
0
xir ir
st = zrst γi + ηst , ∀i, r (2.6.2)

where γi ∈ Rq is a row of the p × q matrix of first-stage coefficients Γ. The second stage


equation for the effect of xir
st on revenue is the same as the OLS equation from above:

r
gst = βir xir r
st + st . (2.6.3)

The empirical goal is to obtain consistent estimates of βir from Equation (2.6.3).
The key identifying assumption for this IV setup is that

ir
Cov(zst , st ) = 0, ∀i ∈ P, r ∈ R.

r
This requires that the instrument only affect gst through its effect on xir
st . That is, a state

legislature’s choices of tax law phrases will have an impact on the phrases chosen by other
state legislatures in the circuit, but will not otherwise affect tax revenue collections as a
share of income (conditional on the fixed effects). This is justified by the same arguments
that that are used for traditional Bartik instruments. With the inclusion of state-source and
source-year fixed effects, this specification compares well to other recent work using related
methods [e.g. Bertrand et al., 2013, Acemoglu et al., 2014]. In the data, the instruments
are not significantly related to current period observables, including tax revenues and state
GDP. The 2SLS results reported below are not sensitive to the inclusion of a variety of sets
of covariates that one would expect to be correlated with tax collections, including a state’s
own GDP and/or the average GDP for the rest of the circuit.

2.6.3 High-Dimensional IV Estimation

Even if the instruments are valid, there are too many of them. The 2SLS estimator is
consistent only for small numbers of instruments relative to the sample size [Chao and

74
Swanson, 2005, Hansen et al., 2008]. In this dataset there are 25,000 instruments but just
3,500 observations. This subsection describes the use of regularization methods for dealing
with high dimensionality.
A set of recent econometrics papers have made progress in solving the many-weak-
instruments problem using regularization methods such as Lasso (Least Absolute Shrink-
age). Lasso and related methods (such as Ridge regression and elastic net) can improve
the performance of IV under the assumption of a sparse first stage, that is, when a rela-
tively small number of instruments suffice to approximate the effect of all the instruments
on the endogenous regressors. This active research area includes Caner [2009], Gautier and
Tsybakov [2011], Okui [2011], and Carrasco [2012].
The main approach in this paper is based on Belloni et al. [2012], who use post-Lasso
to obtain optimal instruments under sparsity. That paper provides conditions under which
post-Lasso IV is consistent and asymptotically normal under heteroskedastisticy and non-
normality. Another related paper is Lin et al. [2015], who use Lasso (and more general
regularization methods) in the case of a large number of instruments as well as a large
number of endogenous regressors. They prove consistency for a regularized 2SLS estimator
under sparse effects of the instruments and the endogenous regressors.18
In this case, the sparsity assumption means that there are a set of factors, traditions,
cultures, or ideas that are active within the federal judicial circuits and driving changes in
the tax code. Lasso provides a data-driven method for recovering proxies for these factors
from the lagged leave-one-out average phrase frequencies.
Lasso is implemented as follows. There are p = 25000 phrases and q = 25000 instru-
ments. Estimating the 625 million elements of Γ is computationally expensive. To ease the
18
An alternative approach to dimension reduction is the factor IV method using principal components
analysis (PCA) to reduce the matrix of instruments [Bai and Ng, 2008]. This method is widely used in
the time series forecasting literature in empirical macroeconomics. Bai and Ng [2010] show that when
there are underlying factors driving both the endogenous regressors and the instruments, then the principal
components of the matrix of instruments will themselves provide the optimal instruments. For robustness,
all of the regressions below were alternatively implemented using factor IV in the first stage (as detailed in
Appendix B.2). The main results were similar under factor IV, but the out-of-sample prediction (Subsection
2.6.5) was worse, so the sparse-instruments specification is reported in the main text.

75
computational burden, I first run each of the 625 million univariate regressions

j ij
xist = γij zst + ηst

and exclude from the first stage any elements of z for which γ̂ij has a t-statistic below 3
(chosen arbitrarily, to ensure statistical significance).
The first stage regression for phrase i solves

J
1 X
γ̂i = arg minq { ||xist − Zγi ||22 + λ (||γij ||1 )} (2.6.4)
γi ∈R 2n j=1

where the last term is the L1 (Lasso) penalty. The penalty parameter λ is chosen following
the methods in Belloni et al. [2012] and Lin et al. [2015].19
The regularized first stage forces sparsity; most elements of Γ go to zero. Lasso provides
its own regularized estimates for Γ̂, but following Belloni et al. [2012], the preferred approach
is to use post-Lasso.20 First-stage estimates are obtained by running OLS using only the
non-zero phrases from Lasso, with standard errors clustered by state. An advantage of using
post-Lasso is that it provides a first-stage F-statistic for evaluating instrument relevance.
This is discussed further in Subsection 2.6.4.21
19
An alternative specification included an L2 penalty in addition to the L1 penalty. This L1/L2 specifica-
tion is the elastic net model, which has better performance than Lasso under high levels of multi-collinearity
[Zou and Hastie, 2005]. The elastic net estimator also satisfies the assumptions of the more general regular-
ization framework in Lin et al. [2015]. Zou and Hastie [2005] show that one of the limiting cases for elastic
net is Lasso, while the other is equivalent to choosing regressors via soft thresholding. Caner and Zhang
[2014] study the elastic net in a GMM framework.
20
The Lasso and post-Lasso second-stage results were similar in this sample.
21
First stage regressions were implemented in Python using scikit-learn (for Lasso and elastic net) and
statsmodels (for OLS). I followed the advice of Dubé et al. [2012] in setting numerical tolerance levels.

76
The rest of the IV method is standard. The estimated Γ̂ is used to predict

pr
x̂1r11 ... x̂11
.. .. ..
. . .
X̂r = .. ,
pr
x̂1rst . x̂ st
.. .. ..
. . .

the nr × p matrix of instrumented (and fixed-effect-transformed) phrase frequencies for each


revenue source. This matrix includes only the exogenous variation in phrase changes due to
the instruments. Then the average partial effect of phrase i on tax revenues can be estimated
using
r
gst = βir x̂ir r
st + st . (2.6.5)

This equation uses the instrumented phrase frequency x̂ir


st . Holding other phrases constant,

this will procure the average effect on tax revenues for source r of using phrase i once more
in statutes related to r.

2.6.4 First Stage Statistics

This section reports statistics on the first stage regressions. The main goal is to show that
the post-Lasso obtains a sufficiently high first-stage F-statistic, and therefore instrument
relevance, for a large set of phrases.
Figure 2.6.2 shows the distribution of the first-stage F-statistics. A set of 8,923 phrases
have a strong first stage. In the main analysis, phrases with a weak first stage are excluded.
This set of phrases is still large enough for prediction and analysis, as demonstrated below.
For comparison, Gentzkow and Shapiro [2010] use a vocabulary of 1,000 phrases.
Figure 2.6.3 is designed to assess the common-sense idea of whether the instrument
phrases are affecting their own phrase in other states, to substantiate the diffusion process.
The figure shows that when ranking the instruments j by the t-statistic of γij for any given

77
Figure 2.6.2: Distribution of First-Stage F-Statistic

Distribution of first-stage F-statistics for main IV specification. Vertical line at F = 10. The
mean is 14.1 and the median is 7.4. Out of a vocabulary of 25000, 8,923 phrases have an
F-stat greater than 10.

Figure 2.6.3: Instrument Phrases Have a Stronger Effect on Own Endogenous Phrase

(a)

Frequency distribution over ranking of same phrase in first stage t-statistics.

78
endogenous regressor i, the t-statistic for one’s own phrase tends to rank highly among the
set of phrases. This supports the idea that language diffusion is occurring through preference
for phrases in the same judicial circuit.
To further assesses the usefulness of the Bartik instrument, alternative specifications were
run that intuitively should have a weaker first stage. First, a ten-year lag was used rather
than a two-year lag, which results in a 20% smaller mean F-statistic and 23% smaller median
F-statistic. Second, a set of instruments were constructed from non-tax statutes (rather than
tax statutes), which results in a 10% decrease in the mean F-statistic and a 12% decrease in
the the median F-statistic. These alternative specifications are weaker, as intuition would
suggest.

2.6.5 Out-of-sample prediction of revenue with the effective tax

code

With thousands of regressors, reporting the individual 2SLS estimates is not very informative.
Many of them are significant just due to statistical noise. Therefore this section takes a
machine-learning approach to see whether a regression model trained on the textual features
of tax code changes can predict out-of-sample changes in tax revenue. The prediction is run
conditional on a constant rate structure, and uses the exogenous variation in the tax code
derived from the instruments.
The method for out-of-sample prediction is partial least squares regression (PLS). PLS is
a dimension-reduction technique similar to principal component analysis (PCA), where high-
dimensional data is projected down to a lower-dimensional space while retaining as much
information as possible. The key difference from PCA is that PLS is a supervised technique:
Components are constructed to maximize the predictiveness for an outcome variable [Chun
and Keleş, 2010]. Previous examples of PLS in social-science text analysis include Jensen
et al. [2012] and Jelveh et al. [2015].
r
The outcome variable is gst , which has been residualized on a source-year fixed effect and

79
r
a source-state-rate fixed effect and then standardized. PLS is then used to predict ĝst . As the
explanatory data, the actual phrase frequencies Xr and the instrumented phrase frequencies
X̂r are alternatively used. The former should predict better, but the latter only uses causal
variation in the effective tax code. If the instrumented tax code changes predict changes in
tax revenues, that uncovers an aggregate causal effect of the tax code on tax revenues.
Chun and Keleş [2010] show that PLS can be inconsistent with a large number of non-
predictive noise variables. To avoid this problem, phrases with a weak t-statistic for βir
(below three) are excluded. In the set of instrumented phrases, any phrases with a first-
stage F-statistic below 10 are also excluded. The training data included a random sample
of 70% of the observations, while the test data included the remaining 30% of observations.
The best highest predictions were obtained for between 25 and 50 PLS components.22
Figure 2.6.4 illustrates the predictiveness of the PLS model for the three tax sources,
illustrating that the method is recovering revenue-relevant features from the tax codes. In
these graphs, the horizontal axis is the true tax-revenue change for each test observation.
The vertical axis is the PLS-predicted tax-revenue change based on the phrase frequencies
for that test observation. The red line gives the best linear fit for these observations. In the
left column, the actual phrases are used; in the right column, the instrumented phrases are
used.
The PLS model has good out-of-sample predictiveness. With the actual phrases, the
correlation between truth and prediction is very high for all three income sources: 0.88,
0.89, and 0.84, respectively. Using the instrumented phrases results in a worse prediction
(.65, .53, and .41, respectively), perhaps because the filtering on F-statistic removes useful
predictors. But there is still a clear correlation between truth and prediction. Taking the
square of the correlation coefficient gives the R2 . With the actual phrase frequencies, we
can say that roughly 80% of the variance in tax revenues (remaining after partialling out
the source-year and source-state-year fixed effects) is explained by the text features of the
22
The regressions used the Python implementation of PLS from the scikit-learn package.

80
Figure 2.6.4: Out-of-Sample Tax Revenue Predictions

(a) Corporate Income Tax

(b) Personal Income Tax

(c) Sales Tax

PLS model trained with most predictive phrases (p < .01) and 25 PLS components. Horizontal axis is the true tax-revenue change for that test

observation; the vertical axis is the PLS-predicted tax-revenue change based on the phrase frequencies for that test observation. The red line gives

the best linear fit. In the left column, the actual phrases are used; in the right column, the instrumented phrases are used.

81
tax code. As a comparison, Gentzkow and Shapiro [2010] report an in-sample correlation of
0.61 for their measure of political ideology (they do not report an out-of-sample correlation).
The in-sample correlation for the PLS model used here is over 0.9 for all the measures.
These statistics demonstrate the out-of-sample predictiveness of tax code features, hold-
ing major tax rates constant. The PLS model is learning information about the tax base
from tax code changes and using it to predict revenue changes. This validates the use of this
measure in the subsequent analysis.

2.6.6 Analysis of phrases that affect tax revenues

The next step is to analyze the set of predictive phrases. Because the particular phrases
chosen by the algorithm do not play a major role in the empirical analysis, this section can
be seen as a set of descriptive statistics. These statistics are useful because they show how
the phrases in the tax code relate to changes in the tax base (rather than other features of
the law).
The 2SLS framework discussed so far procures a set of statistics for ranking phrases by
their predicted effect on tax revenues. First, the F-statistic for the first-stage regression can
be used to filter out phrases for which there isn’t sufficient exogenous variation in the phrase
from the instruments. Second, the t-statistic for the second-stage regression summarizes the
impact of the phrase on tax revenue, accounting for both the covariance and the noise in the
data.
The simplest approach would be to rank all of the phrases by their t-statistic and then
to look at the top and bottom phrases for each revenue source. This turns out not to be
very informative, since the phrases chosen are from a variety of topics, some of which are
not related to the tax base. To get more interpretability, I construct phrase topics and rank
the phrases within topic by their revenue effect.
Topics are constructed by using k-means clustering to partition the Word2Vec space into
clusters of related words and phrases [Yu et al., 2013, Guo et al., 2014]. Given a set of word

82
vectors {~q1 , ~q2 , ..., ~qP }, the algorithm chooses clusters Q = {Q1, Q2 , ...Qk }, to minimize the
within-cluster sum of squares. Formally, the model solves

k X
X
arg min ||~q − µi ||2
Q
i=1 q~∈Qi

where µi is the mean of the points (the centroid) for cluster Qi . Once initialized, the
algorithm re-assigns samples to clusters and recomputes centroids until convergence to a
threshold. The only parameter needed is the desired number of clusters. After experimenting
with between k = 5 and k = 250 topics, I settled on k = 25, which is small enough to allow
reports for all topics but still produced reasonable results in terms of interpretability.
Within topic, the F-statistics and t-statistics are collected for each phrase by revenue
source. Phrases with low F-statistics and low t-statistics are filtered out, and the remainder
are ranked by the t-statistic. The full ranking of phrases is available in an appendix. In
Table 2.4, I report a selection of topics for personal income tax and sales tax, respectively,
which are relatively useful for interpretation. Words in bold are discussed in the text. The
numbers on the topics are arbitrary and were determined randomly by the algorithm.
The top half of Table 2.4 looks at phrases related to the income tax. First consider
Topic 3 (panel a), which includes phrases related to pensions and dependents. The phrase
“such dependent” refers to exemptions and credits for children and other dependents.23 The
phrase “such service” is found in income tax statutes giving deductions for certain service
expenses.24 More work is needed on this point, but the fact that using “such” increases
revenue may reflect the effect of higher clarity in the tax code, as the word “such” serves to
23
E.g. 1994 Kansas H.B. 2929: “Income earned on an individual development account shall be exempt
from state income taxation under the Kansas income tax act... There shall be no limit on the amount of
earned income of a dependent child, who is a recipient of aid to families with dependent children, deposited
in an individual development account of such dependent child that was created or organized to pay for
educational expenses of such dependent child.”
24
E.g. 1995 Idaho H.B. 132: “In the case of an individual, there shall be allowed as a deduction from gross
income either (1) or (2) at the option of the taxpayer: Itemized expenditures of not to exceed one thousand
dollars ($1,000) per cared for member incurred in providing personal care services to or for an immediate
member of the taxpayer’s family; such services may be provided either in the taxpayer’s home or the family
member’s home.”

83
Table 2.4: Phrases with a Significant 2SLS Effect on Tax Revenues

Phrase T-statistic Phrase T-statistic

Personal Income Tax

Topic 3 Topic 7
such dependent 5.89 buildings and structures 14.92
retirement purposes 5.34 construct operate and maintain 14.51
such service 5.34 adjacent land 12.62
in excess of year 4.54 street and road 10.52
pay period -4.31 sewage disposal plant 10.13
bi-weekly -4.31 curb gutter 9.66
pension board 3.71 aforesaid purposes 9.07

Topic 19 Topic 22
dependent children 7.09 school activity -7.14
daycare service -5.30 high school graduate 5.88
self-support 4.57 school graduate 5.56
legal settlement 4.44 educational purposes 4.57
center 4.00 adult education 4.13
medical condition -4.00 academic 3.99
admission 3.92 vocation 3.96

Sales Tax

Topic 8 Topic 12
not-for 5.60 retail store -8.70
internal combustion engine 4.73 fell 8.11
certain motor vehicles -4.60 fuel dealer 6.84
snow -4.20 such distributor 6.80
such vehicle 4.00 wrapper 6.45
antique -3.91 director of agriculture 5.59
movement of traffic 3.62 frog 4.79

Topic 14 Topic 19
retail install sale -8.70 aid to families 8.11
on the real property -7.49 cost of health 6.29
such dwelling 6.20 retard service 4.95
certificate of sale -4.93 state plan 4.84
other rights -4.57 educate or train -4.69
valuable consideration 3.88 psychiatrist -4.57
execute and deliver 3.87 first aid -4.37

84
clarify the targets of deductions and exemptions.25
Topic 7 (panel b) relates to construction projects and expenses. These phrases can
affect income tax through deductions and credits for various home-related expenses. For
example, the phrase “building or structure” can be used to define homes for the purposes of
homeowners’ exemptions.26
Next, Topic 19 (panel c) again has phrases related to dependents, but with an emphasis
on health care. The phrase “dependent children” occurs frequently in income tax statutes in
determining credits for parents of children. For example, some statutes provide for medical
expense deductions for dependent children.27 Similarly, “medical condition” is relevant to
income tax for determining what types of health expenses are deductible, or for determining
targeted benefits.28 Third, “daycare service” is another relevant deductible expense in state
income taxes, as part of deductible childcare expenses.29
Topic 22 (panel d) is related to education and training. “Adult education” is relevant to
income tax in light of the deductions for adult educational expenditures provided in many
states.30 Meanwhile, the word “vocation” is often found in income tax statutes as part of the
25
In Appendix B.5, I show that using the 2SLS rankings to suggest replacements to increase revenue often
results in adding “such” or “said” before phrases.
26
E.g. 1997 California AB 2797: “For the purposes of this section, the term ’premises’ means a house or
a dwelling unit used to provide living accommodations in a building or structure and the land incidental
thereto, but does not include land only, unless the dwelling unit is a mobile home. The credit is not allowed
for any taxable year for the rental of land upon which a mobile home is located if the mobile home has been
granted a homeowners’ exemption under Section 218 in that year.”
27
E.g. 2001 Idaho HB 121: “’Eligible medical expense’ means an expense paid by the taxpayer for medical
care described in section 213(d) of the Internal Revenue Code, and long-term care expenses of the account
holder and the spouse, dependents and dependent children of the account holder.”
28
E.g. 1995 South Carolina SB 753: “There is allowed as a deduction in computing South Carolina taxable
income of an individual the following: Two thousand dollars for each adopted special needs child... For
purposes of this item, a special needs child is a person who is:unlikely to be adopted without assistance
as determined by the South Carolina Department of Social Services because of conditions such as ethnic
minority status, age, sibling group membership, medical condition, or physical, mental, or emotional
handicaps.”
29
E.g. 1995 New Mexico HB 11: “Any resident who files an individual New Mexico income tax return
and who is not a dependent of another taxpayer may claim a credit for child daycare expenses incurred and
paid to a caregiver in New Mexico during the taxable year by such resident...The caregiver shall furnish the
resident with a signed statement of compensation paid by the resident to the caregiver for daycare services.
Such statements shall specify the dates and the total number of days for which payment has been made.”
30
E.g. 2006 Kentucky HB 1: “An employer who assists an individual to complete his or her learning
contract under the provisions of this section shall receive a state income tax credit for a portion of the
released time given to the employee to study for the tests. The application for the tax credit shall be

85
definition of income-generating activities that are taxable.31
The bottom half of Table 2.4 reports the revenue-relevant phrases by topic for sales taxes.
Topic 8 (panel a) has to do mainly with automobiles. These phrases often crop up in sales
tax statutes to define what types of vehicles and fuels are exempt from sales taxation.32
These phrases affect revenues through their influence on the exemptions.
Topic 12 (panel b) is related to retail trade. The phrases in this topic appear frequently
in sales tax legislation, for example to describe which retailers must collect sales tax.33 Note
again the inclusion of “such distributor”: just as we saw with income tax, adding the word
“such” tends to increase revenue.
We see the same trend in Topic 14 (panel c). Both “such dwelling” and “such transaction”
are predicted to increase sales tax revenues. Seeing all of these phrases together is suggestive
that clarifying language tends to increase tax collections. This suggests a role for good legal
writing in the efficient implementation of tax policies. Meanwhile, “valuable consideration”
is often used to define what constitutes a taxable sales transaction.34
Finally, Topic 19 (panel d) has phrases related to health care. Compare this set of
phrases to that selected for income tax; it is the same topic, but a different set of phrases
are chosen as relevant. This shows that the rankings are picking out different phrases for
different revenue sources, which makes intuitive sense. These phrases are going to mostly be
related to sales tax exemptions for health care services. However, they can also be used for
supported with attendance documentation provided by the department for adult education and literacy.”
31
E.g. 1993 Mississippi SB 2720: “For the purposes of this article, except as otherwise provided, the term
’gross income’ means and includes the income of a taxpayer derived from salaries, wages, fees or compensation
for service, of whatever kind and in whatever form paid, including income from governmental agencies and
subdivisions thereof; or from professions, vocations, trades, businesses, commerce or sales, or renting or
dealing in property, or reacquired property.”
32
E.g. 2007 California SB 774: “There are exempted from the taxes imposed by this part the gross receipts
from the sale of, and the storage, use, or other consumption in this state of, by a qualified person any of the
following. . . . any motor fuel or mixture of motor fuels that is . . . Advertised, offered for sale, suitable
for use, or used as a motor fuel in an internal combustion engine.”
33
E.g. 23 VAC 210-630: “The preceding paragraph establishes when a fuel dealer must collect tax at the
time of sale, and it does not establish any rule of exemption for consumers.”
34
E.g. Oklahoma Code 68-1352: “’Sale’ means the transfer of either title or possession of tangible personal
property for a valuable consideration regardless of the manner, method, instrumentality, or device by
which the transfer is accomplished in this state.”

86
classifications related to non-profit status.35
The rest of the word clouds for income tax and sales tax, and the word clouds for corporate
tax, are in the appendix. While some of the topics are not interpretable, the ones listed in
this section are suggestive. Overall, they suggest that the 2SLS estimates are measuring a
strong impact on revenue of tax expenditures: exemptions, deductions, and credits. This
is consistent with the view that the tax code has an important impact on tax revenues by
changing the legal definition of the tax base.

2.7 Effect of political control on tax policy

This section describes the empirical strategy for measuring the effect of political control on
tax policy. Subsection 7.1 describes the research design. Subsection 7.2 reports the results on
tax rates and revenues. Subsection 7.3 provides descriptive statistics on the tax law phrases
that are related to political party control.

2.7.1 Empirical strategy

There are many ways one could try to measure the effect of political control on state tax
policy. One could look at the number of years of political party control, for example. To
keep things simple, this papers estimates the sign of the average change from one party to
the other.
The empirical approach for identifying the effect of political control on tax rates and
tax code language is a panel data design similar to a regression discontinuity (RD) [Lee
and Lemieux, 2010]. This approach has gained traction in political economy through the
use of electoral votes as the forcing variable, with a cutoff at 50 percent of the popular
votes [e.g. Lee et al., 2004]. Leigh [2008] and Beland [2015] document causal effects on state
35
E.g. Nebraska Reg. 1-090: “A nonprofit organization operating any of the following facilities that are
licensed under the Health Care Facility Licensure Act is only exempt on purchases for use at the facility. .
. . A health clinic, when one or more hospitals, or the parent corporations of the hospitals, own or control
the health clinic for the purpose of reducing the cost of health services...”

87
policy of barely electing a Democratic (rather than Republican) governor. Warren [2009] and
De Magalhães and Ferrero [2015] take the analogous approach to state legislatures, using the
number of legislative seats belonging to the political parties as the forcing variable. Warren
[2009] shows that there is a positive local treatment effect of a Democratic legislature on the
total tax burden.
Caughey et al. [2015] show that a RD using seat shares in the legislature as the forcing
variable is associated with covariate imbalance. Therefore I do not use a standard RD with
small bandwidths around the threshold. The regressions include all the observations. To
control for the type of variation that RD’s are designed to control for, I include polynomials
above and below the cutoff. These regressions are designed to isolate the variation from
going from minority Democrat to majority Democrat.
Let Dst be an indicator variable or set of indicator variables for stronger Democratic
control in state s at period t. This could include an indicator equaling one for a Democrat-
controlled lower chamber, for example. Let dst be the vote-share variable(s) (in percentage
points) determining Dst , with associated polynomial(s) f (dst ) for use in RD-type regressions.
The empirical analysis uses the party in charge of the legislative chambers, the governorship,
and an index for the number of these governing bodies that are controlled.
The estimating equation is a panel data regression with polynomials in the forcing vari-
ables. For outcome variable yst , estimate

0
yst = αst + Dst ρ + f (dst ) + st

where αst may include state and year fixed effects. For f (dst ), specifications include linear
or quadratic polynomials. Again I cluster standard errors by state.

88
2.7.2 Effect of political control on tax revenues and tax rates

This subsection provides estimates for the effect of political control on tax policy outcomes
besides the tax code. I provide estimates of the effect of political control on marginal tax
rates and tax revenue. I also estimate the share of the revenue effect due to the rate structure.
This analysis adds to the previous literature [Chernick, 2005, Reed, 2006, Leigh, 2008] by
providing separate estimates for income tax, corporate tax, and sales tax.
The first question is whether political parties change marginal tax rates when they come
into office. Let mrst be the top marginal tax rate for source r in state s at time t. The effect
of political party control on the marginal rate is obtained from

mrst = αst + ρrm Dst + f (dst ) + rst (2.7.1)

where αst includes state and year fixed effects by revenue source.
The second question is whether party control is associated with changes in tax revenues
as a share of income. This involves estimating ρg from Subsection 2.3.2 for each revenue
r
source r. Let government revenue be given by gst . The empirical model is

r
gst = αst + ρrg Dst + f (dst ) + rst (2.7.2)

where αst include state and year fixed effects by revenue source. In these regressions, Dst ∈
[0, 3] is defined as an index of Democrat control, equaling the number of governing bodies
(the legislative houses and the governorship) that Democrats control. A tied legislature
adds one half to this index. The f (dst ) term includes linear polynomials above and below
the cutoffs, for both legislatures and the governorship.
The third question is what share, if any, of ρrg is due to changes in the rate structure. This
is not given by the estimate for ρrm from (2.7.1), which just gives the effect of political control
on the marginal rate. The rate structure is complex, with multiple rates and brackets, so

89
Table 2.5: Effect of Political Control on State Tax Policy

(1) (2) (3)


Marginal Tax Rate Tax Revenue Tax Revenue
(including rates) (net of rates)
Effect of Democrat Power

Income Tax 0.0384 0.0460 0.0134


(0.0782) (0.0811) (0.0765)

Sales Tax -0.0766 -0.176 -0.157


(0.0644) (0.114) (0.110)

N 3091 3091 3091


State-Source FE’s Yes Yes Yes
State-Source-Rate FE’s Yes
Estimates for effect of Democrat Control index on the marginal tax rate and tax revenue, separately by tax source. Observation
is a state-source-year. Regressions include linear polynomials in the forcing variables for both houses and governor, separately
for values above and below the cutoffs. Outcome variables are standardized so coefficients can be interpreted as changes in the
standard deviation of the outcome variable. Standard errors in parentheses, clustered by state. + p<0.1, * p<0.05, ** p<0.01.

one cannot estimate the share of the revenue change due to the rate structure, ρrτ , strictly
from the marginal rate. Instead, this quantity is obtained as follows. First, estimate ρ̂rg from
(2.7.2) as previously described. Second, estimate ρ̈rg from (2.7.2) using state-rate-source fixed
effects, as described in Subsection 2.4.1. This provides an estimate for the effect of political
control on revenues purged of any effects from the rate structure. Then the share of the
revenue due to tax rates is obtained from ρrτ = ρ̂rg − ρ̈rg .
Table 2.5 reports estimates for the effect of Democrat control on marginal tax rates and
tax revenues. First, Column 1 reports the effect of party control on the marginal tax rate. If
tax rates are the most important component of fiscal policy, then changing political parties
should be associated with a change in the marginal tax rate. As can be seen in Column 1,
there is no statistical effect of party control on the tax rate.
Next, Columns 2 through 3 show the effect of political control on tax revenues, with
and without state-source-rate fixed effects. The estimates are noisy, and not statistically
significant. The coefficients are positive for income tax and negative for sales tax. As

90
expected, the coefficients are smaller when fixed effects for the rate structure are included.
These coefficients are used in the computation of U in Subsection 2.8.3 below.

2.7.3 Tax code language associated with political control

This section discusses the method and provides summary statistics associated with the the
effect of political control on tax code language. As with Subsection 2.6.6, the individual
phrase coefficients are not treated as precisely estimated. Instead, the goal is to construct
a rough ranking of the political party differences for use of phrases in tax legislation. Then
these scores can be used to analyze the political economy of state fiscal policy, as done in
Section 2.8 below.
The estimating equation is a phrase-wise panel data regression. The set of outcomes is
the vector of tax code language features xrst . There are separate regressions for each source
r, with the goal of testing whether different political parties have different priorities for the
incidence of tax liability. Formally, estimate

xir ir
st = δir Dst + f (dst ) + st , ∀i, r

for each phrase, to get the average effect of Democrat control Dst on the use of phrase i for
tax code provisions related to source r.
Table 2.6 reports samples of phrases associated with Democrat and Republican control
for the same selection of topics used in Table 2.4. A positive t-statistic is associated with
Democrats; a negative t-statistic is associated with Republicans. Unlike Gentzkow and
Shapiro [2010], these phrases are not clearly partisan. This reflects that the text of legislation
is not as politicized as floor debate speech.
The top half of Table 2.6 reports the phrases that Democrats and Republicans prefer to
use on income tax legislation, with the bottom half doing do for sales tax legislation. These
particular phrases don’t play a large role in the analysis but show which types of policies the

91
Table 2.6: Phrases with a Significant Relation to Political Party Control

Personal Income Tax


Topic 3 Topic 19
Democrat Phrases Republican Phrases Democrat Phrases Republican Phrases
written design other pension physical health home health care
rate of wage employee organization first aid response person
period of employ plan or system medical such commitment
service as member age of sixty service and supply private practice
become member compensation school of medicine federal social security act
normal retirement date patrolmen convincing evidence epileptic
attainment of age retirement contribution foster care

Sales Tax
Topic 8 Topic 12
Democrat Phrases Republican Phrases Democrat Phrases Republican Phrases
drive wagon use swim
commission of motor licensed motor vehicle put stockyard
drive vehicle thirty feet other means ink
respective jurisdiction livery material hook
vehicle vehicle or trailer firework wild
trip passenger motor vehicle apply to sale fur
operating motor vehicle clearance groceries prohibit the use

92
parties spend time on legislating.
One notable example is the issue of “home health care” (Topic 19). Health care services
are an important but somewhat controversial target for tax expenditures, as a deduction
for income tax and an exemption for sales tax. Recent press articles have detailed how
tax-cutting Republicans tend to favor these exemptions and deductions.36 A second notable
example is the inclusion of “groceries” in sales tax legislation (Topic 12). Democrats have long
favored exempting groceries from sales tax, although Republicans are generally opposed.37
This is a clear example of a redistribution-focused tax expenditure.
For a more detailed discussion of tax code phrases, see Appendix B.3. That section
discusses phrases identified by the regressions as having both a political impact and a revenue
impact. The appendix discusses examples of where those phrases may be found in the
statutes, and also provides examples of court cases construing the language in revenue-
relevant caselaw.

2.8 The effective tax code and the politics of redistribu-

tion

This section analyzes the role of the effective tax code in how political parties implement
preferred redistributive policies. I provide two methods. In Subsection 2.8.1, I construct
predicted changes in revenue by state-year-source using the tax code features, and test how
that predicted measure responds to changes in political control. In Subsection 2.8.2, I focus
on the granularity of the language features, relating the average revenue impact of a phrase
to the average political impact on a phrase, separately by revenue source. Subsection 2.8.3
provides a discussion.
36
E.g. Pennsylvania State Capital Newsfeed, Rep. Will Tallman, Aug. 27 (2015).
37
E.g. “Alabama House Democrats make creating jobs a priority,” Nov. 2. (2011), quoting a House
Democrat this way: “"It’s not like people have a choice about eating. The grocery tax is unfair, immoral
and it has to go."

93
2.8.1 Testing for the effect of political control on textually predicted

tax revenue

This section reports estimates for the effect of political control on the predicted revenue
changes from tax legislation. I construct a metric for the predicted change in tax revenue
based on the effective tax code. I then estimate the effect of changes in political party control
of state government on this metric.
The metric is constructed as follows. For each state, year, and revenue source, define

p
r
X β̂ir
g̃st = xir
rt
i=1
σ̂ir

where σ̂ir give the standard error for the 2SLS estimate β̂ir . Only phrases with a strong
first-stage F-statistic in the 2SLS framework are included. This can be understood as the
predicted tax revenue change in a state-year, weighted by the precisions of the estimated
effects of each phrase.
Then I regress
r
g̃st = αst + φr Dst + f (dst ) + st

to obtain the effect of Democrat control, φ̂r , on the predicted tax revenue change from the
effective tax code. I cluster standard errors by state.
Because the statute text gives the flow of legislation, the outcome variable is first-
differenced. This will eliminate bias from time-invariant state characteristics. The term
αst includes source-year fixed-effects and and state-source trends [Bertrand et al., 2004].
The source-year fixed effects control for bias associated with time-varying national trends in
the outcome variable. The state-source trends are designed to account for preexisting trends
in the outcome variable that may be correlated with treatment.
The term f (dst ) includes linear or quadratic polynomials in the forcing variables (vote
share for governor, seat shares for the legislatures), separately interacted with each revenue

94
source, and separately for observations above and below the cutoff. This allows for the model
to flexibly control for vote and seat shares. All observations are included, rather than only
observations near the cutoff as would be done in a standard RD. Including these time-varying
forcing variables is designed to control for other political institutions and factors that may
affect the tax code text.
For Dst , I include three specifications. First, I include an index for Democrat control of
state government that counts the number of bodies controlled by Democrats, from zero to
three. Second, I break out the governor separately from the legislature, where the Legislative
Power index is the number of legislatures controlled by Democrats. Third, I include separate
regressors for each legislature. In the case of a tied legislature, that adds one-half to the
index (or is a one-half instead of zero-one in the indicators).
The regression results are reported in Table 2.7. These regressions analyze the combined
effects of changes in Democratic control on the tax revenue text. The regressions look at
the within-state effect of changes in political control to the three government bodies. The
regressions included corporate taxes, but those are not reported here because there were no
significant effects.
Columns 1 and 4 look at the aggregate effect of Democratic power in state government.
There is a significant positive effect on text-predicted income tax revenue, and a significant
negative effect on text-predicted sales tax revenue. When Democrats take control of an
additional wing of state government, there is a 0.14 standard deviation predicted increase in
income tax revenues due to tax code changes, and a 0.07 standard deviation decrease in the
predicted sales tax revenues due to tax code changes.
Columns 2 and 5 look at the separate effects of the governor and the legislatures, where
Legislative Power is the number of legislatures controlled by Democrats. This shows that
both the legislature and the governor have positive estimated effects on income-tax-increasing
tax code language. Only the effect of the governor is individually significant, however. The
sales-tax effects are wholly driven by the legislature.

95
Table 2.7: Effect of Party Control on Text-Predicted Tax Revenue

(1) (2) (3) (4) (5) (6)

Income Tax

Democrat Power 0.0992** 0.144**


(0.0337) (0.0478)
Legislative Power 0.0822+ 0.120
(0.0480) (0.0771)
Democrat Governor 0.140+ 0.147* 0.180* 0.187*
(0.0724) (0.0720) (0.0838) (0.0814)
Dem. Upper House 0.0985 0.113
(0.0610) (0.0949)
Dem. Lower House 0.0514 0.0950
(0.104) (0.132)
Sales Tax

Democrat Power -0.0324 -0.0677*


(0.0254) (0.0311)
Legislative Power -0.0388 -0.0865*
(0.0284) (0.0382)
Democrat Governor -0.0158 -0.0179 -0.0434 -0.0527
(0.0442) (0.0444) (0.0538) (0.0530)
Dem. Upper House -0.0362 -0.121+
(0.0458) (0.0604)
Dem. Lower House -0.0509 -0.0990+
(0.0460) (0.0536)

State-Source FD’s X X X X X X
Source-Year FE’s X X X X X X
State-Source Trends X X X X X X
Forcing Var Polys X X X
Estimates from regressing an index for Democrat control on the predicted revenue change based on the text, as described
in Subsection 8.2, separately by tax source. N = 3, 588 observations, state-source-year. Columns 4 through 6 include linear
polynomials in the forcing variables for both houses and governor, separately for values above and below the cutoffs. Outcome
variables are standardized so coefficients can be interpreted as changes in the standard deviation of the outcome variable.
Standard errors in parentheses, clustered by state. + p<0.1, * p<0.05, ** p<0.01.

96
Columns 3 and 6 include all three bodies as separate regressors. In the case of income
tax, all three bodies contribute materially to the effect in terms of magnitudes. Again, only
the governor effect is individually statistically significant. In the case of sales tax, both
the upper house and lower house of the legislature have a statistically significantly negative
estimated effect. The governor again has no effect.
Table 2.8 provides two additional specifications to probe the robustness of the results.
First, in Columns 1 through 3 the regressions include lagged covariates for state gross do-
mestic product and state expenditures on financial administration. These are two major
economic and political factors that may be correlated with tax code changes and tax rev-
enue collections. These do not change the results. Columns 4 through 6 add the lagged
dependent variable to test for further confounding trends. This also does not change the
results.
Furthermore, the results are robust to the inclusion of non-interacted linear or quadratic
polynomials in the forcing variables (rather than interacted). Adding an interacted quadratic
polynomial strengthens the sales tax effect but weakens the income tax effect. Adding state-
source fixed effects in addition to the state-source first-differences, which can be seen as
double-differencing the outcome, does not affect the income tax effect but weakens the sales
tax effect (without changing the sign). Using more lags in the covariates variables, and/or
using the current-period values, also does not change the results. Finally, adding the federal-
circuit average of state GDP also does not change the results.
The Democrat Power index can be interpreted as the predicted change in standard de-
viations of revenue from Democrat control of an additional wing of state government. This
means that moving from full Republican control to full Democrat control is associated with
a 0.42 standard deviation increase in income tax revenues due to the tax code. This roughly
translates to a 34.7% increase in income taxes as a share of personal income, and an addi-
tional $1.96 billion in income tax revenues (in 2007 dollars) in the average state.
For sales tax, moving from full Republican control to full Democrat control is associated

97
Table 2.8: Party Control and Text-Predicted Tax Revenue (Additional Specifications)

(1) (2) (3) (4) (5) (6)

Income Tax

Democrat Power 0.138** 0.145**


(0.0458) (0.0418)
Legislative Power 0.107 0.120+
(0.0735) (0.0680)
Democrat Governor 0.186* 0.190* 0.182* 0.189*
(0.0775) (0.0763) (0.0807) (0.0794)
Dem. Upper House 0.130 0.164+
(0.0879) (0.0818)
Dem. Lower House 0.0738 0.0708
(0.134) (0.128)
Sales Tax

Democrat Power -0.0829* -0.0780*


(0.0326) (0.0310)
Legislative Power -0.106** -0.100*
(0.0396) (0.0419)
Democrat Governor -0.0503 -0.0596 -0.0477 -0.0567
(0.0579) (0.0575) (0.0499) (0.0497)
Dem. Upper House -0.143* -0.155**
(0.0606) (0.0559)
Dem. Lower House -0.105+ -0.0777
(0.0590) (0.0617)

State-Source FD’s X X X X X X
Source-Year FE’s X X X X X X
State-Source Trends X X X X X X
Forcing Var Polys X X X X X X
Lagged Covariates X X X X X X
Lagged Dep. Var. X X X
Estimates from regressing an index for Democrat control on the predicted revenue change based on the text, as described
in Subsection 8.2, separately by tax source. N = 3, 588 observations, state-source-year. Columns 4 through 6 include linear
polynomials in the forcing variables for both houses and governor, separately for values above and below the cutoffs. Outcome
variables are standardized so coefficients can be interpreted as changes in the standard deviation of the outcome variable.
Standard errors in parentheses, clustered by state. + p<0.1, * p<0.05, ** p<0.01.

98
with a predicted 0.21 standard deviation decrease in sales tax revenues due to the tax code.
That corresponds to a 25.9% decrease in sales taxes as a share of sales receipts income. In
2007 dollars, the average state loses $1.73 billion in sales tax revenue based on tax code
changes.
To show these results a different way, Figure 12 plots the change in tax-predicted revenue
before and after a political takeover of the state legislatures. The top figure shows the lower
chamber effect, while the bottom figure shows the upper chamber effect. The purple line
gives the trend in text-predicted revenue for income tax, while the orange line does so for
sales tax. Note that the houses often change party control the same year or in nearby years,
which explains the similarity of the trend. Republican takeovers are also included in the
graph – with the sign of the outcome variable reversed so that the treatment is treated
symmetrically. Excluding Republican takeovers results in a similar trend.
These graphs show that after a change in political control, the text features of income
tax legislation change in a way that would predict increasing revenues. Conversely, the text
features of sales tax legislation change in way that would predict decreasing revenues. These
graphs support the idea that the political parties put different types of language into the tax
code when they are in power, in such a way that Democrats increase revenues from income
tax but decrease revenues from sales tax.

2.8.2 Assessing the granularity of the redistributive consequences

of tax code language

This section complements the previous section by looking at language features directly. The
approach tests for differences in how political parties use phrases based on their predicted
revenue consequences. The goal is to assess the level of textual subtlety that is driving the
effects. Democrats may be selecting broadly different policies and topics than Republicans,
or they may be making specific textual substitutions within the same topics. The approach
in this section is to provide evidence on this question.

99
Figure 2.8.1: Dynamic Effect of Democrat Control on Text-Predicted Revenue

(a) Dynamic Effect of Democrat Takeover of Lower House

(b) Dynamic Effect of Democrat Takeover of Upper House

Event study graphs for change in text-predicted revenue before and after Democratic takeover of the lower
legislature (panel a) and upper legislature (panel b), respectively. The vertical axis is the metric for state-
predicted revenue g̃, as described in the text. The horizontal axis is years before and after a change in
political control. Republican takeovers are also included, with the sign of the outcome variable reversed.

100
For each phrase i in the vocabulary P , I have a set of statistics from the previous sections.
First, I have a t-statistic for the 2SLS effect of phrase i on revenue from source r, β̃ir .
Second, I have a t-statistic for the effect of Democrat control on frequency of phrase i for
tax legislation on source r, δ̃ir . To test whether the language used by political parties is
systematically related to the revenue consequences of that language, I regress

β̃ir = α + ψr δ̃ir + ri , ∀r ∈ R (2.8.1)

to estimate ψ̂r . A positive ψ̂r means that relative to Republicans, Democrats tend to use
revenue-increasing phrases on revenue source r. A negative ψ̂r means that Democrats tend
to use revenue-decreasing phrases on revenue source r. The regression is weighted by the
average frequencies of the phrase observations.
Topics are constructed using the k-means clustering method described in Subsection
2.6.6. The topics are used, firstly, to cluster standard errors by 50 topics.38 Next, a varying
number of topic fixed effects are added to the regression. Then the regression obtains the
within-topic relationship of Democrat control and the revenue effect of language. The goal
is to assess the subtlety of the tax code differences that lead to the observed effects. If the
effect is killed off after adding a few topics, that suggests the party-control effect is driven
by choices across broad topics or policies. If the effect remains after adding fixed effects for
a large number of topics, that means the effect is driven by highly specific choices between
closely related words.
There are 8,923 phrases in the vocabulary. This means, for example, that with 100 topic
fixed effects, each topic will have 89 words on average. As one adds more topics, we are
looking at small groups of words on average – around 9 words each for 1000 topics, for
example. With 4000 topics, many words will have their own topic, and the topics that do
remain will be groups of closely related words and phrases. If there is still a significant
language effect with this many topics, we can say that the fiscal policy differences between
38
The basic results are statistically significant with at least 10 topic clusters.

101
Table 2.9: Granularity of the Revenue-Politics Relation of Tax Code Language

(1) (2) (3) (4) (5) (6) (7) (8)

Income Tax Effect 0.0528+ 0.0621* 0.0646* 0.119** 0.0739** 0.114** 0.139* 0.0804
of Dem Power (0.0274) (0.0243) (0.0241) (0.0317) (0.0221) (0.0376) (0.0604) (0.0724)

Sales Tax Effect -0.0802** -0.0647* -0.0714** -0.0688* -0.109* -0.0254 -0.0614 -0.110
of Dem Power (0.0251) (0.0258) (0.0225) (0.0336) (0.0470) (0.0630) (0.0650) (0.106)

Topic Fixed Effects - 10 100 1000 2000 3000 4000 5000


Mean Words per Topic 8923 892 89.2 8.92 4.46 2.97 2.23 1.78
Estimates from regressing the revenue effect of a phrase (beta) on the party effect on a phrase (delta), separately by tax source.
Columns use a different number of topic fixed effects. Outcome variables and explanatory variables are standardized. N = 8, 923
phrases with strong first-stage F-statistics. Standard errors in parentheses, clustered by 50 phrase topics. Regressions weighted
by average frequency of the phrase. + p<0.1, * p<0.05, ** p<0.01.

Republicans and Democrats are embodied in highly specific language choices in the tax code.
Table 2.9 reports the regression coefficients from (2.8.1). The column specifications grad-
ually add more fixed effects for topics. As before, we generally see that Democrats prefer
revenue-increasing language on income taxes, but revenue decreasing language on sales taxes.
For sales tax, the effect persists for up to 2000 topics. For income tax, the effect persists for
up to 4000 topics. Above those thresholds, the number of topics is large enough that the
effects go to zero.
These results support the view that the policy effects of tax code language are encoded in
relatively specific choices of legal wording. For sales tax, the effects are still significant with
2000 topics – that is, the effects come from the within-topic choices between 4 to 5 phrases
on average. Income tax legislation is even more granular – the effects are still significant for
4000 topics. This means that the effects of income tax legislation come from the within-topic
choices between 2 to 3 phrases on average. The redistributive fiscal policies implemented by
the political parties in the U.S. states consist of highly specific choices in the tax code.
Identifying these subtle differences would likely be difficult for researchers taking a more
standard approach of subjectively coding discrete policy changes. The natural language pro-
cessing tools are needed. Moreover, with such small clusters of phrases having an important

102
association with the politics of redistribution, it may be useful for researchers and policy-
makers to analyze these phrases more systematically. This demonstrates the usefulness of
natural language processing tools in the analysis of the tax code.

2.8.3 Discussion

Personal income taxes are progressive taxes. Sales taxes are regressive taxes. If Democrats
prefer more redistribution, then one would expect them to increase income taxes but decrease
sales taxes. As shown in Subsection 2.3.2, they do not change the major tax rates. However,
as we see here, they do change the tax code in line with this intuition. These results are
consistent with the idea that the tax code plays an important role in the political economy of
fiscal policy in the U.S. states. This may reflect that because tax rates are salient, political
bargaining is difficult and tends to stalemate. Instead, political parties have to implement
redistributive policies in the specifics of legislation, which allow for tradeoffs across different
issues.
These results are related to the evidence in Finkelstein [2009], who found that toll rates
were difficult to increase when they were salient and known to drivers, but could be increased
when the toll rates became less salient. In the case of state governments, politicians who are
interested in changing redistributive policy will have trouble doing so by changing the rates.
Because the major rates are so salient for voters, it is politically costly to change them. On
the other hand, changing the text of the tax code is less politically costly, since these textual
features are not salient to voters.

2.9 Conclusion

This paper has examined the role of the tax code in the political economy of fiscal policy
in the U.S. states. I used a data-driven method to extract the effective tax code – those
text features of legislation that have a causal impact on tax collections. The paper then

103
showed which phrases are related to changes in political control. Democrat control of state
government is associated with a preference for tax code language that is predicted to increase
the progressivity of the state tax system. The tax code, rather than the rate rate, is the
more important fiscal policy tool in the U.S. states. Work on state tax policy cannot limit
attention to changes in tax rates.
This paper’s analysis has focused on the positive questions of how the tax code affects
revenues and how political parties differ in the language they insert into the tax code. In
future work one could use these methods to analyze the equity and welfare consequences of
tax code features. An example of this analysis is provided in the appendix, which uses the
method to find replacement phrases that are predicted to increase tax revenues.
A natural extension of this project is in linking the text features of the tax code to other
text data. For example, it would be important to understand the role of courts in legal tax
avoidance. Second, it would be interesting to measure connections between legislative text
and newspaper text, to see how media attention influences the salience of tax code reform.
This approach has the potential to open up a new area for research in political economy
and public finance. Economists tend to view economic systems through national accounts
and other numerical data sets. Yet complex economies will not run well without a complex
corpus of statutes regulating it, and a well-managed system of courts enforcing those laws as
written. A data-driven approach to legal text will help uncover the impact of written laws
on the real economy.
As natural language processing technology improves, there will be a growing set of tools
for lawyers and legislators to use for designing legislation that more effectively implements
desired policy goals. This method is not limited in use to tax legislation. It could be applied
to any set of legal documents with a defined quantitative policy goal. For example, exoge-
nous variations in criminal laws could be analyzed for their effects on crime rates. Exogenous
variations in contract laws could be analyzed for their effects on transaction efficiency. And
“laws” in this context include not just legislation but court cases and administrative regula-

104
tions.

105
Chapter 3

Property taxes and local labor markets:


Evidence from staggered property
reassessments

Elliott Ash

This paper reports evidence on the potential benefits to local labor markets of increasing
property taxes as a source of local government revenue. The data come from three states (308
tax districts, 16 years) where tax districts reassess properties on a state-mandated staggered
cycle, resulting in exogenous variation in assessments and accompanying taxes. I find that
an increase in taxes due to random assessment causes economic expansion, with an increase
in local population and the number of local business establishments. These effects appear to
be driven by increases in government revenues and expenditures, rather than by changes in
borrowing behavior. These results suggests that property taxes are inefficiently low in this
sample of states.

106
3.1 Introduction

Although the property tax ranks as “the worst tax” in taxpayer surveys [Cabral and Hoxby,
2012], it remains the largest single source of local government revenue in the United States
and worldwide [BrÃŒlhart et al., 2015]. Economists have criticized the property tax as an
inefficient tax on capital [Arnott and Petrova, 2006], a regressive tax on the poor [Davis,
2015], and a deterrent to firm entry [Papke, 1991]. But they have also admired its capacity to
match costs of public services to their beneficiaries [Wallis, 2001], its resistance to evasion,1
and its tendency to align government incentives with the interests of residents [Glaeser,
1996].2
There is little existing empirical evidence on the economic impacts of property taxes.
Bakija and Slemrod [2004] report state-level panel regressions where higher property tax
collections were associated with reduced population. Johansson et al. [2008] report cross-
country panel regressions showing that increases in property taxes are associated with the
smallest negative impact on GDP of all the taxes studied. These studies do not address the
problem that governments set property taxes endogenously in response to the business cycle
and in conjunction with other relevant policy variables [e.g. Coate, 2011]. In consequence,
as Fullerton and Metcalf [2002] observe, the economic impacts of property taxes “have never
been reliably tested” (p. 1822). Fischel et al. [2011] agree that “our understanding of the
incidence of local property taxes is in a sad state” (p. 1).
The goal of this paper is to provide empirical evidence on the local labor market impacts
of changes in property taxes as a source of government revenue and expenditures. I begin
with a local public finance model, which makes the simple point that the impact of a property
tax change depends on the status quo level of the tax. If property taxes are inefficiently low,
1
Real property requires the owner to record a deed with the local cadastre; England, France, and the
United States have had cadastres since the 1700s. The tax district always knows where to send the tax bill
and can seize the property as collateral in the event of non-payment. Many developing countries do not have
cadastres, which leads to paper-trail avoidance problems like those in Gordon and Li [2009]. See also Besley
et al. [2015], who note that property taxes may be less prone to evasion than other taxes since compliance
with the property tax is more established as a social norm.
2
See Reschovsky [2013] and Norregaard [2013] for reviews of these arguments.

107
raising them will result in market expansion. If they are inefficiently high, raising them
will result in market contraction. These ideas motivate the empirical analysis, which uses
exogenous variation in local property taxes due to a state-mandated reassessment cycle to
measure the effect of tax changes on local population and local business establishments.
In my estimates, increasing property taxes is associated with expansion in the local labor
market. In the model’s logic, these results suggest that property taxes are below the optimum
on average for this sample of tax districts.
The empirical approach is to construct a set of instruments for property taxes using
state-mandated cycles for reassessing properties. In the sample of three states, an arbitrary
fraction of tax districts reassess their properties for tax purposes in each year. Tax districts
were assigned to reassessment cohorts many decades ago, so cohort membership is exogenous
to changes in housing values due to state-wide housing-market fluctuations. The first step
in the econometrics is to isolate variation in assessed property values due only to cohort
assignment. The second step is to use this variation as an instrument for taxes in two-stage
least-squares regression.
This approach is used to measure the impacts of property tax changes on local public
finance and local labor markets. I find that these exogenous changes in assessments are
associated with higher overall tax revenues and government expenditures. Property tax
revenue increases cause the labor market to expand, with an increase in population and
number of businesses. In the preferred specifications, a 1 percent increase in the property
tax is associated with a 0.26 percent increase in local population and a 0.15 percent increase
in local business establishments. The coefficients on employment and wages are positive
but not statistically significant. Follow-up results suggest that the effects are not driven by
changes in other taxes or by changes in borrowing behavior.
These results are consistent with the view that the benefits of increased property taxes
can outweigh the costs when they go into government revenue. In this sample of tax districts,
higher tax revenues induce labor market growth. One cannot make strong welfare claims

108
about this effect, for example due to unobserved effects on other tax districts. But one
interpretation is that property taxes are below the optimum in this set of tax districts.
These points will be useful to policymakers seeking to increase property taxes with the goal
of encouraging local economic growth.
The findings are in line with Fajgelbaum et al. [2015], whose structural estimates suggest
that increasing state taxes would result in firm and worker in-migration. They also agree
with Cellini et al. [2010], who find using a regression discontinuity design that increasing
school facility investments resulted in a greater-than-one-to-one increase in home values in
California school districts. This paper provides complementary evidence from three other
states that increases in expenditures due to higher property taxes can have a positive effect.
A different result is reported in Haughwout et al. [2004], who find in a sample of three
cities that property taxes are probably too high in the sense that raising rates does not
result in more revenue collected. Using Italian data, Surico and Trezzi [2015] find that
a national property tax increase affecting some households more than others resulted in
reduced household consumption expenditures.
Other related evidence includes the bunching papers demonstrating housing market dis-
tortions due to housing transfer taxes [Slemrod et al., 2012, Best and Kleven, 2013, Kopczuk
and Munroe, 2014]. These papers can estimate the impacts of these taxes on the buyers and
sellers in this segment of the market and show that transfer taxes with a discrete threshold
cause the market to unravel around the neighborhood of the cutoff. My results look at the
average taxation effect on all homeowners.3
The results on business establishments add to the previous literature on tax arbitrage
by firms. Wilson [1985] argues that business property taxes should be zero since business
capital is mobile, while housing capital is immobile. Some previous empirical papers have
found a negative effect of taxes on entry [Holmes, 1998, Serrato and Zidar, 2014], while
3
See also the large capitalization literature on the elasticity of property values with respect to property
taxes [Palmon and Smith, 1998, Bradbury et al., 2001, Feldman and Beer Sheva, 2010, Borge and RattsÞ,
2014, Bai et al., 2014].

109
others have found no effect [Guimaraes et al., 2004, Duranton et al., 2011]. Given the results
in this paper, the null results could be explained by the public amenities that firms enjoy
in higher-tax jurisdictions. Note, moreover, that the tax increases observed in my data are
in the main imposed on residential property. This is consistent with Wilson’s view that the
property tax burden should be placed more on residential capital.
A positive view of property taxes in economics goes back to George [1879], who proposed
a 100 percent tax on land rents. The “Henry George Theorem,” later formalized by Stiglitz
[1977] and Arnott and Stiglitz [1979], states that government spending on public goods
increases land rents by the same amount, meaning that a tax on those rents is distortion-free
and would provide all the revenues needed for public-goods expenditures [see also Solow
and Vickrey, 1971, Hamilton, 1976]. Subsequent papers extending this work have shown
that taxing improvements (in addition to land value) can be optimal under different sets of
assumptions [Song and Zenou, 2006, Lyytikäinen, 2009, Banzhaf and Lavery, 2010, Behrens
et al., 2015].4
More recently, interest in the property tax has been further encouraged by Piketty’s [2013]
work on wealth inequality and in particular his advocacy of a wealth tax.5 In the United
States at least, the property tax is the closest thing we have to a non-inheritance-based tax
on wealth. Increasing taxes on real property value (net of liens), in the absence of other
distortions, could further the same goals as an explicit tax on wealth. This is especially true
in light of Rognlie’s [2014] point that “recent trends in both capital wealth and income are
driven almost entirely by housing.”
The remainder of this paper is organized as follows. Section 2 introduces the model to
organize ideas. Section 3 describes the data sources and provides summary statistics. Section
4 discusses the institutional background and how to obtain exogenous variation in property
taxes. Section 5 reports results. Section 6 concludes.
4
See also Zodrow and Mieszowski [1986].
5
Recent theory work in this vein includes Piketty and Saez [2013].

110
3.2 Conceptual Framework

This section introduces a model of the effects of higher property taxes on local labor and
housing markets. The key point is that depending on the current tax, an increase in property
taxes could represent either a net gain or a net loss to households or firms. The basic
mechanism is that higher property taxes impose direct costs on residents, but they also
create benefits through public goods expenditures and in reducing distortions from other
taxes. This section outlines the argument when possible; see Appendix A for technical
details.

3.2.1 Property Taxes

I study economic activity in a city. The city is assumed to be small enough that there are
no general equilibrium effects that feedback from other cities.
The government collects a property tax at rate τ ≥ 0 on housing capital H and at rate
κ ≥ 0 on business capital K. For simplicity the model abstracts away from fiscal policy as
much as possible. Besides imposing a direct cost on households and firms from the payments,
the taxes produce revenues
G = τ H + κK

which generate benefits to households and firms through expenditures on public services.
These benefits are given by A(G) for households and B(G) for firms. These include the
benefits from increasing expenditures on public goods, for example better roads, schools,
and sewers. They may also include potential benefits from reducing other taxes, charges,
and fees, such as local sales taxes or fees for trash pickup. To ensure interior solutions these

111
functions satisfy the following conditions:

∂A ∂B
, > 0
∂G ∂G
∂ 2A ∂ 2B
, < 0
∂G2 ∂G2
∂A ∂B
lim+ , lim+ = ∞
G→0 ∂G G→0 ∂G
∂A ∂B
lim , lim = 0.
G→∞ ∂G G→∞ ∂G

The assumption that property taxes provide some benefits, and that the marginal benefits
are greater than the marginal cost at low tax rates, is justified by the fact that almost all local
governments in the United States are funded by a property tax. If this were a bad assumption
then there would be more heterogeneity in whether a property tax were imposed or not. The
assumption that the marginal benefits go to zero reflects that there are a finite number
of valuable government services, and the marginal value of a dollar spent on government
expenditures would decrease for higher expenditures. In addition, if the property tax is very
high, moving taxes to other less-taxed bases would probably reduce market distortions.
I make no assumptions on how the government sets property taxes. The goal of the
model is to study the effect of increasing or decreasing the tax from a status quo (τ̄ , κ̄).

3.2.2 Firms

This subsection outlines the role of business establishments (firms) in the model. There is a
market for a single good, with price normalized to one. Firms are identical except that they
have heterogeneous productivity in this city, a common assumption in the trade literature
[e.g. Greenaway and Kneller, 2007].
The profit for firm j is given by

πj = y(k, l; τ ) − (ρ + τ )k − wl + bj

112
where y(·) is output, k is the value of capital, l is the number of workers, ρ is the national
price of capital, w is the regional wage, and bj is the heterogeneous productivity shifter for
firm j. The productivity parameter is uniformly distributed:

1 1
bj ∼ U [− , ].
2ψ 2ψ

Any firms that make non-negative profit will enter the city market.
The firm production function is

y(k, l) = B(τ )k β lβ .

This specification is chosen for simplicity, and the results hold under weaker assumptions on
the functional forms. Assuming the same productivity parameters β for capital and labor
is a way to conserve on notation and has no bearing on the results. A more important
assumption is that β < 12 , meaning that the function features decreasing returns to scale.
Besides being consistent with empirical evidence [e.g. Basu and Fernald, 1997], this pins
down the number of firms without assuming market imperfections.6
The firm’s optimal capital-labor ratio is

k w
= ,
l ρ+τ

which means that an increase in property taxes decreases the capital per worker. There will
be more workers per firm with higher property taxes, for a given stock of capital.
Using the capital-labor ratio, we construct the profit equation in terms of labor:

w β 2β
max B(τ )( ) l − wl.
l ρ+τ
6
Assuming monopolistic competition with a continuum of good pins down the number of firms with
constant-returns-to-scale production [e.g. Basu, 1995].

113
which can be solved to obtain

2βB(τ ) 1
l∗ = ( ) 1−2β
(ρ + τ )β w1−β

and correspondingly
2βB(τ ) 1
k∗ = ( 1−β β
) 1−2β .
(ρ + τ ) w

Increasing the property tax has two effects on the input choice. First, there is a negative
effect by increasing the cost of capital (since ρ + τ is in the denominator of each expression).
Because β < 21 , the effect on capital is stronger. Second, there is a positive effect of the tax
through increasing B(τ ).
With these expressions one can write the optimal output in terms of the exogenous
parameters:
B(τ ) 1
y ∗ = βy ( β β
) 1−2β
(ρ + τ ) w

where βy > 0 is a constant (see appendix). The effects on output of changing the property
tax go in the same direction as the effect on the input choices.
Now back to the profit equation. Equilibrium profits for firm j are equal to

πj∗ = y ∗ − wl∗ − (ρ + κ)k ∗ + bj


B(τ ) 1
= βπ ( β β
) 1−2β + bj
(ρ + τ ) w

where βπ > 0 is a constant (see appendix). The effects on profit move in the same direction
in response to a property tax change as output and the input choices.
Assuming free entry means that the marginal firm makes zero profit. Any firms that
make non-negative profit in the city will enter. Normalize the total number of firms to one.
Then the number of firms in the city is given by

1 B(τ ) 1
E∗ = + ψβπ ( β β
) 1−2β
2 (ρ + τ ) w

114
The effect of the property tax on the number of firms is

∂E 1 B(τ ) 2β B 0 (τ ) βB(τ )
= ψβπ ( )( ) 1−2β [ − ].
∂τ 1 − 2β (ρ + τ )β wβ (ρ + τ )β wβ (ρ + τ )1+β wβ

For given τ̄ , this expression has the same sign as the tax effect on profit, output, and the
input choices. The leading terms outside the bracket are always positive, as are both terms
within the brackets. So the entire expression is positive when

B 0 (τ ) βB(τ )
β β

(ρ + τ ) w (ρ + τ )1+β wβ

which simplifies to
B 0 (τ ) β
≥ ,
B(τ ) ρ+τ

and negative otherwise. Given the assumptions on B(τ ), this means there is a unique cutoff
τE∗ determining the effect of the property tax.
Proposition 1. For τ̄ < τE∗ , a small increase in property taxes will increase
workers per firm, capital per firm, output per firm, and the number of firms.
For τ̄ > τE∗ , a small increase in property taxes will have the opposite effects.
Note that since firm-level labor and the number of firms move in the same direction,
this also means that the same cutoff determines the effect on the total number of employed
workers in the city.
Further, note that this result depends on the compensating beneficial effect in B(τ ). If
B 0 (τ ) equals zero, an increase in property taxes will decrease labor, capital, output, and the
number of establishments.

3.2.3 Households

This section analyzes the effect of property taxes on the local housing market. The housing
market is treated separately from the labor market, which greatly simplifies the analysis.

115
One can think of any beneficial changes to residents from the labor market – higher wages,
for example, or reduced commuting time – as being reflected in A(τ ).
Utility for household i is given by

ui = A(τ ) + w − (1 + τ )h + ai

where h is the cost of housing and ai is household i’s idiosyncratic preference for this city.
It is uniformly distributed:
φ φ
ai ∼ U [− , ].
2 2

Utility in other cities is given by ū, so person i decides to live in this city when ui ≥ ū.
Normalize the total number of people to one and assume that each resident inelastically
demands one unit of housing and supplies one unit of labor. City population (and housing
demand) is equal to
A(τ ) + w − (1 + τ )h
N = N0 +
φ

where N0 is a positive constant and represents the population of the city when residents
are immobile. The parameter φ measures how mobile/responsive city residents are to city
features. If it is low, they are responsive; if it is high, they are not responsive.
Housing supply (the marginal cost of housing) is

N
h= .
σ

The parameter σ measure the elasticity of housing supply. The market equilibrium for
housing costs is
φN0 + A(τ ) + w
h∗ =
(φσ + 1 + τ )

As before, the property tax has two effects on the price. It increases the price through the
amenities A(τ ), but decreases them through the direct cost of the tax.

116
Given the assumptions, city population is given by

φN0 + A(τ ) + w
N∗ = σ
(φσ + 1 + τ )

This means, intuitively, that the response of housing prices and local population move in the
same direction in response to the property tax. If σ is low, housing is inelastic so the local
population doesn’t change very much in response to economic variables.
The effect of a tax on housing prices is

∂h A0 (τ ) φN0 + A(τ ) + w
= −
∂τ (φσ + 1 + τ ) (φσ + 1 + τ )2

∂N
with ∂τ
= σ ∂h
∂τ
. An increase in property taxes increases housing prices and city population
when
A0 (τ )
≥ τ + φσ + 1
A(τ ) + φN0 + w

Given the assumptions on A(τ ), the left-hand-side starts at infinity for low τ and decreases
to zero with increasing τ , while the right hand side increases linearly with τ . This means
there is a unique τH∗ where this expression is satisfied with equality.
Proposition 2. For τ̄ < τH∗ , increasing property taxes will increase city popula-
tion and increase home prices. For τ̄ > τH∗ , increasing property taxes will reduce
city population and reduce home prices.
Therefore the effect of a change in property taxes could be either positive or negative in
the housing market.

3.3 Data and Summary Statistics

This section describes the data sources for the empirical analysis and provides summary
statistics. The merged data set features tax collections variables, data on local government
finances from from the census, housing market variables, and variables on firms and employ-

117
ment.
The data on local property taxes were collected from the state tax agencies for Connecti-
cut, South Carolina, and Tennessee. The key variables are assessed values and collections.
For South Carolina, the assessed values of real property and the county tax collections from
real property were used. For Connecticut and Tennessee, the assessed values and collections
for real property were constructed by summing the values of residential, commercial, and
industrial property (including apartments).
The data on local government financial accounts comes from the IndFin local govern-
ment finances census dataset. This is a survey of all local governments administered every
five years; if the localities do not provide previous years’ data, those values are imputed
by census statisticians.7 The survey includes items on revenues, expenditures, assets, and
liabilities. For Connecticut, the municipality government data is used. For South Carolina
and Tennessee, the county government data is used.
Data on population were assembled from the population census and from state labor
department data. I used the revised average annual data points.
Employment statistics are assembled from state labor agencies. These data are collected
for the Quarterly Census of Employment and Wages (QCEW) from administrative data.
They contain information on the number of private and public employer establishments
(firms), number of workers, wages, and industry classifications. This data is available at the
county-quarter level and municipality-year level.
The data on housing prices comes from the real estate web site Zillow. The data items
include Zillow’s “home value index” (ZHVI, details available on the web site), sale and list
prices, and the number of sales. These data are monthly and at the county or municipality
level. Finally, I obtained some data on new loan acquisitions from Fannie Mae and Freddie
Mac.

7
My results are similar if using only non-imputed years.

118
Table 3.1: Summary Statistics

Variable Mean S.D. Variable Mean S.D.


Assessed Value ($M) 1150 2260 Total Revenues ($M) 108 178
Tax Rate 0.04 0.03 Sales Tax 5.44 22.9
Collections ($M) 31.1 59.9 Charges 17.3 58.5
Inter-Govt Revenues 29.8 57.1
Assessed Value (Residential) 1020 1860
Assessed Value (Business) 282 687 Direct Expenditures ($M) 101 159
Collections (Residential) 25.2 42.7 Education 46.5 76.9
Collections (Business) 7.9 23.6 Healthcare 10.4 54.3
Police 5.85 10.7
Population (1000s) 45.28 84.8 Roads 3.95 5.06
Fire Protection 2.33 5.14
Establishments (1000s) 1.07 2.16 Govt. Staff 1.9 3.82
Employment (1000s) 17.14 40.36 Prisons/Jails 1.8 6.96
Annual Wages ($1000s) 22.64 23.36
Summary means and standard deviations for key variables, 1997-2013. Observation is a district-year.

Table 3.1 reports summary statistics. Note that regression results are reported in logs so
the particular levels here do not play an important role in the analysis. It is worth noting
however that with $25M in collections and a population of 45,000, we are looking at $688 in
property taxes per person on average, or about $2,752 for a family of four.

3.4 Empirical Strategy

This section describes the empirical strategy for identifying changes in property taxes due
to the staggered reassessment schedule.

3.4.1 Staggered Tax Reassessment

In all jurisdictions that use a property tax, there is some mechanism for reassessing prop-
erty to keep up with market prices and inflation. In most cases, this means that parcels
are revalued periodically by trained assessors. More frequent assessments means higher ad-
ministrative costs but a more equitable distribution of the tax burden. Longer delays act

119
as a subsidy to homeowners whose property values have increased the most since the last
revaluation.
There is wide variation across states and counties in this delay: while some states do a full
reassessment every year, Utah once went 20 years without revaluing. According to Walden
and Denaux [2002], infrequent reassessments cost the state of North Carolina $320 million in
forgone tax revenues between 1980 and 1995. In Michigan and California, revaluations only
occur at transfer – resulting in a subsidy to long-time home-owners and a wedge between
the buyer price and the seller price [Skidmore et al., 2010, Wasi and White, 2005].
Given that revaluations change the size and the composition of the property tax bur-
den, they present an opportunity for studying the effects of property tax changes on local
economies. However, in states with discretion over revaluation timing, the timing of prop-
erty reassessments may be correlated with regional socioeconomic factors, such as mean
income and business activity [Stine, 2010]. In this case, a standard diffs-in-diffs strategy
comparing revaluations with socioeconomic outcomes would yield biased estimates due to
the endogeneity of revaluation timing.
To solve this problem, I focus on the set of states where property revaluations occur on
a state-mandated staggered schedule. The sample includes Connecticut, South Carolina,
and Tennessee. In these three states, jurisdictions must revalue every few years regardless
of other socioeconomic conditions. The schedule is adhered to nearly perfectly because the
assessment agencies face legal penalties if they are late. In South Carolina and Tennessee,
the reassessment unit is the county; a subset of counties revalues each year. In Connecticut,
the unit is the municipality. In the three states in the sample, there are no state or other
government benefits tied to property assessments.
Figure 3.4.1 shows the housing value trends for 2001-2013 in the sample of tax districts,
separately by the year of the first revaluation in the sample. The different levels for the series
demonstrate that assignment to cohort is not randomly assigned. Given that the analysis
uses fixed effects, random assignment is not needed, just parallel trends. The graph does

120
Figure 3.4.1: Housing Value Trends by Reval Cohort

support the latter. In all years, the lines are near parallel and always moving in the same
direction.
Table 3.2 reports some summary statistics on the characteristics of the states. These
states are economically and politically diverse. Connecticut is high-income and urbanized,
while South Carolina and Tennessee are relatively low-income and rural. The proportion of
revenue from property taxes in these states is comparable to the amount in other states in
the country.
The first step is to verify the effect of the revaluation cycle on assessed property values.
Figure 3.4.2a shows a kernel density estimate of the distribution of the log change in property
values for reval years and non-reval years. It is clear from the figure that assessed property
values change significantly more, both positively and negatively, in reval years. Similarly,
Figure 3.4.2b shows the average changes by year in the cycle. In all three states, there is
a significantly larger increase in assessed property values during reval years than in other
years.
The next step is to verify whether higher assessments due to cohort assignment are
associated with higher tax collections. Previous empirical studies of the levy process show

121
Figure 3.4.2: Change in Property Assessments in Reval Years

(a) Kernel density estimate of the first-differenced log assessed value of all real property, separately by reval
years and non-reval years. CT, SC, TN data, 1997-2014.

(b) Average first-differenced log assessed value of all real property by year in revaluation cycle. CT, SC, TN
data, 1997-2014.

122
Table 3.2: State Characteristics

Connecticut South Carolina Tennessee


Tax Districts 166 towns 46 counties 95 counties
Cycle 5 years 5 years 4-6 years
Res/Bus Ratio 1.0 0.381 0.625

% Rev. from Prop Tax 42.0% 35.8% 27.6%

Med. Income $67K $43K $43K


Manuf. % 11.2% 17.3% 15.9%
Urban Pop. % 88% 66.3% 66.4%
Med. Home Sale $210K $162K $140K
Apt. Share 9.8% 7.7% 8.3%

that most of the time, tax districts will increase the actual tax take in response to increasing
property values [Ross and Yan, 2013, Brien, 2014]. Lutz [2008] estimates a property tax
revenue elasticity with respect to housing prices of 0.4 on average, using a nationwide sample
of cities.
Some basic graphical evidence of the effect of the revaluation cycle is presented in Figure
3.4.3. In these graphs, the series are split by “Up Market” and “Down Market,” which
I define as whether residential housing prices are increasing (“up-market”) or decreasing
(“down-market”) at the time of the reval. In the sample used for this and subsequent figures,
that corresponds to 2001-2006 being up-market, and 2007-2013 being down-market. Using
more complex specifications for this distinction did not make a qualitative difference in the
reported figures.
The takeaway from these figures is that the reval cycle does have a large impact on
the property taxes collected. In up-market years where housing values are going up, tax
collections are increasing significantly due to a reval. In down-market years, collections are
still increasing on average, but not as much. Figure 3b shows that in an up-market, the
distribution of the change in collections is heavily skewed to the right in reval years. In
a down-market, we do not see the same skew and we see that some places are actually
decreasing collections in response to the decreasing property values.

123
Figure 3.4.3: Property Tax Collections Trend

(a) The average first-differenced log tax collections, residualized on year fixed effect. Up-Market = 2001-2006;
Down-Market = 2007-2013.

(b) Kernel density estimate of first-differenced log property tax collections, separately by up and down
market, and reval years and non-reval years. CT, SC, TN data, 2001-2013.

124
Table 3.3: Effect of Property Assessments on Collections

OLS IV1 IV2


(1) (2) (3) (4)

Tax Collections 1.032** 0.368** 0.941** 1.075**


(0.0326) (0.0534) (0.117) (0.397)

First-Stage F-stat 20.88 16.02


Observations 4889 4889 3572 3572
Tax Districts 307 307 250 250
State-Year FE’s X X X X
Tax District FE’s X X X
Effect of log assessed real property value on log real property tax collections. Standard errors in parentheses, clustered by tax district. Observations

weighted by pre-treatment property values.+ p < .1, * p < .05, ** p < .01.

Table 3.3 provides regression estimates to complement the graphical evidence. This table
measures the elasticity of tax collections with respect to property assessments. Column 1,
which only includes state-year fixed effects, is interesting in that it shows that collections
increase one for one with assessed value, across districts. Column 2 shows that as assessments
change over time, about 37% go into higher tax collections. This is close to the 0.4 elasticity
estimated by Lutz [2008] in a national sample of city governments. The IV columns (see
the IV specification in subsection 4.2 below) are also included, using assessments as the
endogenous variable and collections as the outcome variable. When assessments go up just
due to cohort assignment, that’s associated with a one-for-one increase in tax collections.
Figure 3.4.4 fleshes out the connection between housing market fluctuations and the
allocation of the property tax burden. As shown in panel a, an up-market reval is associated
with an increase in residential taxes, while a down-market reval is associated with a decrease
in residential taxes. In panel b, for business taxes, we see a more-or-less opposite effect. This
is further clarified in panel c, showing that the ratio of residential taxes to business taxes can
increase dramatically in up-market reval years and fall dramatically in down-market reval
years. These graphs emphasize the fact that when residential values increase more than
total collections, business taxes go down. This will be important for interpreting the results

125
Figure 3.4.4: Reval Effects on Business and Residential Collections

(a) Kernel density estimate of the first-differenced log tax collections for residential property, separately by
reval years and non-reval years. Data for CT, TN, 2001-2013.

(b) Kernel density estimate of the first-differenced log tax collections for business property, separately by
reval years and non-reval years. Data for CT, TN, 2001-2013.

(c) Kernel density estimate of first-differenced log residential-to-business tax ratio, separately by up and
down market, and reval years and non-reval years. CT„ TN data, 2001-2013.

126
later on; the increasing taxes due to reassessment are disproportionately borne by residential
property. This is an allocation that would be recommended by standard economic models
like Wilson [1985].

3.4.2 Instrumental Variables Framework

To formalize the econometric approach, consider tax district i in state s at period t. We


consider log outcome Yist , such as population, and look at the effect of log property tax
collections Tist . The second stage equation is

Yist = αi + αst + ρTist + ist (3.4.1)

where αi is a tax-district fixed effect and αst is a state-year fixed effect. Instruments for Tist
are constructed using assignment to reassessment cohort. In all regressions, standard errors
are clustered by tax district.
The first IV specification (IV1) uses dummies for cohort assignment. This allows for arbi-
trary non-linearities in the state-wide response to changes in the housing market. Specifically,
sτ sτ
for each state and year, define a set of dummy variables zist . Let zist = 1 if the last revalu-

ation in i occurred at year τ , and zist = 0 otherwise. For example, if i is in Tennessee and
revalued in 2005, the dummy T N 2005ist will equal one for the years 2005 through 2009, and
zero otherwise.
The first stage is
XX

Tist = αi + αst + πsτ zist + ηist (3.4.2)
s τ


The identification assumption is that, conditional on the fixed effects, the zist are uncorrelated
with ist . This follows from the state-mandated cycle. With a few dozen instruments, (2) is
over-identified. Optimal GMM is used to estimate the system.
The second IV specification (IV2) uses the deviation in assessed values from market
values due to the delay since the last reval process. For each state-year-district, compute the

127
average Zillow Home Value Index Vist for other tax districts in the state:

1 X
v̄ist = Vist
n − 1 j6=i∈s

This provides a measure of market value trends that is separate from the tax assessments.
Moreover, it does not include a tax district’s own endogenous factors. The instrument for
Tist is the divergence between this year’s leave-one-out average z̄ist and the leave-one-out
average from the year of the previous reval process (τ ):

vist = v̄ist − v̄isτ .

This is included in the first stage as

Tist = αi + αst + ψvist + ηist (3.4.3)

which is used in conjunction with the second stage (1) using 2SLS.

3.5 Results

This section reports the empirical results on the effects of property taxes on local government
finances and on local labor market outcomes. Subsection 5.1 reports the results on local
government finances. Subsection 5.2 reports the results on the local labor market.

3.5.1 Effect of Property Tax Changes on Government Finances

This section reports the responses of local government financial accounts to raising more
property taxes. This data comes from the IndFin series administered by the U.S. Census.
There are effects on both revenues and expenditures.
As a first pass, Figure 3.5.1 provides reduced-form evidence on the major government

128
Figure 3.5.1: Reduced Form:Reval Cycle and Direct Expenditures

Plot of first-differenced log direct expenditures, residualized on year FE, by years since last reval, separately by up and down
market. CT, SC, TN data, 2001-2013.

variable of interest. In up-market reval years, direct government expenditures increase signif-
icantly, reflecting the higher property tax collections in those years. In down-market years,
there is no reval effect on direct expenditures. This reflects a key finding in the workings
of local fiscal policy – these local governments don’t seem to be spending enough money
on public expenditures, and they are able to raise more revenue and increase spending by
raising property taxes due to higher assessments.
Table 3.4 reports the regression estimates for the effect on local public finance variables.
The increase in taxes due to the staggered cycle is associated with a significant increase in
total government revenues, consistent with the finding in Dye and McGuire [1997] for Illinois
towns facing a tax cap. The effects on the other revenue items are positive but of mixed
significance. This suggests that any labor-market effects from increased property taxes are
not due to decreases in other revenue sources.
Table 3.5 estimates the effects of changes in property taxes on government expenditure
measures. As can be seen in the top row (columns 3 and 4), half of the higher tax collections
go into increased direct expenditures. The coefficients on individual categories are positive
but noisy. This is likely due to the heterogeneous needs of governments. Expenditures are
increasing, but what how it is spent varies across districts. Given this increase in expendi-

129
Table 3.4: Effect of property tax increase on government revenues

OLS IV1 IV2


Log Outcome (1) (2) (3) (4)
Total Revenues 0.802** 0.323** 0.420* 0.833**
(0.0439) (0.0795) (0.185) (0.309)

Sales Tax Revenue 0.573** 0.130+ 0.526* 0.361


(0.122) (0.0716) (0.246) (0.438)
Charges and Fees 0.940** 0.292* 0.275 1.607**
(0.0708) (0.135) (0.261) (0.501)
Inter-government Revenue 0.617** 0.353** 0.467* 0.703
(0.0619) (0.103) (0.215) (0.504)
First-Stage F-stat 13.24 35.67
Observations 2980 2980 2980 2265
Tax Districts 303 303 303 246
Tax District FE’s X X X
Endogenous variable: log property tax collections. Standard errors in parentheses, clustered by tax district. Includes state-year
fixed effect. Weighted by pre-treatment property value. + p < .1, * p < .05, ** p < .01.

tures, it is plausible that any observed labor-market effects from increased property taxes
are due in part to these expenditures.

3.5.2 Effect of property taxes on local labor market

This section reports the effects of the exogenous property tax changes on local labor market
outcomes.
Table 3.6 reports the main results on property taxes and labor market outcomes. Columns
1 and 2 report the OLS estimates. These regressions are estimating the second stage coeffi-
cients without instrumenting for the tax. Naturally, places with higher tax collections also
have better labor markets (Column 1). Within tax district, higher property taxes are as-
sociated with more population, establishments, and employment. However, these estimates
could be due to other socioeconomic factors that drive improvements in the labor market,
which mechanically would increase property tax collections.
The IV estimates resolve this problem, but the effects are somewhat similar. According

130
Table 3.5: Effect of property tax increase in government expenditures

OLS IV1 IV2


Outcome (1) (2) (3) (4)

Direct Expenditure 0.803** 0.326** 0.497* 0.554*


(0.0459) (0.0842) (0.214) (0.217)

Education 0.685** 0.292** 0.157 3.300


(0.0485) (0.0521) (0.127) (7.699)
Police 0.984** 0.0651 0.153 0.476
(0.0622) (0.0894) (0.297) (0.410)
Roads 0.507** 0.116 0.0699 0.663
(0.0542) (0.147) (0.310) (0.629)
Healthcare 2.592** 0.128 0.462* -0.257
(0.955) (0.0936) (0.216) (0.497)
Fire Protection 1.253** 0.164 -0.432 1.765
(0.147) (0.134) (0.304) (1.607)
Govt. Staff 0.778** -0.0473 -0.393 1.242
(0.0395) (0.193) (0.429) (1.539)
Prisons 0.626** 0.258+ 0.950+ 0.149
(0.128) (0.149) (0.563) (0.429)
Sewers 0.254 -0.0183 -0.213 -1.620*
(0.254) (0.164) (0.438) (0.715)
First-Stage F-stat 13.24 35.62
Observations 2980 2980 2979 2265
Tax Districts 303 303 303 246
Tax District FE’s X X X
Endogenous variable: log property tax collections. Standard errors in parentheses, clustered by tax district. Includes state-year
fixed effect. Weighted by pre-treatment property value. + p < .1, * p < .05, ** p < .01.

131
Table 3.6: Effect of property tax on labor market outcomes

OLS IV1 IV2


Log Outcome (1) (2) (3) (4)
Population 0.742** 0.160** 0.256** 0.502**
(0.0305) (0.0309) (0.0809) (0.109)

Business Establishments 0.870** 0.130** 0.148+ 1.04+


(0.0245) (0.0370) (0.0915) (0.620)
Employment 0.949** 0.118* 0.121 1.38+
(0.0330) (0.0543) (0.120) (0.806)
Average Wage 0.127** -0.0183 -0.0698 0.0407
(0.0207) (0.0291) (0.0621) (0.132)
Total Wage 1.076** 0.0995 0.0523 1.415+
(0.0367) (0.0686) (0.162) (0.782)
First-Stage F-stat 18.23 45.5
Observations 4292 4292 4292 3288
Tax Districts 303 303 303 246
Tax District FE’s X X X
Endogenous variable: log property tax collections. Standard errors in parentheses, clustered by tax district. Includes state-year
fixed effect. Weighted by pre-treatment property value. + p < .1, * p < .05, ** p < .01.

to these estimates, there is a statistically significant effect on local population as well as the
number of local establishments. Increasing property taxes by 10 percent would be associated
with a 2.5-5 percent increase in population and a 1.5-10 percent increase in local businesses.
There is a positive coefficient on employment, but the effect is not statistically significant.
There is no effect on wages.
Table 3.7 adds to this analysis by estimating the model using direct expenditure (from
the local government census) as the endogenous variable. We lose some observations in this
setting, but we still get enough power to estimate effects. Overall the coefficients are quite
similar to those in Table 6, which used property tax collections as the endogenous variable.
This supports the idea that the property tax increases are increasing in-migration of residents
and establishments through the channel of higher government expenditures.
Table 3.8 shows that there is no discernible effect on home sales prices or on the number
of cash-out refinancing loans. There are quality issues with both of these data points, but

132
Table 3.7: Effect of government expenditures on labor market

OLS IV1 IV2


Log Outcome (1) (2) (3) (4)
Population 0.800** 0.0552** 0.184* 0.796**
(0.0464) (0.0177) (0.0729) (0.286)

Business Establishments 0.893** 0.0499** 0.184+ 0.946+


(0.0462) (0.0173) (0.0976) (0.593)
Employment 1.011** 0.0337 0.103 1.278
(0.0497) (0.0206) (0.108) (0.826)
Average Wage 0.133** -0.00978 -0.0691 0.0480
(0.0275) (0.00920) (0.0535) (0.186)
Total Wage 1.144** 0.0241 0.0331 1.325
(0.0562) (0.0239) (0.145) (0.940)
First-Stage F-stat 14.30 4.681
Observations 3000 3000 3000 2279
Tax Districts 304 304 304 244
Tax District FE’s X X X
Endogenous variable: log direct expenditures. Standard errors in parentheses, clustered by tax district. Includes state-year
fixed effect. Weighted by pre-treatment property value. + p < .1, * p < .05, ** p < .01.

Table 3.8: Effect of taxes on home sale price and borrowing

OLS IV IV2
Outcome (2) (3) (4) (4)
Home Sale Price 0.145** 0.0620 -0.0189 -0.0353
(0.0512) (0.0491) (0.100) (0.155)
First-Stage F-stat 32.20 7.606
Observations 2655 2655 2650 2205
Tax Districts 195 195 190 174

Outcome (1) (2) (3) (4)


Cash-Out Refis 0.0356 0.339 0.380 -5.145
(0.0782) (0.252) (0.470) (12.58)
First-Stage F-stat 13.77 0.780
Observations 1463 1463 1398 1007
Tax Districts 180 180 115 87
Tax District FE’s X X X
Endogenous variable: log property tax collections. Standard errors in parentheses, clustered by tax district. Weighted by
pre-treatment property values.

133
overall they suggest that the labor market effects are not due to changes in household wealth
or borrowing. This is somewhat different from the result in Bhutta and Keys [2014] that
households increased cash-out refinancing during the 2000s housing bubble.

3.5.3 Robustness checks

A number of robustness checks were performed to assess the sensitivity of the results. The
coefficients change, but the signs are not affected, by using unweighted regressions or weight-
ing by pre-treatment population. The results are significant with clustering by reassessment
cohort rather than by tax district, or if robust standard errors without clustering are used.
Adding time-varying controls for available demographic information does not substantially
affect the results. When looking at individual states, the effects go in the same direction as
the three states together, but some of the results are not statistically significant in individual
states.

3.6 Conclusion

When property taxes go into government revenue, they cause the labor market to expand.
There is an increase in the number of local business establishments and local population
in response. The benefits of higher property taxes appear to outweigh the costs when they
go into government revenue. These results are consistent with a model in which higher
property taxes reduce fiscal policy distortions, which expands the labor market and causes
in-migration.
Are property taxes too high? Making welfare claims is contentious, since we do not
observe the effects in other tax districts who may be losing out, among other things. But
one interpretation is that property taxes are below the optimum on average. These results
will be useful for future local fiscal policymakers. One useful piece of advice is that properties
should be assessed as closely as possible to their market values.

134
This data on property taxes present many avenues for further research. Besides affecting
household and firm budgets, property taxes affect land values. Capitalization of taxes into
land values is an active research topic in local public finance [Slemrod and Bakija, 2008,
Feldman and Beer Sheva, 2010]. The data may be able to answer some of the open questions,
since there is exogenous variation in taxes as well as expectations about the next reassessment
cycle. While the evidence is more consistent with an effect of expenditures, there are still
likely effects coming from household responses to the assessment itself. in future work one
could try to decompose the effects of assessments from the effects of expenditures.
On the business property side, more fine-grained data on firm responses could elucidate
the effects of property taxes on businesses. This is a tax on capital, so the results are relevant
to the recent work on optimal capital taxation [e.g. Piketty and Saez, 2013].
Finally, property taxes are the largest local revenue contributor to education. Therefore
differences in property tax funds due to the instruments may influence school funding. This
may be a useful way to study the effects of school funding on education outcomes.

135
Bibliography

Daron Acemoglu, Suresh Naidu, Pascual Restrepo, and James A


Robinson. Democracy does cause growth. Technical report,
National Bureau of Economic Research, 2014.
Hirotugu Akaike. Factor analysis and aic. Psychometrika, 52(3):
317–332, 1987.
Alberto Alesina and Guido Tabellini. Bureaucrats or politicians?
part i: A single policy task. The American Economic Review, 97
(1):169–179, 2007. URL
http://www.jstor.org.ezproxy.cul.columbia.edu/stable/30034389.
Alberto Alesina and Guido Tabellini. Bureaucrats or politicians?
part ii: Multiple policy tasks. Journal of Public Economics, 92
(3-4):426–447, 2008. doi: 10.1016/j.jpubeco.2007.06.004.
M.G. Allingham and A. Sandmo. Income tax evasion: A theoretical
analysis. Journal of public economics, 1(3-4):323–338, 1972.
James Alm and Kyle Borders. Estimating the tax gap at the state
level: The case of georgia’s personal income tax. Public
Budgeting and Finance, 34(4):61–79, 2014.
J. Andreoni, B. Erard, and J. Feinstein. Tax compliance. Journal
of economic literature, 36(2):818–860, 1998.
Richard Arnott and Petia Petrova. The property tax as a tax on

136
value: Deadweight loss. International Tax and Public Finance,
13(2-3):241–266, 2006. ISSN 0927-5940. doi:
10.1007/s10797-006-4938-6. URL
http://dx.doi.org/10.1007/s10797-006-4938-6.
Richard J Arnott and Joseph E Stiglitz. Aggregate land rents,
expenditure on public goods, and optimal city size. The
Quarterly Journal of Economics, pages 471–500, 1979.
Elliott Ash and W. Bentley MacLeod. Intrinsic motivation in public
service: Theory and evidence from state supreme courts. Journal
of Law and Economics, forthcoming 2015.
Elliott Ash, Massimo Morelli, and Richard Van Weelden. Elections
and divisiveness: Theory and evidence. Working Paper 21422,
National Bureau of Economic Research, 2015a.
Elliott Ash, Massimo Morelli, and Richard Van Weelden. Elections
and divisiveness: Theory and evidence. NBER, 2015b.
Scott Ashworth and Ethan Bueno de Mesquita. Electoral selection,
strategic challenger entry, and the incumbency advantage.
Journal of Politics, 70(4):1006–1025, 2008. ISSN 0022-3816. doi:
10.1017/s0022381608081024.
Scott Ashworth, Ethan Bueno de Mesquita, and Amanda Friedenberg.
Accountability and information in elections. mimeo, University
of Chicago, May 2015.
Anthony Barnes Atkinson and Joseph E Stiglitz. The design of tax
structure: direct versus indirect taxation. Journal of public
Economics, 6(1):55–75, 1976.
ChongEn Bai, Qi Li, and Min Ouyang. Property taxes and home
prices: A tale of two cities. Journal of Econometrics, 180(1):1

137
– 15, 2014. ISSN 0304-4076. doi:
http://dx.doi.org/10.1016/j.jeconom.2013.08.039. URL http:
//www.sciencedirect.com/science/article/pii/S0304407613002674.
Jushan Bai and Serena Ng. Large dimensional factor analysis. Now
Publishers Inc, 2008.
Jushan Bai and Serena Ng. Instrumental variable estimation in a
data rich environment. Econometric Theory, 26(06):1577–1606,
2010.
Jon Bakija and Joel Slemrod. Do the rich flee from high state
taxes? evidence from federal estate tax returns. Technical
report, National Bureau of Economic Research, 2004.
Steven J Balla. Interstate professional associations and the
diffusion of policy innovations. American Politics Research, 29
(3):221–245, 2001.
H Spencer Banzhaf and Nathan Lavery. Can the land tax help curb
urban sprawl? evidence from growth patterns in pennsylvania.
Journal of Urban Economics, 67(2):169–179, 2010.
Timothy J Bartik. Who benefits from state and local economic
development policies? Books from Upjohn Press, 1991.
Susanto Basu. Intermediate goods and business cycles:
Implications for productivity and welfare. The American Economic
Review, pages 512–531, 1995.
Susanto Basu and John G. Fernald. Returns to scale in u.s.
production: Estimates and implications. Journal of Political
Economy, 105(2):pp. 249–283, 1997. ISSN 00223808. URL
http://www.jstor.org/stable/10.1086/262073.
Kristian Behrens, Yoshitsugu Kanemoto, and Yasusada Murata. The

138
henry george theorem in a second-best world. Journal of Urban
Economics, 85:34–51, 2015.
Louis-Philippe Beland. Political parties and labor market
outcomes: Evidence from us states. American Economic Journal:
Applied Economics, 2015.
Alexandre Belloni, Daniel Chen, Victor Chernozhukov, and Christian
Hansen. Sparse models and methods for optimal instruments with
an application to eminent domain. Econometrica, 80(6):2369–2429,
2012.
Roland Benabou and Jean Tirole. Incentives and prosocial behavior.
American Economic Review, 96(5):1652–1678, 2006.
Carlos Berdejo and Noam Yuchtman. Crime, punishment, and politics:
an analysis of political cycles in criminal sentencing. Review
of Economics and Statistics, 95(3):741–756, 2013.
Frances Stokes Berry and William D. Berry. State lottery adoptions
as policy innovations: An event history analysis. American
Political Science Review, 84:395–415, 6 1990. ISSN 1537-5943.
doi: 10.2307/1963526. URL
http://journals.cambridge.org/article_S0003055400192561.
Frances Stokes Berry and William D. Berry. Tax innovation in the
states: Capitalizing on political opportunity. American Journal
of Political Science, 36(3):pp. 715–742, 1992. ISSN 00925853.
URL http://www.jstor.org/stable/2111588.
Frances Stokes Berry and William D. Berry. The politics of tax
increases in the states. American Journal of Political Science,
38(3):pp. 855–859, 1994. ISSN 00925853. URL
http://www.jstor.org/stable/2111610.

139
William D Berry and Brady Baybeck. Using geographic information
systems to study interstate competition. American Political
Science Review, 99(04):505–519, 2005.
Marianne. Bertrand, Ester Duflo, and Sendhil Mullainathan. How
much should we trust differences-in-differences estimates?
Quarterly Journal of Economics, 119(1):249–275, Feb 2004. URL
http://www.jstor.org.ezproxy.cul.columbia.edu/stable/25098683.
Marianne Bertrand, Jessica Pan, and Emir Kamenica. Gender identity
and relative income within households. Technical report,
National Bureau of Economic Research, 2013.
T. Besley and A. Case. Does electoral accountability affect
economic-policy choices - evidence from gubernatorial term
limits. Quarterly Journal of Economics, 110(3):769–798, 1995a.
ISSN 0033-5533. doi: 10.2307/2946699. URL
<GotoISI>://WOS:A1995RN97800008.
Timothy Besley and Anne Case. Does electoral accountability affect
economic policy choices? evidence from gubernatorial term
limits. The Quarterly Journal of Economics, pages 769–798,
1995b.
Timothy Besley and Anne Case. Political institutions and policy
choices: evidence from the united states. Journal of Economic
Literature, pages 7–73, 2003.
Timothy Besley and Stephen Coate. Elected versus appointed
regulators: Theory and evidence. Journal of the European
Economic Association, 1(5):1176–1206, 2003.
Timothy Besley, Torsten Persson, and Daniel M. Sturm. Political
competition, policy and growth: Theory and evidence from the

140
us. The Review of Economic Studies, 77(4):pp. 1329–1352, 2010.
ISSN 00346527. URL http://www.jstor.org/stable/40836649.
Timothy J Besley, Anders Jensen, and Torsten Persson. Norms,
enforcement, and tax evasion. 2015.
Michael Carlos Best and Henrik Jacobsen Kleven. Housing market
responses to transaction taxes: Evidence from notches and
stimulus in the uk. London School of Economics, 2013.
Neil Bhutta and Benjamin J Keys. Interest rates and equity
extraction during the housing boom. University of Chicago
Kreisman Working Papers Series in Housing Law and Policy, (3),
2014.
Robert C. Bird and Donald J. Smythe. The structure of american
legal institutions and the diffusion of wrongful-discharge laws,
1978-1999. Law and Society Review, 42(4):833–864, 2008. ISSN
1540-5893. doi: 10.1111/j.1540-5893.2008.00360.x. URL
http://dx.doi.org/10.1111/j.1540-5893.2008.00360.x.
Michael James Bommarito, Daniel Martin Katz, and Jillian
Isaacs-See. An empirical survey of the population of united
states tax court written decisions. Virginia Tax Review, 30(2),
2011.
Lars-Erik Borge and JÞrn RattsÞ. Capitalization of property
taxes in norway. Public Finance Review, 42(5):635–661, 2014.
doi: 10.1177/1091142113489845. URL
http://pfr.sagepub.com/content/42/5/635.abstract.
Katharine L Bradbury, Christopher J Mayer, and Karl E Case.
Property tax limits, local fiscal behavior, and property values:
Evidence from massachusetts under proposition 212. Journal of

141
Public Economics, 80(2):287–311, 2001.
Spencer T Brien. Compensating changes to the property tax levy?
an empirical test of the residual rule. 2014.
Marius BrÃŒlhart, Sam Bucovetsky, and Kurt Schmidheiny. Chapter 17
- taxes in cities. In J. Vernon Henderson Gilles Duranton and
William C. Strange, editors, Handbook of Regional and Urban
Economics, volume 5 of Handbook of Regional and Urban Economics,
pages 1123 – 1196. Elsevier, 2015. doi:
http://dx.doi.org/10.1016/B978-0-444-59531-7.00017-X. URL http://www.
sciencedirect.com/science/article/pii/B978044459531700017X.
John B. Burbidge, Lonnie Magee, and A. Leslie Robb. Alternative
transformations to handle extreme values of the dependent
variable. Journal of the American Statistical Association, 83
(401):123–127, 1988. doi: 10.1080/01621459.1988.10478575. URL
http:
//www.tandfonline.com/doi/abs/10.1080/01621459.1988.10478575.
Leonard Burman and Eric J. Toder Christopher Geissler. How big are
total individual income tax expenditures, and who benefits from
them? The American Economic Review, 98(2):79–83, 2008. ISSN
00028282. URL http://www.jstor.org/stable/29729999.
Marika Cabral and Caroline Hoxby. The hated property tax:
salience, tax rates, and tax revolts. Technical report, National
Bureau of Economic Research, 2012.
H. Cai and Q. Liu. Competition and corporate tax avoidance:
Evidence from chinese industrial firms*. The Economic Journal,
119(537):764–795, 2009.
Mehmet Caner. Lasso-type gmm estimator. Econometric Theory, 25

142
(01):270–290, 2009.
Mehmet Caner and Hao Helen Zhang. Adaptive elastic net for
generalized methods of moments. Journal of Business & Economic
Statistics, 32(1):30–47, 2014.
Brandice Canes-Wrone, Tom S. Clark, and Jee-Kwang. Park. Judicial
independence and retention elections. Journal of Law, Economics
& Organization, 28(2):211, 2010.
Brandice Canes-Wrone, Tom S. Clark, and Jason P. Kelly. Judicial
selection and death penalty decisions. American Political
Science Review, 108:23–39, 2 2014. ISSN 1537-5943. doi:
10.1017/S0003055413000622. URL
http://journals.cambridge.org/article_S0003055413000622.
Robert A Carp. The scope and function of intra-circuit judicial
communication: A case study of the eighth circuit. Law and
Society Review, pages 405–426, 1972.
Marine Carrasco. A regularization approach to the many instruments
problem. Journal of Econometrics, 170(2):383–398, 2012.
Anne C. Case, Harvey S. Rosen, and James R. Hines. Budget
spillovers and fiscal policy interdependence. Journal of Public
Economics, 52(3):285 – 307, 1993. ISSN 0047-2727. doi:
http://dx.doi.org/10.1016/0047-2727(93)90036-S. URL http:
//www.sciencedirect.com/science/article/pii/004727279390036S.
Devin Caughey, Christopher Warshaw, and Yiqing Xu. The policy
effects of the partisan composition of state government. 2015.
Stephanie Riegg Cellini, Fernando Ferreira, and Jesse Rothstein.
The value of school facility investments: Evidence from a

143
dynamic regression discontinuity design. The Quarterly Journal
of Economics, 125(1):215–261, 2010.
Andrew C Chang. Tax policy endogeneity: evidence from r&d tax
credits. 2014.
John C Chao and Norman R Swanson. Consistent estimation with a
large number of weak instruments. Econometrica, 73(5):1673–1692,
2005.
Howard Chernick. On the determinants of subnational tax
progressivity in the u.s. National Tax Journal, 58(1):pp.
93–112, 2005. ISSN 00280283. URL
http://www.jstor.org/stable/41790177.
Raj Chetty. Is the taxable income elasticity sufficient to
calculate deadweight loss? the implications of evasion and
avoidance. American Economic Journal: Economic Policy, 1(2):
31–52, 2009.
Raj Chetty and Nathaniel Hendren. The economic impacts of tax
expenditures: Evidence from spatial variation across the u.s.
2013.
Raj Chetty, Adam Looney, and Kory Kroft. Salience and taxation:
Theory and evidence. The American Economic Review, 99(4):1145,
2009.
Stephen J. Choi, G. Mitu Gulati, and Eric A. Posner. Professionals
or politicians: The uncertain empirical case for an elected
rather than appointed judiciary. Journal of Law, Economics, and
Organization, 26(2):290, 2010. doi: 10.1093/jleo/ewn023.
Hyonho Chun and Sündüz Keleş. Sparse partial least squares
regression for simultaneous dimension reduction and variable

144
selection. Journal of the Royal Statistical Society: Series B
(Statistical Methodology), 72(1):3–25, 2010.
Kenneth Ward Church and Patrick Hanks. Word association norms,
mutual information, and lexicography. Computational linguistics,
16(1):22–29, 1990.
Stephen Coate. Property taxation, zoning, and efficiency: A
dynamic analysis. Technical report, National Bureau of Economic
Research, 2011.
Michael Collins. Discriminative training methods for hidden markov
models: Theory and experiments with perceptron algorithms. In
Proceedings of the ACL-02 Conference on Empirical Methods in
Natural Language Processing - Volume 10, EMNLP ’02, pages 1–8,
Stroudsburg, PA, USA, 2002. Association for Computational
Linguistics. doi: 10.3115/1118693.1118694. URL
http://dx.doi.org/10.3115/1118693.1118694.
Marquis de Condorcet. Essay on the application of analysis to the
probability of majority decisions. 1785.
Carl Davis. Who Pays?: A Distributional Analysis of the Tax
Systems in All 50 States. Institute on Taxation & Economic
Policy, 2015.
Leandro De Magalhães and Lucas Ferrero. Separation of powers and
the tax level in the us states. Southern Economic Journal, 2015.
David Deming. Early childhood intervention and life-cycle skill
development: Evidence from head start. American Economic
Journal: Applied Economics, pages 111–134, 2009.
Matthew J. Denny, Brendan O’Connor, and Hanna Wallach. A little

145
bit of nlp goes a long way: Finding meaning in legislative
texts with phrase extraction. Technical report, 2015.
M.A. Desai. The degradation of reported corporate profits. The
Journal of Economic Perspectives, 19(4):171–192, 2005.
Mathias Dewatripont, Ian Jewitt, and Jean Tirole. The economics of
career concerns, part ii: Application to missions and
accountability of government agencies. Review of Economic
Studies, 66(1):pp.199–217, 1999. ISSN 00346527. URL
http://www.jstor.org.ezproxy.cul.columbia.edu/stable/2566956.
Anthony Downs. An economic theory of political action in a
democracy. Journal of Political Economy, 65(2):pp. 135–150,
1957. ISSN 00223808. URL http://www.jstor.org/stable/1827369.
Jean-Pierre Dubé, Jeremy T Fox, and Che-Lin Su. Improving the
numerical performance of static and dynamic aggregate discrete
choice random coefficients demand estimation. Econometrica, 80
(5):2231–2267, 2012.
Gilles Duranton, Laurent Gobillon, and Henry G Overman. Assessing
the effects of local taxation using microgeographic data*. The
Economic Journal, 121(555):1017–1046, 2011.
Richard F Dye and Therese J McGuire. The effect of property tax
limitation measures on local government fiscal behavior. Journal
of Public Economics, 66(3):469–487, 1997.
S.M. Dynarski and J.E. Scott-Clayton. The cost of complexity in
federal student aid: Lessons from optimal tax theory and
behavioral economics. Technical report, National Bureau of
Economic Research, 2006.
Lee Epstein, William M. Landes, and Richard A. Posner. The

146
Behavior of Federal Judges. Harvard University Press, 2013. URL
https://read.amazon.com/.
Pablo Fajgelbaum, Eduardo Morales, Juan Carlos Suarez Serrato, and
Owen Zidar. State taxes and spatial misallocation. Technical
report, Working Paper, 2015.
Naomi E Feldman and Israel Beer Sheva. A reevaluation of property
tax capitalization: The case of michigan’s proposal a. 2010.
Martin Feldstein. Tax avoidance and the deadweight loss of the
income tax. Review of Economics and Statistics, 81(4):674–680,
1999.
J. Ferejohn. Incumbent performance and electoral control. Public
Choice, 50(1-3):5–25, 1986. ISSN 0048-5829. doi:
10.1007/bf00124924.
Claudio Ferraz and Frederico Finan. Electoral accountability and
corruption: Evidence from the audits of local governments.
American Economic Review, 101(4):1274–1311, 2011. ISSN
0002-8282. doi: 10.1257/aer.101.4.1274. URL
<GotoISI>://WOS:000292186600009.
Amy Finkelstein. E-ztax: Tax salience and tax rates. Quarterly
Journal of Economics, 124(3), 2009.
William Fischel, Wallace Oates, and Joan Youngman. Are local
property taxes regressive, progressive, or what? Unpublished
Manuscript, University of Maryland, College Park, 2011.
Don Fullerton and Gilbert E Metcalf. Tax incidence. Handbook of
public economics, 4:1787–1872, 2002.
David Gamage and Darien Shanske. Three essays on tax salience:

147
Market salience and political salience. Tax L. Rev., 65:19,
2011.
General Accountability Office GAO. Internal revenue service,
challenges remain in combating abusive tax shelters, reprinted
in gao rep. No. 04-104T, 2003.
Eric Gautier and Alexandre Tsybakov. High-dimensional instrumental
variables regression and confidence sets. arXiv preprint
arXiv:1105.2454, 2011.
Matt Gentzkow, Jesse M Shapiro, and Matt Taddy. Measuring
polarization in high-dimensional data: Method and application
to congressional speech. 2015.
Matthew Gentzkow and Jesse M Shapiro. What drives media slant?
evidence from us daily newspapers. Econometrica, 78(1):35–71,
2010.
Matthew Gentzkow, Jesse M Shapiro, and Michael Sinkinson.
Competition and ideological diversity: Historical evidence from
us newspapers. The American Economic Review, 104(10):3073–3114,
2014.
Henry George. Progress and Poverty: An Enquiry Into the Cause of
Industrial Depressions, and of Increase of Want with Increase of
Wealth. The Remedy. K. Paul, Trench & Company, 1879.
Xavier Giroud and Joshua Rauh. State taxation and the reallocation
of business activity: Evidence from establishment-level data.
Working Paper 21534, National Bureau of Economic Research,
September 2015. URL http://www.nber.org/papers/w21534.
Y. Givati. Resolving legal uncertainty: The unfulfilled promise
of advance tax rulings. Virginia Tax Review, 29:137, 2009.

148
Edward L Glaeser. The Incentive Effects of Property Taxes on Local
Governments. Public Choice, 89(1-2):93–111, October 1996. URL
http://ideas.repec.org/a/kap/pubcho/v89y1996i1-2p93-111.html.
Jacob Goldin. Optimal tax salience. Journal of Public Economics,
forthcoming, 2015.
Jacob Goldin and Tatiana Homonoff. Smoke gets in your eyes:
Cigarette tax salience and regressivity. American Economic
Journal: Economic Policy, 5(1):302–336, 2013.
Roger Gordon and Wei Li. Tax structures in developing countries:
Many puzzles and a possible explanation. Journal of public
Economics, 93(7):855–866, 2009.
Roger H Gordon and Wojciech Kopczuk. The choice of the personal
income tax base. Journal of Public Economics, 118:97–110, 2014.
S.C. Gordon and G.A. Huber. The effect of electoral
competitiveness on incumbent behavior. Quarterly Journal of
Political Science, 2(2):107–138, 2007. doi: 10.1561/100.00006035.
Michael J. Graetz. Paint-by-numbers tax lawmaking. Columbia Law
Review, 95(3):609–682, 1995. ISSN 00101958. URL
http://www.jstor.org/stable/1123226.
M.J. Graetz. Tax reform unraveling. The Journal of Economic
Perspectives, 21(1):69–90, 2007.
David Greenaway and Richard Kneller. Firm heterogeneity, exporting
and foreign direct investment*. The Economic Journal, 117(517):
F134–F161, 2007. ISSN 1468-0297. doi:
10.1111/j.1468-0297.2007.02018.x. URL
http://dx.doi.org/10.1111/j.1468-0297.2007.02018.x.
Erwin N. Griswold. The need for a court of tax appeals. Harvard

149
Law Review, 57(8):pp. 1153–1192, 1944. ISSN 0017811X. URL
http://www.jstor.org/stable/1334533.
Paulo Guimaraes, Octávio Figueiredo, and Douglas Woodward.
Industrial location modeling: Extending the random utility
framework*. Journal of Regional Science, 44(1):1–20, 2004.
Jiang Guo, Wanxiang Che, Haifeng Wang, and Ting Liu. Revisiting
embedding features for simple semi-supervised learning. In
Proceedings of EMNLP, pages 110–120, 2014.
M.G. Hall and C.W. Bonneau. Does quality matter? challengers in
state supreme court elections. American Journal of Political
Science, 50(1):20–33, January 2006. doi:
10.1111/j.1540-5907.2006.00167.x.
Bruce W Hamilton. Capitalization of intrajurisdictional
differences in local tax prices. The American Economic Review,
pages 743–753, 1976.
Christian Hansen, Jerry Hausman, and Whitney Newey. Estimation
with many instrumental variables. Journal of Business and
Economic Statistics, 26(4), 2008.
Stephen Hansen, Michael McMahon, and Andrea Prat. Transparency and
deliberation within the fomc: a computational linguistics
approach. 2014.
F.A. Hanssen. Is there a politically optimal level of judicial
independence? The American Economic Review, 94(3):712–729,
2004. URL
http://www.jstor.org.ezproxy.cul.columbia.edu/stable/3592949.
Andrew Haughwout, Robert Inman, Steven Craig, and Thomas Luce.

150
Local revenue hills: evidence from four us cities. Review of
Economics and Statistics, 86(2):570–585, 2004.
Mary L Heen. Plain meaning, the tax code, and doctrinal
incoherence. Hastings LJ, 48:771, 1996.
Alexander Hertel-Fernandez and Konstantin Kashin. Capturing
business power across the states with text reuse. In annual
conference of the Midwest Political Science Association,
Chicago, April, pages 16–19, 2015.
W. Hettich and S.L. Winer. Democratic choice and taxation: A
theoretical and empirical analysis. Cambridge Univ Pr, 2005.
Rachael K Hinkle. Into the words: Using statutory text to explore
the impact of federal courts on state policy diffusion. American
Journal of Political Science, 2015.
RG Holcombe. Tax policy from a public choice perspective. National
Tax Journal, 51(2):359–371, 1998.
Thomas J Holmes. The effect of state policies on the location of
manufacturing: Evidence from state borders. Journal of
Political Economy, 106(4):667–705, 1998.
J. Holtzblatt and J. McCubbin. Whose child is it anyway?
simplifying the definition of a child. National Tax Journal, 56
(3):701–718, 2003.
Craig A Hoover. Deference to federal circuit court interpretations
of unsettled state law: Factors, etc., inc. v. pro arts, inc.
Duke Law Journal, pages 704–732, 1982.
Christopher Howard. The hidden welfare state: Tax expenditures
and social policy in the United States. Princeton University
Press, 1999.

151
G. A. Huber and S. C. Gordon. Accountability and coercion: Is
justice blind when it runs for office? American Journal of
Political Science, 48(2):247–263, 2004. ISSN 0092-5853. doi:
10.2307/1519881. Times Cited: 79 Huber, GA Gordon, SC Huber,
Gregory/A-5950-2012 Huber, Gregory/0000-0001-6804-8148 79.
Matias Iaryczower, Garrett Lewis, and Matthew Shum. To elect or to
appoint? bias, information, and responsiveness of bureaucrats
and politicians. Journal of Public Economics, 97:230–244, 2013.
Joshua M Jansa, Eric R Hansen, and Virginia H Gray. Copy and paste
lawmaking: The diffusion of policy language across american
state legislatures. 2015.
Zubin Jelveh, Bruce Kogut, and Suresh Naidu. Political language in
economics. 2015.
Jacob Jensen, Suresh Naidu, Ethan Kaplan, Laurence Wilse-Samson,
DAVID GERGEN, MICHAEL ZUCKERMAN, and ARTHUR SPIRLING. Political
polarization and the dynamics of political language: Evidence
from 130 years of partisan speech [with comments and
discussion]. Brookings Papers on Economic Activity, pages 1–81,
2012.
Åsa Johansson, Chistopher Heady, Jens Arnold, Bert Brys, and Laura
Vartia. Taxation and economic growth. 2008.
Daniel Martin Katz and Michael J Bommarito II. Measuring the
complexity of the law: the united states code. Artificial
Intelligence and Law, 22(4):337–374, 2014.
H.J. Kleven and W. Kopczuk. Transfer program complexity and the
take-up of social benefits. American Economic Journal: Economic
Policy, 3(1):54–90, 2011.

152
H.J. Kleven, M.B. Knudsen, C.T. Kreiner, S. Pedersen, and E. Saez.
Unwilling or unable to cheat? evidence from a tax audit
experiment in denmark. Econometrica, 79(3):651–692, 2011.
Jeffrey R Kling, Jeffrey B Liebman, and Lawrence F Katz.
Experimental analysis of neighborhood effects. Econometrica, 75
(1):83–119, 2007.
W. Kopczuk. Redistribution when avoidance behavior is
heterogeneous. Journal of Public Economics, 81(1):51–71, 2001.
Wojciech Kopczuk. Tax bases, tax rates and the elasticity of
reported income. Journal of Public Economics, 89(11):2093–2119,
2005.
Wojciech Kopczuk and David J Munroe. Mansion tax: The effect of
transfer taxes on the residential real estate market. Technical
report, National Bureau of Economic Research, 2014.
A. Krishna and J. Slemrod. Behavioral public finance: tax design
as price presentation. International Tax and Public Finance, 10
(2):189–203, 2003.
Herbert M. Kritzer. Competitiveness in state supreme court
elections, 1946–2009. Journal of Empirical Legal Studies, 8(2):
237–259, 2011. doi: 10.1111/j.1740-1461.2011.01208.x.
William M. Landes and Richard A. Posner. Rational judical
behavior: A statistical study. The Journal of Legal Analysis,
1:775–831, 2009. URL http://ssrn.com/abstract=1126403.
D. S. Lee, E. Moretti, and M. J. Butler. Do voters affect or elect
policies? evidence from the us house. Quarterly Journal of
Economics, 119(3):807–859, 2004. doi: 10.1162/0033553041502153.

153
URL <GotoISI>://WOS:000223100200002. Times Cited: 91 Lee, DS
Moretti, E Butler, MJ 91.
David S. Lee and Thomas Lemieux. Regression discontinuity designs
in economics. Journal of Economic L, 48(2):281–355, June 2010.
doi: 10.1257/jel.48.2.281.
Andrew Leigh. Estimating the impact of gubernatorial partisanship
on policy settings and economic outcomes: A regression
discontinuity approach. European Journal of Political Economy,
24(1):256–268, 2008.
Omer Levy and Yoav Goldberg. Dependencybased word embeddings. In
Proceedings of the 52nd Annual Meeting of the Association for
Computational Linguistics, volume 2, pages 302–308, 2014.
Omer Levy, Yoav Goldberg, and Israel Ramat-Gan. Linguistic
regularities in sparse and explicit word representations.
CoNLL-2014, page 171, 2014.
Omer Levy, Yoav Goldberg, and Ido Dagan. Improving distributional
similarity with lessons learned from word embeddings.
Transactions of the Association for Computational Linguistics,
3:211–225, 2015.
A. Likhovski. The duke and the lady: Helvering v. gregory and the
history of tax avoidance adjudication. Cardozo Law Review, 25,
2004.
Claire Lim and James M. Snyder. Is more information always
better?party cues and candidate quality in u.s. judicial
elections. Journal of Public Economics, April 2015. URL
doi:10.1016/j.jpubeco.2015.04.006.
Claire H S Lim. Preferences and incentives of appointed and

154
elected public officials: Evidence from state trial court
judges. American Economics Review, 2013.
Wei Lin, Rui Feng, and Hongzhe Li. Regularization methods for
high-dimensional instrumental variables regression with an
application to genetical genomics. Journal of the American
Statistical Association, 110(509):270–288, 2015.
John A. List and Daniel M. Sturm. Elections matter: Theory and
evidence from environmental policy. Quarterly Journal of
Economics, 121(4):1249–1281, 2006. ISSN 0033-5533. doi:
10.1093/qje/121.4.1249. URL <GotoISI>://WOS:000242802700004. Times
Cited: 50 50.
Michael Livingston. Practical reason, purposivism, and the
interpretation of tax statutes. Tax. L. Rev., 51:677, 1995.
K.D. Logue. Optimal tax compliance and penalties when the law is
uncertain. Va. Tax Rev., 27:241, 2007.
Byron F Lutz. The connection between house price appreciation and
property tax revenues. National Tax Journal, pages 555–572,
2008.
Teemu Lyytikäinen. Three-rate property taxation and housing
construction. Journal of Urban Economics, 65(3):305–313, 2009.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff
Dean. Distributed representations of words and phrases and their
compositionality. In Advances in neural information processing
systems, pages 3111–3119, 2013.
James A Mirrlees. An exploration in the theory of optimum income
taxation. The review of economic studies, pages 175–208, 1971.
Christopher Z Mooney and Mei-Hsien Lee. Legislative morality in

155
the american states: The case of pre-roe abortion regulation
reform. American Journal of Political Science, pages 599–627,
1995.
Michael A Nelson. Electoral cycles and the politics of state tax
policy. Public Finance Review, 28(6):540–560, 2000.
Michael J Nelson, Rachel Paine Caufield, and Andrew D Martin. Oh,
mi: A note on empirical examinations of judicial elections.
State Politics & Policy Quarterly, page 1532440013503838, 2013.
Mr John Norregaard. Taxing Immovable Property Revenue Potential
and Implementation Challenges. Number 13-129. International
Monetary Fund, 2013.
Peter C. O’Brien. Procedures for comparing samples with multiple
endpoints. Biometrics, pages 1079–1087, 1984.
Ryo Okui. Instrumental variable estimation in the presence of many
moment conditions. Journal of Econometrics, 165(1):70–86, 2011.
H. Ordower. The culture of tax avoidance. Saint Louis University
Law Journal, Vol. 55, 2010, Saint Louis U. Legal Studies
Research Paper No. 2010-06, 2010.
Oded Palmon and Barton A Smith. New evidence on property tax
capitalization. Journal of Political Economy, 106(5):1099–1111,
1998.
Rohini Pande. Can informed voters enforce better governance?
experiments in low-income democracies. Annual Review of
Economics, 3(1):215–237, 2011. doi:
10.1146/annurev-economics-061109-080154. URL
http://dx.doi.org/10.1146/annurev-economics-061109-080154.
Leslie E. Papke. Interstate business tax differentials and new

156
firm location: Evidence from panel data. Journal of Public
Economics, 45(1):47 – 68, 1991. ISSN 0047-2727. doi:
http://dx.doi.org/10.1016/0047-2727(91)90047-6. URL http:
//www.sciencedirect.com/science/article/pii/0047272791900476.
Kyung H Park. Judicial Elections and Discrimination in Criminal
Sentencing. PhD thesis, Harris School, University of Chicago,
Chicago, IL, April 2014.
D.L. Paul. Sources of tax complexity: How much simplicity can
fundamental tax reform achieve, the. NCL Rev, 76:151, 1997.
Torsten Persson and Guido Enrico Tabellini. Political economics:
explaining economic policy. MIT press, 2002.
Thomas Piketty. Capital in the 21st century. Cambridge: Harvard
Uni, 2013.
Thomas Piketty and Emmanuel Saez. A theory of optimal inheritance
taxation. Econometrica, 81(5):1851–1886, 2013.
G.A. Plesko. Estimates of the magnitude of financial and tax
reporting conflicts. Technical report, National Bureau of
Economic Research, 2007.
D. Pomeranz. No taxation without information. 2011.
Martin F Porter. Snowball: A language for stemming algorithms,
2001.
Richard A. Posner. The law and economics movement. The American
Economic Review, 77(2):pp.1–13, 1987. ISSN 00028282. URL
http://www.jstor.org.ezproxy.cul.columbia.edu/stable/1805421.
James Poterba and Todd Sinai. Tax expenditures for owner-occupied
housing: Deductions for property taxes and mortgage interest
and the exclusion of imputed rental income. The American

157
Economic Review, 98(2):84–89, 2008. ISSN 00028282. URL
http://www.jstor.org/stable/29730000.
David Pozen. Judicial elections as popular constitutionalism.
Columbia Law Review, 110:2047–2134, 2010.
Kevin M Quinn, Burt L Monroe, Michael Colaresi, Michael H Crespin,
and Dragomir R Radev. How to analyze political attention with
minimal assumptions and costs. American Journal of Political
Science, 54(1):209–228, 2010.
C Radhakrishna Rao. Estimation and tests of significance in factor
analysis. Psychometrika, 20(2):93–111, 1955.
W Robert Reed. Democrats, republicans, and taxes: Evidence that
political parties matter. Journal of Public Economics, 90(4):
725–750, 2006.
Andrew Reschovsky. Usually the best available tax, but it’s a
complex question. Cityscape, pages 247–254, 2013.
Matthew Rognlie. A note on piketty and diminishing returns to
capital. Tillgänglig:< http://www. mit. edu/˜
mrognlie/piketty_diminishing_returns. pdf, 2014.
Jonathan C Rork. Coveting thy neighbors’ taxation. National Tax
Journal, pages 775–787, 2003.
Justin M Ross and Wenli Yan. Fiscal illusion from property
reassessment? an empirical test of the residual view. National
Tax Journa, 2013.
D.M. Schizer. Enlisting the tax bar. Tax L. Rev., 59:331, 2005.
Juan Carlos Suárez Serrato and Owen Zidar. Who benefits from state
corporate tax cuts? a local labor markets approach with

158
heterogeneous firms. Technical report, National Bureau of
Economic Research, 2014.
Daniel Shaviro. Beyond public choice and public interest: A study
of the legislative process as illustrated by tax legislation in
the 1980s. University of Pennsylvania Law Review, pages 1–123,
1990.
Daniel Shaviro. An economic and political look at federalism in
taxation. Michigan Law Review, pages 895–991, 1992.
Daniel Shaviro. Rethinking tax expenditures and fiscal language.
Tax Law Review, 57:187, 2004.
Joanna M. Shepherd. The influence of retention politics on judges’
voting. The Journal of Legal Studies, 38(1):169–206, 2009. doi:
10.1086/592096.
Mark Skidmore, Charles L Ballard, and Timothy R Hodge. Property
value assessment growth limits and redistribution of property
tax payments: evidence from michigan. National Tax Journal, 63
(3):509–538, 2010.
J. Slemrod. The economics of corporate tax selfishness. Technical
report, National Bureau of Economic Research, 2004.
J. Slemrod. The etiology of tax complexity: Evidence from us
state income tax systems. Public Finance Review, 33(3):279,
2005.
J. Slemrod and S. Yitzhaki. Tax avoidance, evasion, and
administration. Handbook of public economics, 3:1423–1470, 2002.
J. Slemrod, M. Blumenthal, and C. Christian. Taxpayer response to
an increased probability of audit: evidence from a controlled

159
experiment in minnesota. Journal of Public Economics, 79(3):
455–483, 2001.
Joel Slemrod and Jon Bakija. Taxing ourselves: a citizen’s guide
to the debate over taxes. MIT Press Books, 1, 2008.
Joel Slemrod, Caroline Weber, and Hui Shan. The lock-in effect of
housing transfer taxes: Evidence from a notched change in dc
policy. University of Michigan: Michigan, 2012.
James M. Snyder and David Stromberg. Press coverage and political
accountability. Journal of Political Economy, 118(2):355–408,
2010. ISSN 0022-3808. URL <GotoISI>://WOS:000277626200005.
L.M. Solan and S.A. Dean. Tax shelters and the code: Navigating
between text and intent. 26 Va. Tax Rev. 879, 2006:879, 2007.
Robert M Solow and William S Vickrey. Land use in a long narrow
city. Journal of Economic Theory, 3(4):430–447, 1971.
Yan Song and Yves Zenou. Property tax and urban sprawl: Theory
and implications for us cities. Journal of Urban Economics, 60
(3):519–534, 2006.
Joseph E. Stiglitz. The theory of local public goods. The
Economics of Public Services, 1977.
William F Stine. Estimating the determinants of property
reassessment duration: An empirical study of pennsylvania
counties. Journal of Regional Analysis and Policy, 40(2), 2010.
Paolo Surico and Riccardo Trezzi. Consumer spending and property
taxes. Technical report, Federal Reserve Discussion Papers,
2015.
Stanley S Surrey. The congress and the tax lobbyist: How special

160
tax provisions get enacted. Harvard Law Review, pages 1145–1182,
1957.
Robert D Tollison. Public choice and legislation. Virginia Law
Review, pages 339–371, 1988.
Roberto P Vasconcellos. Vague concepts and uncertainty in tax law:
The case of comparative tax judicial review. 2007.
Michael L Walden and Zulal Denaux. Lags in real property
revaluations and estimates of shortfalls in property tax
collections in north carolina. Journal of Agricultural and
Applied Economics, 205:213, 2002.
John Joseph Wallis. A history of the property tax in america.
Property Taxation and Local Government Finance, edited by
Wallace Oates. Cambridge, MA, 2001.
Patrick L Warren. State parties and taxes: A comment on reed in
the context of close legislatures. Available at SSRN 1144057,
2009.
Nada Wasi and Michelle J White. Property tax limitations and
mobility: The lock-in effect of california’s proposition 13.
Technical report, National Bureau of Economic Research, 2005.
D.A. Weisbach. Formalism in the tax law. The University of Chicago
Law Review, pages 860–886, 1999.
D.A. Weisbach. An economic analysis of anti-tax-avoidance
doctrines. American Law and Economics Review, 4(1):88–115, 2002.
Harold L. Wilensky. The professionalization of everyone? American
Journal of Sociology, 70(2):pp.137–158, 1964. ISSN 00029602.
John D Wilson. Optimal property taxation in the presence of

161
interregional capital mobility. Journal of Urban Economics, 18
(1):73–89, 1985.
H. P. Young. Condorcet’s theory of voting. The American Political
Science Review, 82(4):pp. 1231–1244, 1988. ISSN 00030554. URL
http://www.jstor.org.ezproxy.columbia.edu/stable/1961757.
Mo Yu, Tiejun Zhao, Daxiang Dong, Hao Tian, and Dianhai Yu.
Compound embedding features for semi-supervised learning. In
HLT-NAACL, pages 563–568, 2013.
George R Zodrow and P Mieszowski. Pigou, tiebout, property
taxation and the underprovision of local public goods’(1986) 19.
Journal of Urban Economics, 357, 1986.
Hui Zou and Trevor Hastie. Regularization and variable selection
via the elastic net. Journal of the Royal Statistical Society:
Series B (Statistical Methodology), 67(2):301–320, 2005.
Gabriel Zucman et al. The hidden wealth of nations. University of
Chicago Press Economics Books, 2015.

162
Appendix

163
Appendix A

Ch. 1: The Performance of Elected


Officials: Evidence from State Courts

This appendix enumerates the proofs for the major theoretical results from Section 1.3.
Subsection A.1.1 formalizes the effects of bias and noise on the quality of selected judges.
A.1.2 formalizes the role of bias and noise in campaign incentives for judge effort.

A.1 Model Appendix

A.1.1 Effect of bias and noise on judge quality

Let φ, Φ respectively denote the standard normal’s probability density and cumulative distri-
bution functions. The expected quality of judges selected by the governor, expression 1.3.2,
can be written as:

Z ∞ Z ∞
G
q̄ (b) = (qA + (qB − qA ) I (qA, qB , b)) φ (qB ) φ (qA ) dqB dqA .
Z−∞
∞ −∞
Z ∞ 
= qA + (qB − qA ) φ (qB ) dqB φ (qA ) dqA (A.1.1)
−∞ qA +b

164
Clearly q̄ M = q̄ G (0). Notice that:


dq̄ G (b)
Z
= (−bφ (qA + b)) φ (qA ) dqA
db −∞
Z ∞
= −b φ (qA + b) φ (qA ) dqA
−∞
1 b2
= − √ b exp(− ) < 0. (A.1.2)
π 4

This shows that a small amount of bias has a small negative effect on quality, that gets larger
with b. This proves Proposition 1.
Next we consider the expected quality with elections. In this case the expected payoff is
over qA and qB , with selection determined by the signals:

Z Z
E
q̄ (b) = (qA + (qA − qB ) Pr [sB > sA + b|qA , qB ]) φ (qA ) φ (qB ) dqA dqB .

Notice that (qA − qB ) I (qA, qB , b) > (qA − qB ) Pr [sB > sA + b|qA , qB ] and hence we have im-
mediately that q̄ G (b) > q̄ E (b). Also since

d Pr [sB > sA + b|qA , qB ]


<0
db

for all qA , qB , we have that expected ability of judges falls with b. This implies Proposition
2.

A.1.2 Effect of campaign incentives on effort

We can write the signals observed by the voters as:

sj = mj + rj j ,

= πj (xj + ej ) + πj σj j ,

165
where j follows a standard normal distribution. Let us compute:

Pr [mA + rA A + b ≥ mB + rB B ] .

The inequality can be rewritten as:

q
2 2
mA + b + eA − eB − mB ≥ rB B − rA A = rB + rA ,

where  is a standard normal distribution. Hence, we have:


!
mA + b − mB
Pr [mA + b + eA − eB + rA A ≥ mB + rB B ] = Φ p
2 2
rB + rA

where F (·) is the standard normal cdf. In our case we have

ρj
mj = (xj + ej )
1 + ρj

and

ρj
rj = σj
1 + ρj

ρj
= .
1 + ρj

Taking the effort of the other judge as given, the first order condition for a judge defines an
optimal effort choice:
!
πj mA + b − mB
Cj0 (ej ) = B p 2 2
φ p
2 2
. (A.1.3)
rB + rA rB + rA
ρA ρB
!
πj (q
1+ρA A
+ eA ) + b − 1+ρ B
(qB + eB )
= Bp 2 2
φ p
2 2
. (A.1.4)
rB + rA rB + rA

Observe that if πA = πB , both judges choose the same level of effort, and this has no

166
effect on the probability of winning – it is a negative sum game.

Assumption Effort costs are strongly convex given ρj , i ∈ {A, B} if for every x ∈ < the
solution to the following equation is unique:
!
πj πj e
Cj0 (e) = B p 2 2
φ p
2 2
+ x , i ∈ {A, B} .
rB + rA rB + rA

Such functions exist because φ > 0 and φ0 , φ00 are bounded, and Cj (0) = Cj0 (0) = 0, Cj00 > 0.
More generally, given any function C (e) satisfying C (0) = C 0 (0) = 0, C 00 > 0, and precisions
ρj for j ∈ A, B, one can choose γj > 0 sufficiently large that this condition holds for
Cj (e) = γj C (e).

Proposition 4. If effort costs are strongly convex given ρj , i ∈ {A, B} then there exists a
Nash equilibrium in campaign effort. Moreover Judge A chooses more effort than Judge B
(eA > eB ) if and only if the quality of information regarding Judge A is higher (πA > πB ).

Proof. Notice that the maximum effort possible for judge j is:

πj
Cj0 emax

j =p 2 2
φ (0) .
rB + rA

Let m = max {πA emax max


A , πB eB } and define the function:

h : [−m, m] → [−m, m]

by:
ρA ρB
h (x) = eA (x) − eB (x)
1 + ρA 1 + ρB

where:
ρA ρB
!
πj q
1+ρA A
+ b − 1+ρ qB + x
Cj0 (ej (x)) = B p 2 2
φ p
2 2
B
.
rB + rA rB + rA

Strong convexity ensures that ej (x) is a uniquely defined continuous function of x that

167
maximizes the payoff of judge j given the effort of the other judge. Hence h (x) is continuous,
and by Brower’s fixed point theorem we have the existence of x∗ such that h (x∗ ) = x∗ , which
is in turn by construction a Nash equilibrium, where:

ρA ρB
!
πj q
1+ρA A
+ b − 1+ρ q B + x∗
Cj0 e∗j = B p 2
 B
2
φ p
2 2
,
rB + rA rB + rA
ρA ρB
!
πj (q
1+ρA A
+ e∗A ) + b − 1+ρB
(qB + e∗B )
= Bp 2 2
φ p
2 2
.
rB + rA rB + rA

A.2 Empirical Appendix

This appendix includes some further notes on the data and the institutional reforms, as well
as further regression specifications.

A.2.1 Notes on Institutional Reforms

This section provides some notes on the institutional reforms. The key point is that there
were often coterminous reforms, such as the introduction of an intermediate appellate court.
To deal with this we ran all the regressions while leaving one state out. None of the re-
sults were substantially changed in these checks. Note that these coterminous reforms only
threaten identification in the analysis of retention-process reforms. When we look at the
electoral cycle and when we look at selection effects, we are holding court-specific incentives
constant.
Colorado instituted an intermediate appellate court in 1971, four years after the election
reform. Changing Colorado to a four year window does not change the results. Florida
moved from partisan to non-partisan elections in 1972, then moved from non-partisan to
merit-uncontested in 1977. Florida is not included in the selection process regressions. In
the retention-process regressions we treat these as separate reforms with five-year effect

168
windows. Removing Florida from the regressions does not change the results.
At the same time that Illinois changed from partisan retention to uncontested retention
(November 1962), the state also increased judge term lengths from nine years to ten years.
However, the term-lengths change went into effect in January 1963, two years before the
election reform went into effect.
At the same time it moved from partisan to merit-uncontested, Indiana increased term
lengths from six years to ten years.
Kentucky instituted an intermediate appellate court at the same time that it moved from
partisan to non-partisan elections.
The Maryland governor began selecting new appointees by merit commission beginning
in 1971. When it moved from non-partisan retention to uncontested retention, the term
length was reduced from 15 years to 10 years.
Oklahoma instituted an intermediate appellate court at the same time it moved from
partisan to merit-uncontested.
In 1973, South Dakota increased its term length from six years to eight years, eight years
before the non-partisan to merit-uncontested reform.
Tennessee moved from partisan to merit-uncontested in 1972, then moved back to partisan
elections in 1975. It is not included in the analysis.
Utah instituted an intermediate appellate court in 1988, two years after the reform from
non-partisan to merit-uncontested.

A.2.2 Additional Regression Results

This appendix reports additional empirical results.


Table A.1 reports summary statistics for the additional set of outcome variables. We have
dissents and concurrences reported separately. The outcome used in the main tables, discre-
tionary opinions written, includes concurrences, dissents, and opinions that are concurring
in part and dissenting in part.

169
Table A.1: Summary Statistics on Judge-Year Performance Variables (Additional Outcomes)
Outcome Variable Mean Std. Dev. Min Max

Dissents Written 3.755 5.694 0 129


Concurrences Written 1.816 3.448 0 60

Negative Cites Per Opinion 0.575 0.895 0 20.625


Federal Circuit Cites Per Opinion 0.649 3.624 0 153.11
Multiple-Use Cites Per Opinion 1.288 2.024 0 96
Proportion of Cases Overruled 0.052 0.194 0 5.52
Proportion of Cases Superseded by Statute 0.047 0.106 0 2.5

Total Negative Cites 11.629 15.932 0 246


Total Federal Circuit Cites 16.509 106.101 0 3503
Total Multiple-Use Cites 26.141 31.348 0 800
Total Cases Overruled 1.063 3.174 0 116
Total Cases Superseded by Statute 0.947 1.868 0 28

Notes. Observation is a judge-year, N=16,084. These statistics are constructed from each judge's yearly out-
put of cases. “Per Opinion” measures are divided by the number of majority opinions written that year. See
variable definitions in the accompanying text.

We also report in the appendix an additional set of citations measures. These measures
were excluded from the main text for brevity and because they are relatively rare, as can be
seen in the summary statistics. Negative citations are coded as negative by Bloomberg staff
attorneys. Federal circuit cites includes citations from federal circuit courts. “Multiple-use”
cites means that an opinion is cites multiple times by a later court, which is similar to the
discussion cites measure used in the text. A case can be overruled by the state supreme
court at a later date, or it can be overruled by the U.S. Supreme Court on appeal. Finally, a
case can be superseded by statute – which means the state legislature passes a law to reverse
a court ruling.
Tables A.7, A.4, A.11, and A.16 report regression results from the same equations esti-
mated in the corresponding main tables but with the additional outcome measures. These
results generally line up with those in the main tables. Note that there are often strong
effects on negative cites, but that may be due to the initially low average for this measure
(see Table A.1).
Tables A.2 and A.3 show the effects of the electoral cycle on individual performance

170
Table A.2: Effect of Being Up For Election (Output and Effort)
Non-Partisan Election Uncontested Election
Partisan Election Year
Year Year
Outcome (1) (2) (3)

Majority Opinions Written -0.0583* -0.101 0.0654


(0.0296) (0.0658) (0.0425)
Words in Majority Opinions -0.0786* -0.119 0.0896+
(0.0350) (0.0739) (0.0470)
Cases Cited in Majority Opinions -0.0816* -0.108 0.100*
(0.0396) (0.0771) (0.0453)
Discretionary Opinions Written -0.0736** -0.0699* 0.061
(0.0285) (0.0354) (0.0383)
Words in Discretionary Opinions -0.208+ -0.154 0.362**
(0.123) (0.128) (0.126)
Cases Cited in Discretionary Opinions -0.1 -0.131** 0.0711
(0.0854) (0.0404) (0.0873)

Words Per Majority Opinion -0.0196+ -0.0166 0.0218+


(0.0108) (0.0130) (0.0128)
Cases Cited Per Majority Opinion -0.0211 -0.00571 0.0331*
(0.0168) (0.0204) (0.0163)
Words Per Discretionary Opinion -0.142 -0.0881 0.333**
(0.117) (0.138) (0.103)
Cases Cited Per Discretionary Opinion -0.0377 -0.0499 0.0531
(0.0678) (0.0608) (0.0667)

Treated States 23 17 19
Treated Judges 437 270 277
Election Events 810 517 451
N= 16,084 judge-years. Standard errors clustered by state in parentheses. + p < .1, * p < .05, ** p < .01.
Each row is from a separate regression for the stated outcome variable. Treatment variable is a dummy
equaling one for years judge is facing reelection. Regressions include a state-year fixed effect and judge
fixed effect, estimated using Stat's reg2hdfe module.

171
Table A.3: Effect of Being Up For Election (Quality and Impact)
Non-Partisan Election Uncontested Election
Partisan Election Year
Year Year
Outcome (1) (2) (3)

Positive Cites Per Opinion -0.0390* -0.0222 -0.00167


(0.0183) (0.0231) (0.0186)
Distinguishing Cites Per Opinion -0.038 -0.0297 0.0252
(0.0246) (0.0275) (0.0354)
Discuss Cites Per Opinion -0.026 -0.0273+ 0.0109
(0.0186) (0.0166) (0.0153)
Quoted Cites Per Opinion -0.0273+ -0.0229 0.00125
(0.0151) (0.0150) (0.0204)
Out-of-State Cites Per Opinion -0.0213 0.00414 0.0231
(0.0177) (0.0254) (0.0232)

Total Positive Cites -0.106* -0.158+ 0.0661


(0.0416) (0.0882) (0.0451)
Total Distinguishing Cites -0.124* -0.188* 0.0839
(0.0552) (0.0907) (0.0692)
Total Discuss Cites -0.0888* -0.160* 0.0795+
(0.0444) (0.0715) (0.0427)
Total Quoted Cites -0.0930* -0.161* 0.0685
(0.0424) (0.0725) (0.0416)
Total Out-of-State Cites -0.0970* -0.104 0.0929*
(0.0448) (0.0782) (0.0431)

Treated States 23 17 19
Treated Judges 437 270 277
Election Events 810 517 451
N= 16,084 judge-years. Standard errors clustered by state in parentheses. + p < .1, * p < .05, ** p < .01.
Each row is from a separate regression for the stated outcome variable. Treatment variable is a dummy
equaling one for years judge is facing reelection. Regressions include a state-year fixed effect and judge
fixed effect, estimated using Stat's reg2hdfe module.

172
variables. As shown in Column 1, partisan elections are associated with reduced perfor-
mance/output. First, there is a decrease in the number of majority opinions and discre-
tionary opinions written. The point estimate indicates about an 8% decrease in the number
of words written, though the estimate is quite noisy. Average length of each opinion decreases
as well. Opinion quality is going down slightly (a decrease in positive cites), which means
that total cites are going down significantly for all of the measures.
Column 2 shows the effect for non-partisan elections. The number of discretionary opin-
ions and total words written decrease by 7% and 11% respectively, but as in the case of
a partisan election the estimate is quite noisy. There aren’t significant effects on average
opinion quality, except discussion cites. But the combined effect of slightly fewer opinions
and slightly lower quality results in statistically significant decreases in most of the total
cites measures. Though the coefficients are imprecisely estimated, the point estimates are of
the same order of magnitude as in the case of a partisan election.
Column 3 shows the effect of uncontested elections. There aren’t any negative effects
from the electoral cycle in this system. There are actually some positive effects, with an
increase in total words written, majority opinion length, length of table of cases, and some
of the cites measures.
In Table A.4, note that in partisan and non-partisan election years, there is a decrease in
the number of dissents, but no effect on concurrences. In uncontested elections, meanwhile,
there is a large positive election-year effect on dissents.
Tables A.5 and A.6 report the estimates from Equation 1.6.1 for the individual variables.
Column 1 estimates the average difference in performance between non-partisan judges and
partisan judges. Relative to the partisan judges, the non-partisan judges write shorter opin-
ions, but they are higher quality. Opinions written by non-partisan judges have more positive
cites and more distinguishing than opinions written by partisan judges in the same court
and same year. The effect is larger and more significant when we consider the total cites.
Column 2 estimates the performance measure differential for merit-selected judges relative

173
Table A.4: Effect of Being Up For Election (Additional Outcomes)
Partisan Election Non-Partisan Uncontested Election
Year Election Year Year
Outcome (1) (2) (3)

Number of Concurrences Written -0.0282 -0.0227 -0.00666


(0.0320) (0.0294) (0.0348)
Number of Dissents Written -0.0563** -0.0621* 0.0616*
(0.0201) (0.0254) (0.0270)

Negative Cites Per Opinion -0.0244* -0.0229+ 0.018


(0.0124) (0.0120) (0.0188)
Federal Circuit Cites Per Opinion -0.0154* -0.0042 -0.00141
(0.00679) (0.00532) (0.00971)
Multiple-Use Cites Per Opinion -0.0132 -0.0199+ 0.00565
(0.0105) (0.0114) (0.0175)
Proportion of Cases Overruled -0.000786 0.00634 0.00553
(0.00566) (0.00889) (0.00973)
Proportion of Cases Superseded by Statute -0.00555 -0.000988 0.00865
(0.00355) (0.00352) (0.00636)

Total Negative Cites -0.115* -0.169** 0.1


(0.0505) (0.0654) (0.0680)
Total Federal Circuit Cites -0.101** -0.0664 0.0262
(0.0305) (0.0491) (0.0468)
Total Multiple-Use Cites -0.0819* -0.142** 0.0586
(0.0413) (0.0512) (0.0489)
Cases Overruled -0.0332 -0.0485 0.0504
(0.0329) (0.0385) (0.0583)
Cases Superseded by Statute -0.0454* -0.0251 0.0760+
(0.0224) (0.0252) (0.0413)

174
Table A.5: Effect of Judicial Selection System on Judge Quality (Output and Effort)
Non-Partisan Judges Merit-Selected Judges Merit-Selected Judges
Relative to Partisan Relative to Partisan Relative to Non-
Judges Judges Partisan Judges
Outcome (1) (2) (3)

Majority Opinions Written 0.0844 -0.148+ -0.111


(0.0546) (0.0804) (0.115)
Words in Majority Opinions -0.0525 -0.0566 0.00523
(0.0750) (0.0569) (0.147)
Cases Cited in Majority Opinions -0.0897 0.0186 0.0108
(0.124) (0.0581) (0.126)
Discretionary Opinions Written 0.368 -0.13 0.453*
(0.373) (0.147) (0.208)
Words in Discretionary Opinions 0.579 0.159 1.235
(0.766) (0.456) (0.817)
Cases Cited in Discretionary Opinions 0.891** 0.152 0.837
(0.265) (0.341) (0.536)

Words Per Majority Opinion -0.135* 0.0918 0.116


(0.0507) (0.0595) (0.0722)
Cases Cited Per Majority Opinion -0.167 0.166** 0.121
(0.117) (0.0571) (0.0856)
Words Per Discretionary Opinion 0.206 0.32 0.863
(0.445) (0.372) (0.676)
Cases Cited Per Discretionary Opinion 0.509** 0.296 0.447
(0.106) (0.245) (0.389)

Treated States 3 6 3
Treated State-Years 24 86 24
Treated Judges 14 54 16
N= 16,084 judge-years..Estimate of the average difference between judges selected under a new system, relative to to
judges selected under the old system, limited to years in which there are at least two judges on the court selected from each
system. Regressions include a state-year fixed effect, a full set of dummies for years of experience, and a full set of
dummies for starting years. Standard errors clustered by state in parentheses. + p < .1, * p < .05, ** p < .01.

175
Table A.6: Effect of Judicial Selection System on Judge Quality (Quality and Impact)
Non-Partisan Judges Merit-Selected Judges Merit-Selected Judges
Relative to Partisan Relative to Partisan Relative to Non-
Judges Judges Partisan Judges
Outcome (1) (2) (3)

Positive Cites Per Opinion 0.0692+ 0.0690+ 0.114


(0.0382) (0.0391) (0.0765)
Distinguishing Cites Per Opinion 0.108+ 0.0992+ 0.251+
(0.0582) (0.0518) (0.131)
Discuss Cites Per Opinion 0.0355 0.0702* 0.154*
(0.0652) (0.0290) (0.0593)
Quoted Cites Per Opinion -0.0751 0.0942* 0.239**
(0.0562) (0.0442) (0.0891)
Out-of-State Cites Per Opinion -0.0217 0.121** 0.194
(0.0674) (0.0341) (0.136)

Total Positive Cites 0.148** -0.0777 -0.00391


(0.0553) (0.0776) (0.156)
Total Distinguishing Cites 0.418** -0.0102 0.13
(0.131) (0.106) (0.217)
Total Discuss Cites 0.124 -0.0708 0.0372
(0.0966) (0.0648) (0.152)
Total Quoted Cites -0.0242 -0.0255 0.123
(0.0785) (0.0669) (0.176)
Total Out-of-State Cites 0.0324 0.0308 0.0847
(0.145) (0.0793) (0.232)

Treated States 3 6 3
Treated State-Years 24 86 24
Treated Judges 14 54 16
N= 16,084 judge-years..Estimate of the average difference between judges selected under a new system, relative to to
judges selected under the old system, limited to years in which there are at least two judges on the court selected from each
system. Regressions include a state-year fixed effect, a full set of dummies for years of experience, and a full set of
dummies for starting years. Standard errors clustered by state in parentheses. + p < .1, * p < .05, ** p < .01.

176
to partisan-selected judges. While merit judges write fewer opinions than the partisan elected
judges, they are higher-quality on a range of measures. The merit judges have more caselaw
research, as seen in the Length of Table of Cases. They also have more citations on all of
our metrics – positive, distinguishing, discussed-in, quoted-in, and out-of-state.
In Column 3 we look at the difference between merit-selected judges and non-partisan-
selected judges. First, the merit-selected judges write more discretionary opinions than
the non-partisan-selected judges. In terms of opinion quality, merit-selected judges write
higher-quality opinions than non-partisan-selected judges for most of the citation measures.
For distinguishing cites, discussion cites, and quoted cites, the estimates are statistically
significant.
Table A.8 reports an alternative specification for the selection-process results. Recall
that for the Table 1.6 estimates, we only included years where there were at least two judges
from each system working together on the court. Table A.8 includes all years, so it includes
years where there is only one judge from one of the systems – so the effect is identified off
that judge’s difference from the rest of the court. The estimated coefficients are different in
this table, but the results all go in the same direction. Non-partisan judges are better than
their partisan colleagues. Merit judges are better than their election colleagues.
Tables A.9 and A.10 report retention-system reform effects on individual performance
variables. Column 1 gives the incentive effect on sitting judges of moving from a partisan
system to a non-partisan system. We see small negative coefficients for number of majority
opinions, total words written, and total discussion cites. These are only marginally signifi-
cant, however.
Column 2 has the effect of moving from a partisan system to an uncontested system.
Here we see no effects on performance.
Finally we look at Column 3. In contrast with the other reforms, moving from non-
partisan to uncontested elections is associated with an increase in performance on a range
of measures. While the number of majority opinions doesn’t change, the number of discre-

177
Table A.7: Effect of Judicial Selection System on Judge Quality (Additional Outcomes)
Non-Partisan Judges Merit-Selected Merit-Selected
Relative to Partisan Judges Relative to Judges Relative to
Judges Partisan Judges Non-Partisan Judges
Outcome (1) (2) (3)

Number of Concurrences Written 0.22 -0.065 0.269+


(0.324) (0.0602) (0.141)
Number of Dissents Written 0.301+ -0.106 0.211
(0.162) (0.132) (0.247)

Negative Cites Per Opinion 0.0455** 0.0395** 0.133**


(0.0159) (0.0146) (0.0452)
Federal Circuit Cites Per Opinion 0.00357 0.0249+ 0.0677**
(0.0124) (0.0141) (0.0107)
Multiple-Use Cites Per Opinion -0.0439 0.0484* 0.202**
(0.0554) (0.0197) (0.0671)
Proportion of Cases Overruled 0.0171 0.00511 -0.00256
(0.0144) (0.00806) (0.00931)
Proportion of Cases Superseded by Statute 0.00881 0.00505 0.0365**
(0.00934) (0.00498) (0.0133)

Total Negative Cites 0.246** 0.0213 0.141


(0.0373) (0.0540) (0.110)
Total Federal Circuit Cites 0.0509 0.00275 0.224+
(0.0875) (0.0588) (0.124)
Total Multiple-Use Cites 0.000095 0.0148 0.162
(0.137) (0.0404) (0.181)
Cases Overruled 0.00862 -0.0263 -0.00949
(0.0542) (0.0493) (0.0616)
Cases Superseded by Statute 0.0473 0.0241 0.141
(0.0324) (0.0412) (0.0895)

178
Table A.8: Effect of Judicial Selection System on Judge Quality (All Years)
Non-Partisan Judges Merit-Selected Judges Merit-Selected Judges
Relative to Partisan Relative to Partisan Relative to Non-
Judges Judges Partisan Judges
Outcome (1) (2) (3)

Majority Opinions Written 0.149** -0.0925 -0.0976


(0.0451) (0.0894) (0.0783)
Discretionary Opinions Written 0.204 -0.173 0.119
(0.297) (0.119) (0.195)
Total Words Written 0.0873 -0.015 -0.0829
(0.0786) (0.0646) (0.141)

Length of Majority Opinion -0.0706+ 0.0899+ 0.05


(0.0390) (0.0484) (0.102)
Length of Table of Cases -0.143 0.124* 0.0534
(0.0925) (0.0575) (0.0903)

Positive Cites Per Opinion


-0.01 0.0362 0.125*
Distinguishing Cites Per Opinion (0.0570) (0.0379) (0.0602)
-0.00849 0.0668 0.123
Discuss Cites Per Opinion (0.0596) (0.0466) (0.0762)
0.022 0.0389 0.132*
Quoted Cites Per Opinion (0.0286) (0.0274) (0.0586)
-0.044 0.0478 0.171*
Out-of-State Cites Per Opinion (0.0517) (0.0368) (0.0775)
0.00122 0.0697** 0.139+
(0.0373) (0.0246) (0.0806)
Total Positive Cites
0.134 -0.0689 0.0256
Total Distinguishing Cites (0.0998) (0.0800) (0.140)
0.168 0.0345 0.0392
Total Discuss Cites (0.144) (0.0967) (0.179)
0.176* -0.0454 0.0427
Total Quoted Cites (0.0825) (0.0761) (0.131)
0.0725 -0.0232 0.0906
Total Out-of-State Cites (0.116) (0.0731) (0.146)
0.148+ 0.034 0.0888
(0.0781) (0.0709) (0.143)

179
Table A.9: Effect of Retention Process (Output and Effort)
Non-Partisan
Partisan Retention to Partisan Retention to Retention to
Non-Partisan Uncontested Uncontested
Retention Retention Retention
Outcome (1) (2) (3)

Majority Opinions Written -0.153+ 0.0602 -0.0899


(0.0852) (0.0590) (0.100)
Words in Majority Opinions -0.115+ 0.0274 -0.0363
(0.0587) (0.0418) (0.0713)
Cases Cited in Majority Opinions -0.155+ 0.033 0.0694
(0.0780) (0.0579) (0.0786)
Discretionary Opinions Written -0.101 0.0174 0.190*
(0.0738) (0.120) (0.0906)
Words in Discretionary Opinions 0.0974 -0.149 0.743
(0.299) (0.479) (0.459)
Cases Cited in Discretionary Opinions 0.116 -0.0221 0.43
(0.124) (0.278) (0.268)

Words Per Majority Opinion 0.0371 -0.0317 0.0516


(0.0832) (0.0376) (0.0682)
Cases Cited Per Majority Opinion -0.00619 -0.0272 0.155*
(0.0781) (0.0766) (0.0654)
Words Per Discretionary Opinion 0.219 -0.186 0.597
(0.279) (0.409) (0.461)
Cases Cited Per Discretionary Opinion 0.235* -0.0457 0.28
(0.0933) (0.202) (0.238)

Treated States 4 8 6
Treated Judges 25 65 35
N= 16,084 judge-years.Estimate of the average treatment effect of changing the judge retention system on incumbent
judges at the time of the reform. Regressions include a judge fixed effect, year fixed effect, and state trends. Standard
errors clustered by state in parentheses. + p < .1, * p < .05, ** p < .01.

180
Table A.10: Effect of Retention Process (Quality and Impact)
Non-Partisan Retention
Partisan Retention to Partisan Retention to to Uncontested
Non-Partisan Retention Uncontested Retention Retention
Outcome (1) (2) (3)

Positive Cites Per Opinion 0.0492 -0.0173 0.111*


(0.0841) (0.0716) (0.0551)
Distinguishing Cites Per Opinion 0.0328 -0.0745 0.304**
(0.0505) (0.0559) (0.0949)
Discuss Cites Per Opinion 0.0144 -0.0315 0.0915+
(0.0718) (0.0500) (0.0478)
Quoted Cites Per Opinion -0.00402 -0.0355 0.102*
(0.0786) (0.0474) (0.0481)
Out-of-State Cites Per Opinion -0.00613 -0.013 0.0768
(0.0461) (0.0357) (0.0680)

Total Positive Cites -0.108 0.0574 0.0143


(0.0966) (0.0867) (0.126)
Total Distinguishing Cites -0.0996 -0.0713 0.308*
(0.0692) (0.0833) (0.134)
Total Discuss Cites -0.140+ 0.0497 -0.00665
(0.0791) (0.0876) (0.125)
Total Quoted Cites -0.149 0.0288 0.00288
(0.0955) (0.0719) (0.120)
Total Out-of-State Cites -0.157 0.0629 0.00356
(0.105) (0.0706) (0.121)

Treated States 4 8 6
Treated Judges 25 65 35
N= 16,084 judge-years.Estimate of the average treatment effect of changing the judge retention system on incumbent
judges at the time of the reform. Regressions include a judge fixed effect, year fixed effect, and state trends. Standard
errors clustered by state in parentheses. + p < .1, * p < .05, ** p < .01.

181
Table A.11: Effect of Retention Process (Additional Outcomes)
Non-Partisan Retention
Partisan Retention to Partisan Retention to to Uncontested
Non-Partisan Retention Uncontested Retention Retention
Outcome (1) (2) (3)

Number of Concurrences Written -0.191** 0.00801 0.0557


(0.0624) (0.0567) (0.106)
Number of Dissents Written -0.0375 0.034 0.0386
(0.0731) (0.0772) (0.0617)

Negative Cites Per Opinion 0.0206 -0.0189 0.134**


(0.0137) (0.0300) (0.0474)
Federal Circuit Cites Per Opinion -0.0406** -0.00426 0.00738
(0.00522) (0.0113) (0.0221)
Multiple-Use Cites Per Opinion -0.0108 -0.0187 0.0694
(0.0294) (0.0235) (0.0501)
Proportion of Cases Overruled 0.0238* 0.00099 -0.00551
(0.0118) (0.00827) (0.00503)
Prop. Cases Superseded by Statute 0.00336 -0.00738* 0.0274**
(0.00398) (0.00365) (0.00710)

Total Negative Cites -0.00581 0.0377 0.307*


(0.0654) (0.0624) (0.124)
Total Federal Circuit Cites -0.168** 0.0341 -0.0255
(0.0465) (0.0575) (0.101)
Total Multiple-Use Cites -0.146+ 0.0569 0.0127
(0.0852) (0.0709) (0.103)
Cases Overruled 0.0992 0.0377 -0.136*
(0.0616) (0.0348) (0.0546)
Cases Superseded by Statute 0.0288 -0.0141 0.216**
(0.0395) (0.0255) (0.0715)

tionary opinions does increase. Caselaw research also increases. There are large positive
effects on the quality of opinions written, as reflected in positive cites, distinguishing cites,
discuss cites, and quoted cites. The total cites measures are more noisy and less significant,
but still positive.
In Table A.11, note that the effect of the non-partisan to uncontested reform has inter-
esting effects. Under the nonpartisan-to-uncontested reform, judges are overruled less often
by later courts, but they are overruled more often by the legislature. There are also higher
negative cites per opinion. This may be a sign of greater judicial independence.
Tables A.12 and A.13 report the election-selection interaction effects on individual per-
formance variables. Column 1a gives the baseline effect of non-partisan elections on non-
partisan selected judges. Column 1a has negative effects, which are similar to the estimates

182
Table A.12: Relative Election-Year Effect on Judges Selected by Different Processes (Output and Effort)
Effect of Non- Relative Effect Effect of Relative Effect Relative Effect
Partisan of Non-Partisan Uncontested of Uncontested of Uncontested
Elections on Elections on Elections on Elections on Elections on
Non-Partisan- Partisan- Merit-Selected Partisan- Non-Partisan-
Selected Judges Selected Judges Judges Selected Judges Selected Judges

Outcome (1a) (1b) (2a) (2b) (2c)

Majority Opinions Written -0.104 0.0996 0.0853+ -0.0994 0.0322


(0.0673) (0.129) (0.0459) (0.0732) (0.0954)
Words in Majority Opinions -0.122 0.103 0.123* -0.157* 0.0194
(0.0757) (0.137) (0.0495) (0.0701) (0.122)
Cases Cited in Majority Ops -0.112 0.164 0.131** -0.166* 0.096
(0.0788) (0.149) (0.0475) (0.0739) (0.134)
Disc. Opinions Written -0.0696+ -0.0107 0.0398 0.0684 0.106
(0.0364) (0.242) (0.0520) (0.0568) (0.126)
Words in Discretionary Ops -0.151 -0.135 0.372* -0.0276 -0.0784
(0.131) (0.172) (0.162) (0.198) (0.374)

183
Cases Cited in Discr. Ops -0.128** -0.107 0.1 -0.0966 -0.14
(0.0420) (0.130) (0.102) (0.178) (0.352)

Words Per Majority Opinion -0.0167 0.00379 0.0347* -0.0544* -0.0177


(0.0130) (0.0616) (0.0151) (0.0215) (0.0489)
Cases Cited Per Maj. Op -0.00733 0.0638 0.0429+ -0.0607+ 0.0592
(0.0201) (0.104) (0.0228) (0.0347) (0.0567)
Words Per Discy Opinion -0.0849 -0.127 0.367** -0.105 -0.188
(0.142) (0.243) (0.126) (0.159) (0.352)
Cases Cited Per Disc Op -0.0474 -0.101 0.102 -0.165 -0.228
(0.0623) (0.205) (0.0724) (0.123) (0.306)

Treated States 2 2 11 8 4
Treated Judges 7 4 119 51 10
Election Events 8 5 201 90 16
N= 16,084 judge-years. Standard errors clustered by state in parentheses. + p < .1, * p < .05, ** p < .01. Each
row is from a separate regression for the stated outcome variable. The estimated coefficient is a dummy
equaling one for years judge is facing reelection, interacted with a dummy for if the judge is selected under the
new selection system. Regressions include a state-year fixed effect, judge fixed effect, and the baseline
Table A.13: Relative Election-Year Effect on Judges Selected by Different Processes (Quality and Impact)
Effect of Non- Relative Effect Effect of Relative Effect Relative Effect
Partisan of Non-Partisan Uncontested of Uncontested of Uncontested
Elections on Elections on Elections on Elections on Elections on
Non-Partisan- Partisan- Merit-Selected Partisan- Non-Partisan-
Selected Judges Selected Judges Judges Selected Judges Selected Judges

Outcome (1a) (1b) (2a) (2b) (2c)

Positive Cites Per Opinion -0.0254 0.127 0.0214 -0.098 -0.0289


(0.0232) (0.120) (0.0172) (0.0813) (0.0483)
Disting. Cites Per Opinion -0.0336 0.153 0.0416 -0.0899 0.0549
(0.0277) (0.140) (0.0317) (0.0870) (0.119)
Discuss Cites Per Opinion -0.0285+ 0.0463 0.0216 -0.037 -0.0439
(0.0158) (0.138) (0.0166) (0.0411) (0.0393)
Quoted Cites Per Opinion -0.0255+ 0.101 0.022 -0.0759 -0.0701
(0.0144) (0.179) (0.0212) (0.0606) (0.0633)
Out-of-State Cites Per Op. -0.00167 0.229* 0.0402+ -0.0465 -0.12
(0.0244) (0.116) (0.0232) (0.0530) (0.101)

184
Total Positive Cites -0.164+ 0.258 0.110* -0.202* 0.00656
(0.0903) (0.200) (0.0519) (0.0814) (0.0951)
Total Distinguishing Cites -0.197* 0.364 0.132* -0.240+ 0.0794
(0.0925) (0.294) (0.0639) (0.127) (0.159)
Total Discuss Cites -0.164* 0.159 0.113* -0.147* -0.0152
(0.0732) (0.211) (0.0456) (0.0637) (0.0986)
Total Quoted Cites -0.166* 0.215 0.113** -0.192** -0.0455
(0.0742) (0.284) (0.0439) (0.0669) (0.0888)
Total Out-of-State Cites -0.115 0.434* 0.132** -0.147* -0.12
(0.0784) (0.171) (0.0478) (0.0722) (0.0944)

Treated States 2 2 11 8 4
Treated Judges 7 4 119 51 10
Election Events 8 5 201 90 16
N= 16,084 judge-years. Standard errors clustered by state in parentheses. + p < .1, * p < .05, ** p < .01. Each
row is from a separate regression for the stated outcome variable. The estimated coefficient is a dummy
equaling one for years judge is facing reelection, interacted with a dummy for if the judge is selected under the
new selection system. Regressions include a state-year fixed effect, judge fixed effect, and the baseline
for non-partisan elections in Table 5. Column 1b shows the relative effect of non-partisan
elections on partisan judges. These are mostly zeros, with a likely spurious positive effect
on out-of-state cites.
Column 2a gives the baseline effect of uncontested elections on merit-selected judges. This
column is similar to column 3 from Table 5, which gave the average effect of uncontested
elections. As with that table, there are actually positive effects estimated for the election-
cycle effect in an uncontested system. When one only looks at the merit-selected judges, the
effect is stronger.
Column 2b gives the relative effect of uncontested elections on partisan-selected judges.
There are significant negative effects. The coefficients are larger in absolute value than the
coefficients from Column 2a, meaning that uncontested elections have a negative effect on
performance for partisan-selected judges. This means that partisan judges respond in the
opposite direction due to elections compared to merit-selected judges.
Finally Column 2c gives the relative effect of uncontested elections for non-partisan-
selected judges, relative to merit judges. There aren’t any significant differences here.
Notice that the point estimates on out of state citations for partisan judges are very large.
When facing a competitive non-partisan election there is a 36% increase in citations, but an
18% decrease in an uncontested election. The pattern is consistent with the hypothesis that
partisan judges are more sensitive to incentives. This provides some direct evidence that the
characteristics of the judges vary by the selection procedure.
Table A.14 reports additional outcomes for the interacted study of incentives and selec-
tion. The estimates are similar to those in Table 8. Partisan-selected judges respond to
uncontested elections with a reduction in negative cites, circuit cites, and multiple-use cites.

A.2.3 Effect of Retention Process in Election Years

In this section we look at the retention reforms and the electoral cycle together. We look at
the interacted effect of a retention process reform in years that a judge is up for election, to

185
Table A.14: Relative Election-Year Effect on Judges Selected by Different Processes (Additional Outcomes)
Relative Effect of Relative Effect of Relative Effect of
Effect of Non- Effect of
Non-Partisan Uncontested Uncontested
Partisan Elections Uncontested
Elections on Elections on Elections on Non-
on Non-Partisan- Elections on Merit-
Partisan-Selected Partisan-Selected Partisan-Selected
Selected Judges Selected Judges
Judges Judges Judges
Outcome (1a) (1b) (2a) (2b) (2c)

Number of Concurrences Written -0.0196 -0.127* -0.0243 0.0542 0.132


(0.0301) (0.0578) (0.0455) (0.0657) (0.134)
Number of Dissents Written -0.0659* 0.155 0.0635+ -0.014 0.0122
(0.0262) (0.144) (0.0336) (0.0667) (0.146)

Negative Cites Per Opinion -0.0243* 0.0573 0.0254 -0.0438 0.0117


(0.0118) (0.0846) (0.0213) (0.0490) (0.0423)
Federal Circuit Cites Per Opinion -0.0058 0.0654 0.0116 -0.0611** -0.0302
(0.00591) (0.0707) (0.0111) (0.0231) (0.0497)

186
Multiple-Use Cites Per Opinion -0.0232* 0.136* 0.0107 -0.0303 0.00815
(0.0105) (0.0669) (0.0201) (0.0355) (0.0554)
Proportion of Cases Overruled 0.00626 0.00308 0.00515 -0.0001 0.00694
(0.00911) (0.0164) (0.0120) (0.0112) (0.0129)
Proportion of Cases Superseded by Statute -0.000715 -0.0111 0.0109 -0.0134 0.00431
(0.00357) (0.0151) (0.00779) (0.0135) (0.0200)

Total Negative Cites -0.176** 0.277 0.157* -0.276* -0.106


(0.0669) (0.231) (0.0754) (0.133) (0.163)
Total Federal Circuit Cites -0.0733 0.281 0.0836 -0.279** -0.106
(0.0509) (0.307) (0.0543) (0.0565) (0.117)
Total Multiple-Use Cites -0.153** 0.444** 0.0950+ -0.201** 0.011
(0.0512) (0.127) (0.0504) (0.0559) (0.123)
Cases Overruled -0.0493 0.0286 0.0496 -0.0431 0.15
(0.0393) (0.208) (0.0701) (0.0590) (0.0986)
Cases Superseded by Statute -0.0194 -0.232+ 0.0962* -0.0899 -0.0634
(0.0257) (0.141) (0.0460) (0.114) (0.180)
see whether the observed effect is due to changes in campaigning behavior.
This subsection describes the empirical strategy for looking at the election-year effects
of the retention process reforms. The regression approach combines the approach from
Subsection 6.1 on the electoral cycle with the approach from Subsection 6.2 on the retention
process reforms.
As before, we have the vector of election dummies Eist that equal one when judge i from
state s is up for election at year t, with a separate set of dummies for each retention system.
We have the vector of treatment indicators for the retention treatments, Rst , which go into
effect relative to the 10-year treatment window R̄st as described in Subsection 6.2.1. As in
Subsection 6.2, our regressions include year fixed effects, judge fixed effects, and state-specific
time trends.
0
The regressions include the full set of interactions Eist Rst . Specifically, we estimate

0 0
yist = TIMEt + JUDGEi + STATEs × t + R̄st ρ̄ + Eist Rst ρ + ist (A.2.1)

where again we cluster standard errors by state. The components of ρ include the effects
of the reform in non-election years (E = 0) as well as the effects in election years (E = 1).
Because the interactions are included, this is relative to the election-year average before the
reform.
An additional specification is reported in Appendix Table A.18. In that specification, the
election-year effect is measured relative to a baseline for all years after the reform (rather
than looking at non-election years and election years separately).
The results on the effect of the retention process reform on election and non-election
years are reported in Table 7. The “a” columns report the effect in non-election years. The
“b” columns report the effect in election years. We report the results for partisan to non-
partisan, and non-partisan to uncontested. The results from partisan to uncontested (they
are all zeros) are in Appendix Table A.17.

187
Table A.15: Effect of Retention Process in Election Years
Partisan Retention to Non-Partisan Non-Partisan Retention to
Retention Uncontested Retention
Non-Election Non-Election
Years Election Years Years Election Years
Outcome (1a) (1b) (2a) (2b)

Majority Opinions Written -0.137 -0.252** -0.0972 -0.0771


(0.0839) (0.0882) (0.0968) (0.0863)
Discretionary Opinions Written -0.0905 -0.0898 0.136+ 0.183*
(0.0639) (0.0644) (0.0762) (0.0754)
Total Words Written -0.0965+ -0.236 -0.022 0.0609
(0.0536) (0.142) (0.0694) (0.0687)

Length of Majority Opinion 0.0395 0.0187 0.047 0.0757


(0.0795) (0.116) (0.0678) (0.0805)
Length of Table of Cases -0.0165 0.0127 0.135* 0.201**
(0.0613) (0.144) (0.0604) (0.0736)

Positive Cites Per Opinion 0.0377 0.0596 0.0919+ 0.177*


(0.0739) (0.0754) (0.0461) (0.0672)
Distinguishing Cites Per Opinion 0.0149 0.0851 0.221** 0.291**
(0.0347) (0.0893) (0.0707) (0.0740)
Discuss Cites Per Opinion 0.00857 0.0184 0.0678+ 0.119**
(0.0530) (0.0686) (0.0382) (0.0419)
Quoted Cites Per Opinion -0.00617 -0.000161 0.0712+ 0.155**
(0.0600) (0.0768) (0.0392) (0.0543)
Out-of-State Cites Per Opinion -0.0147 0.0548 0.0513 0.0875
(0.0361) (0.0511) (0.0503) (0.0572)

Total Positive Cites -0.0914 -0.183* -0.00721 0.125


(0.101) (0.0858) (0.127) (0.103)
Total Distinguishing Cites -0.0882 -0.144 0.274* 0.403**
(0.0624) (0.182) (0.129) (0.119)
Total Discuss Cites -0.118 -0.252** -0.0272 0.0699
(0.0812) (0.0870) (0.124) (0.0972)
Total Quoted Cites -0.125 -0.290* -0.0251 0.12
(0.104) (0.124) (0.116) (0.0995)
Total Out-of-State Cites -0.139 -0.127 -0.0156 0.0527
(0.106) (0.114) (0.111) (0.114)
Treated States 4 6
Treated Judges 25 35
N= 16,084 judge-years..Each row is a separate regression. The “a” columns give the baseline effect of changing the judge
retention system on incumbent judges at the time of the reform, while the “b” columns give the additional effect during
judge election years. Regressions include a judge fixed effect, year fixed effect, and state trends. Standard errors clustered
by state in parentheses. + p < .1, * p < .05, ** p < .01.

188
Table A.16: Effect of Retention Process in Election Years (Additional Outcomes)
Partisan Retention to Non-Partisan Non-Partisan Retention to
Retention Uncontested Retention
Non-Election Non-Election
Years Election Years Years Election Years
Outcome (1a) (1b) (2a) (2b)

Number of Concurrences Written -0.184** -0.25 0.0619 0.0219


(0.0541) (0.174) (0.0998) (0.151)
Number of Dissents Written -0.0427 -0.00385 0.0388 0.0925+
(0.0781) (0.0564) (0.0622) (0.0544)

Negative Cites Per Opinion 0.0244+ 0.00355 0.133** 0.151**


(0.0130) (0.0389) (0.0490) (0.0403)
Federal Circuit Cites Per Opinion -0.0405** -0.0372 0.0106 -0.0234
(0.00662) (0.0254) (0.0209) (0.0324)
Multiple-Use Cites Per Opinion -0.016 0.0306 0.0645 0.0909+
(0.0291) (0.0458) (0.0502) (0.0527)
Proportion of Cases Overruled 0.0191* 0.0594 -0.00372 -0.0134
(0.00837) (0.0535) (0.00589) (0.00986)
Prop. Cases Superseded by Statute 0.00463 -0.00642 0.0257** 0.0387**
(0.00449) (0.00577) (0.00776) (0.00802)

Total Negative Cites 0.0127 -0.147 0.282* 0.338*


(0.0658) (0.131) (0.134) (0.135)
Total Federal Circuit Cites -0.151** -0.309* -0.0176 -0.177
(0.0490) (0.121) (0.0971) (0.158)
Total Multiple-Use Cites -0.129 -0.295* -0.0215 0.0576
(0.0852) (0.117) (0.109) (0.110)
Cases Overruled 0.0976+ 0.0904 -0.147* -0.123
(0.0568) (0.141) (0.0620) (0.0841)
Cases Superseded by Statute 0.0459 -0.125** 0.197** 0.248+
(0.0472) (0.0431) (0.0732) (0.129)

Columns 1a and 1b look at the election/non-election effects for the partisan to non-
partisan reform. These results bolster what was found in Subsection 6.2. We see significant
negative effects when we look at election years specifically. There is a decrease in majority
opinions written, total positive cites, total discuss cites, and total quote cites.
Columns 2a and 2b show the effect of moving from nonpartisan-to-uncontested reform.
The baseline effect is comparable to the estimate from Table 6, with clear improvements
in opinion quality. Moreover, as seen in Column 2b, this effect is even stronger in election
years.
The non-partisan-to-uncontested results show that there is both a durable effect across
the whole term, as well as an especially large effect from relieving electoral campaigning

189
demands. Overall, these results substantiate that the effects of these reforms are due in part
to the weakening of electoral demands.
Table A.17 shows the estimates for the partisan-to-uncontested reform by election year,
which were left out of Table 7. These are almost all zeros – there is no within-judge electoral
effect of this reform.
Table A.18 extends the analysis from Table 7 but looks at the relative effect of election
years to a baseline after the reform – rather than looking at the effects on non-election
years and election years separately. In this specification, we lose statistical significance in
the partisan-to-nonpartisan reform In the partisan to uncontested reform, we see a couple
of more positive effects of the reform in election years. In the nonpartisan-to-uncontested
meanwhile, the coefficients are positive in both columns, meaning that there is a statistically
significant additional positive effect during election years.
The partisan-to-non-partisan reform shows the stronger electoral demands from non-
partisan elections. Relative to the case before the reform (partisan elections), the election
years in non-partisan elections are more demanding and cause a larger reduction in per-
formance. This is consistent with those elections being more competitive. Particularly
noteworthy is the 25% decline in writing majority opinions in election years. This is consis-
tent with anecdotal evidence that in election years other judges on the court help reduce the
load on judges up for re-election. It seems that this pro-social behavior is more evident on
non-partisan benches.

190
Table A.17: Effect of Partisan-to-Uncontested Retention Reform in Election Years
Non-Election Non-Election
Years Election Years Years Election Years
Outcome (2a) (2b) Outcome (2a) (2b)

Majority Opinions Written 0.0531 0.117+ Number of Concurrences Written 0.00274 0.0428
(0.0611) (0.0651) (0.0556) (0.0691)
Discretionary Opinions Written -0.00694 0.168 Number of Dissents Written 0.0158 0.133
(0.0902) (0.117) (0.0723) (0.127)
Total Words Written 0.0062 0.0531 Negative Cites Per Opinion -0.019 -0.0258
(0.0441) (0.0728) (0.0303) (0.0364)
Length of Majority Opinion -0.0263 -0.061 Federal Circuit Cites Per Opinion -0.00115 -0.0177
(0.0385) (0.0385) (0.0113) (0.0172)
Length of Table of Cases -0.0166 -0.0668 Multiple-Use Cites Per Opinion -0.0144 -0.0365
(0.0711) (0.0783) (0.0245) (0.0326)
Positive Cites Per Opinion -0.00175 -0.102 Proportion of Cases Overruled 0.00146 -0.000169
(0.0601) (0.0889) (0.00867) (0.00919)

191
Distinguishing Cites Per Opinion -0.0525 -0.0956 Proportion of Cases Superseded by Statute -0.00836* -0.00244
(0.0412) (0.0690) (0.00331) (0.00714)
Discuss Cites Per Opinion -0.0155 -0.0605 Total Negative Cites 0.0432 0.0485
(0.0384) (0.0403) (0.0637) (0.151)
Quoted Cites Per Opinion -0.0175 -0.0688 Total Federal Circuit Cites 0.0453 0.0112
(0.0355) (0.0487) (0.0638) (0.0829)
Out-of-State Cites Per Opinion -0.00572 -0.0228 Total Multiple-Use Cites 0.0737 0.0236
(0.0273) (0.0463) (0.0800) (0.131)
Total Positive Cites 0.0732 -0.0116 Cases Overruled 0.0361 0.0765
(0.0957) (0.165) (0.0343) (0.0588)
Total Distinguishing Cites -0.048 -0.0783 Cases Superseded by Statute -0.0211 0.0548
(0.0809) (0.177) (0.0320) (0.0753)
Total Discuss Cites 0.0565 0.0327
(0.0906) (0.107)
Total Quoted Cites 0.0405 0.00651
(0.0734) (0.126)
Total Out-of-State Cites 0.0621 0.0966
(0.0756) (0.0978)
Table A.18: Relative Effect of Retention Process in Election Years
Partisan Retention to Non-Partisan Partisan Retention to Uncontested Non-Partisan Retention to
Retention Retention Uncontested Retention
Baseline Effect Election Years Baseline Effect Election Years Baseline Effect Election Years
Outcome (1a) (1b) (2a) (2b) (3a) (3b)

Majority Opinions Written -0.104 -0.0747 -0.0175 0.0957* -0.117 -0.0132


(0.0676) (0.166) (0.101) (0.0420) (0.126) (0.118)
Discretionary Opinions Written -0.0461 -0.0715 -0.0321 0.188* 0.104 0.0329
(0.0750) (0.132) (0.0860) (0.0723) (0.0849) (0.0921)
Total Words Written -0.0764+ -0.0483 -0.0689 0.0668 -0.0436 0.0504
(0.0442) (0.304) (0.0879) (0.0460) (0.0975) (0.126)
Length of Majority Opinion 0.0226 0.0274 -0.0242 -0.0412 0.0426 0.0318
(0.0657) (0.135) (0.0298) (0.0272) (0.0704) (0.0434)
Length of Table of Cases -0.0338 0.159 -0.0318 -0.0142 0.148* 0.0722+
(0.0510) (0.188) (0.0541) (0.0462) (0.0683) (0.0389)
Positive Cites Per Opinion 0.0328 0.0609 -0.0195 -0.145+ 0.110* 0.0720+
(0.0688) (0.0713) (0.0470) (0.0752) (0.0512) (0.0395)

192
Distinguishing Cites Per Opinion -0.00419 0.119 -0.0557 -0.0392 0.245** 0.105**
(0.0310) (0.123) (0.0358) (0.0556) (0.0782) (0.0279)
Discuss Cites Per Opinion 0.00112 0.02 -0.0241 -0.0195 0.0764+ 0.0936*
(0.0485) (0.0618) (0.0292) (0.0240) (0.0432) (0.0420)
Quoted Cites Per Opinion -0.0186 0.0406 -0.0252 -0.0659 0.0851+ 0.118**
(0.0504) (0.0694) (0.0274) (0.0397) (0.0477) (0.0370)
Out-of-State Cites Per Opinion -0.0137 0.0754 -0.0189 0.024 0.0573 0.032
(0.0353) (0.0814) (0.0223) (0.0405) (0.0533) (0.0275)
Total Positive Cites -0.0677 0.019 -0.0526 -0.105 0.00315 0.0707
(0.100) (0.250) (0.126) (0.156) (0.151) (0.134)
Total Distinguishing Cites -0.091 0.0472 -0.146+ 0.0413 0.315* 0.165
(0.0727) (0.437) (0.0815) (0.172) (0.124) (0.146)
Total Discuss Cites -0.0955 -0.0646 -0.0502 0.0502 -0.0272 0.11
(0.0898) (0.249) (0.104) (0.0628) (0.146) (0.140)
Total Quoted Cites -0.112 -0.0486 -0.0652 -0.0259 -0.0148 0.135
(0.0956) (0.282) (0.100) (0.107) (0.136) (0.143)
Total Out-of-State Cites -0.122 0.176 -0.0555 0.192** -0.0132 0.0043
(0.104) (0.291) (0.125) (0.0682) (0.133) (0.127)
Appendix B

Ch. 2: The political economy of tax laws


in the U.S. states

B.1 Vector Representation of Tokens and Documents

The algorithm for representing the linguistic meaning of words and phrases (tokens) as data
is called Word2Vec, a machine-learning model developed by Google researchers [Mikolov
et al., 2013]. The model is inspired by Harris’s distributional hypothesis that words in
similar contexts have similar meanings. Recent work in natural language processing has
made progress in representing tokens as dense vectors, culminating in the skip-gram with
negative sampling training method, better-known as Word2Vec.
Levy and Goldberg provide an accessible introduction to Word2Vec. The model assumes
a corpus of tokens (words and phrases) x1 , x2 , ..., xn , each drawn from vocabulary Vx . Each
token is observed in an associated context, which is an ordered set of the tokens appearing
in an l-sized window around the token: {xi−l , ..., xi−1 , xi+1 , ..., xi+l }. The standard window
used in NLP tasks is l = 5, which is used in my analysis. The vocabulary of contexts (a
very long list of all possible combinations of preceding and succeeding tokens in the corpus)
is given by Vc .

193
Each token x has an associated vector x ∈ Rd , where d is the dimensionality of the word
vector space. A standard choice in the NLP literature is d = 300, which also gives good
results in this dataset (although further experimentation is needed). Next, each context has
an associated vector c ∈ Rd , which plays a role in training the model but is not used further
in the analysis.
In Word2Vec, an adjacency matrix of collocations, where each entry in the matrix Aij is
the number of times token i appears within l tokens of token j. This high-dimension |W |×|W |
matrix is then factored into a pair of matrices of dimension |W | × |C| and |C| × |W |, where
the vector space C can be understood as the latent “contexts” of the token. Taking the first
matrix, we get a mapping vec between tokens and points in a |C|-dimensional vector space.
Tokens that are “similar” are located near each other in context space, in that they tend to
be surrounded by similar token sqeuences. By looking at the sequences of tokens that occur
before and after a particular token (the “contexts” of the token), Word2Vec “learns” which
other words in the vocabulary could fit into the same context.1
Word2Vec has several desirable features for this paper’s purposes. First, it can be trained
in eight hours on the corpus of statutes. Once trained, it can quickly compute similarity
statistics between phrases and documents. Importantly, the vector dimensions encode infor-
mation about the underlying relations between tokens. This is why analogies work:

vec[0 corpor_incom_tax0 ] − vec[0 corpor0 ] + vec[0 individu0 ] ≈ vec[0 individu_incom_tax0 ]

This example shows that the word dimensions are encoding semantic information about
types of taxes.
The Word2Vec model is implemented in Python’s gensim package. I train the model on
the processed statutes for 1963 through 2010 in random sequence. For parameters, I select
C = 300 dimensions. This is the default and works well on Wikipedia, which is much larger
1
See “A Word is Worth a Thousand Vectors” by Chris Moody (2015), available at multithreaded.
stitchfix.com/blog/2015/03/11/word-is-worth-a-thousand-vectors/.

194
than my dataset. I choose a context window of l =5, which means that Word2Vec learns
relations within five tokens of each other. This is also the default.
1
P
In the trained model, each token p is represented as a vector p~ = vec[p] = |words(p)| wi ∈words(p) vec[wi ],
with a value between -1 and 1 for each of d ∈ {1, 2, ..., 300} dimensions. While it may appear
that we lose a lot of information by taking the mean, recall that our tokens are already
filtered so as to be noun phrases and word phrases. “Similarity” between a token p and q is
computed using cosine similarity between the vectors for those tokens:

p~ · ~q
sim(~p, ~q) = .
||~p|| · ||~q||

This metric is between -1 and 1, with higher numbers meaning the tokens are more similar
[Levy et al., 2014]. For example, sim(0 democrat0 ,0 republican0 ) = 0.86.
In future work one could analyze the vectors directly rather than the tokens. It may turn
out that some dimensions encode important information for certain parts of tax law, such
as defining the base, enforcement, or the style of legal writing. A document (statute) can be
represented as a vector – as the mean or sum of the constituent phrase vectors. Then one
could measure the effect of treatments on vector dimensions rather than phrase frequencies.
In some unreported exploratory work I have found that some dimensions are correlated with
higher tax revenue across bases, for example, or with changes in political party. This may
provide better measures at the document level than using phrase frequencies.

B.2 Factor IV Approach

This appendix section describes the factor IV approach to estimating the first stage. As
discussed in the text, I obtained similar second-stage results using this approach (the Section
2.8 results looking at the effects of political control on language), but the out-of-sample PLS
prediction was worse. That said, a potential problem with Lasso is if the sparsity assumption
fails. This can occur if the true Γ is actually a dense matrix. Lasso will wrongly exclude

195
many elements of Γ. If the included instruments are correlated with the excluded ones, those
elements of Γ will also be inconsistent.
An alternative dimension reduction method that addresses this problem is PCA [Bai and
Ng, 2010]. PCA projects high-dimensional data down to a lower-dimensional space while
retaining as much information as possible. Formally, PCA finds the n × p projection matrix
Z̃ that solves
min ||Z̃Z − Z||2 ,

The columns of Z̃ are principal components, which are orthogonal to each other and are
ordered by their explanatory power for Z. Taking the first few components of Z̃ is a con-
venient way to reduce the dimensionality of Z while preserving as much information as
possible. Since each row in Z̃ is a linear combination of a row in Z, the reduced matrix
inherits any exogeneity properties of the original matrix. Moreover, the components in-
cluded in Z̃ are orthogonal to any excluded components, which solves the exclusion issue of
correlated instruments we faced with Lasso.2
In the baseline implementation, I included enough components to explain 90 percent of
the variance in Z. In the baseline specification this required 205 components. To select
among these components, I again used Lasso to estimate (2.6.4) but with the PCA compo-
nents Z̃ as the instruments rather than the original Z matrix.
Figure B.2.1 shows a heat map (bivariate histogram) in which each observation is a
phrase-component pair (i, j). The horizontal axis is the correlation between the instrument
phrase z i and the component z̃ j . The vertical axis is the t-statistic for the first-stage effect
of component z̃ j on endogenous phrase xi ; that is, the element γ̂ij in the matrix of first-
stage coefficients Γ̂. This shows that the components that are correlated with particular
instrument phrases also tend to have stronger effects on those same endogenous phrases.
This supports the idea that language diffusion is occurring through preference for phrases in
2
In the empirical analysis, PCA is implemented with Python’s scikit-learn package, using the truncated
singular value decomposition algorithm.

196
Figure B.2.1: Instrument Phrases Have a Stronger Effect on Own Endogenous Phrase

the same judicial circuit.

B.3 Example phrases with related court cases

A foremost issue in this paper is how tax code features are used to implement redistributive
fiscal policy. Text features that have the effect of broadening the income tax base would serve
to increase the progressivity of the tax code. If used preferentially by Democrats, that would
be consistent with Democrats using these phrases to implement a progressive redistributive
policy. Analysis of these phrases quickly procured two examples: “old age” and “fire-fighter.”
The term “old age” is used in income tax provisions related to age-related exemptions.
For example, 2005 Pa. ALS 40 states that “The term ’compensation’ shall not mean or
include: payments commonly recognized as old age or retirement benefits paid to persons
retired from service after reaching a specific age or after a stated period of employment.” In
this case the term is evidence of a deduction, but the 2SLS estimate for “old age” indicates
that it is associated with increased income tax revenues on average. Moreover, this phrase

197
is associated with Democrat political control. The Commonwealth Court of Pennsylvania
construed this clause in a 1987 opinion (Bickford v. Commonwealth, 111 Pa. Commonw.
246), finding that a pension plan from a private employer was not covered by this clause and
therefore was taxable. In this case the clause did not decrease revenues generated. However,
in Pugliese v. Township of Upper St. Clair, 660 A.2s 155 (1995), the same court held that
a similar corporate incentive plan (with a longer deferral) was exempt from taxation.
The term “fire-fighter” is also predicted to increase income tax revenue and is associated
with Democrats. An example of a statute where it may appear is 2006 Al. ALS 352,
providing that “the following exemptions from income taxation shall be allowed to every
individual resident taxpayer: The first $8,000 of any retirement compensation, retirement
allowances, pensions and annuities, or optional allowances, received by any eligible fire-
fighter.” An Alabama case construing this type of clause is Ex parte Melof, 735 So.2d 1172
(1999), wherein the Supreme Court of Alabama held that firefighters could be given special
tax treatment in spite of a state constitutional amendment forbidding special tax treatment
for public sector workers.
These cases are good examples of the indeterminacy and unpredictability of how statutory
language will be construed by courts. Before these cases were decided, a researcher interested
in coding the policies in these provisions would have had difficulty deciding where they would
apply. The phrases demonstrate that the machine learning method can effectively identify
revenue-relevant tax code language using a data-driven approach.

B.4 Decomposition of the party effect on tax revenues

This appendix uses the notation in the model (Subsection 2.3.2) to compute the share of
revenue changes due to party control that can be assigned to the various components. In
∂g ∂uj
particular, although ∂uj
and ∂D
cannot be estimated, the summation U can be computed

198
Table B.1: Decomposition of Party Control Effects on Government Revenue

Effect Component Notation Income Tax Sales Tax

Revenue ρg 0.046 -0.176

Tax Rate ρτ = ρg − ρ̃g 0.033 -0.019


Pp
Tax Code i=1 βi δi 0.144** -0.068*
Pp
Unobserved Policies U = ρg − ρτ − i=1 βi δi -0.131 -0.089
Decomposition of political control effects on government revenue, separately for Income Tax and Sales Tax. Units are in
standard deviations. Values computed in previous sections.

as
p
X
U = ρg − ρτ − β i δi ,
i=1

assuming the effects are separable. These estimates can be used to compute the relative
importance of the tax rate, the tax code, and other policies in the implementation of fiscal
policy in the U.S. states.
The estimates for ρg and ρτ were computed in Subsection 2.7.2. The language effect term
Pp
i=1 βi δi was computed in Subsection 2.8.1. That provides enough information to compute
U.
The estimates from the empirical section, along with their relation to U , are reported
in Table B.1. The table highlights that the tax code has a larger effect than the tax rate.
The best estimates for U are -0.131 and -0.089, respectively. These unobserved policies are
comparable in importance to the effect of the tax code. They suggest that the unobserved
policies implemented by Democrats – besides tax rates and the tax code – are associated
with reduced tax revenues for both income tax and sales tax.

199
B.5 Substituting Phrases to Increase Tax Revenues

In this appendix, I use the machinery to try to re-write statutes to increase tax revenues. I
iterate through phrases in statutes and find closely related words or phrases that are predicted
to increase tax capacity. These substitutions are credible because the predicted changes in
revenue are derived from the instrumental variables estimates described previously.
The method for phrase substitution works as follows. Consider a given document, which
is a list of phrases, indexed by p. For each p, search the nearby words in the Word2Vec
space. In these small clusters, the phrases q are closely related and sometimes synonymous.
I take the first-stage F-statistics, coefficients βq , and standard errors for each phrase in the
cluster. Then I analyze suggested replacements if there is an improvement in predicted tax
capacity.
To find possible replacements, I look for any q such that βq > βp , then make pair-wise
comparisons based on the phrase statistics. First, I filter out any q with weak F-statistics.
Next, let σp2 and σq2 equal the standard errors for βp and βq . I use a Wald test statistic to
compute whether they are significantly different:

βq − βp 2
W (p, q) = ( p 2 ).
σp + σq2

This test statistic follows an F -distribution. If I cannot reject the null that βq = βp , exclude
q from the list of possible replacements. If there are any phrases remaining that satisfy these
criteria, I choose q that results in the largest predicted improvement in tax capacity, that is,
the highest βq .
To illustrate this approach, I analyze the 4000 phrases in the vocabulary with the highest
cosine similarity to “tax.” For each phrase in this subset, I assess the twenty most similar
phrases. Of these twenty, I exclude any phrases with cosine similarity less than 0.5. I
further skip any potential replacement where the first-stage F-statistic for p or q is below 5.
Finally I exclude any proposed replacements where W (p, q) < 10. If all q are excluded, no

200
replacement is made. If any q are left, I select the one with the highest βq . This turns out
to be a relatively conservative specification, resulting in about 2% of phrases replaced.
Table 2 reports the set of proposed replacements for these phrases. I have roughly or-
ganized them into groups of related phrases. As can be seen immediately, many of these
replacements don’t make a lot of intuitive sense. Machine-learning methods are necessarily
imperfect and will pick up a lot of nonsensical relations. That said, there are some replace-
ments recommended here that deserve a closer look. As can be seen in the predicted revenue
change column, making these substitutions could result in significant increases in revenue.
In particular, the recommendation to replace “failure” with “such failure,” and “person or
firm” with “such person or firm,” both make intuitive sense from a statutory interpretation
perspective. By adding “such,” these replacements increase the clarity of a tax statute and
likely increase revenue by reducing avoidance.
The other replacements are not as intuitive but still deserve discussion. First, naturally
enough, there are a couple of accounting-related suggestions. “Submit report” and “certified
public accountant” are predicted to increase revenue, which is consistent with an effect of
better record-keeping [Gordon and Li, 2009].
Next there is a collection of phrases related to businesses and corporations. Anchoring
tax liabilities to the “principal place” of business rather than where the “business is located”
appears to increase revenue. For the insurance-related phrases, changing phrases about
insurance premiums to those about life credits seems to make a difference.
There are also suggestive phrases related to collections and enforcement. “Collect rev-
enue” is preferred to “collect rate,” while “amount of revenue” is preferred to “tax revenue.”
These might be better specifications of the collections process. The replacements for appeals
and penalties likely reflect that the structure of these statutes have an important impact on
collections.
In the debt category, the phrase “payment of interest” seems to matter a lot. This is likely
related to paying interest on delinquent taxes. Changing “rate of interest” to “maximum rate”

201
Phrase Replacement Rev. ($M) Phrase Replacement Rev. ($M)

Accounting, Business, Insurance, and Debt Dates

annual report submit report 0.87 annum cent annum 0.45


audit book certified public accountant 2.79 annum from date cent month 4.33
calendar import 0.44
business business state 2.88 day after receipt receipt of request 3.81
business is located principal place 4.24 expiration year expiration date 3.09
corporate corporate law 1.72 file file within day 2.51
corporate limit said city 0.87 final determination file within day 2.25
failure such failure 1.41 first day month date retire 2.38
file article certificate of incorporation 0.89 first month last day 1.47
incorporate corporate law 1.72 succeed calendar last day 3.99
operating business business 1.12 such estimate next fiscal year 2.37
person or firm such person or firm 2.67 sunday legal holiday following day 3.55
person or partnership such person or firm 2.79 thirty-first day june year 2.82
purpose author corporate purpose 3.28 thirty-first day december last day 1.54
twelfth twenty-seventh 1.17
bank corporate federal deposit insurance 2.13
inheritance tax executor 0.51 Amounts
insurance premium credit life 0.93
premium finance credit life 1.29 cent one-half cent 1.22
state unemployment unemployment trust 2.28 exceed sum exceed 0.71
per one-half 1.21
Collections and Enforcement proportion same ratio 2.56
proportion amount same proportion 2.42
anticipated revenue adopt budget 2.17 seven-tenth cent 1.06
collect rate collect revenue 3.20 three-fourth
subject tax tax 0.92
tax jurisdiction state tax 0.76 Local Issues
tax revenue amount revenue 3.25
total contribution contributor 0.94 additional levy author levy 3.07
certified state local district 2.26
appeal notice appeal 1.72 charge fee service 1.11
become delinquent enforce collect 1.69 county auditor county clerk 0.37
enforce other remedy 1.43 county general fund fund of county 5.06
fix penalty penalty violation 3.76 county township city village 1.3
same penalty fail neglect 4.21 fee be paid fee 3.86
gas water heat power 4.3
annual tax sufficient pay interest 4.12 hospital district director district 1.44
bond interest payment of interest 2.36 proposition majority voter 1.65
other obligations other obligations issued 1.49 question such proposition 1.39
person liability liability 1.27 record deed file for record 2.14
rate of interest maximum rate 1.10 record in office county record 1.85
rate per annum exceed forty year 2.75 referendum such referendum 1.67
sink payment of interest 2.06 royalty school land 1.14
sink fund payment of interest 1.45 said license payment fee 1.64
such installments equal installments 3.16 territory include territory 3.94
such rate exceed forty year 1.70 town such town 1.15
sufficient pay sufficient pay interest 5.87
Sales Tax
Miscellaneous
said vehicle motor 2.43
addition to power have power 3.00 tax stamp retail dealer 2.34
be in effect be effect 2.05 purchase price trade-in 1.65
code title annotated 0.50
not inconsistent not inconsistent herewith 4.44
nothing herein provided however nothing 2.85
.
List of phrases, proposed replacements, and predicted increase in revenue due to the replacement (in millions
of dollars). Top 4000 phrase that are most similar to “tax.”

Table B.2: Examples of Replaced Phrases

202
increases revenues, perhaps reflecting a higher rate of interest paid on delinquent taxes.
Another striking observation is the large number of suggested replacements for dates.
This emphasizes that the timing of tax obligations is an important tool for legal avoidance
[Slemrod and Bakija, 2008]. The technical phrasing of statutes is important for facilitating
this type of avoidance. Similarly, the technical phrasing related to amounts (e.g., “propor-
tion” versus “same ratio”) can have large impacts on tax liability.
The three other groups of phrases – Miscellaneous, Local Issues, and Sales Tax – are
perhaps less intuitive. As mentioned, this is a machine learning method that will produce
some results that are not useful. This emphasizes that any suggested replacements will
require verification by lawyers and policymakers.

203
Appendix C

Ch. 3: Property taxes and local labor


markets: Evidence from staggered
property reassessments

C.1 Model Appendix

This appendix provides details and proofs for the model propositions.

C.1.1 Firms

First-order conditions for the inputs k and l give

βB(τ )k β−1 lβ = ρ + τ

βB(τ )k β lβ−1 = w

which combine to
k w
= .
l ρ+τ

204
The derivative is
∂ kl w
=− ,
∂τ (ρ + τ )2

which is negative.
The profit equation in terms of labor:

w β 2β
max B(τ )( ) l − wl.
l ρ+τ

which can be solved to obtain

2βB(τ ) 1
l∗ = ( β 1−β
) 1−2β
(ρ + τ ) w

Differentiating for l gives

w β 2β−1
2βB(τ )( ) l =w
ρ+τ

and solving for l gives

w
l2β−1 = w β
2βB(τ )( ρ+τ )
(ρ + τ )β w1−β 2β−11
l = ( )
2βB(τ )
2βB(τ ) 1
l∗ = ( ) 1−2β
(ρ + τ )β w1−β

and correspondingly
2βB(τ ) 1
k∗ = ( ) 1−2β .
(ρ + τ )1−β wβ

The effect of property tax on the labor input is

∂l 1 2β 1 B(τ ) 1−2β2β B 0 (τ ) βB(τ )


=( )( 1−β ) 1−2β ( β
) [ β
− ]
∂τ 1 − 2β w (ρ + τ ) (ρ + τ ) (ρ + τ )β+1

205
This is positive when

B 0 (τ ) βB(τ )

(ρ + τ )β (ρ + τ )β+1
0
B (τ ) β

B(τ ) ρ+τ

Given the assumptions on B 0 (τ ), this equation has a unique cutoff τ . For example, assuming
B(τ ) = τ α for α ∈ (0, 1), we would have

ατ α−1 β
α
=
τ ρ+τ
α β
=
τ ρ+τ
αρ + ατ = βτ

(β − α)τ = αρ
αρ
τl∗ = .
β−α

Note that α < β is required in this specification to guarantee an interior solution.


The effect of property tax on the capital input is

∂k 1 2β 1 B(τ ) 2β B 0 (τ ) βB(τ )
=( )( β ) 1−2β ( 1−β
) 1−2β [
1−β
− ]
∂τ 1 − 2β w (ρ + τ ) (ρ + τ ) (ρ + τ )2−β

This is positive when

B 0 (τ ) βB(τ )
1−β

(ρ + τ ) (ρ + τ )2−β
B 0 (τ ) β

B(τ ) ρ+τ

which means that the effect of the property tax on capital moves in the same direction as
the effect on labor.

206
The optimal output in terms of the exogenous parameters is:

y(k, l) = B(τ )k β lβ
2βB(τ ) β 2βB(τ ) β
= B(τ )( β 1−β
) 1−2β ( 1−β β
) 1−2β
(ρ + τ ) w (ρ + τ ) w
B(τ ) 1
y∗ = βy ( β β
) 1−2β
(ρ + τ ) w


where βy = (2β) 1−2β .
The derivative with respect to property taxes is

∂y 1 β B 0 (τ ) 2β 1 1−2β
β β 1 1 1−2β
1−β
= βy ( ) 1−2β [ B(τ ) 1−2β ( ) −( )B(τ ) 1−2β ( ) ]
∂τ w 1 − 2β ρ+τ 1 − 2β ρ+τ

this is positive when

B 0 (τ ) 2β 1 1−2β
β β 1 1 1−2β
1−β
B(τ ) 1−2β ( ) ≥ ( )B(τ ) 1−2β ( )
1 − 2β ρ+τ 1 − 2β ρ+τ
1
B 0 (τ ) ≥ βB(τ )( )
ρ+τ
B 0 (τ ) β

B(τ ) ρ+τ

which gives the same result as with the input choices.


Firm profits are equal to

πj∗ = y ∗ − wl∗ − (ρ + κ)k ∗ + bj


2β 1 1 1 2βB(τ ) 1 2βB(τ ) 1
= (2β) 1−2β B(τ ) 1−2β ( β β
) 1−2β − w( β 1−β
) 1−2β − (ρ + κ)( 1−β β
) 1−2β + b
(ρ + τ ) w (ρ + τ ) w (ρ + τ ) w
1 2βB(τ ) 1−2β 1 2βB(τ ) 1−2β1 2βB(τ ) 1−2β 1
= (2β) 1−2β ( ) − ( ) − ( ) +b
(ρ + τ )β wβ (ρ + τ )β wβ (ρ + τ )β wβ
B(τ ) 1
= βπ ( ) 1−2β + b
j
(ρ + τ )β wβ

where
1 1
βπ = (2β) 1−2β ((2β) 1−2β − 1)

207
The zero-profit equilibrium condition means that the probability a firm locates in the city is
given by

B(τ ) 1 B(τ ) 1
Pr(βπ ( ) 1−2β + b > 0)
j = 1 − Pr(b j < −β π ( ) 1−2β )
(ρ + τ )β wβ (ρ + τ )β wβ
1 B(τ ) 1
E∗ = + ψβπ ( ) 1−2β
2 (ρ + τ )β wβ

Normalizing the total number of firms to one, this gives the number of local business estab-
lishments.
The effect of the property tax on the number of firms is

∂E 1 B(τ ) 2β B 0 (τ ) B(τ )
= ψβπ ( )( β β
) 1−2β [
β β
−β ]
∂τ 1 − 2β (ρ + τ ) w (ρ + τ ) w (ρ + τ )1+β wβ

∂y
The term in brackets is the same as in ∂τ
, so again this is positive when

B 0 (τ ) β
≥ .
B(τ ) ρ+τ

C.1.2 Households

The probability that person i lives in this city is This means that the population is

N = Pr(ui > ū)

= Pr(A(τ ) + w − (1 + τ )h + ai > ū)

= Pr(ai > ū − A(τ ) − w + (1 + τ )h)


1 A(τ ) + w − (1 + τ )h − ū
= +
2 φ
A(τ ) + w − (1 + τ )h
= N0 +
φ

where
1 ū
N0 = −
2 φ

208
Housing prices are determined by equating housing supply and housing demand

φσh = φN0 + A(τ ) + w − (1 + τ )h

(φσ + 1 + τ )h = φN0 + A(τ ) + w


φN0 + A(τ ) + w
h =
(φσ + 1 + τ )

C.1.3 Feedback to the Labor Market

So far I have assumed that there is no feedback between the housing market and labor
market. This is justified when firms can hire freely from outside the jurisdiction. Relaxing
these assumptions does not change the results substantially. The main difference is that
changes in the property tax could affect the wage. And it could either increase or decrease
the wage depending on the status quo, the same as the other propositions. The result on
population would be strengthened if the property tax also increased the wage. However, the
effect on number of firms and on employment would be weakened if the additional marginal
workers had to be paid a higher wage.

C.2 Tax changes with no revenue changes

To complement the main analysis, I report estimates from New Jersey on the effects of
property taxes due to assessment spillovers. The key contribution of this data is that these
property tax changes do not affect revenue – they only affect how county taxes are allocated
across constituent municipalities. This means there is no mechanism, such as more pub-
lic goods or reduced distortions from other taxes, that could result in improved economic
outcomes.

209
C.2.1 Tax Spillovers

In New Jersey, county taxes (and consolidated school district taxes) are allocated across con-
stituent municipalities proportionally to their total assessed property values. For example,
if Town A revalues its properties and they decrease, the county taxes paid by Towns B and
C (in the same county) will increase.
To illustrate the situation, consider a county with three towns, Maplewood, Newark,
and Orange. Assume that these towns have 20%, 50%, and 30% of the property values,
respectively. Let’s say that Newark reassesses properties and their assessed value falls by
10%. This means that Newark’s share of the county levy falls by 50% × 10% = 5%. Note
that Maplewood’s share of the non-Newark property value is 20%/50% = 40%. Therefore
Maplewood’s share of compensating for the reduced total county property valuation is 40%×
5% = 2%. Because Maplewood initially paid for 20% of the levy, the relative increase in
Maplewood’s county taxes is 2%/20% = 10%. Originally, Maplewood property taxes went
25% to the municipal government, 50% to the school district, and 25% to the county. So the
increase in the Maplewood property tax because of the Newark revaluation is 10% of 25%,
or 2.5%. Importantly, these property tax shifts do not affect revenues or expenditures for
any government entities (unlike the tax cap overrides); they only affect the distribution of
costs across towns in the county.

C.2.2 Econometric Framework

More formally, consider a county with a set of municipalities i that have to share a county
P
tax based on their property value vi . Let V = vi be the total valuation. The propor-
tional change in the county tax TiC for town i due to changes in other towns’ valuations is
approximated by
X vj vi
∆TiC = − ∆vj ×
j6=i
V V − vj

210
Let ∆Ti be the proportional change in total property taxes paid in i, and let Ci be the
proportion of property taxes going to the county. Then the proportional change in property
taxes due to changes in other town valuations is given by

∆Ti = ∆TiC Ci

Now I construct an instrument for property taxes Zict , where I have county c and year t.
When constructing the instrument I use the levels vi , V , Ti , and Ci from the previous year.
The terms ∆vjct give the change between last year and this year. Therefore the formula for
the instrument is

C
X vjct−1 vict−1
Zict = ∆Tict Cict−1 = − ∆vjct × Cict−1 .
j6=i
Vct−1 Vct−1 − vjct−1

I estimate the first-differenced second-stage equation:

0
∆Yict = ∆αict + ρ∆Tict + ∆Xict β + ∆ict

where Yict is a log outcome variable, αict may include fixed effects or trends, and Xict includes
other time-varying controls. Since Yict is in logs and ∆Tict gives the proportional change, the
estimate for ρ̂ can be interpreted as the elasticity of Yict with respect to the property tax.
The first stage is
0
∆Tict = ∆αict + πZict + ∆Xict β + ηict .

The exogeneity assumption is that Zict is uncorrelated with ∆ict . When Town A revalues,
the assessed property values converge to market values, spilling over into Town B’s county
tax payments.

211
Table C.1: First Stage: New Jersey Analysis

Average Residential Tax Average Business Tax

Coeff. on Zict 0.0671* 0.1580**


(0.0334) (0.0564)

N 9018 7577
Clusters 21 21
1st-Stage F-stat 6.805 15.16
Effect of the spillover instrument on the log average property tax bill, first-differenced by municipality. Regression includes a year fixed effect.
Standard errors in parentheses, clustered by county. + p < .1, * p < .05, ** p < .01.

C.2.3 Results

The first stage statistics for tax spillovers are presented in Table C.1. The F-statistics are
strong for a variety of specifications of the estimating equation. The spillover instrument has
a statistically significant effect on the property tax bill. Table C.2 reports the IV estimates
for the effects of tax spillovers on the local labor market. Relative to the effects from
the assessment cycle, these are small and mostly insignificant. Column 4 is the preferred
specification that uses the spillover instrument while including county-year fixed effects.
While most the effects are insignificant, there is a statistically significant negative effect of
higher property taxes on the number of business establishments. A 10 percent increase in
business property taxes due to spillovers is associated with a 5.5 percent reduction in the
number of businesses. The other measures are not significant, but they all have negative
coefficients, consistent with the same idea. When higher property taxes are not accompanied
by benefits, they reduce performance in the labor market. This is similar to the result in
Giroud and Rauh [2015], finding an elasticity of -.4 for that state corporate taxes, which
cause business establishments to locate in other states.

212
Table C.2: Effect of tax changes due to spillovers

OLS IV
(1) (2) (3)
Effect of Residential Tax
Population 0.0390 -0.014** -0.031
(0.170) (0.0024) (0.353)
Effect of Business Tax
Establishments 0.616** 0.989 -0.567**
(0.117) (0.577) (0.182)
Employment 1.060** -0.0152 -0.250
(0.134) (0.0366) (0.177)
Wages 0.204** -0.0101 -0.074
(0.0316) (0.0111) (0.202)
First-Stage F-stat 15.17
County-Year FE’s X X X
Town FD X X
Endogenous variable: log property tax collections. Instrument: Tax spillover instrument. Standard errors in parentheses,
clustered by county, + p < .1, * p < .05, ** p < .01. 4696 town-years, 545 towns, 21 counties.

213

Das könnte Ihnen auch gefallen