Sie sind auf Seite 1von 22

555751

research-article2014
AASXXX10.1177/0095399714555751Administration & SocietyDesouza and Jacob

Article
Administration & Society
2017, Vol. 49(7) 10431064
Big Data in the Public The Author(s) 2014
DOI: 10.1177/0095399714555751
Sector: Lessons for journals.sagepub.com/home/aas

Practitioners and
Scholars

Kevin C. Desouza1 and Benoy Jacob2

Abstract
In this essay, we consider the role of Big Data in the public sector. Motivating
our work is the recognition that Big Data is still in its infancy and many
important questions regarding the true value of Big Data remain unanswered.
The question we consider is as follows: What are the limits, or potential, of
Big Data in the public sector? By reviewing the literature and summarizing
insights from a series of interviews from public sector Chief Information
Officers (CIOs), we offer a scholarly foundation for both practitioners and
researchers interested in understanding Big Data in the public sector.

Keywords
big data, public organizations, public management, policy analysis

The amount of data in our world has been exploding, and analyzing large
datasetsso-called Big Datawill become a key basis of competition,
underpinning new waves of productivity growth, innovation, and consumer
surplus.
McKinsey Global Institute (2010)

1Arizona State University, Phoenix, AZ, USA


2University of Colorado, Denver, CO, USA

Corresponding Author:
Kevin C. Desouza, Arizona State University, 411 N. Central Ave., M/C 3520, Suite #750,
Phoenix, AZ 85004-0685, USA.
Email: kev.desouza@gmail.com
1044 Administration & Society 49(7)

The era of Big Data has begun. Computer scientists, physicists, economists,
mathematicians, political scientists, bit-informaticists, sociologists, and many
others are clamoring for access to the massive quantities of information
produced by and about people, things and their interactions.

Boyd and Crawford (2012)

Big Data is indeed a Big Deal.

Dr. John Holdren (2012; Director of the White House Office of Science and
Technology Policy)

Introduction
As suggested in the introductory quotes, there is an increasingly popular per-
ception that Big Data holds vast potential for improving the decision-making
processes of both public and private organizations. In the hopes of solving
previously intractable problems, scholars, analysts, and entrepreneurs, from a
wide range of fields, are actively pursuing novel approaches to mining the
digital traces and deposit data that comprise Big Data (Boyd & Crawford,
2012). Against this emerging backdrop, policymakers, public managers, and
citizens have started to consider the ways in which Big Data can be used to
improve public sector outcomes, that is, public policies, programs, and dem-
ocratic processes.
Several Big Data initiatives have recently emerged in the public sector.
For example, in March of 2012, the Obama Administration put forward the
Big Data Research and Development Initiative. The objective of this initia-
tive was to understand the technologies needed to manipulate and mine mas-
sive amounts of information; apply that knowledge to other scientific fields
as well as address the national goals in the areas of health energy defense,
education and research (Mervis, 2012, p. 22). Big Data efforts have also
been initiated at other levels of government. For example, a host of munici-
palities have created open data platforms and held civic hackathons to
engage citizens with public data. Several novel applications have emerged
from these efforts addressing a wide range of local issues such as providing
information on blighted properties, identifying local resources for under-
served citizens, and helping parents access information on local schools.1
Simply stated, it appears that Big Data can, indeed, provide the public sec-
tor with a powerful arsenal of strategies and techniques for boosting produc-
tivity and achieving higher levels of efficiency and effectiveness (Manyika
et al., 2011, p. 54). That said, Big Data is still in its infancy and many impor-
tant questions regarding the true value of Big Data remain unanswered (Boyd
Desouza and Jacob 1045

& Crawford, 2012; Desouza, 2014). Indeed, observers have noted that Big
Data solutions are being promoted as a way to address public issues but, with
little consideration of how, where, and when it is most likely to be success-
ful.2 Thus, it appears that there is, at least some, tension between the promise
of Big Data and the reality.3 The question at hand then is, What are the lim-
its, or potential, of Big Data in the public sector?
In this article, we begin to address this question by reviewing the nascent
Big Data literature as it pertains to the management of public organizations.
The insights we draw are further informed by findings from a recent survey
of Chief Information Officers (CIOs) in different public organizations.4 As
such, we provide a scholarly foundation for both practitioners and research-
ers interested in understanding Big Data in the public sector.
Following this introduction, our article is organized into five sections. The
next four sections summarize key themes from the Big Data literature and
consider the implications of each for public organizations; in particular, the
bounds of Big Data, governance and privacy, decision-making, and the end
of theory. The final section offers a short summary of lessons for practitio-
ners and some thoughts on potential research directions for scholars.

The Bounds of Big Data


As a relatively new phenomenon, much of the literature on Big Data focuses
on defining the bounds of Big Data. That is, what is Big Data and what
does it mean to operate in a Big Data environment. Despite the ubiquity of
the term, Big Data is a difficult term to define (Franks, 2012; Laney, 2001;
Manyika et al., 2011). There is, however, some consensus among scholars
and practitioners that four factors characterize Big Datavolume, velocity,
variety, and complexity.5 More than just simple semantics, these characteris-
tics have potentially important implications for management practices.

Big Data is just too big for us. Where . . . big data begin[s] and end[s] is not
known. I have been struggling to identify digestible bites for us to take to move
on big data . . ..

First, at its core, Big Data must clearly be big. Big Data datasets are
beyond the ability of typical database software tools to capture, store, man-
age and analyze (Franks, 2012, p. 4).6 Thus, Big Data, in terms of volume,
is a function of the underlying and pre-existing capacity of an organization to
collect, store, and analyze its data. This definition suggests that Big Data is,
in terms of volume, a moving target. For example, household demographics
that were once difficult to manage now fit on a thumb drive and can be ana-
lyzed by a low-end laptop (Franks, 2012, p. 24).7
1046 Administration & Society 49(7)

The second defining characteristic of Big Data is its velocity. This refers
to the speed at which data are being created and stored, and their associated
rates of retrieval (Kaisler, Armour, Espinosa, & Money, 2013). Much like the
volume of data, however, there is no established benchmark by which to con-
sider when data velocity meets a Big Data threshold. Rather, the salient issue
is that the data are being created at historically fast rates. An example of the
current velocity of data is provided by Mayer-Schonberger and Cukier
(2013):

Google processes more than 24 petabytes of data per day, a volume that is
thousands of times the quantity of all printed material in the U.S. Library of
Congress. Facebook, a company that didnt exist a decade ago, gets more than
10 million new photos uploaded every hour. Facebook members click a like
button or leave a comment nearly three billion times a day, creating a digital
trail that the company can mine to learn about users preferences. Meanwhile,
the 800 million monthly users of Googles YouTube service upload over an
hour of video every second. The number of messages on Twitter grows at
around 200 percent a year and by 2012 exceeded 400 million tweets a day.
(p. 8)

A third defining characteristic of Big Data is its variety. Big Data is com-
prised of data in a wide range of forms, including text, images, and videos.
Generally speaking, then, it will include data that are structured, semi-struc-
tured, or unstructured. Structured data refer to data that have an organized
structure and are, thus, clearly identifiable. A simple example would be a
database with specific information that is stored in columns and rows. Semi-
structured data do not conform to a formal structure per se; however, it con-
tains tags that help separate the data records or fields. For example, data in
many bibliographical software programs reflect semi-structured data. That is,
the file is composed of records, but the structure is not regular in the sense
that fields may be missing or be comprised of more open formats, such as
a Notes section. Finally, unstructured data, as its name implies, have no
identifiable structure. Examples of unstructured data, then, include the fol-
lowing: text messages, photos, videos, and audio files.
As datasets become increasingly complexfrom structured to unstruc-
turedthe processing and analytical capabilities required to collect, manage,
and analyze the data increases significantly. Thus, a better understanding of
the defining characteristics of an organizations real, or potential, data reposi-
tories offers insights into the level and types of investments needed.
The final defining characteristic of Big Data is its complexitythe degree
to which the data are interconnected. Many of the novel applications and/or
insights that have emerged from Big Data applications are a result of
Desouza and Jacob 1047

Small Big Data


Data

Low volume High volume High volume High volume High volume
Low velocity Low velocity High velocity High velocity High velocity
Low variety Low variety Low variety High variety High variety
Low complexity Low complexity Low complexity Low complexity High complexity

Example: Example: Example: Example: Example:


Land Use Census Data Twier Data-sets Linked
Data for Data or with data-sets
Small City Video
different with
Feeds
structures different
structures

Figure 1. Data continuum - Volume, velocity, variety and complexity.

connecting otherwise unrelated datasets. One oft-cited example is the joint


effort between Google and the Center for Disease Control (CDC). In this
case, Google was able to connect their database of search termsentered
in their search enginesuch as cold medicine and flu symptoms, with
the CDCs data on the H1N1 virus. In doing so, analysts were able to predict
the spread of the H1N1 virus by connecting two previously disconnected
datasets.8
Given that many public organizations are in the nascent stages of imple-
menting data-based decision processes, the characteristics of an organiza-
tions data provide a simple framework for assessing the potential data needs
and subsequent investments of the organization. More precisely, if we con-
sider the data environment as varying along a data continuum (Figure 1)9
whereby Big Data reflects one extremein which the volume, velocity,
variety, and complexity of the data are very highwe can better appreciate
the potential data infrastructure and subsequent investment that an organiza-
tion requires.
First, consider organizations at the small data end of the data spec-
trumwhere data are characterized as low in terms of volume, velocity,
variety, and complexity. Such organizations must think about data differently
1048 Administration & Society 49(7)

than their bigger data counterparts. More precisely, at the low-end of the
data spectrumby definitionthe organization is unlikely to have a great
deal of data to begin with (low volume and low velocity). Moreover, relevant
data are unlikely to be accessible through complementary organizations (low
complexity). Such an environment is not unusual for contemporary public
organizations. First, many public organizations have yet to adoptfor good
reasonsdata-based decision processes. In addition, the nature of many pub-
lic programs does not lend them to the creation/collection of reliable data. In
this context, the issue at hand will be to first determine what data, if any, will
improve programmatic outcomes. To the degree that outcomes could be
improved, the primary investment will be to increase the volume and velocity
of data. That is, to generate appropriate data (increased volume) and then to
ensure that the data are collected in an ongoing fashion (increased velocity).
In contrast to small data organizations, many public organizations have
spent a great deal of time generating, collecting, and storing data. These pub-
lic organizations are likely to be characterized by higher volumes of data,
often spread across multiple departments. In these contexts, the primary
question will be as follows: How can existing data resources be better
employed to improve programmatic outcomes? For example, one high visi-
bility Big Data application in the public sector is the Office of Policy and
Strategic Planning in the Office of the Mayor of New York City. This office
employs a small group of analysts that mine data from approximately 60 dif-
ferent city agencies to address a host of issues, including building and devel-
opment issues, infrastructure problems, the selling of bootleg cigarettes, and
the flipping of business licenses (Howard, 2012). This office connects,
and then mines, a host of existing datasets from different agencies within
the city. Simply stated, this office leverages existing analytical capabilities by
bringing together otherwise unconnected datasets to find correlations that
help them refocus their programmatic efforts in more efficient ways. A tan-
gible example offers some additional clarity.
Every year, New York City receives approximately 20,000 complaints for
illegal conversions. That is, where an apartment or house is zoned to
accommodate a certain number of people but is accommodating many more.
Given the potential safety problems associated with this issue, it is critical
that the City be able to identify illegal conversions before a problem arises.
Historically, the only way to address this problem was to investigate com-
plaints. Such a process is idiosyncratic at best; with only about 200 inspectors
to address the complaints, the City had been unable to get ahead of the
problem. By compiling a wide range of information such as property tax,
building structure, and age of the building, the data analytics team found a
curious correlation between illegal conversions and a particular building
Desouza and Jacob 1049

characteristic. This allowed inspectors to proactively inspect buildings with


this characteristic. This led to a sharp increase in the number of illegal con-
versions that the City was able to fix.10
In this example, the data being used arefrom the perspective of Figure
1not quite Big Data. It does, however, reflect bigger data: high volume
(large datasets)11 and high complexity (high interconnectedness) but rela-
tively low velocity and variety. In this context, which characterizes many
public organizations, the practical issue is to leverage existing resources to
take full advantage of existing data. This idea of leveraging existing data was
evident throughout the CIO interviews. Indeed, many of the CIOs were cog-
nizant of the fact that they actually collect and maintain large stores of data.
But the analytical potential of these data had yet to be fully realized. As noted
by one CIO:

We need to focus on analyzing the data we currently have stored [in our
systems]. My guess is that we only analyze about 30% of it . . . there is a huge
opportunity for us to work on the rest [of the data] and create value . . ..

Finally, at one end of the Big Data spectrum are those organizations that
use data that are characterized by high volume, velocity, variety, and com-
plexity. While there are, likely, few examples of truly Big Data in the pub-
lic sector there are some. For example, the Los Angeles Police Departments
Real-Time Analysis and Critical Response Division, in collaboration with
researchers from University of California, Los Angeles (UCLA), uses both
historical and real-time data (that includes live feeds of city and traffic cam-
eras), to predict where future crime might occur. This division provides
real time investigative information to officers and detectives throughout the
city and region.12 These data allow the Los Angeles Police Department
(LAPD) to concentrate resources in geographically defined areas. The data
used in this case provide a rare, but important, example of how Big Data
high volume, velocity, variety, and complexitycan support public efforts.
The issue at hand, however, is investment and management issues involved
with this type of data are different from the issues when dealing with smaller
types of data. In particular, to recognize data in its unstructured form and then
to understand how to connect it to more conventional forms of data. In this
particular case, the LAPD had to first recognize that live feeds and video-
streams are best thought of as dataand that these data could be con-
nected, through geocoding, to other data.
This discussion, of the characteristics of data, suggests insights for both
practitioners and scholars. First, the primary insight is that large efficiencies
can be achieved through analytics by simply recognizing the type of data that
1050 Administration & Society 49(7)

do, or are likely to, characterize an organization. Indeed, many CIOs have
already recognized the importance of appreciating their organizations exist-
ing data resources. This ideathat public investments for data should begin
with an assessment of existing data reservoirssuggests a first step for
scholarship on Big Data in the public sector. More precisely, an important
first step in this research agenda should be to survey and assess the types of
data that public organizations are collecting. This simple, albeit difficult, task
would provide the foundation to consider several important questions. For
example, what are the characteristics of organizations that are most likely to
have and benefit from Big Data? What types of organizations are mostly
characterized by small data? And, are different types of data being used to
support different types of decision processes?
A second insight to draw from this discussion, particularly when consider-
ing organizations that have more than small data, is that maximizing the
analytical power of existing data will likely require that it be considered in
relation to data in other parts of the organization. Increasing the data com-
plexity in this wayby connecting across departmentsposes unique chal-
lenges. This issue of collaboration, and its importance, is further developed in
the next section on governance and privacy.

Governance and Privacy


As described above, one of the defining characteristics of Big Data is its
complexity. That is, the degree to which an organizations data are drawn
from, and connected to, data in other departments and other organizations. As
such, a large portion of the Big Data literature is focused on understanding
the networked nature of Big Data and the challenges it presents. Two key
issues emerge from this literaturegovernance and personal privacy.
First, while an extensive body of work has developed that explores the
issue of collaboration, particularly in the public sector, very little of this work
has focused on the issue of collaboration with respect to data. This omission
is non-trivial. Data present unique challenges for collaborative forms of gov-
ernance. For example, even in cases where public organizations collect sig-
nificant amounts of data, it tends to be fragmented. More precisely, public
agencies often operate in silos when it comes to their information technolo-
giesthere is limited, if any, interoperability among information systems
used across agencies. In and of itself, this data fragmentation is not problem-
atic. However, as it relates to the idea of creating and leveraging big(ger) data
fragmentation inhibits the integration of individual datasets into big data.
Coordinating between these different silos is difficult and certainly not
costless. Relative to most collaborative efforts, the coordination costs
Desouza and Jacob 1051

associated with data-sharing may be greater than other types of collaborative


endeavors in the public sector.13
The importance and difficulty of governing data is a major theme to
emerge in the CIO interviews. Indeed, many CIOs stated that poor data gov-
ernance was a significant factor in limiting their efforts to pursue Big Data.
For example, the CIOs noted that because data are localized within particu-
lar departments and agencies, reconciling data across systems is challenging.
In addition, it can be difficult to initiate collaborative efforts that involved
sharing data, because there is limited guidance in terms of policy and legal
frameworks (Desouza, 2014). This second issue reflects an important theme
in Big Data literature, more generally, that is, the implications of Big Data for
individual privacy.
Much of the power of Big Data arises because of how it connects and finds
correlations between previously disparate sets of data. This dimension of Big
Data has led to concerns about privacy, because these connections can lead to
insights about an individual that (s)he did not consent to. Consider the fol-
lowing case: public agencies in New York came under critical scrutiny for
public disclosure for an uncalculated, and some might argue an emotional,
response to a real-time situation. In the wake of the Connecticut shooting
incident, a group of researchers obtained through the Freedom of Information
Act information regarding gun owners living in the suburbs of Westchester,
Rockland, and Putnam counties in New York. In addition to publishing an
article about the licensed gun owners in the neighborhood, the authors also
published an interactive visual map that provides information about gun
owners names and addresses (Worley, 2012). The information is published
with an intention to provide open knowledge about an individuals posses-
sion of arms, but, at the same time, the information presented in the article
can assist criminals. Criminals can use this information to target home own-
ers who do not own guns or target homes to steal guns to make lucrative
business through the sale of illegal stolen guns (Mackey, 2013). The issue of
privacy is particularly attenuated when we consider Big Data efforts in the
public sector.
A citizens right to information about the government and how its deci-
sions and processes might affect his or her personal interest is considered an
essential value of democratic societies (Galnoor, 1975; Piotrowski &
Rosenbloom, 2002). Responding to specific instruction from President
Obama, the Office of Management and Budget issued an Open Government
Directive in December 2009. The stated objective of this directive was to
direct executive departments and agencies to take specific actions to imple-
ment the principles of transparency, participation, and collaboration, which
are argued to be the cornerstone of an open government. Fundamental to
1052 Administration & Society 49(7)

this directive is the development, maintenance, and accessibility of public


data. For example, the directive makes clear that data should be made avail-
able online and in open formats. This directive sets the stage for much of
the current interest in public sector Big Data. That is, as more public sector
data become openly available, it seems to invite the use of Big Data analyti-
cal tools.14 That said, in both the public and private sectors, these novel data
connections are increasingly concerning because it is unclear what might be
revealed about particular individuals. By initiating open data programs, gov-
ernment officials are hoping that Big Data techniques can be used to improve
such accountability. In particular, because many of the policy domains that
generate data in the public sector are also governed by a host of privacy regu-
lations (e.g., Health Insurance Portability and Accountability Act [HIPAA] in
the health care domain or micro-level education data).
In sum, the complexity of Big Data datasets introduces two management
challengesgovernance and privacy. That said, there is a lengthy literature
on public sector collaboration and governance. While this literature points to
several management issues that will support efforts to effectively develop
collaborative systems, dataBig Data in particularposes unique chal-
lenges that public officials need to consider. More precisely, the governance
issues associated with collaborating across agencies, departments, and even
working groups are attenuated by potential security and privacy issues asso-
ciated with data. Unfortunately, the literature offers few insights for public
managers and policymakers on how to mitigate the potential privacy issues
around Big Data efforts. However, the CIOs interviewed suggest thatin the
absence of clearly defined protocolsleadership and transparency are criti-
cal factors for overcoming the governance and privacy issues associated with
Big Data initiatives (Desouza, 2014). For example, many CIOs described a
similar process of creating interdepartmental or interagency working groups
as the initial step in an effective Big Data strategy.

Decision-Making in a Big Data Environment


Not surprisingly, a large portion of the Big Data literature is focused on
understanding the ways in which Big Data has, or can, improve decision-
making. As it relates to the public sector, the value of Big Data is often at the
programmatic level. For example, in the cases of New York and Los Angeles
offered above, Big Data was used to improve programmatic outcomes. One
stream of the literature suggests, however, that Big Data can enhance higher
order decision processes. That is, not only can Big Data enhance particular
programs, but it can also support the creation and development of public
policy, more generally. The underlying logic of this argument is that Big Data
Desouza and Jacob 1053

technologies can engage citizens in novel ways and, thus, improve the aggre-
gation and revelation of citizen preferences, with respect to public policies.
In a democratically governed society, a requisite objective of the govern-
ment is to ensure that their policiesand the subsequent provision of public
goods and servicesreflect the preferences of its citizens. A well-established
finding in political economy, however, is thatbecause of the heterogeneity
of preferences found in large groupsthe provision of collective goods will
always be suboptimal; society will be provided with a level of public goods
that does not account for the heterogeneous preferences of citizens (see, for
example, Alesina, Baqir, & Easterly, 1999; Olson, 1965; Samuelson, 1954).
One explanation for this problem is the inability of traditional democratic
mechanisms to adequately aggregate preferences (Arrow, 1950). That is, the
outcomes of traditional voting mechanismsmajority voting and representa-
tive democracydo not necessarily lead to preferred outcomes.15 In this con-
text, policy outcomes are highly reliant on information offered by
policy-experts and/or agenda-setters. Neither of which, necessarily, represent
the will of the people. An important component of the argument for Big
Data applications in the public sector is the promise that it can solve this
problem. That is, it can provide information about the preferences of citizens
regarding public policies without relying on policy-experts. Big data propo-
nents point to two different applications that will allow policymakers to
undertake better assessments of the will of the people: prediction markets
and sentiment analysis. From an investment point of view, these efforts
require new forms of datawhereas the previous examples we have dis-
cussed leveraging existing data through novel analytics. Thus, the level of
investment is likely greater, or at the very least quite different, than other
forms of Big Data investment. Despite some of the enthusiasm around the
potential for Big Data in the policy process, a close reading of the literature
suggests to us that the Big Data mechanisms proposed to improve public
policiesparticularly prediction markets and sentiment analysisface
important limitations.
First, scholars and practitioners have considered how Big Data can take
advantage of the wisdom of crowdsthrough prediction marketsto pre-
dict potential outcomes. These markets, also known as information mar-
kets or event futures, are markets where participants trade contracts
whose payoff depends on unknown future events (Wolfers & Zitzewitz,
2006). In traditional markets, the equilibrium outcome reflects the market
price. In prediction markets, the equilibrium outcome reflects the markets
expectation of an outcome. For example, if a contract pays US$1 if an
event occurs (and nothing otherwise) and the contract last trades at 30 cents,
then the markets expectation of that event occurring is 0.30. Thus, prediction
1054 Administration & Society 49(7)

markets provide a mechanism to access the wisdom of crowds (Surowiecki,


2004) to determine the likelihood of an event occurring. These types of mar-
kets have been shown to be extremely accurate predictors of events, such as
Oscar winners, sales of new products, and presidential elections.16
As suggested by the title of his recent book, Accelerating Democracy:
Transforming Democracy Through Technology, John O. McGinnis (2012)
argues that prediction marketsand other contemporary forms of technol-
ogycan be used to foster better public policies. He argues that policymak-
ers and citizens need to take advantage of the vast stores of data and
information currently available, in particular, through the use of policy pre-
diction markets. As platforms for the public to speculate on election and
policy outcomes, McGinnis argues that prediction markets can aggregate
vast sums of information from an array of individuals and can, thus, assess
the likely effects of policies before they are implemented (McGinnis, 2012,
p. 60). Despite McGinniss optimism, the broader literature seems to suggest
a more cautious approach with respect to the potential for Big Data to support
public policy.
First, it might not be politically feasible to benefit from trading on the
outcome of critical issues (Green, Armstrong, & Graefe, 2007). For example,
in the wake of the 9-11 terrorist attacks, Defense Advanced Research Projects
Agency (DARPA) established a prediction market to predict events related to
national security, such as regime changes in the Middle East or the likelihood
of terrorist attacks. This market was immediately criticized by a host of poli-
ticians and citizens, and was, subsequently, canceled only 1 day after it was
announced (Wolfers & Zitzewitz, 2006). Second, many public policy issues
are difficult to translate into contracts for prediction markets. Most public
policieseven at the local levelare adequately complex that characterizing
them in a single contract to be traded will be difficult. Finally, a requisite
component of effective marketsprediction or otherwiseis a large number
of participants. Thin markets are less likely to reflect true equilibrium out-
comes. In the case of prediction markets, fewer participants will lead to less
reliable predictions. Along this vein, there is limited evidence to suggest that
prediction marketsparticularly for public policy mattersare likely to gen-
erate the requisite levels of participation that would lead to efficient or true
predictions. So while prediction markets provide, in principle, a tool that
could support public policy, there are important limits that, we feel, constrain
its potential.
Another Big Data application that proponents argue will improve policy
outcomes is sentiment analysis.17 Sentiment analysis draws on recent efforts
by various governments to assess the subjective well-being of citizens, rela-
tive to the policy environment.18 Such analytics can provide assessments of
Desouza and Jacob 1055

how citizens are responding to proposed changes to legislation. This can take
different forms. For example, analysts could assess real-time data that
emerge during a policy speech or they could consider archived citizen
responses over the evolution of a particular piece of legislation. Either way,
sentiment analysis for public policy draws on the increasing acceptance of
social media as a platform for real-time public communication. For exam-
ple, a group of researchers at Northeastern University and Harvard have
established a project titled Pulse of a Nation, which uses twitter data to assess
the mood throughout each day.19 Another example is the United Nations
(UN) Global Pulse initiative in collaboration with Crimson Hexagon (a social
media analysis and analytics platform developed at Harvard University). This
effort launched a research project to analyze tweets to understand the senti-
ments, choices, and socioeconomic conditions of people (Lopez & Amand,
2012). These data could be used to correlate changes in sentiment related to
specific policies. As such, sentiment analysis particularly when undertaken
using Big Data (such as twitter) can offer a unique understanding of the
degree to which citizen preferences have been met, or are likely to be met,
through public policies and programs.
Not surprisingly, some of these limitations reflect existing limitations in
the democratic process, such as engaging otherwise marginalized groups,
limiting the influence of potentially biased agenda-setters, and validating the
sources of information. More precisely, drawing on existing critiques of
social media data in science and research, we point to three key limitations of
this application in the public sectorthe unequal distribution of real-time
data, deliberate data manipulation by key stakeholders, and the unlikely abil-
ity to engage citizens in policy matters.
First, real-time data, such as Twitter, are not necessarily representative of the
population. Moreover, Twitter accounts and users are not equivalent. This
idea and its importance are clearly articulated by Boyd and Crawford (2012):

Some users have multiple accounts. Some accounts are used by multiple
people. Some people never establish an account, and simply access Twitter
using the web. Some accounts are bots that produce automated content
without involving a person. Furthermore, the notion of an active account is
problematic. While some users post content frequently through Twitter, other
participate as listeners . . . Due to uncertainties about what an account
represents and what engagement looks like, it is standing on precarious ground
to sample Twitter and make claims about people and users. (p. 6)

Relatedly, there is the issue of the digital dividethe unequal access


between socioeconomic groups, with respect to information and communica-
tion technology. For example, groups at the lower-end of the socioeconomic
1056 Administration & Society 49(7)

ladder tend to have less access to computers and information technology.


Thus, even if the online platforms, such as Twitter, reflected true usage, the
data generated from these sources will be biased in the normal ways, that is,
favoring the upper-middle class groups of American society. Not everyone is
online. A focus on simply mining data created by those who have digital
footprints could lead to spurious results, given the fact that we are only sam-
pling a section of society.
Building upon the idea that Twitter accounts are not equivalent, a sec-
ond, and important, critique of using Twitter (or other real-time data) for
sentiment analysis is that it can be subject to manipulation. For example, in
2011, the Obama administration proposed the Keystone XL pipeline project
to carry tar sands oil from Alberta down to Texas (Olson & Efstathiou, 2011;
Sheppard, 2011). To counter the expressed concerns among stakeholders, the
American Petroleum Institute and other lobbyists were able to manipulate
social media sentiments to show support or the project. By using fake
Twitter accounts, they were able to send an inordinate number of tweets to
show support for the project, which did not accurately represent the senti-
ment on the ground. While this example is perhaps more nefarious than typi-
cal, it does reflect a real concern regarding the potential for manipulating data
to sway public policy.
An additional critique of both prediction markets and sentiment analysis is
based on the underlying assumption of engagement. That is, proponents of
these approaches seem to be assuming that the current lack of engagement
and subsequent loss of information in the policy processis due to limits in
technology. As it relates to serious policy deliberations and discussions we
disagree. Much of the limited engagement is more likely a function of the
cumbersome nature of the democratic process more generally. Thus, if we
consider the length of time it takes to enact laws, change codes, revise poli-
cies, and so on, it seems unlikely thatregardless of the technologyciti-
zens are going to stay engaged throughout the process, such that policymakers
can take advantage of real-time data. It is more likely that citizens would
participate in a priori or post hoc opinion surveys. To be clear, this critique
does not imply that Big Data efforts should be avoided but only that they are
unlikely to replace the existing data efforts and should subsequently be
viewed, more reasonably, as another input into the policy-making process.
In sum, Big Data approaches to informing the policy process are limited in
important ways. Both prediction markets and sentiment analysisusing real-
time dataseem unlikely to better represent the populace than traditional
democratic mechanisms. In addition, prediction markets are difficult to
implement for complex or controversial policies. Thus, the claim that Big
Dataand its related technology and analytical approachesoffer novel
Desouza and Jacob 1057

opportunities for policymakers to assess the true preferences of the populace,


and subsequently make better policies, seems overstated. Both these Big
Data tools suffer from important limitations that constrain their ability to
broadly enhance public policy. That said, this circumspect view suggests an
important research agenda. Scholars need to explore the policy areas where
different Big Data effortsparticularly those that use novel data collection
and aggregation toolscan best support discourse around public policy.

The End of Theory


In the first section of our articlethe bounds of datawe defined Big Data
in terms of its underlying structurevolume, velocity, variety, and complex-
ity. An important stream of the literature, however, offers a different tack on
the issue of defining Big Data. Many argue that Big Data is less about a
change in the structure of data, as much as it is a change in the way we think
about research and analytics (Boyd & Crawford, 2012; Mayer-Schonberger
& Cukier, 2013). The nature of this shift is defined in terms of a move away
from causal theories and toward simple correlations. The logic underly-
ing this shift is that in a Big Data environment, correlations provide clear
evidence that an event is occurring. And this is enough information upon
which to base decisions. For example, if we can save money by knowing the
best time to buy a plane ticket without understanding the method behind air-
fare madness, thats good enough (Mayer-Schonberger & Cukier, 2013, p.
55). This idea is summarized nicely in the following quote from Chris
Andersons (2008) controversial essay about Big Data titled The end of
theory:

This is a tool where massive amounts of data and applied mathematics replace
every other tool that might be brought to bear. Out with every theory of human
behavior from linguistics to sociology. Forget taxonomy, ontology, and
psychology. Who knows why people do what they do? The point is that they do
it, and we can track and measure it with unprecedented fidelity. With enough
data, the numbers speak for themselves.

That said, simple correlations may not serve the longer-term goals of
public organizations. More precisely, because of the wicked nature of the
problems that characterize the work of public sector organizations, it seems
that correlations may be necessary but certainly not sufficient.
Wicked problems are defined by uncertainty. The problems themselves
are often ill-defined, and the solution-set is often ambiguous. Moreover,
many of the solutions or programmatic options involve important tradeoffs.
1058 Administration & Society 49(7)

Examples of wicked problems in the public sector include poverty, homeless-


ness, homeland security, and sustainability. In this context, analyticsBig
Data or otherwiseare necessary but not sufficient conditions for effective
programmatic decision-making (see, for example, deLeon & Denhardt, 2000;
Durant & Legge, 2006; Roberts, 2002).
While Big Data efforts may allow analysts novel insights into problems, it
is equally likely, given the complexity of public problems, to identify a host
of spurious relationships. This concern, to some degree, underlies some of the
current debate on police tactics such as stop and frisk and the National
Security Agencys monitoring of phone records and conversations. Stated
differently, public programs driven by findings on correlations are unlikely to
address underlying social issues and could lead to a misallocation of
resources.
In the two cases described in the second section of this article, the LAPD
and the Office of Policy and Strategic Planning in New York City used Big
Data to achieve programmatic outcomes, reducing crime and improving pub-
lic safety. To some degree, applying this approach to public sector organiza-
tions makes sense. That is, from the analytical perspective described in this
section, such a data-centric critique would be perceived as missing the
point. From the data-centered point of view, the salient issue is that analysts
were able to discern previous unperceived correlations and these correlations
helped address key social problems. If programmatic outcomes are improved
through a novel (Big Data) analytical approach, the definitional issue (and
the related resources it requires) is less problematic. That said, like others, we
caution against letting the pendulum swing too far in this direction. For
example, scholars have pointed out that, while for some stakeholders, the
correlation is all that will matter, for policymakers such correlative insights
may be less valuable. From a policymakers point of view, understanding
why and how correlations matterthat is, what are the causal issueswill be
critical for making long-term decisions. To appreciate this insight, consider
again the example of the New York Office of Policy and Strategic Planning.
From a programmaticdata-centeredpoint of view, this example is clearly
a Big Data success story. The data helped focus resources to better address a
critical public issue. That said, the correlation said nothing of the underlying
causes of illegalsuch as low-housing supply for lower-income residents.
As such, it seems plausible (if not likely) that illegal conversions will just
shift to different types of buildings. The inspectors and analytical department
will continue to show success in terms of the numbers of conversions fixed,
but the underlying social issue will remain. The point of this then, is that even
if Big Data offers valuable insights that support the day-to-day operations of
a public organization, public managers and policymakers need to ensure that
Desouza and Jacob 1059

they do not to lose sight of the broader issues the public sector needs to
address.
Consider this case: The city of Boston introduced Street Bump app,
which automatically detects potholes and sends reports to city administrators.
As residents drive, this app collects data on smoothness of ride, which could
potentially aid in planning investments to fix roads in the city. However, after
the launch of the app, it was found that the program directed crews to wealthy
neighborhoods because people were more likely to have access to smart-
phones (Rampton, 2014). This example illustrates that public agencies can-
not simply leverage technologies to address urban challenges; they need to
think through several dimensions. Who are the users of the application? Is it
representative of all sections of society? Also, data ownership becomes a cru-
cial issue. Who owns the data? How can people be used as sensors without
violating their privacy? Incidents like these will affect cities resilience and
livability.
While this is an important stream of the Big Data literature, the idea of
true causal relationships seems far from the mind of public sector CIOs.
Indeed, the interview data suggest that public organizations are simply so
early in their data efforts that they have not been able to consider the full
potential of Big Data analytics. As noted in Desouzas (2014) report:

CIOs overwhelmingly report that they are just getting started with big data
efforts. While big data as a concept has been discussed in the popular press and
the academic literature for years, public agencies have not yet fully embraced
the concept.

For scholars, this suggests an opportunity to examine and assess the analyti-
cal approaches currently being used by public agencies. In doing so, we can
better understand the causal connections between data, policy, public pro-
grams, and the social issues they are supposed to address.

Conclusion
We were motivated to write this article by what we saw as a dearth of scholar-
ship on Big Data in the public sector. That is, the extant literature offers few,
explicit insights about the limits and potential of Big Data in the public sec-
tor. Thus, drawing on the broad literature on Big Data, as well as data from
interviews with public sector CIOs, we identified some important bounds
to the potential for Big Data. Throughout the article, we offered lessons for
practitioners with respect to developing a Big Data program. That said, we
also hinted at some ways that scholars can support Big Data programs in the
1060 Administration & Society 49(7)

public sector. The primary insight we offer for scholars, however, is that there
is roomindeed the needfor the development of a systematic research
agenda. In this concluding section, we highlight the components of this pro-
posed research agenda.
First, what types of data characterize public organizations? This question
could lead to an important typology of public organizations and how they are,
or could, use different types of data. Second, how are public officials over-
coming the issues of privacy associated with the data-sharing that is funda-
mental to Big Data programs. Third, given the nascent nature of Big Data in
the public sector, most of the related efforts have been targeted at improving
the low-lying fruit found at the programmatic level. That said, some recent
scholarship suggests that the true value of Big Data lies in its ability to
enhance public policy-making more generally. This literature is unsettled, at
best, and subsequently, there is a great deal of work that should explore (a)
the degree to which this is true (i.e., Can Big Data enhance public policy-
making?) and (b) what public policy domains are best suited for Big Data
analytics? Finally, we described a growing body of work that pushes
against the Big Data narrativethat analytics need to focus exclusively, or at
least primarily, on correlations. Scholars and analysts writing in this area
have noted that an emphasis on correlations comes at the expense of under-
standing the underlying causal relationships. In some organizational con-
texts, this might be a trivial omission. In the public sector, however, where
many of the problems being addressed are wicked in nature, understanding
the causal connection between factors is critical for developing policies and
programs that address the longer-term components of the problem. From the
point of view of future research, we argue that scholars should look closely at
how data are currently being used and assess the degree to which it is being
underutilized.
Big data offers the potential to address many public sector problems.
There is, however, some tension between the promise of Big Data and reality.
In this article, we have looked at the extant literature for lessons that will sup-
port public officials in their efforts to leverage Big Data. That said, for the
potential of Big Data to be realized we argue that scholars have an important
role to play. With this in mind, we have also set forth the beginnings of a
research agenda that considers the limits and potential of Big Data in the
public sector.

Declaration of Conflicting Interests


The author(s) declared no potential conflicts of interest with respect to the research,
authorship, and/or publication of this article.
Desouza and Jacob 1061

Funding
The author(s) disclosed receipt of the following financial support for the research,
authorship, and/or publication of this article: Kevin C. Desouza graterfully acknowl-
edges funding received from the IBM Center for the Business of Government.

Notes
1. See Code for America website (http://codeforamerica.org/) for more information
on these types of apps. That said, we will demonstrate that these types of open
data exercises are, in most instances, not truly representative of Big Data. They
are novel and interesting, but not Big Data.
2. As described in a recent article in the Atlantic Cities: We are hearing from the
mayors office like, a cat got stuck in a tree, can we have a hackathon to get it
down? (Badger, 2013).
3. Most recently, in a series of articles in The New York Times, Paul Krugman
and James Glanz offer competing views on the potential benefits of Big Data
for economic productivity (see http://krugman.blogs.nytimes.com/2013/08/18/
the-dynamo-and-big-data/?_r=0).
4. The full findings from the study are reported in Desouza (2014).
5. The list of characteristics seems to be growing. The 3Vs were popularized in
Laney (2001).
6. This of course suggests that, as storage becomes cheaper, and analytical tools
become more powerful, Big Data defined purely in terms of volumewill be
a moving target (Manyika et al., 2011).
7. For more insights on the volume of data being created, see Bohn and Short
(2010); Bohn, Short, and Baru (2011); and Shapiro and Varian (1998).
8. A recent study has revealed that the Google team has been overestimating flu
outbreaks since 2011their prediction is two points off compared with the
Center for Disease Control (CDC) estimates. In addition, by using its traditional
methods of estimation, the CDC has been accurately predicting flu outbreaks
(Lazer, Kennedy, King, & Vespignani, 2014).
9. A similar framework has been put forward by Birnhack (2014).
10. For more details on this particular case, see Franks (2012).
11. These datasets are not big but just large. These datasets are still analyzed using
traditional analytical tools (e.g., SPSS, Excel) and hence do not meet the stan-
dard definition for Big Data.
12. www.lapdonline.org/home/pdf_view/39375
13. John Bryson has written (with a variety of coauthors) extensively on this issue,
particularly as it relates to the sharing of information and resources (see, for
example, Bryson, Ackermann, & Eden, 2007; Bryson, Crosby, & Stone, 2006).
14. Similar open data efforts have been initiated at the local level. For example, a
recent newsletter from Alliance for Innovation, titled The Digital Future: Open
Data to Open Doors highlights open data initiatives in Hawaii, Austin, Texas,
and Palo Alto, California.
1062 Administration & Society 49(7)

15. See almost any introductory text book on political economy. For a simple exposi-
tion of this issue, see Jonathon Grubers (2014) textbook.
16. See Wolfers and Zitzewitz (2006).
17. McGinnis refers to this as dispersed media.
18. For example, the City of Santa Monica was one of five US$1 million awards
granted by the Bloomberg Philanthropies Mayors Challenge. This effort is
focused on the development of a Local Well-Being Index, a dynamic mea-
surement tool that will provide a multidimensional picture of our communitys
strengths and challenges across key elements of well-being (economics, social
connections, health, education & care, community engagement, and physical
environment). The goal of this index is to be used by city officials to make
data-driven decisions and targeted resource allocation. http://www.smgov.net/
uploadedFiles/Wellbeing/Project-Summary.pdf
19. http://www.ccs.neu.edu/home/amislove/twittermood/

References
Alesina, A., Baqir, R., & Easterly, W. (1999). Public goods and ethnic divisions.
Quarterly Journal of Economics, 114, 1243-1284.
Anderson, C. (2008, June 23). The end of theory: The data deluge makes the scien-
tific method obsolete. Wired. Retrieved from http://archive.wired.com/science/
discoveries/magazine/16-07/pb_theory
Arrow, K. J. (1950). A difficulty in the concept of social welfare. Journal of Political
Economy, 58, 328-346.
Badger, E. (2013). Are civic hackathons stupid? The Atlantic cities: Place matters.
Retrieved from http://www.theatlanticcities.com/technology/2013/07/are-hack-
athons-stupid/6111/
Birnhack, M. (2014). S-M-L-XL data: Big data as a new informational privacy para-
digm. In Big data and privacy: Making ends meet (Future of Privacy Forum).
The Center for Internet and Society, Stanford Law School. Retrieved from http://
www.futureofprivacy.org/big-data-privacy-workshop-paper-collection/
Bohn, R., & Short, J. (2010). How much information 2009: Report on American con-
sumers. San Diego: Global Information Industry Center, University of California,
San Diego.
Bohn, R., Short, J., & Baru, C. (2011). How much information 2010: Report on enter-
prise server information consumers. San Diego: Global Information Industry
Center, University of California, San Diego.
Boyd, D., & Crawford, K. (2012). Critical questions for big data. Information,
Communication & Society, 15, 662-679.
Bryson, J. M., Ackermann, F., & Eden, C. (2007). Putting the resource-based view
of strategy: An distinctive competencies to work in public organizations. Public
Administration Review, 67, 702-718.
Bryson, J. M., Crosby, B. C., & Stone, M. M. (2006). The design and implemen-
tation of cross-sector collaborations: Propositions from the literature. Public
Administration Review, 66, 44-56.
Desouza and Jacob 1063

Chow, B. (2012, February 8). LAPD pioneers high tech crime fighting war room.
Retrieved from: http://losangeles.cbslocal.com/2012/02/08/lapd-pioneers-high-
tech-crime-fighting-war-room/
deLeon, L., & Denhardt, R. B. (2000). The political theory of reinvention. Public
Administration Review, 60, 89-97.
Desouza, K. C. (2014). Realizing the promise of big data. Washington, DC: IBM
Center for the Business of Government.
Durant, R. F., & Legge, J. S. (2006). Wicked problems, public policy, and adminis-
trative theory: Lessons from the GM food regulatory arena. Administration &
Society, 38, 309-334.
Franks, B. (2012). Taming the big data tidal wave. Hoboken, NJ: John Wiley.
Galnoor, I. (1975) Government secrecy: exchanges, intermediaries and middlemen.
Public Administration Review, 35, 32-42.
Green, K. C., Armstrong, J. S., & Graefe, A. (2007). Methods to elicit forecasts from
groups: Delphi and prediction markets compared. Foresight: The International
Journal of Applied Forecasting, 8, 17-20.
Gruber, J. (2014). Public finance and public policy (4th ed.). New York, NY: Worth
Publishers.
Howard, A. (2012). Predictive data analytics is saving lives and taxpayer dollars in
New York City. Retrieved from http://strata.oreilly.com/2012/06/predictive-data-
analytics-big-data-nyc.html
Kaisler, S., Armour, F., Espinosa, J. A., & Money, W. (2013). Big data issues and chal-
lenges moving forward. Proceedings from the 46th Hawaii International Conference
on System Sciences. Retrieved from http://www.cse.hcmut.edu.vn/~ttqnguyet/
Downloads/SIS/References/Big%20Data/(2)%20Kaisler2013%20-%20Big%20
Data-%20Issues%20and%20Challenges%20Moving%20Forward.pdf
Laney, D. (2001). 3D data management: Controlling data volume, velocity, and
variety. META Group. Retrieved from http://blogs.gartner.com/doug-laney/
files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-
and-Variety.pdf
Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google
flu: Traps in big data analysis. Science, 343, 1203-1205.
Lopez, G., & Amand, W. S. (2012). Discovering global socio economic trends hid-
den in big data. Retrieved from http://www.unglobalpulse.org/discoveringtrend
sinbigdata-CHguestpost
Mackey, A. (2013). In wake of Journal News publishing gun permit holder maps,
nation sees push to limit access to gun records. The News Media and The Law,
Winter 2013, 37(1). Retrieved from http://www.rcfp.org/browse-media-law-
resources/news-media-law/news-media-and-law-winter-2013/wake-journal-
news-publishin
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Hung
Byers, A. (2011). Big data: The next frontier for innovation, competition, and
productivity. McKinsey Global Institute.
Mayer-Schonberger, V., & Cukier, K. (2013). Big data: A revolution that will trans-
form how we live, work and think. New York, NY: Houghton Mifflin Harcourt.
1064 Administration & Society 49(7)

McGinnis, J. O. (2012). Accelerating democracy: Transforming governance through


technology. Princeton, NJ: Princeton University Press.
Mervis, J. (2012). Agencies rally to tackle Big Data. Science, 336, 22.
Olson, M. (1965). The logic of collective action: Public goods and the theory of
groups. Cambridge, MA: Harvard University Press.
Olson, B., & Efstathiou, Jr. J. (2011, November 16). Enbridges pipeline threatens
Trans Canadas Keystone XL Plan. Bloombrg BussinessWeek. Available at:
http://www.businessweek.com/news/2011-11-16/enbridge-s-pipeline-threatens-
transcanada-s-keystone-xl-plan.html
Piotrowski, S. J., & Rosenbloom, D. H. (2002). Nonmission-Based Values in Results-
Oriented Public Management: The Case of Freedom of Information. Public
Administration Review, 62, 643657.
Rampton, R. (2014, April 27). White House looks at how big data can discrimi-
nate. Retrieved from http://uk.reuters.com/article/2014/04/27/uk-usa-obama-
privacy-idUKBREA3Q00S20140427
Roberts, N. C. (2002). Keeping public officials accountable through dialogue:
Resolving the accountability paradox. Public Administration Review, 62,
658-669.
Samuelson, P. (1954). The pure theory of public expenditures. Review of Economics
and Statistics, 36, 386-389.
Shapiro, C., & Varian, H. R. (1998). Information rules: A strategic guide to the net-
work economy. Cambridge, MA: Harvard Business Press.
Sheppard, K. (2011, August 24). Whats All the Fuss About the Keystone XL Pipeline?
MotherJones. Available at: http://www.motherjones.com/blue-marble/2011/08/
pipeline-protesters-keystone-xl-tar-sands
Surowiecki, J. (2004). The wisdom of crowds: Why the many are smarter than the few
and collective wisdom shapes, business, economies, societies and nations. New
York, NY: Little Brown Book.
Wolfers, J., & Zitzewitz, E. (2006). Prediction markets in theory and practice (NBER
Working Paper 12083). Retrieved from http://www.nber.org/papers/w12083.pdf
Worley, D. R. (2012). The gun owner next door: What you dont know about the
weapons in your neighborhood. Retrived from http://www.lohud.com/apps/pbcs.
dll/article?AID=2012312230056&nclick_check=1

Author Biographies
Kevin C. Desouza is the associate dean for research in the College of Public Programs,
a professor in the School of Public Affairs, and the interim director for the Decision
Theater in the Office of Knowledge Enterprise Development at Arizona State
University.
Benoy Jacob is the director of the Center for Local Government Research and
Training, and is an assistant professor in the School of Public Affairs at the University
of Colorado, Denver.

Das könnte Ihnen auch gefallen