Beruflich Dokumente
Kultur Dokumente
ACM
CACM.ACM.ORG OF THE 01/2019 VOL.62 NO.01
Face2Face:
Real-Time Face Capture and
Reenactment of RGB Videos
Quantum Leap
Illegal Pricing Algorithms
Intelligent Systems for Geosciences
Open Collaboration in an Age of Distrust Association for
Computing Machinery
Complexes of physically interac�ng proteins cons�tute fundamental func�onal units that drive
almost Complexes
all biological processes
of physically within
interac� ng cells.
proteinsA faithful
cons�tute reconstruc�
fundamentalon of the
func� onalen�
unitsrethat
setdrive
of protein
almost all biological processes within cells. A faithful reconstruc� on of the
complexes (the “complexosome”) is therefore important not only to understand the composi�on en� re set of protein
complexes (the “complexosome”) is therefore important not only to understand the composi�on
of complexes but also the higher level func�onal organiza�on within cells. In this book, we
of complexes but also the higher level func�onal organiza�on within cells. In this book, we
systema� cally walk through computa�onal methods devised to date (approximately between
systema�cally walk through computa�onal methods devised to date (approximately between
2000 and 2016)
2000 for iden�
and 2016) fyingfying
for iden� protein
proteincomplexes
complexes from
from thethenetwork
network of protein
of protein interac�
interac� ons (the
ons (the
protein-protein interac�
protein-protein on (PPI)
interac� network).
on (PPI) network).We Wepresent
present aadetailed
detailed taxonomy
taxonomy of these
of these methods,
methods, and and
comprehensively
comprehensivelyevaluate them
evaluate forfor
them protein
proteincomplex iden�fifica�
complex iden� ca�
onon across
across a variety
a variety of scenarios
of scenarios
including the absence of many true interac� ons and the presence of false-posi�
including the absence of many true interac�ons and the presence of false-posi�ve interac� ve interac� ons ons
(noise) (noise) in PPI networks. Based on this evalua�on, we highlight challenges faced by the methods, for
in PPI networks. Based on this evalua�on, we highlight challenges faced by the methods, for
instance in iden�fying sparse, sub-, or small complexes and in discerning overlapping complexes,
instanceand
in reveal
iden�howfyinga combina�
sparse, sub-, or small complexes and in discerning overlapping complexes,
on of strategies is necessary to accurately reconstruct the en�re
and reveal how a combina�on of strategies is necessary to accurately reconstruct the en�re
complexosome.
complexosome.
COMMUNICATIONS OF THE ACM
By Sergio Orenga-Roglá
and Ricardo Chalmeta 85 Deception, Identity, and Security:
The Game Theory of Sybil Attacks
66 The Church-Turing Thesis: Classical mathematical game theory
Logical Limit or Breachable Barrier? helps to evolve the emerging logic About the Cover:
IMAGE BY PHOTO BA NK GA LLERY
In its original form, the Church- of identity in the cyber world. This month’s cover story
illustrates the essence of
Turing thesis concerned By William Casey, Ansgar Kellner, Face2Face—an innovative
computation as Alan Turing Parisa Memarmoshrefi, approach for the highly
convincing transfer of
and Alonzo Church used the term in Jose Andre Morales, and Bud Mishra facial expressions from
one source to a target
1936—human computation. video in real time. Cover
By B. Jack Copeland and Oron Shagrir illustration by Vault49.
Communications of the ACM is the leading monthly print and online magazine for the computing and information technology fields.
Communications is recognized as the most trusted and knowledgeable source of industry information for today’s computing professional.
Communications brings its readership in-depth coverage of emerging areas of computer science, new trends in information technology,
and practical applications. Industry leaders use Communications as a platform to present and debate various technology implications,
public policies, engineering challenges, and market trends. The prestige and unmatched reputation that Communications of the ACM
enjoys today is built upon a 50-year commitment to high-quality editorial content and a steadfast dedication to advancing the arts,
sciences, and applications of information technology.
ACM, the world’s largest educational STA F F EDITORIAL BOARD ACM Copyright Notice
and scientific computing society, delivers DIRECTOR OF PU BL ICATIONS E DITOR- IN- C HIE F Copyright © 2019 by Association for
resources that advance computing as a Scott E. Delman Andrew A. Chien Computing Machinery, Inc. (ACM).
science and profession. ACM provides the cacm-publisher@cacm.acm.org eic@cacm.acm.org Permission to make digital or hard copies
computing field’s premier Digital Library Deputy to the Editor-in-Chief of part or all of this work for personal
and serves its members and the computing Executive Editor Lihan Chen or classroom use is granted without
profession with leading-edge publications, Diane Crawford cacm.deputy.to.eic@gmail.com fee provided that copies are not made
conferences, and career resources. Managing Editor S E NIOR E DITOR or distributed for profit or commercial
Thomas E. Lambert Moshe Y. Vardi advantage and that copies bear this
Executive Director and CEO Senior Editor notice and full citation on the first
Vicki L. Hanson Andrew Rosenbloom NE W S page. Copyright for components of this
Deputy Executive Director and COO Senior Editor/News Co-Chairs work owned by others than ACM must
Patricia Ryan Lawrence M. Fisher Marc Snir and Alain Chesnais be honored. Abstracting with credit is
Director, Office of Information Systems Web Editor Board Members permitted. To copy otherwise, to republish,
Wayne Graves David Roman Monica Divitini; Mei Kobayashi; to post on servers, or to redistribute to
Director, Office of Financial Services Editorial Assistant Michael Mitzenmacher; Rajeev Rastogi; lists, requires prior specific permission
Darren Ramdin Danbi Yu François Sillion and/or fee. Request permission to publish
Director, Office of SIG Services from permissions@hq.acm.org or fax
Donna Cappo Art Director (212) 869-0481.
VIE W P OINTS
Director, Office of Publications Andrij Borys
Co-Chairs
Scott E. Delman Associate Art Director For other copying of articles that carry a
Tim Finin; Susanne E. Hambrusch;
Margaret Gray code at the bottom of the first or last page
John Leslie King; Paul Rosenbloom
Assistant Art Director or screen display, copying is permitted
ACM CO U N C I L Board Members
Mia Angelica Balaquiot provided that the per-copy fee indicated
President Stefan Bechtold; Michael L. Best; Judith Bishop;
Production Manager in the code is paid through the Copyright
Cherri M. Pancake Andrew W. Cross; Mark Guzdial; Haym B. Hirsch;
Bernadette Shade Clearance Center; www.copyright.com.
Vice-President Richard Ladner; Carl Landwehr; Beng Chin Ooi;
Intellectual Property Rights Coordinator
Elizabeth Churchill Francesca Rossi; Loren Terveen;
Barbara Ryan Subscriptions
Secretary/Treasurer Marshall Van Alstyne; Jeannette Wing;
Advertising Sales Account Manager An annual subscription cost is included
Yannis Ioannidis Susan J. Winter
Ilia Rodriguez in ACM member dues of $99 ($40 of
Past President
Alexander L. Wolf which is allocated to a subscription to
Chair, SGB Board Columnists P R AC TIC E Communications); for students, cost
Jeff Jortner David Anderson; Michael Cusumano; Co-Chairs is included in $42 dues ($20 of which
Co-Chairs, Publications Board Peter J. Denning; Mark Guzdial; Stephen Bourne and Theo Schlossnagle is allocated to a Communications
Jack Davidson and Joseph Konstan Thomas Haigh; Leah Hoffmann; Mari Sako; Board Members subscription). A nonmember annual
Members-at-Large Pamela Samuelson; Marshall Van Alstyne Eric Allman; Samy Bahra; Peter Bailis; subscription is $269.
Gabriele Anderst-Kotis; Susan Dumais; Betsy Beyer; Terry Coatta; Stuart Feldman;
Renée McCauley; Claudia Bauzer Mederios; C O N TAC T P O IN TS Nicole Forsgren; Camille Fournier; ACM Media Advertising Policy
Elizabeth D. Mynatt; Pamela Samuelson; Copyright permission Jessie Frazelle; Benjamin Fried; Tom Killalea; Communications of the ACM and other
Theo Schlossnagle; Eugene H. Spafford permissions@hq.acm.org Tom Limoncelli; Kate Matsudaira; ACM Media publications accept advertising
SGB Council Representatives Calendar items Marshall Kirk McKusick; Erik Meijer; in both print and electronic formats. All
Sarita Adve; Jeanna Neefe Matthews calendar@cacm.acm.org George Neville-Neil; Jim Waldo; advertising in ACM Media publications is
Change of address Meredith Whittaker at the discretion of ACM and is intended
BOARD C HA I R S acmhelp@acm.org to provide financial support for the various
Letters to the Editor activities and services for ACM members.
Education Board C ONTR IB U TE D A RTIC LES
letters@cacm.acm.org Current advertising rates can be found
Mehran Sahami and Jane Chu Prey Co-Chairs
by visiting http://www.acm-media.org or
Practitioners Board James Larus and Gail Murphy
W E B S IT E by contacting ACM Media Sales at
Terry Coatta and Stephen Ibaraki Board Members
http://cacm.acm.org (212) 626-0686.
William Aiello; Robert Austin; Kim Bruce;
REGIONA L C O U N C I L C HA I R S Alan Bundy; Peter Buneman; Jeff Chase;
WEB BOARD Single Copies
ACM Europe Council Carl Gutwin; Yannis Ioannidis;
Chair Single copies of Communications of the
Chris Hankin Gal A. Kaminka; Ashish Kapoor;
James Landay ACM are available for purchase. Please
ACM India Council Kristin Lauter; Igor Markov; Bernhard Nebel;
Board Members contact acmhelp@acm.org.
Abhiram Ranade Lionel M. Ni; Adrian Perrig; Marie-Christine
Marti Hearst; Jason I. Hong;
ACM China Council Rousset; Krishan Sabnani; m.c. schraefel;
Jeff Johnson; Wendy E. MacKay COMMUN ICATION S OF THE ACM
Wenguang Chen Ron Shamir; Alex Smola; Josep Torrellas;
Sebastian Uchitel; Hannes Werthner; (ISSN 0001-0782) is published monthly
AU T H O R G U ID E L IN ES by ACM Media, 2 Penn Plaza, Suite 701,
http://cacm.acm.org/about- Reinhard Wilhelm
PUB LICATI O N S BOA R D New York, NY 10121-0701. Periodicals
Co-Chairs communications/author-center postage paid at New York, NY 10001,
RES E A R C H HIGHLIGHTS
Jack Davidson; Joseph Konstan and other mailing offices.
Board Members Co-Chairs
ACM ADVERTISIN G DEPARTM E NT Azer Bestavros and Shriram Krishnamurthi
Phoebe Ayers; Edward A. Fox; Chris Hankin; 2 Penn Plaza, Suite 701, New York, NY POSTMASTER
Xiang-Yang Li; Nenad Medvidovic; Board Members
10121-0701 Please send address changes to
Sue Moon; Michael L. Nelson; Martin Abadi; Amr El Abbadi; Sanjeev Arora;
T (212) 626-0686 Communications of the ACM
Sharon Oviatt; Eugene H. Spafford; Michael Backes; Maria-Florina Balcan;
F (212) 869-0481 2 Penn Plaza, Suite 701
Stephen N. Spencer; Divesh Srivastava; David Brooks; Stuart K. Card; Jon Crowcroft;
New York, NY 10121-0701 USA
Robert Walker; Julie R. Williamson Alexei Efros; Bryan Ford; Alon Halevy;
Advertising Sales Account Manager Gernot Heiser; Takeo Igarashi; Sven Koenig;
Ilia Rodriguez Greg Morrisett; Tim Roughgarden;
ACM U.S. Public Policy Office ilia.rodriguez@hq.acm.org Printed in the USA.
Adam Eisgrau, Guy Steele, Jr.; Robert Williamson;
Director of Global Policy and Public Affairs Margaret H. Wright; Nicholai Zeldovich;
Media Kit acmmediasales@acm.org
1701 Pennsylvania Ave NW, Suite 300, Andreas Zeller
Washington, DC 20006 USA
Association for Computing Machinery S P EC IA L S EC TIONS
T (202) 659-9711; F (202) 667-1066
(ACM) Co-Chairs
Computer Science Teachers Association 2 Penn Plaza, Suite 701 Sriram Rajamani and Jakob Rehof A
SE
REC
Y
CL
PL
Executive Director T (212) 869-7440; F (212) 869-0481 Tao Xie; Kenjiro Taura; David Padua
NE
TH
S
I
Z
I
M AGA
F
ditional notions of controlg irrelevant.
OR OVER 30 years, computing what it means to engage in open collab- Companies face increasing assertion
has been pursued in an envi- oration in an Age of Distrust. Why must of national sovereignty and control—
ronment of trust with com- the computing community change? government access to data, citizen
puting research advances and While computing has supported data privacy rights, even information
publications shared openly military technology (design) and tactics control.h Universities and research in-
within a truly integrated international (gunnery tables) from its earliest days,c stitutes face increasing questions about
community. At the heart is the explosive they were not the direct tools of aggres- whom to collaborate with, to share in-
20-year rise of open source softwarea— sion. The evidence is undeniable that formation with, and to allow to work on
shared touchstones sufficient to build computing is now a dual-use technolo- projects. At issue is the ethical and mor-
enterprise-scale software systems and gy with capability for direct aggression. al implications of research. Export con-
giving rise to multibillion-dollar com- ˲˲ Cybersecurity technologies are used trol regulations proliferate, “deemed
panies and entire new service sectors. extensively as instruments of aggression export” is increasingly challenging, and
The bounty of open sharing is the by governments and non-governmental new regulations controlling informa-
rapid advance of computing technolo- organizations for industrial espionage, tion sharing and research seem likely.
gies—the Internet, WWW, and a wide sabotage, and subversion of elections,d Within science, the physics commu-
variety of Internet and cloud services. and even entire countries’ infrastruc- nity has faced these concerns for much
Equally important, open source sharing ture. Cybersecurity technology is used of the 20th century, and recently so has
has been a boon for education, building for asymmetric attacks on the wealthy the biology community. Within com-
an open international community that and powerful—nations, companies, puting, the cryptography community
included developed countries in Europe CEO’s, but can also be turned on the is no stranger to these concerns. We
and North America as well as developing poor, weak, and individuals. should seek to learn from them.
countries such as Brazil, Russia, India, ˲˲ Artificial intelligence technologies Let me be clear, I am not advocating
and China. All have contributed and have growing capabilities for surveil- banning, control, or classification of re-
benefitted tremendously in return. lance, espionage, and more intimidat- search topics. The computing commu-
The global backdrop for comput- ing potential to create autonomous and nity is too large and international for any
ing’s open sharing was an environment robotic systems. So serious are these single country or organization to limit
of international trust and secular trend concerns that leading AI researchers the progress in computing technologies.
toward global integration of economy have called for a ban on development However, such efforts will inevitably arise,
and society. We are manifestly in a new of autonomous weapons,e and oth- so we, as computing professionals, must
era of international relations—”An Age ers have protested and prevented their begin the difficult conversations of how
of Distrust”—where the trend toward company’s participation in military to shape the development and use of
increased trade and integration has applications.f Most countries believe AI technologies so that they can be a respon-
stalled, if not reversed. And, a new su- is not only commercially important, but sible and accountable force in society.
perpower competition between the U.S. also strategic for intelligence and war- Let’s begin the conversation!
and China for global scientific, eco- fare cyberspace and the physical world.
nomic, and other forms of leadership Andrew A. Chien, EDITOR-IN-CHIEF
is reshaping perspective and strategy.b c History of Computing Hardware;
It is time for the computing commu- https://bit.ly/2IHzgP4.
d M.S. Schmidt and D.E. Sanger. 5 in China army f D. Wakabayashi and S. Shane. Google will not
nity to begin thinking and discussing face U.S. charges of cyberattack. NY Times, (May renew Pentagon contract that upset employ-
19, 2014). A. Greenberg. How an entire nation ees. NY Times, (June 1, 2018).
a S. Phipps. Open source software: 20 years and became Russia’s test lab for cyberwar. WIRED, g UN Office for Disarmament Affairs. Treaty
counting, (Feb. 3, 2018), opensource.com (June 20, 2017); The untold story of NOTPETYA, on the Non-Proliferation of Nuclear Weapons;
b China v America: The end of engagement, how the most devastating cyberattack in history. https://bit.ly/2gxxd2j
the world’s two superpowers have become ri- WIRED, (Aug. 22, 2018). h E.C. Economy. The great firewall of China: Xi Jin-
vals. Economist, (Oct. 18, 2018); J. Perlez. U.S.- e Autonomous weapons: An open letter from ping’s Internet shutdown, The Guardian, (June
China clash at Asian summit was over more AI & robotics researchers; https://futureoflife. 29, 2018) and European Union: General data
than words. NY Times, (Nov. 19, 2018). org/open-letter-autonomous-weapons/ protection regulation; https://gdpr-info.eu/
DOI:
DOI:10.1145/3292820 FirstName
Vinton G. Cerf
Lastname
A People-Centered Economy
Innovation for Jobs (i4j.info) recently published
a booka describing a new, people-centered
view of work. In some ways, this is a kind of
revolutionary Copernican view of work.
Rather than organizing work around to support their families and partici- work that creates value. One can see
tasks, the idea is to organize work pate in the economy. the attraction of linking these togeth-
around people and their skills. One In capitalist societies, there is typi- er in the form of owner-workers.
thesis of this book is that organizing cally a distinction made between own- Making people more valuable is also
work around tasks leads companies ers and workers. The owners partici- tied to the capacity to produce value.
to focus on reducing the cost of tasks pate in the value of the company while Increasing skills and knowledge in-
by increasing productivity, reducing the workers are paid to work. This dis- creases the potential to do valuable
the need for people to do work. Au- tinction creates a disparity between work so education is part of the equa-
tomation and robotics derive their these two cohorts, particularly in the tion. We are seeing new forms of edu-
attraction in part from this incen- case of successful companies. With cation emerging, partly through on-
tive. An alternative view seeks to in- relatively few exceptions, the work- line access to information and partly
crease the value of people by maxi- ers do not participate in the value of as a consequence of longer lives and
mizing their utility and shaping the company except to the extent they thus longer careers. No longer does
work/jobs around their strengths. I are paid for their work. Stockholders it seem possible to learn for a while,
have written before about strengths (that is, owners) participate in the earn for a while, and then retire. Ca-
and noted, in particular, the Gallup value of the company. Gallup is an reers may extend over periods of six
Corporation’s StrengthsFinder ap- exception, for example, because the decades or more during which time
plicationb that helps people discover company is owned by its employees technology will have changed society
and rank-order the skills and capa- who participate in the value of the and its needs dramatically. Continued
bilities they have. company as well as being paid for learning will be needed during the
As we ponder the future of work, it their work. Without the efforts of the course of a working career. Indeed,
is important to recognize how essen- workers the company would not have long-lived people may have multiple ca-
tial work is to global socioeconomic value so the idea that the workers and reers over time.
conditions and how important it is owners ought to be the same cohort As we contemplate the future of
to the individuals who perform it. has a great deal of attraction. Wealth work, it seems inescapable that tech-
In a world in which money is the pri- creation is tied to ownership and the nology will play a major role in increas-
mary medium of exchange, payment ing human ability to do work that is of
for work is essential. The authors of value to the society. While there is a
The People Centered Economy recog- No longer popular meme today that seeks to de-
nize that much effort has gone into monize automation and robotics, the
encouraging people to spend more does it seem possible alternative view is that these technolo-
(think advertising), but not so much to learn for a while, gies will enhance our ability to do pro-
into helping people earn more (that ductive work. I see them as a means for
is, to make themselves more valu- earn for a while, augmenting our capacity to be produc-
able). Meaningful work is fulfilling and then retire. tive and innovative, making each of us
and payment for it enables people potentially more valuable to each other
and our society.
a The People Centered Economy; The New Eco-
system for Work. IIIJ Foundation, 2018, ISBN: Vinton G. Cerf is vice president and Chief Internet Evangelist
1729145922 at Google. He served as ACM president from 2012–2014.
b https://www.gallupstrengthscenter.com/home/
en-us/benefits-of-cliftonstrengths-34-vs-top-5 Copyright held by author/owner.
T
HE COMPUTING FIELD went into production in 1908 and started the criticism of Internet companies for “un-
through a perfect storm in automobile age. With the automobile ethical” business models is misguided.
the early 2000s: the dot-com came automobile crashes, which today If society finds the surveillance business
and telecom crashes, the off- kill annually more than 1,000,000 peo- model offensive, then the remedy is pub-
shoring scare, and a research- ple. But the fatality rate has been going lic policy, in the form of laws and regula-
funding crisis. After its glamour phase down for the past 100+ years. Reducing tions, rather than an ethics outrage. Of
in the late 1990s, the field seems to have the fatality rate has been accomplished course, public policy cannot be divorced
lost its luster, and academic comput- by improving the safety of automobiles, from ethics. We ban human-organ trad-
ing enrollments have declined precipi- the safety of roads, licensing of drivers, ing because we find it ethically repug-
tously. This was referred to as the Image drunk-driving laws, and the like. The so- nant, but the ban is enforced via public
Crisis. We seem to be going through lution to automobile crashes is not eth- policy, not via an ethics debate.
another image crisis, of a different na- ics training for drivers, but public poli- The IT industry has successfully lob-
ture, these days. Last year the columnist cy, which makes transportation safety a bied for decades against any attempt to
Peggy Noonan described Silicon Valley public priority. legislate/regulate IT public policy under
executives as “moral Martians who op- Last year I wrotea on how “informa- the mantra “regulation stifles innova-
erate on some weird new postmodern tion freedom” leads Internet compa- tion.” In response to the investigation
ethical wavelength.” Niall Ferguson, a nies to use targeted advertising as their of Tesla’s CEO Elon Musk by the U.S.
Hoover Institution historian, described basic monetization mechanism, which Security and Exchange Commission for
cyberspace as “cyberia, a dark and law- requires them to collect personal data possible security-law violation, a recent
less realm where malevolent actors and offer it to their advertisers. The Wired magazine headline proclaimed,
range.” Salesforce’s CEO, Marc Benioff, social scientist Shoshana Zuboff de- “The case against Elon Musk will chill
declared: “There is a crisis of trust con- scribed this business model in 2014 as innovation!” Of course regulation chills
cerning data privacy and cybersecurity.” “surveillance capitalism.” There is a di- innovation. In fact, the whole point of
Many view this crisis as an ethical cri- rect line between this business model regulation is to chill certain kinds of in-
sis. The Boston Globe asserted in March and the 2018 Facebook–Cambridge novation, the kind that public policy
2018, “Computer science faces an ethics Analytica scandal, when it was revealed wishes to chill. At the same time, regula-
crisis. The Cambridge Analytica scandal that Cambridge Analytica collected tion also encourages innovation. There is
proves it!” The New York Times reported personal data of millions of people’s no question that automobile regulation
in October 2018, “Some think chief Facebook profiles without their con- increased automobile safety and fuel ef-
ethics officers could help technology sent and used it for political purposes. ficiency, for example. Regulation can be
companies navigate political and social We must remember, however, that the a blunt instrument and must be wielded
questions.” Many academic institutions advertising-based Internet business carefully; otherwise, it can chill innova-
are hurriedly launching new courses on is enormously profitable. It is unlikely tion in unpredictable ways. Public policy
computing, ethics, and society. Others Internet companies will abandon this is hard, but it is better than anarchy.b
are taking broader initiatives, integrat- lucrative business model because of Do we need ethics? Of course! But
ing ethics across their computing curri- some ethical qualms, even under Ap- the current crisis is not an ethics crisis;
cula. The narrative is that what ails tech ple’s CEO Tim Cook’s blistering attack it is a public policy crisis.
today is a deficit of ethics, and the rem- on the “data industrial complex.”
edy, therefore, is an injection of ethics. The problem with surveillance capi- b See Point/Counterpoint debate in
This narrative, however, leaves me talism is not that it is unethical, but that the December 2018 issue.
deeply skeptical. It is not that I am it is completely legal in many countries.
Moshe Y. Vardi (vardi@cs.rice.edu) is the Karen Ostrum
against ethics, but I am dubious of the It is unreasonable to expect for-profit George Distinguished Service Professor in Computational
diagnosis and the remedy. As an exam- corporations to avoid profitable and le- Engineering and Director of the Ken Kennedy Institute for
Information Technology at Rice University, Houston, TX, USA.
ple, consider the Ford Model T, the first gal business models. In my opinion, the He is the former Editor-in-Chief of Communications.
mass-produced and mass-consumed
automobile. The Ford Model T went a https://bit.ly/2FvmGGt Copyright held by author.
DOI:10.1145/3290404 http://cacm.acm.org/blogs/blog-cacm
Questions About
vented computers in the first place.
Yet, the new school curricula across the
world have lost focus on hardware and
Drawing 2. Drawing 3.
they were often insistent that there some isolated facts about them. None of gether their everyday experiences with
should be a fan in there. They knew the pictures showed accurately how the technology with facts that adults tell
that there would be wires inside, and components work together to perform them to try to make sense of how com-
that it would need a battery to make it computation, although the children puters work. This can lead to some
work. The child who created Drawing 1 were ready and willing to reason about confusion, particularly if the adults in
has made a nice job of piecing together this with their classmates. Although their lives are also unsure. One child
a possible design from what they knew some of the children had programmed thought, for example, that if you paid
about computers—can you spot what in the visual programming language, more money, then it would make Wi-
is missing, though? none of them knew how the commands Fi stronger. Others were curious about
The artist of Drawing 2 knows there they wrote in Scratch would be execut- how Wi-Fi works on a train, and wheth-
is a chip inside (made by HP, in this ed in the hardware inside a computer. er you really need to stop using your
case) and to their credit, they know One boy, who had been learning about phone on a plane. A student advised
there is code, too. Notice that the code variables in Scratch the previous day, the class that if we needed to save space
is not physically located on the memory wanted to know whether if he looked in on our phones, then we should delete
or the chip, but somewhere in the wires. his computer, he would really see apps videos from YouTube. The children,
In general, there was some puzzlement with boxes full of variables in them. I like most Windows users, wanted to
about how code related to the computer, love that question, because it reveals know why their computers “freeze,”
as exemplified by the artist of Drawing the mysterious boundary between in- speculating that it could be because
3, who confessed, “I know a computer is tangible, invisible information and the the chip is asleep or that too many
full of code and all devices. I am not sure small lump of silicon that processes it. people are using Wi-Fi. There was also
what it looked like, so I just scribbled.” To be clear, I am not criticizing the a sense of wonderment and curiosity. A
Often, the children spent a while children, who were curious, interested, young boy was fascinated when he read
thinking about what is outside the and made perfectly reasonable infer- about supercomputers and wanted to
computer and how information might ences based on the facts they picked up know more: Do supercomputers have
get inside. It was quite common to see in their everyday lives. But I think that really big chips in them? A class of 11-
pictures in which the artist had folded computer science educators can do bet- year-olds gravely debated whether peo-
the page to show this distinction but ter here. Our discipline is built upon ple would be more or less clever if the
it was often a mystery how pressing a the remarkable fact that we can write computer had never been invented.
key or touching the screen might make instructions in a representation that These are the sorts of questions about
something happen in the computer. makes sense to humans, and then auto- computers that children want to ex-
Children who had spent time tinkering matically translate them into an equiva- plore. It’s our job as computer scien-
with computers at home had an advan- lent representation that can be followed tists, and as educators, to help them.
tage here: “I broke my keyboard once by a machine dumbly switching electri- [This article was based on a key-
and I saw what was inside. It would cal pulses on and off. Children are not note talk at the Workshop in Primary
send a signal from key to computer to going to be able to figure that out for and Secondary Computing Education
the monitor.” themselves by dissecting old computers (WiPSCE) 2018.]
What the pictures and subsequent or by making the Scratch cat dance. We
classroom discussions told me is that need to get better at explicitly explain- Judy Robertson is professor of Digital Learning at the
University of Edinburgh, U.K.
the children know names of compo- ing this in interesting ways.
nents within a computer, and possibly Children are currently piecing to- © 2019 ACM 0001-0782/19/1 $15.00
Quantum Leap
A new proof supports a 25-year-old claim
of the unique power of quantum computing.
H
OPES FOR QUANTUM comput-
ing have long been buoyed by
the existence of algorithms
that would solve some par-
ticularly challenging prob-
lems with exponentially fewer opera-
tions than any known algorithm for
conventional computers. Many experts
believe, but have been unable to prove,
that these problems will resist even the
cleverest non-quantum algorithms.
Recently, researchers have shown the
strongest evidence yet that even if con-
ventional computers were made much
more powerful, they probably still could
not efficiently solve some problems that
a quantum computer could.
IMAGE BY AND RIJ BORYS ASSOCIAT ES, BASED ON GRAP HIC F ROM UNIVERSIT Y OF STRAT HCLYD E
That such problems exist is a long-
standing conjecture about the greater
capability of quantum computers.
“It was really the first big conjecture
in quantum complexity theory,” said
computer scientist Umesh Vazirani
of the University of California, Berke-
ley, who proposed the conjecture with
then-student Ethan Bernstein in the
1993 paper (updated in 1997) that es-
tablished the field.
That work, now further validated, al model, then or now, “that violates Quantum Resources
challenged the cherished thesis that the extended Church-Turing thesis,” Conventional “classical” computers
any general computer can simulate any Vazirani said. “It overturned this basic store information as bits than can be
other efficiently, since quantum com- fabric of computer science, and said: in one of two states, denoted 0 and 1.
puters will sometimes be out of reach ‘here’s a new kid on the block, and it’s In contrast, a quantum degree of free-
of conventional emulation. Quantum completely different and able to do to- dom, such as the spin of an electron or
computation is the only computation- tally different things.’” the polarization of a photon, can exist
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 11
news
Bernstein and Vazirani defined “we show that there is one problem that
a new complexity class called BQP BQP will solve better than PH.” In addi-
(Bounded Quantum Polynomial), which “The basic ability tion to choosing the right oracle, he and
has access to quantum resources. BQP to do Fourier Tal had to choose a problem that reveals
is closely analogous to the conventional quantum computation’s strength—and
class BPP (Bounded Probabilistic Poly- transformation, classical computation’s weakness—but
nomial), which has access to a perfect that’s the heart they only needed one example.
random-number generator and must They adapted an earlier suggestion
not give a wrong answer too often. Cur- of the power by Scott Aaronson (then at the Mas-
rently, some problems having only of quantum, sachusetts Institute of Technology)
stochastic solutions are known, but it in which the computer must deter-
is hoped that deterministic, “de-ran- at least most mine if one sequence of bits is (ap-
domized” algorithms will eventually be of the algorithms proximately) the Fourier transform of
found for them. another. Computing such frequency
we know.” spectra is a natural task for quan-
Consulting the Oracle tum computations, and Shor’s algo-
The relationship of the quantum class rithm exploits precisely this strength
BQP to various conventional classes, to identify periodicities that expose
however, continues to be studied, long separations. “They are a way for us to prime factors of the target. “The basic
after Bernstein and Vazirani suggested understand what kinds of problems ability to do Fourier transformation,”
it includes problems beyond the scope are hard to prove and what kinds of re- Fortnow said, “that’s the heart of the
of conventional techniques. “We have sults might be possible, but they’re not power of quantum, at least most of
our conjectures and we can feel strongly a definite proof technique,” he said. the algorithms we know.”
about them, but every so often they are “We didn’t prove a separation between “The hard part is to give the lower
wrong,” Vazirani said. “A proof is really these two classes,” Raz agreed. “I can’t bound for the polynomial hierarchy,”
something to be celebrated.” imagine that [such a separation] will Raz said. To show that no such algo-
The new proof of separation does be proved in our lifetime.” rithm, even with access to the oracle,
not apply to the pure versions of BQP “Already there were oracle separa- could solve it efficiently, he and Tal
and the other complexity classes ad- tions of BQP and NP, BQP and P, and tweaked Aaronson’s suggestion so they
dressed by the Vazirani-Bernstein con- other classes,” Raz said. He and Tal could apply recent discoveries about
jecture. Similar to the long-standing now extend the argument to a super- pseudorandom sequences.
unproven relationship of P and NP, charged class called the polynomial hi- These and the earlier results illus-
“We almost never are able to actually erarchy, or PH. “This is what is stronger trate what quantum computers will be
separate these important classes of in our result,” he said. PH can be viewed able to do, once they get very large and
complexity theory,” said computer sci- as an infinite ladder of classes, start- perform like the idealized models, Vazi-
entist Ran Raz of Princeton University ing with P and NP, in which successive rani said. What is less clear is how to ef-
in New Jersey and the Weizmann Insti- rungs can build on the earlier ones by fectively use the less-capable machines
tute in Israel. “We don’t know how.” using logical constructions. Later class- that are now being developed. “What
Instead, Raz and his former stu- es invoke the earlier ones rather like a will we be able to do with those?” he
dent Avishay Tal (now at Stanford subroutine, for example by defining asked. “That’s one of the things that we
University) performed what is called problems using them in a phrase such are working hard to try to figure out.”
an oracle separation. Like its name- as “for every,” or “there exists.” “Almost
sake from ancient Greece (or The Ma- all the problems that we encounter in
Further Reading
trix movies), an oracle provides an- everyday life are somewhere in the poly-
swers to profound questions without nomial hierarchy,” Raz said. Bernstein, E. and Vazirani, E.
Quantum Complexity Theory, SIAM J.
explaining how it got them. Roughly If all NP problems had polynomial-
Comput. 26, 1411 (1997).
speaking, Raz and Tal compared the time solutions, though, it turns out that
capabilities of quantum and classi- the entire polynomial hierarchy would Shor, P.W.
Polynomial-Time Algorithms for Prime
cal algorithms that were given access collapse into one class, PH=NP=P. The Factorization and Discrete Logarithms on
to an oracle that answers a specific new result, though, shows that oracle- a Quantum Computer, SIAM Journal of
question. Provided with this oracle, assisted BQP would still be separate. Computing 26, pp. 1484–1509 (1997).
they showed the quantum system “The way I view the Raz-Tal oracle is Raz, R. and Tal, A.
could efficiently solve a carefully they’re saying that even if P happened Oracle Separation of BQP and PH, Electronic
chosen problem more efficiently to equal to NP—that’s an unlikely Colloquium on Computational Complexity,
than the classical system could using case,” Fortnow said, “it’s still possible Report No. 107 (2018).
the same oracle. that quantum can do more than classi-
Don Monroe is a science and technology writer based in
Lance Fortnow, a computer scientist cal machines can.” Boston, MA, USA.
at the Georgia Institute of Technology,
said hundreds of proofs in complex- What Is It Good For?
ity theory have relied upon such oracle “If we choose the right oracle,” Raz said, © 2019 ACM 0001-0782/19/1 $15.00
D
EEP NEURAL NETWORKS (DNNs)
have advanced to the point
where they underpin online
services from image search to
speech recognition, and are
now moving into the systems that con-
trol robots. Yet numerous experiments
have demonstrated that it is relatively
easy to force these systems to make
mistakes that seem ridiculous, but with
potentially catastrophic results. Recent
tests have shown autonomous vehicles
could be made to ignore stop signs, and
smart speakers could turn seemingly
benign phrases into malware.
Five years ago, as DNNs were begin-
ning to be deployed on a large scale make some attacks more feasible than “I don’t like writing, and for two or
by Web companies, Google research- others in the real world. three weeks I had been working on a
er Christian Szegedy and colleagues As a Ph.D. student working with paper and managed to submit it with 15
showed making tiny changes to many David Wagner at the University of Cal- minutes to go on the deadline. I woke
of the pixels in an image could cause ifornia at Berkeley, Nicholas Carlini up the next morning and said, ‘let’s do
DNNs to change their decisions radical- started looking at fooling speech en- something fun,’” Carlini explains.
ly; a bright yellow school bus became, gines in 2015 as part of a project to ex- The target was the DeepSpeech en-
to the automated classifier, an ostrich. amine the vulnerabilities of wearable gine published as open-source code by
But the changes made were imper- devices. The UC Berkeley researchers Mozilla. “Fifteen hours of work later, I
ceptible to humans. thought practical wearable devices had broken it,” Carlini claims.
At the time, researchers questioned would rely on speech recognition for Rather than using noise to confuse
whether such adversarial examples their user interfaces. the system, he had found the engine
would translate into the physical do- Their focus switched to in-home was susceptible to slightly modified
main because cameras would smooth systems when products such as Ama- recordings of normal speech or music.
out the high-frequency noise mixed into zon’s Echo started to become popular. The system could be forced to recog-
the digitized images that Szegedy and “We were able to construct audio nize a phrase as something completely
others were presenting directly to their that to humans sounded like white different to what a human would hear.
DNNs. Within several years, examples noise, that could get the device to The attacks buried subtle glitches and
of real-world attacks appeared. In one perform tasks such as open up Web clicks in the speech or music at a level
case, stickers attached to a stop sign pages,“ says Carlini, now a research that makes it hard for a human hearing
made a DNN interpret it as a 45 m.p.h. scientist at Google Brain. “It was effec- the playback to detect. Some glitches
(miles per hour) sign even though the tive, but it was very clear to anyone who buried in normal phrases convinced
word ‘stop’ remained clearly visible. heard it that something was going on: the network it was hearing silence.
Although most of the research into you could hear that there was noise.” “I was incredibly surprised it worked
subverting DNNs using adversarial In 2017, a team from Facebook AI so easily. You don’t expect things to
examples has been within the realm Research and Bar-Ilan University in break so easily. However, much of it was
of image recognition and classifica- Israel showed it was possible to hide because I had spent a year and a half on
tion, similar vulnerabilities have been messages in normal speech, though developing attacks to break neural net-
found in networks trained for other a limitation of their so-called Houdi- works in general,” Carlini explains.
applications, from malware classifica- ni method was that it needed to use However, as a practical attack, the
tion to robot control. Audio systems replacement phrases, the spoken method did not work on audio played
IMAGE BY EVA NNOVOSTRO
such as smart speakers seem just as versions of which were phonetically through a speaker and into a micro-
susceptible to attack using the same similar to those being targeted. In phone. Distortions caused by amplifiers
concepts. Similar to the effects of cam- November of that year, Carlini found and microphones altered the glitches
era processing on images, the low-pass it was possible to push attacks on enough to cause the attacks to fail. In
filtering of microphones and speakers speech-based systems much further. Carlini’s version, the adversarial exam-
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 13
news
H
I G H ATO P T H EThomas Jeffer-
son Memorial in Washing-
ton, D.C., is a layer of biofilm
covering the dome, darken-
ing and discoloring it. Bio-
film is “a colony of microscopic organ-
isms that adheres to stone surfaces,”
according to the U.S. National Park Ser-
vice, which needed to get a handle on
its magnitude to get an accurate cost
estimate for the work to remove it.
Enter CyArk, a non-profit organi-
zation that uses three-dimensional
(3D) laser scanning and photogram-
metry to digitally record and archive
some of the world’s most significant
cultural artifacts and structures.
CyArk spent a week covering “every
inch” of the dome, processed the
data, and returned a set of engineer-
ing drawings to the Park Service “to
quantify down to the square inch how
much biofilm is on the monument,’’
says CEO John Ristevski.
“This is an example of where data is
being used to solve a problem,” to help
preserve a historical structure, he says.
Ristevski says the Park Service was not
charged for the data, and the work
CyArk did was funded by individual Capturing photogrammetric data for the digital reconstruction of a badly damaged temple in
donors in the San Francisco Bay Area, the ancient city of Bagan, in central Myanmar.
where the company is located.
CyArk is one of several organiza- data to build extremely precise 3D to use them to raise awareness of their
tions using 3D scanning to help pro- models, says Yves Ubelmann, an ar- historical sites,’’ he says. “It is vital to
tect and preserve historic structures chitect who co-founded the company. us that countries be able to share their
from looting, destruction, urbaniza- This type of work has raised the cultural heritage with their citizens
tion, and mass tourism. Iconem, a tricky question of who owns the rights and the international community.”
French start-up founded in 2013, to these digital scans. Officials at or- When a client finances a project, the
also specializes in the digitization of ganizations involved in utilizing these rights to the images are determined on
endangered cultural heritage sites in techniques for historic preservation a case-by-case basis, he notes. Iconem
3D. Like CyArk, Iconem works on-site say they address this up front to avoid works with the client to determine
with local partners; in its case, in 22 any contentious battles later on. if, how, and where the images can be
countries. One of those partners is Iconem’s projects are either self- circulated, but the client retains the
IMAGE COURTESY OF CYA RK
Microsoft Research, and Iconem’s financed or paid for by a client, says rights to the images. “Our ultimate
technology utilizes the software gi- Ubelmann. “If Iconem is the sole stake- goal is to share the images and mod-
ant’s artificial intelligence and com- holder, we share the images with scien- els with the widest audience possible
puter vision algorithms to integrate tific or governmental authorities in the while respecting the countries and
multiple levels of photogrammetry relevant country. They have the right their heritage.”
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 15
news
Ristevski also says ownership de- nate in Europe and the U.S. and focus
pends on the terms of a contract signed on threatened heritage in the Mid-
prior to any work being done. Howev- There is concern dle East should make every effort to
er, he adds that regardless of the way such digital scanning make scans open and accessible to
the agreement is worded, “the other the people and institutions of those
party gets a free and fully unrestricted “will recapitulate countries,” he says. “There is a worry
license. This is always articulated up colonial museum that digital scanning efforts will reca-
front before we hit the ground and do pitulate colonial museum practices
the work. None of this is ambiguous.” practices that have that have involved the illicit acquisi-
CyArk has been doing this type of involved the illicit tion of objects from dominated cul-
work for almost 15 years, in more than tural groups, and the retention and
50 countries, “and if we were a bad acquisition of objects control of those objects under the
player, we’d never be allowed back in from dominated banner of preservation.”
these countries,” Ristevski says. He Rather than using terms like
stresses that if CyArk owns the scanned culture groups.” “shared” or “universal” heritage as
data, it is the company’s policy to never licensing claims to ownership or con-
monetize it. trol, Hatala Matthes believes “We
CyArk has partnered with Google should view those ideals in terms of re-
Arts & Culture on the Open Heritage sponsibilities, especially to those who
Project, which is using the laser tech- are most vulnerable.”
nology to capture relevant data and assumption that there is some evil Like CyArk and Iconem, the Insti-
store it in Google Cloud. Ristevski bent to it.” tute for Digital Exploration (IDEx) at
thinks “people are suspicious when- Erich Hatala Matthes, an assistant the University of South Florida (USF)
ever Google gets involved [in a proj- philosophy professor and member of works with local partners on the pres-
ect] and how they might monetize the advisory faculty for environmen- ervation of culturally sensitive areas
it,” but notes that many museums tal studies at Wellesley College, says that are under threat. “A lot of the work
also work with the search giant’s that from a moral perspective, anyone we do aims to help major tourist sites
research division. “There are some involved in 3D scanning work should strike a balance between access and
beautiful exhibits” housed in Google keep the data open and available. preservation,” explains co-founder Mi-
Cloud, he says, but “because Google Matthes said three-dimensional chael Decker.
is involved, there’s automatically an scanning projects “that often origi- For example, IDEx is working with Vil-
Milestones
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 17
V
viewpoints
O
N JUNE 6, 2015,the U.S. De- increases welfare, and that for compe- may be designed to immediately re-
partment of Justice brought tition to exist, competitors must make spond by lowering its price, thereby
the first-ever online market- independent decisions. Accordingly, shrinking the benefits to be had from
place prosecution against price-fixing agreements among com- lowering the price in the first place.
a price-fixing cartel. One petitors are considered the “ultimate Moreover, as John von Neumann sug-
of the special features of the case was evil” and may result in a jail sentence gested, algorithms serve a dual pur-
that prices were set by algorithms. in the U.S., as well as in other jurisdic- pose: as a set of instructions, and as
Topkins and his competitors de- tions, unless the agreement increases a file to be read by other programs.
signed and shared dynamic pricing consumers’ well-being. Accordingly, by reading another algo-
algorithms that were programmed Until recently, formation of a cartel rithm’s accessible source code, algo-
to act in conformity with their agree- necessitated human intent, engage- rithms, unlike humans, can determine
ment to set coordinated prices for ment, and facilitation. But with the how other algorithms will react to their
posters sold online. They were found advent of algorithms and the digital own actions, even before any action is
to engage in an illegal cartel. Follow- economy, it is becoming technologi- performed by the other side. This en-
ing the case, the Assistant Attorney cally possible for computer programs ables competitors to design their co-
General stated that “[w]e will not tol- to autonomously coordinate prices ordinated reactions, even before any
erate anticompetitive conduct, [even and trade terms. Indeed, algorithms price is set.
if] it occurs...over the Internet using can make coordination of prices much The questions thus arise when the
complex pricing algorithms.” The easier and faster than ever before, at use of pricing algorithms constitutes
European Commissioner for Com- least under some market conditions. an illegal cartel, and whether legal li-
petition endorsed a similar position, Their speed and sophistication can ability could be imposed on those who
stating that “companies can’t escape help calculate a high price that reacts employ algorithms, as well as on those
responsibility for collusion by hiding to changing market conditions and who design them. The stakes are high:
behind a computer program.” benefits all competitors; the speed at if we cast the net too narrowly and
Competition laws forbid mar- which they can detect and respond to algorithmic-facilitated coordination
ket players from engaging in cartels, deviations from a coordinated high falls under the radar, market compe-
loosely defined as agreements among price equilibrium reduces the incen- tition may be harmed and prices may
market players to restrict competi- tives of competitors to offer lower be raised; if we cast the net too widely,
tion, without offsetting benefits to the prices. Indeed, if one algorithm sets a we might chill the many instances in
public. This prohibition is based on lower price in an attempt to lure more which algorithms bring about signifi-
the idea that competition generally consumers, a competitor’s algorithm cant benefits.
$
$ $
$ $
$
$ $
$
$ $
$ $
$
To prove an illegal cartel, an agree- drivers jointly used a booking platform Books. The U.S. Authority argued that
ment must be shown to exist. An agree- that employed an algorithm to deter- it is unlawful for competitors to agree
ment requires communication among mine taxi prices for all participating with one another to delegate pricing
competitors, which signals intent to drivers. The algorithm set the price decisions to a common agent, unless
act in a coordinated way, and reliance based on predetermined criteria such the agreement creates countervail-
on the other to follow suit, in a man- as the length of journey, the hour of ing benefits. Interestingly, the fact
ner that creates a concurrence of wills. service, traffic congestion, and so on. the pricing algorithm was designed
Some scenarios that involve pricing The price was non-negotiable. This ar- to mimic pricing in a competitive
algorithms easily fall within the defi- rangement was found to constitute an market was regarded as insufficient.
nition. A simple scenario involves agreement to fix prices. It was none- Actual bilateral negotiations on book
the use of algorithms to implement theless exempted on the grounds that prices were seen as preferable. This
or monitor a prior agreement among the efficiencies it generated (including argument was not pursued further by
competitors, as was done in the Top- reduction of wait time and lower prices the courts.
kins case, mentioned here. In such sit- for some consumers) were larger than The more challenging cases arise
uations, a clear agreement exists, and the harm caused by the coordination, when algorithms are designed inde-
the algorithms simply serve as tools and that these efficiencies could not pendently by competitors to include
for its execution. U.S. Federal Trade be achieved by less-restrictive means. decisional parameters that react to
Commissioner Maureen Ohlhausen Much depends, however, on the spe- other competitors’ decisions in a way
suggested a simple test that captures cific facts of a given case, including the that strengthens or maintains a joint
many of these easy cases: If the word price formula used by the algorithm coordinated outcome. For example,
“algorithm” can be replaced by the and the efficiencies it creates. suppose each firm independently
IMAGE BY ALICIA KUBISTA /A ND RIJ BORYS ASSOCIAT ES
phrase “a guy named Bob,” then algo- Should the algorithm not create codes its algorithm to take into ac-
rithms can be dealt with in the same large, countervailing benefits for con- count its competitors’ probable and
way as traditional agreements. sumers, its employment might consti- actual reactions, as well as their joint
A more complicated scenario arises tute an illegal cartel. The U.S. Depart- incentive to cooperate, and the com-
when competitors deliberately use a ment of Justice opposed the Google bination of these independent cod-
joint algorithmic price setter, which Books Settlement on such grounds. ing decisions leads to higher prices
is designed to maximize the profits of There, Google agreed with the associa- in the market. Coordination occurs
its users. Such a scenario was recently tions of book authors and publishers even though no prior agreement to
analyzed by Luxembourg’s Competi- that a pricing algorithm will set the coordinate exists. Even more diffi-
tion Authority. There, numerous taxi default prices for the use of Google cult questions arise when algorithms
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 19
viewpoints
Technology Strategy
and Management
CRISPR: An Emerging
Platform for Gene Editing
Considering a potential platform candidate in
the evolving realm of gene-editing technologies research.
W
HEN THINKING ABOUT
which areas of research
might form the basis for
new industry platforms,
in the past we have
focused on information technologies
such as computers, Internet software,
smartphones, cloud services, artificial
intelligence and machine learning, and
even quantum computing (see “The
Business of Quantum Computing,”
Communications, Oct. 2018). These
technologies early on had the potential
to generate what we call “multi-sided
markets” with powerful “network ef-
fects.” Network effects are self-rein-
forcing feedback loops where, as the
number of users or complementary
innovations increase, the more widely
used and valuable the platform be-
comes (see “The Evolution of Platform
Thinking,” Communications, Jan. 2010).
Another early-stage technology
suited to platform dynamics is gene
editing. Research began several de-
cades ago, leading to various tools and viruses. What scientists observed years CRISPR-associated enzymes as “mo-
techniques. It is still uncertain which ago is that specialized segments of RNA lecular scissors” to cut, modify, or re-
approach will become the dominant and associated enzymes in one organ- place genetic material. The potential
foundation for further research and ism can modify genes (DNA sequences) applications include diagnostic tools
applications development, but there in another organism. For example, this and treatments for genetic diseases
IMAGE F RO M SH UTT ERSTOCK.CO M
are some platform candidates. happens naturally when the immune as well as genetic reengineering more
One particularly promising technol- system in bacteria fight against an in- broadly.8 An August 2016 article in Na-
ogy is CRISPR, or “Clustered Regularly vading virus. In 2012, several scientists tional Geographic magazine described
Interspaced Short Palindromic Re- discovered they could use CRISPR se- CRISPR’s potential: “CRISPR places an
peats.”12 CRISPR refers to small pieces quences of DNA as well as “guide RNA” entirely new kind of power into human
of DNA that bacteria use to recognize to locate target DNA and then deploy hands. For the first time, scientists can
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 21
viewpoints
quickly and precisely alter, delete, and searchers and applications, which in
rearrange the DNA of nearly any living turn have inspired more research, tool
organism, including us. In the past development, applications, venture
three years, the technology has trans- capital investments, and so on.
formed biology … No scientific dis- At the center of an emerging CRISPR
covery of the past century holds more ecosystem is a non-profit foundation
promise—or raises more troubling called Addgene, founded in 2004 by
ethical questions. Most provocatively, if MIT students. It funds itself by selling
CRISPR were used to edit a human em- plasmids, small strands of DNA used in
bryo’s germ line—cells that contain ge- laboratories to manipulate genes. Since
netic material that can be inherited by 2013, it has been collecting and dis-
the next generation—either to correct tributing CRISPR technologies to help
a genetic flaw or to enhance a desired researchers get started on their experi-
trait, the change would then pass to that ments.14 The Addgene tools library con-
person’s children, and their children, sisted of different enzymes and DNA or
in perpetuity. The full implications of RNA sequences useful to identify, cut,
Advertise with ACM! changes that profound are difficult, if edit, tag, and visualize particular genes.a
not impossible, to foresee.”10 There were also numerous startups,
DNA resembles a programming some of which have already gone pub-
Reach the innovators language and data-storage technology, lic. CRISPR Therapeutics (founded in
and thought leaders useful in different applications. Gene 2013) was trying to develop gene-based
editing provides opportunities for com- medicines to treat cancer and blood-re-
working at the panies to pursue product solutions, lated diseases, and collaborating closely
cutting edge such as to build standalone diagnostic with Vertex and Bayer. Editas Medicine
tools or gene therapies. It also enables (2013) and Exonic Therapeutics (2017)
of computing some institutions and companies to were tackling diseases such as cancer,
and information create products, tools, or components sickle cell anemia, muscular dystrophy,
that other firms can build upon. Like and cystic fibrosis.b Beam Therapeutics
technology through today’s quantum computers, each use (2018) planned to use CRISPR to edit
ACM’s magazines, of CRISPR seemed to require special- genes and correct mutations.1 Mam-
ized domain knowledge (that is, the moth Biosciences (2018) was following
websites genome of a particular organism and more of a platform strategy and devel-
and newsletters. disease) and then tailoring to the appli- oping diagnostic tests that could be the
cation, such as to use CRISPR to design basis for new therapies. It was broadly li-
a diagnostic test or therapeutic product censing its technology and encouraging
◊◆◊◆◊ for a specific disease, or to reengineer a other firms to explore therapies based
plant to fight off insects. But, along with on its testing technology.11 In fact, Mam-
rising numbers of CRISPR research- moth’s goal was to create “a CRISPR-en-
Request a media kit ers, platform-like network effects and abled platform [italics added] capable of
with specifications multisided market dynamics were also detecting any biomarker or disease con-
appearing and helping the industry taining DNA or RNA.” In a recent public
and pricing: evolve. In particular, more research statement, the company summarized
publications have led to improvements its strategy to cultivate an applications
Ilia Rodriguez in tools and reusable component li- ecosystem: “Imagine a world where you
braries, which have attracted more re- could test for the flu right from your liv-
+1 212-626-0686 ing room and determine the exact strain
acmmediasales@acm.org you’ve been infected with, or rapidly
DNA resembles screen for the early warning signs of can-
cer. That’s what we’re aiming to do at
a programming Mammoth—bring affordable testing to
language and data- everyone. But even beyond healthcare,
we’re aiming to build the platform for
storage technology, CRISPR apps [italics added] and offer the
useful in different technology across many industries.”3
applications.
a See https://www.addgene.org/crispr/
b See A. Regalado, “Startup Aims to Treat Mus-
cular Dystrophy with CRISPR,” MIT Technology
Review (Feb. 27, 2017) and http://www.editas-
medicine.com/pipeline
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 23
V
viewpoints
Historical Reflections
Hey Google, What’s a Moonshot?
How Silicon Valley Mocks Apollo
Fifty years on, NASA’s expensive triumph is a widely
misunderstood model for spectacular innovation.
T
HE RADIO IN my kitchen is
tuned to a public station.
One day it startled me by
delivering a lecture, “The
unexpected benefit of cel-
ebrating failure,” by the implausibly
named Astro Teller who, according to
his website, enjoys an equally idiosyn-
cratic list of accomplishments: novel-
ist, entrepreneur, scientist, inventor,
speaker, business leader, and IT ex-
pert. That talk concerned his day job:
“Captain of Moonshots” at X (formerly
Google X, now a separate subsidiary of
its parent company Alphabet).a It cen-
tered on the classic Silicon Valley ideal
of being prepared to fail fast and use
this as a learning opportunity. Teller
therefore advised teams to spend the
first part of any project trying to prove
it could not succeed. Good advice, but
maybe not so new: even 1950s “wa-
terfall” methodologies began with a
feasibility stage intended to identify
reasons the project might be doomed. Astronaut Alan L. Bean walks from the moon-surface television camera toward the lunar
Still, many of us have had the experi- module during the first extravehicular activity of the November 1969 Apollo 12 mission,
the second lunar landing in the NASA Apollo program. The mission included placing the first
ence of putting months, or even years, color television camera on the surface of the moon but transmission was lost when Bean
into zombie projects with no path to accidentally pointed the camera at the sun, disabling the camera.
success.b The HBO television series
“Silicon Valley” captured that prob- project.c Each level of management by the “moonshot captain” thing.
lem, in an episode where a new execu- sugarcoated the predictions it passed Teller briefly paid homage to Presi-
tive asked for the status of a troubled upward and avoided asking hard ques- dent Kennedy and the huge scope of
tions of those below it. the real moonshot achieved by the
To be honest, I was more intrigued Apollo program of the 1960s. By pro-
IMAGE COURTESY OF NASA
solutions.”d X boasts of uniting “inven- price tag, but after the USSR checked
tors, engineers, and makers” including off the first few items, by launching a
aerospace engineers, fashion design- The moonshot satellite and sending a human into or-
ers, military commanders, and laser was a triumph of bit, that suddenly looked like money
experts. Teller’s most dramatic exam- worth spending. In 1961, Kennedy an-
ple of an X moonshot that failed ad- management as nounced his intentions to Congress
mirably was that staple technology of much as engineering. and won the first in a series of massive
alternate worlds, an airship “with the increases for NASA’s budget. Like Ken-
potential to lower the cost, time, and nedy’s other initiatives, the moon pro-
carbon footprint of shipping.” Accord- gram became more popular and politi-
ing to Teller, X achieved the “clever cally secure after his death, thanks to
set of breakthroughs” needed to mass Lyndon Johnson’s political arm twist-
produce robust, affordable blimps, in wonder as Neil Armstrong and ing and huge congressional majorities.
but had to give up when it estimated Edward Aldrin planted the Ameri- Apollo, like Medicare, was part of
a cost of “$200 million to design and can flag on the moon [after] the larg- a dramatic expansion in federal gov-
build the first one” which was “way too est managed research project of all ernment spending. A future of inter-
expensive.” X relies on “tight feedback time…. The Saturn V rocket had a di- planetary exploration and coloniza-
loops of making mistakes and learning ameter of 33 feet (three moving vans tion was already an article of faith
and new designs.” Spending that much could have been driven, side by side, for American science fiction writers
“to get the first data point” was not re- into the fuel tanks for the first stage) in the “golden age” of the 1940s, but
motely possible. and a height of 363 feet (about the size they were better at imagining rockets
At this point, I would like you to of a 36-story building). At liftoff, the than economic changes. One of Rob-
imagine the record-scratching noise vehicle weighed 6.1 million pounds, ert Heinlein’s most famous stories,
that TV shows use for dramatic inter- and when the five engines of the first “The Man Who Sold The Moon,” de-
ruptions. That’s what played in my stage were fired … they generated 7.5 scribed a moon landing in the 1978 by
head, accompanied by the thought million points of thrust ... [burning] an eccentric businessman. Described
“this guy doesn’t know what the moon- three tons of fuel a second ... ”3 as the “last of the robber barons” he
shot was.” Teller’s pragmatic, iterative, Those statistics tell you something funded his dream by, among other
product-driven approach to innovation important: the moonshot was about do- things, promising to cancel postage
is the exact opposite of what the U.S. ing something absurdly expensive and stamps in a temporary lunar post of-
did after Kennedy charged it to “com- difficult once (followed by a few encore fice, sell the naming rights to craters,
mit itself to achieving the goal, before performances), not doing something and engraving the names of support-
this decade is out, of landing a man on useful cheaply and routinely. Apollo 11 ers onto a plaque.e Rather than the big
the moon and returning him safely to pushed a gigantic rocket though the government approach of NASA, had
the earth.” Letting Silicon Valley steal atmosphere and into space, launch- Heinlein imagined a space program
the term “moonshot” for projects with ing three men toward the moon at run like a Kickstarter project. The gov-
quite different management styles, more than 24,000 miles an hour. Two ernment’s sudden and mobilization
success criteria, scales, and styles of in- of them descended in a flimsy little of overwhelming resources for the
novation hurts our collective ability to box wrapped in foil, took pictures, col- moonshot took science fiction writers
understand just what NASA achieved lected rocks, and flew back into lunar by surprise.
50 years ago and why nothing remotely orbit. All three returned to Earth, or The moonshot was a triumph of
comparable is actually under way today rather to sea, hurtling back through management as much as engineer-
at Google, or anywhere else. the atmosphere in a tiny capsule that ing. Meeting a fixed launch deadline
splashed into the ocean. meant working backward to identify
The Actual Moonshot Apollo was the capstone to a series the points by which thousands of sub-
As historians of technology Ruth of gigantic American technological systems had to be ready for testing and
Schwartz Cowan and Matthew Hersch projects, beginning with the Manhat- integration, and further back to the
tell the story: “Eight year later, on July tan Project of the 1940s and continu- dates by which they had to be designed
20, 1969, millions of people all over ing into the Cold War with the devel- and ordered. Steven Johnson’s book
the world watched their televisions opment of nuclear submarines, Atlas The Secret of Apollo looked at the sys-
and Minuteman missiles, and hydro- tems and techniques developed to turn
d X grew out of the lab that “graduated” to be- gen bombs. It was shaped by a vision the efforts of hundreds of subcontrac-
come Waymo, now a separate company suc-
cessfully selling technology for self-driving
for the U.S. space program devised by tors into a successful moonshot.7 As he
cars. It was also the group responsible for former Nazi rocket engineer Werhner points out, NASA and its partners suc-
Google Glass, whose camera/screen eyeglass- von Braun, whose heavily accented lec- ceeded in doing something apparently
es went abruptly from next big thing to epic tures on space stations and manned paradoxical: bureaucratizing innova-
flop in 2014, for the Loon project to deliver missions to the Moon and Mars were
Internet access via high-altitude balloons, and
for a fleet of experimental delivery drones. The
popularized during the 1950s with the e This short story was written in 1949 and ap-
most balanced portrait of its workings was all-American aid of Walt Disney. Their peared as the title story in Robert A. Heinlein,
given in https://bit.ly/2gqMi8s. elaborate agenda came with a huge The Man Who Sold the Moon (Shasta, 1950).
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 25
viewpoints
tion. Rather than attempt to do lots of an explosion in space, “each of us” was
new things at once, an approach that haunted by “indelible memories of
had produced problems for the early Project management that awful day three years earlier” when
U.S. space program, von Braun en- tools may have “we had failed our crew.”
forced a careful step-by-step approach. In the end the Apollo 13 astronauts
These techniques built on those devel- improved, but were fine, but the space program was
oped for other Cold War projects, de- human nature not. Diminishing political returns led
scribed by historian Thomas Hughes to Apollo’s early cancellation, like a
in his book Rescuing Prometheus.6 For continues to undercut briefly buzzy TV show that lost its au-
example, the PERT project manage- best practices. dience and thus its reason to exist.
ment tool, now a crucial part of project No human has been further than 400
management software, was developed miles from Earth since 1972. With
in the 1950s to support the U.S. Navy’s the Soviets defeated in the moon race
Polaris nuclear submarine project. there was no need to increase spend-
So was MRP (Materials Requirements ing still further to tackle the remaining
Planning), which evolved into the technologies to levels of performance, items on von Braun’s to-do list: moon
foundation for the enterprise software reliability, or miniaturization that bases, space stations, manned Mars
packages that run almost all modern would not otherwise be economically missions, and so on. Facing shrinking
corporations. practical. Given a choice of two tech- budgets and diminished political will,
NASA management placed a se- nologically workable ways to do some- NASA instead delivered disconnected
ries of milestones along the road to thing, NASA would take the better- fragments of the plan—a space shuttle
the moon landing, paralleling some proven and more expensive way. to assemble large structures in orbit
aspects of the incremental approach Despite this technological conser- and, many years later, a space station
practiced by modern technology lead- vatism, the focus on fixed deadlines to give the shuttle something to do.
ers. That is why the moon landing was still caused deadly trade-offs. After the Twenty-first century America is not
Apollo 11: previous flights had tested Apollo 1 crew died when fire engulfed without enemies, but ISIS and the Tal-
the rockets, the command module, their capsule in a ground test in Janu- iban never developed space programs.
the docking capabilities, and so on. ary 1967, manned flights were halted Generations of American politicians
Apollo 8, for example, flew a crew into for 20 months. A review identified have nevertheless tried to prove their
lunar orbit and back, giving an inte- several management failures that had visionary leadership by ordering new
grated test of many of the key system contributed to the accident, including space missions. None committed
components. Before those flights came a flawed escape system, poor wiring, anything like the funds needed for a
a series of Gemini missions flown dur- and the use of pure oxygen instead of a true moonshot effort. George W. Bush
ing the mid-1960s to test technologies less dangerous air-like mixture. After- dusted off von Braun’s old dreams in
and develop techniques for challenges ward, mission controller Gene Kranz 2004, terminating the space shuttle
such as orbital rendezvous and space- confessed to his team that “We were and directing NASA to restart manned
walks. Systematic ground tests focused too gung-ho about the schedule and moon missions by 2020 as a stepping-
on space suits, engines, and other new we locked out all of the problems we stone to Mars. This set a leisurely 16-
technologies in isolation before inte- saw every day in our work. Every ele- year schedule for a moon landing,
grating them into larger systems. ment of the program was in trouble … but a progress review five years later
Teller stressed the need to prototype Not one of us stood up and said, ‘Dam- concluded that the program was al-
rapidly and cheaply and to be ready to mit, stop!’”9 Half a century later, the ready so underfunded, overbudget,
kill any “moonshot” in its early stages, same words could be applied to many and behind schedule as to be unsal-
but NASA agreed to non-negotiable of Silicon Valley’s highest-profile proj- vageable. In 2012, Newt Gingrich, en-
goals for time (by the end of 1969) ects, from Tesla’s spectacularly hu- joying a brief surge in support for his
and scope (landing and returning a bristic attempt reinvent the assembly presidential candidacy, promised vot-
man) without building testable pro- line to Uber’s lethally ambitious self- ers he could build a permanent moon
totypes. When Kennedy announced driving car program. Project manage- base and launch a manned Mars mis-
those objectives in 1961, NASA had ment tools may have improved, but sion by 2020 while still slashing gov-
achieved just 15 minutes of manned human nature continues to undercut ernment spending and cutting taxes.
flight in space and its managers had best practices. Rather than prove Gingrich’s gravitas
not even decided whether to launch a Although Teller, as “Captain of on a trip to the White House, the moon
single integrated spacecraft or send up Moonshots,” wants to celebrate failure base express took him straight back to
modules to assemble in Earth orbit. that is not how NASA reacted when it the political fringes. More recently,
One cannot plan out a schedule that lost Gus Grissom, Ed White, and Robert President Trump held a ceremony to
depends on fundamental scientific B. Chaffee. Kranz named his memoir sign a policy directive directing NASA
breakthroughs, since those do not oc- Failure is Not an Option, after “the creed to head back to the moon and then on-
cur on a fixed timescale. A project of we all lived by.” Explaining the title, he ward to Mars. Cynics noted the direc-
that kind is about spending money wrote that in 1970, as his team strug- tive made no mention of new funding
to mitigate risk, by pushing existing gled to save the crew of Apollo 13 after and set no timeline.
A Moonshot Is Awesome people doubted that at the time to ilton, who led its software team, even-
and Pointless make Apollo the most obvious symbol tually won the Presidential Medal of
In 1962, Kennedy campaigned for his of the failure of technology to make the Freedom her work on the project.
plan by saying “We choose to go to the world a better place. “If they can put a There were some significant tech-
Moon in this decade and do the other man on the moon,” asked critics, “why nology spin-offs from Apollo, though
things, not because they are easy, but be- can’t they do [X].” Common values for contrary to popular belief, the pow-
cause they are hard.” His moonshot was X were “cure the common cold,” “end dered drink Tang was developed pre-
about spending a $25 billion fortune urban poverty” and “fix traffic prob- viously, as were Velcro and Teflon.
to do something absurdly difficult with lems.” The modern version of that Space technology improved freeze-
no direct economic return. It showed might be “If Elon Musk can launch a dried food, microelectronics, scratch-
America’s technological capabilities, Tesla at Mars, why can’t his car fac- resistant sunglass lenses, and light-
political will, and economic might in tory come close to production metrics weight foil blankets. Most notably, the
its long struggle with the Soviet Union for quantity and quality that other car- need for reliable, miniaturized control
(or, as Kennedy put it, “to organize and makers hit routinely.” Sometimes the electronics drove the emergence of a
measure the best of our energies and rocket science is the easy part. commercial market for microchips,
skills … ”). Nothing economically vi- The Apollo program did little to di- years before they were competitive
able or practical deserves to be called a rectly advance scientific understand- for ground-based applications. Each
moonshot. Scaled up for the size of the ing. The decision to meet arbitrary Apollo guidance computer used ap-
U.S. economy, a similarly impressive in- deadlines by rushing special purpose proximately 5,000 simple chips of a
vestment today would be approximately hardware, rather than maximizing the standard design, providing enough
$600 billion. Apollo was a monumental scientific value of the missions or their demand to drop the cost per chip for
accomplishment, like the construction contribution to longer term goals, around $1,000 down to $20 or so.2 The
of the Pyramids. For Google to emulate caused tensions within NASA at the technique of using redundant control
that might mean erecting a 10-mile- time.g Apollo did more to push tech- computers, now a standard approach
high earthquake-resistant skyscraper, nology and build engineering capabili- for “fly by wire” commercial airlin-
to literally overshadow Apple and pro- ties. Apollo created good jobs for scien- ers, was pioneered by IBM in its work
vide an object of public marvel. Does tists, mathematicians, programmers, on the Saturn V control systems. One
that sound like something Google man- and engineers, at NASA itself and with of the most popular database man-
agement would authorize a massive contractors. Political considerations agement packages of the early 1970s,
bond issue for? No, it does not—even spread the work out to facilities around IBM’s Information Management Sys-
though the project would surely spur the country, rather than concentrating tem (IMS), had its roots in a system
advances in architectural engineering, it in a handful of urban areas. It is easy built with North American Rockwell
improvements in materials science, to decry that spending as corporate in 1965 to handle the proliferation of
and create a lot of engineering and con- welfare or help for the already privi- Apollo parts.5 Despite those accom-
struction jobs. leged but, as the recent movie Hidden plishments, the moonshot was not a
In his talk, Teller explained the true Figures showed, the beneficiaries were cost-effective way to boost technology.
goal of his moonshot factory was “mak- not all white men with easy lives. The Giving a quarter of the money on the
ing the world a radically better place.” I Apollo program also contributed to the National Science Foundation would
was a little surprised to hear that cliché development of software engineering surely have accomplished more, as
used in earnest, several years after “Sili- techniques—the guidance code had to would directing NASA to spend it on
con Valley” skewered it in a montage of work reliably first time. Margaret Ham- satellites and unmanned space probes.
fake TechCrunch pitches centered on But would politicians ever have made
g The scientific side of the Apollo program is those choices? Spending the money
phrases like “making the world a bet-
the focus of W.D. Compton, Where No Man Has
ter place though scalable fault tolerant Gone Before: A History of the Apollo Lunar Explo-
to drop more napalm on Vietnam or
databases with ACID transactions.”f I ration Missions. U.S. Government Printing Of- stockpile more nuclear weapons would
suppose that is why he had to promise fice, Washington, D.C., 1989. have accomplished less than nothing.
“radical” global betterment. If the moonshot made the world a
I am having a hard time imagining “radically better place” it was by redi-
Kennedy’s famous speech working as a The Apollo program recting history in subtle ways. Like me-
TechCrunch pitch to “make the world dieval jousting, the space race offered
a better place by spending billions dol- did little to directly a non-lethal, and proudly phallic, sub-
lars to harvest 381 kilos of rocks.” Was advance scientific stitute for real military clashes. Despite
the Apollo program’s goal to make the the flag waving, people across the world
world a radically better place? Enough understanding. thrilled to the spectacle and took col-
lective pride in the accomplishments
f “Silicon Valley”’s relationship to real Silicon of our species. The “Earthrise” pho-
Valley culture is discussed in A. Marantz, “How tograph of a gibbous Earth rising over
‘Silicon Valley’ nails Silicon Valley” The New
Yorker (June 9, 2016) which reports that Teller
the lunar horizon, was taken in 1968 by
was not amused when the show parodied his the first humans to venture beyond low
“moonshot factory.” Earth orbit. It has been credited with
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 27
viewpoints
sion with spare hardware, the ARPA- around 1970, just as the focus of tech-
NET had received less than one-thou- nological innovation shifted toward
sandth of its funding. Tech companies computers and networks.4 These have
The ARPANET was immediately use- have not always not produced anything like the broad
ful and soon became more useful when and sustained productivity gains cre-
network email, rather than the remote been so wary of ated by electricity or assembly lines.
logins used to justify its construction, moonshot-scale Widespread adoption of the Internet
provided an unexpected “killer appli- gave productivity growth a significant
cation.” It evolved continually, in re- projects. jolt a decade ago, but that has already
sponse to the needs of its users. The faded away.
Apollo program, in contrast, had ac- It is inaccurate to blame this slow-
complished its objective by the time down on public reluctance to fund
the Apollo 11 astronauts rode in their moonshot-sized projects without di-
tickertape parade down Broadway in Tech companies have not always rect economic returns. More likely, the
New York City. been so wary of moonshot-scale proj- end of rapid American growth and the
Since then the divergence of the ects. In my January 2018 column I end of moonshot projects are two con-
moonshot and ARPANET approaches mentioned IBM’s System/360 devel- sequences of a political and ideologi-
has been rather dramatic. As of this opment project in the 1960s, which cal shift away from long-term public
writing, only four of the planet’s sev- reportedly required a commitment and corporate investment in a range
en billion human inhabitants have of twice the firm’s annual revenues of areas, from infrastructure to educa-
walked on the moon. The youngest when the project was launched. For tion. At the height of the Apollo proj-
of them is now 83 years old, so that Alphabet today, two years of revenue ect, federal spending on research and
number seems more likely to fall would be over $200 billion. Yet its development was more than twice its
than rise. In contrast, approximately “moonshot captain” had to kill what level in recent decades. A decades-long
half of the world’s population uses he claims was a highly promising push for tax cuts, combined with rising
the Internet, the direct descendent project, just because an initial invest- government spending on healthcare
of ARPANET, and millions more con- ment of $200 million was unworkable. and social security, has hollowed out
nect to it every day. The incremen- Poor Astro was three zeros and one investment in research and infrastruc-
tal, exploratory development of the comma away from being able to live ture and left massive deficit.
ARPANET provided the modern tech up to that ridiculous job title. (Talk- Companies are likewise more fo-
firms with their model of innovation ing of absurd job titles, X recently lost cused than ever on quarterly earnings
as well as the Internet infrastructure its ‘Head of Mad Science’ to a sexual and shareholder value. Alphabet has
they rely on. harassment scandal.) the money to fund something close
Perhaps that is a good thing. Apol- to a real moonshot, if its investors
The End of Innovation? lo’s politically driven, money-no-ob- allowed it. In 2015 its total spending
I am glad Google still spends some ject pushing of technology toward a on non-core business, not just the
money exploring new product oppor- fixed goal made for great television but “moonshot factory” but potentially
tunities outside its core businesses, did not bring us closer to routine space vast emerging business areas like fi-
unlike many other modern firms, but flight. Like the Concorde supersonic ber-optic Internet service, life scienc-
do not forget that is something big jetliner, sponsored by the French and es, home automation, venture capital,
companies used to do routinely with- British governments, it was a techno- and self-driving cars, accounted for
out blathering about “moonshots.” logical marvel but an economic dead only approximately 5% of its revenues.
Fifty years ago Ford, General Electric, end. On the other hand, the Silicon Even that was viewed by investors as ir-
Kodak, Xerox, RCA, AT&T, Kodak, Valley model has not delivered nearly responsible, given that they generated
Dow Chemical, 3M, and a host of as much economic growth as all the less than 1% of its income, and in early
aerospace firms were investing heav- talk about innovation and disruption 2017 Alphabet reportedly launched an
ily in such projects. Consulting firm might lead to you believe. Notwith- “apparent bloodbath,” killing ambi-
Arthur D. Little specialized in helping standing all the amazing things your tious plans for delivery drones, modu-
companies apply newly developed cellphone does, technological change lar cellphones, and the rollout of fiber-
materials, with stunts like turning a in the developed world has slowed to a optic Internet access to more cities.i
sow’s ear into a silk purse.8 Many of fraction of its former rate. The 1960s Subsequent reports tied a transition
those firms also supported labs do- were a highwater mark for confidence in which “futurism has taken a back
ing basic research in relevant areas of in the effectiveness of investment seat to more pressing concerns” to the
science, which Google and its peers in bold technological projects like withdrawal of Google co-founder Larry
do not attempt. Today’s leading tech Apollo, System/360, or ARPANET. In Page from hands-on management.1
companies are not short of cash, The Rise and Fall of American Growth, What would modern tech compa-
but their focus is on minor improve- economist Robert Gordon suggested nies do with a windfall big enough to
ments and the development of new a century of spectacular growth in liv- fund an actual moonshot? Thanks to
features and applications within their ing standards, life expectancy, and
existing platforms. economic productivity began to stall i https://bit.ly/2PLDigV
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 29
viewpoints
Viewpoint
UCF’s 30-Year REU Site
in Computer Vision
A unique perspective on experiences encouraging
students to focus on further education.
T
HE U.S. GOVERNMENT’S Na-
tional Science Foundation
(NSF) started the Research
Experiences for Undergradu-
ates (REU) program in the
mid-1980s to attract undergraduates in
STEM fields into research careers and
to consider going to graduate school.
The REU program offers grants to uni-
versities to plan and oversee research
experiences that enrich undergraduate
students’ educational experiences. It is
believed these experiences encourage
the participants to pursue leadership
careers in the fields of science, technol-
ogy, engineering, or mathematics.
The University of Central Florida’s
(UCF) Computer Vision group was in
the selected first group of sites: only
three REU sites in NSF’s Division of
Computer and Information Science
and Engineering (CISE) were awarded The Harris Engineering Center, home of the School of Electrical Engineering and Computer
funding in 1987. The grant duration Science at the University of Central Florida, USA.
was one year, so continued funding
would require a new application for re- about 80 have published their projects to have contributed independently to
newal the following year. A few years lat- in high-quality venues. Each year, we our longevity.
er, the grant duration was increased to solicit applications, and we receive Focus: Computer vision. Our site is
three years, and remarkably for the past well over 150. After a careful interview, focused on exciting and appealing top-
30 years, UCF has kept continuously be- we make offers until our 10 positions ics in computer vision, which facilitate
ing funded, by a total of 14 grants. The are filled. Given our successful streak, a condensed short course covering
NSF funded site pays stipends to 10 un- we try to shed some perspective over key topics, coordination among fac-
dergraduates each year who immerse our efforts and experiences; see http:// ulty and graduate students mentors,
in research and gain useful insight into crcv.ucf.edu/REU/ and interaction and exchanging ideas
IMAGE BY FLJUJ ITSU/WIKIM EDIA
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 31
viewpoints
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 33
viewpoints
summer background training, inclu- tion; Scale Space Based Grammar for
sion of and proper scheduling of the Hand Detection.
vast variety of activities. The pre-sum- ˲˲ 2007–2012: Optimizing One-Shot
ACM Transactions mer activities of planning the research Recognition with Micro-Set Learning;
on Accessible
topics in advance has also taken great- Part-based Multiple-Person Tracking
er attention. with Partial Occlusion Handling.
Computing The recent change of adding new
faculty to the Center for Research in
˲˲ 2012–2017: How to Take a Good
Selfie?, GIS-Assisted Object Detection
Computer Vision (CRCV) has permit- and Geo-spatial Localization.
ACM TACCESS is a ted flexibility in how the 10 students
are subgrouped for their weekly report- Broadening Participation
quarterly journal that ing meetings, how they are mentored UCF’s REU has a strong commitment
each day, and has opened up new re- to broaden participation among un-
publishes refereed search areas within computer vision derrepresented groups. Of the 50 par-
articles addressing and machine learning. ticipating UGRs in the past 5 years, 23
are female, and 10 of the 27 males are
issues of computing Changes in Content African-American or Hispanic. This di-
The field of computer vision is rap- versity in the cohort contributes to in-
as it impacts the idly evolving and the REU site has creasing the pipeline of students pur-
kept pace with the changes. Machine suing graduate careers.
lives of people with learning approaches started to ap-
disabilities. The pear in computer vision, as they were
able to contribute to object recogni-
Conclusion
After 30 years (and approximately
journal will be of tion solutions during the mid-1990s. 300 students), some patterns have
Approaches such as neural networks, emerged. Approximately half the
particular interest to boosting, and support vector ma- students have proceeded to gradu-
SIGACCESS members chines were actively competing for as-
cendance during the early 2000s. The
ate school. Many of the participants
have proceeded to leadership posi-
and delegates to its advent of Deep Learning in the 2010s tions in their professions: becoming
has slowly gained acceptance as the faculty members, starting their own
affiliated conference dominant paradigm in computer vi- companies, and rising to manage-
(i.e., ASSETS), as well sion, and today, research in computer
vision must start with a quick study of
rial positions in Fortune 500 Technol-
ogy companies. Details about student
as other international deep learning approaches and novices successes are provided in the book-
must acquire competence in running let at http://crcv.ucf.edu/REU/Book-
accessibility practical experiments with large data let_071117.pdf
conferences. sets in deep learning implementation
environments. Consequently, our own
UCF’s CRCV has seen many ben-
efits from its cultivated REU strength.
short course now has a strong empha- UGRs have provided an opportunity
sis on environments like Keras, Ten- to explore research directions, to de-
sorflow, and a shift to teaching Python velop mentoring skills among faculty
(away from MatLab). (older and newer) and graduate stu-
Sample Topics. Looking at the top- dents. CRCV-trained UGRs have popu-
ics pursued over the past 30 years indi- lated graduate programs around the
cates the student projects have evolved nation. Our models of evaluation and
with the growth of computer vision. attentiveness have allowed for best
Over the six five-year periods, two top- practices to be tested and employed.
ics per period are listed here. The commitment of time, effort, and
˲˲ 1987–1992: Object Recognition us- resources is expected to continue into
ing Multiple Sensors; Detection and future decades.
Representation of Events in Motion
Trajectories. Niels Da Vitoria Lobo (niels@cs.ucf.edu) is an Associate
Professor at the Department of Computer Science,
˲˲ 1992–1997: Visual Lipreading Us- University of Central Florida, Orlando, FL, USA.
For further information ing Eigensequences; Screening Mam- Mubarak A. Shah (shah@cs.ucf.edu) is the founding
mogram Images for Abnormalities. Director of the Center for Research in Computer Vision,
or to submit your ˲˲ 1997–2002: Person-on-Person Vio-
University of Central Florida, Orlando, FL, USA.
Viewpoint
Modeling in Engineering
and Science
Understanding behavior by building models.
F
O R M O RE TH AN 40 years— rupts either directly or through a real-
since 1978—I have been time operating system. To understand
working on computers that the timing behavior, we have to model
interact directly with the many details of the hardware and soft-
physical world. People now ware, including the memory architec-
call such combinations “cyber-physi- ture, pipeline design, I/O subsystem,
cal systems,” and with automated fac- concurrency management, and operat-
tories and self-driving cars, they are ing system design.
foremost in our minds. Back then, I During these 40-plus years, a sub-
was writing assembly code for the In- tle but important transformation oc-
tel 8080, the first in a long line of what curred in the way we approach the de-
are now called x86 architectures. The sign of a real-time system. In 1978, my
main job for those 8080s was to open models specified the timing behavior,
and close valves that controlled air- and it was incumbent on the physical
pressure driven robots in the clinical system to correctly emulate my mod-
pathology lab at Yale New Haven Hos- el. In 2018, the physical system gives
pital. These robots would move test me some timing behavior, and it is up
tubes with blood samples through a to me to build models of that timing
semiautomated assembly line of test behavior. My job as an engineer has
equipment. The timing of these ac- switched from designing a behavior to
tions was critical, and the way I would understanding a behavior over which I
control the timing was to count assem- physical system was electrons sloshing have little control.
bly language instructions and insert around in silicon and causing mechani- To help understand a behavior
no-ops as needed. Even then, this was cal relays to close or open. I did not have over which I have little control, I build
not completely trivial because the time to think about these electromechani- models. It is common in the field of
taken for different instructions varied cal processes, however. I just thought real-time systems, for example, to es-
from four to 11 clock cycles. But the about my more abstract model. timate the “worst case execution time”
timing of a program execution was well Today, getting real-time behavior (WCET) of a section of code using a de-
defined, repeatable, and precise. from a microprocessor is more com- tailed model of the particular hardware
The models I was working with then plicated. Today’s clock frequencies are that the program will run on. We can
were quite simple compared to to- more than three orders of magnitude then model the behavior of a program
day’s equivalents. My programs could higher (more than 2GHz vs. 2MHz), using that WCET, obtaining a higher
be viewed as models of a sequence of but the timing precision of I/O interac- level, more abstract model.
timed steps punctuated with I/O ac- tions has not improved and may have There are two problems with this ap-
IMAGE BY OLGA H M ELEVSKAYA
tions that would open or close a valve. actually declined, and repeatability has proach. First, determining the WCET
My modeling language was the 8080 gone out the window. Today, even if we on a modern microprocessor can be
assembly language, which itself was were to write programs in x86 assem- extremely difficult. It is no longer suf-
a model for the electrical behavior bly code, it would be difficult, maybe ficient to understand the instruction
of NMOS circuits in the 8080 chips. impossible, to use the same style of set, the x86 assembly language. You
What was ultimately happening in the design. Instead, we use timer inter- have to model every detail of the sili-
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 35
viewpoints
Using Remote
own remote cache service. In practice,
this can reduce the build time by almost
an order of magnitude.
Cache Service
How Does It Work?
Users run Bazel (https://docs.bazel.
build/versions/master/user-manual.html)
by specifying targets to build or test. Ba-
for Bazel
zel determines the dependency graph of
actions to fulfill the targets after analyz-
ing the build rules. This process is in-
cremental, as Bazel will skip the already
completed actions from the last invoca-
tion in the workspace directory. After
that, it goes into the execution phase
and executes actions according to the
dependency graph. This is when the
remote cache and execution systems
come into play.
SOFTWARE PROJ E C T S TO D AY are getting more and An action in Bazel consists of a com-
more complex. Code accumulates over the years as mand, arguments to the command, and
the environment variables, as well as
organization growth increases the volume of daily lists of input files and output files. It also
commits. Projects that used to take minutes to contains the description of the platform
for remote execution, which is outside
complete a full build now start with fetching from the the scope of this article. The informa-
repository and may require an hour or more to build. tion about an action can be encoded
A developer who maintains the infrastructure into a protocol buffer (https://develop-
ers.google.com/protocol-buffers/) that
constantly has to add more machines to support the works as a fingerprint of the action. It
ever-increasing workload for builds and tests, at the contains the command, arguments, and
environment variables combined as a
same time facing pressure from users who are unhappy digest and a Merkle tree digest from the
with the long submit time. Running more parallel input files. The Merkle tree is generated
jobs helps, but this is limited by the number of cores as follows: files are the leaf nodes and
are digested using their corresponding
on the machine and the parallelizability of the build. content; directories are the tree nodes
Incremental builds certainly help, but might not apply if and are digested using digests from
IMAGE BY AND RIJ BORYS ASSOCIAT ES
tion cache, known as the action digest The scheme does not rely on incre- The most straightforward way to en-
or action key. If there is a hit, the result mental state, as an action is indexed by able this feature with Bazel is to add
contains a list of output files or output a digest computed from its immediate the flags in the following example to
directories and their corresponding di- inputs. This means once the cache is the ~/.bazelrc file:
gests. Bazel downloads the contents of populated, running a build or test on uild --remote _ http _
b
a file using the file digest from the CAS a different machine will reuse all the cache=http://build/cache
(content-addressable store). Looking already-computed outputs as long as build --experimental _ remote _
up the digest of an output directory the source files are identical. A devel- spawn _ cache
from the CAS results in the contents oper can iterate on the source code; then This enables remote cache with lo-
of the entire directory tree, including build outputs from every iteration will cal sandboxed execution.
subdirectories, files, and their corre- be cached and can be reused. The first flag, --remote _ http _
sponding digests. Once all the output Another key design element is that cache, specifies the URL of the remote
file directories are downloaded, the ac- cache objects in the action cache and cache service. In this example, Bazel uses
tion is completed without the need to CAS can be independently evicted, as the path /ac/ (that is, http://build/cache/
execute locally. Bazel will fall back to local execution in ac) to access the action cache bucket and
The cost of completing this cached the case of a cache miss or error read- the path /cas/ (http://build/cache/cas) to
action comes from the computation of ing from either one. The number of access the storage bucket for the CAS.
digests of input files and the network cache objects will grow over time since The second flag, --experimental _
round trips for the lookup and transfer Bazel does not actively delete. It is the remote _ spawn _ cache, enables the
of the output files. This cost is usually responsibility of the remote cache ser- use of remote cache for eligible actions
substantially less than executing the vice to perform eviction. with sandboxed execution in case of a
action locally. cache miss. When downloading from
In case of a miss, the action is ex- Remote Cache Usage or uploading to a bucket, the last seg-
ecuted locally, and each of the output Two storage buckets are involved in the re- ment of the path (aka a slug) is a digest.
files is uploaded to the CAS and in- mote cache system: a CAS that stores files The next example shows two possi-
dexed by the content digests. Standard and directories and an action cache ble URLs that Bazel might use to access
output and error are uploaded similar- that stores the list of output files and di- the cache service:
ly to files. The action cache is then up- rectories. Bazel uses the HTTP/1.1 protocol http://build/cache/cas/cf80c-
dated to record the list of output files, (https://www.w3.org/Protocols/rfc2616/ d8aed482d5d1527d7dc72fcef-
directories, and their digests. rfc2616-sec9.html) to access these two f84e6326592848447d2dc0b0e87dfc9a90
Because Bazel treats build actions storage buckets. The storage service needs http://build/cache/ac/cf80c-
and test actions equally, this mecha- to support two HTTP methods for each of d8aed482d5d1527d7dc72fcef-
nism also applies to running tests. In the storage buckets: the PUT method, f84e6326592848447d2dc0b0e87dfc9a90
this case, the inputs to a test action will which uploads the content for a binary To more finely control the kinds of
be the test executable, runtime depen- blob, and the GET method, which actions that will use the remote cache
dencies, and data files. downloads the content of a binary blob. without local sandboxed execution,
you can use the flags shown in the fol-
Remote cache service using pen source components. lowing example. Individual actions can
be opted in to use the remote cache ser-
Kubernetes
vice by using the flag
--strategy=<action _
Kubernetes pod name>=remote.
build --remote _ http _
HTTP/1.1 Java virtual machine
cache=http://build/cache
requests
Hazelcast cache instance build --spawn _ strategy=remote
build --genrule _ strategy=remote
JMX build --strategy=Javac=remote
HTTP/1.1 The default behavior of Bazel is
requests to read from and write to the remote
load
balancer replication cache, which allows all users of the re-
Kubernetes pod mote cache service to share build and
test outputs. This feature has been
Java virtual machine used in practice for a Bazel build on
Hazelcast cache instance
machines with identical configura-
HTTP/1.1 tions in order to guarantee identical
requests JMX and reusable build outputs.
Bazel also has experimental support
for using a gRPC (gRPC Remote Proce-
dure Call) service to access the remote
cache service. This feature might pro-
vide better performance but may not scalable in terms of QPS (queries per
have a stable API. The Bazel Buildfarm second) and storage capacity.
project (https://github.com/bazelbuild/ You can also implement your
bazel-buildfarm) implements this API. own HTTP cache service to suit your
developed open
are still under development.
lar to the second example in the previ- In all implementations of the cache
ous section can be used by Bazel as the
remote cache service. A few successful
source build service it is important to consider cache
eviction. The action cache and CAS will
implementations have been reported. and test system grow indefinitely since Bazel does not
Google Cloud Storage (https://
cloud.google.com/storage/) is the easi-
that aims perform any deletions. Controlling the
storage footprint is always a good idea.
est to set up if you are already a user. to increase The example Hazelcast implementa-
It is fully managed, and you are billed
depending on storage needs and net- productivity tion in the figure can be configured to
use a least recently used eviction policy
work traffic. This option provides good in software with a cap on the number of cache ob-
network latency and bandwidth if your
development environment and build development. jects together with an expiration policy.
Users have also reported success with
infrastructure are already hosted in random eviction and by emptying the
Google Cloud. It might not be a good cache daily. In any case, recording met-
option if you have network restrictions rics about cache size and cache hit ra-
or the build infrastructure is not locat- tio will be useful for fine-tuning.
ed in the same region. Similarly, Ama-
zon S3 (Simple Storage Service; https:// Best Practices
aws.amazon.com/s3/) can be used. Following the best practices outlined
For onsite installation, nginx (https:// here will avoid incorrect results and
nginx.org/en/) with the WebDAV (Web maximize the cache hit rate. The first
Distributed Authoring and Versioning) best practice is to write your build rules
module (http://nginx.org/en/docs/http/ without any side effects. Bazel tries very
ngx_http_dav_module.html) will be the hard to ensure hermeticity by requiring
simplest to set up but lacks data repli- the user to explicitly declare input files
cation and other reliability properties if to any build rule. When the build rules
installed on a single machine. are translated to actions, input files are
The accompanying figure shows an known and must present during execu-
example system architecture implemen- tion. Actions are executed in a sandbox
tation of a distributed Hazelcast (https:// by default, and then Bazel checks that
hazelcast.com/) cache service (https:// all the declared output files are created.
hazelcast.com/use-cases/caching/cache- You can, however, still write a build rule
as-a-service/) running in Kubernetes with side effects using genrule or a cus-
(https://kubernetes.io/). Hazelcast is a tom action written in the Skylark language
distributed in-memory cache running in (https://docs.bazel.build/versions/master/
a JVM (Java Virtual Machine). It is used skylark/language.html), used for exten-
as a CaaS (cache-as-a-service) with sup- sions. An example is writing to the tem-
port for the HTTP/1.1 interface. In porary directory and using the temporary
the figure, two instances of Hazelcast files in a subsequent action. Undeclared
nodes are deployed using Kubernetes side effects will not be cached and might
and configured with asynchronous data cause flaky build failures regardless of
replication within the cluster. A Kuber- whether remote cache is used.
netes Service (https://kubernetes.io/ Some built-in rules such as cc _
docs/concepts/services-networking/ library and cc _ binary have im-
service/) is configured to expose a port plicit dependencies on the toolchain
for the HTTP service, which is load- installed on the system and on system
balanced within the Hazelcast cluster. libraries. Because they are not explic-
Access metrics and data on the health itly declared as inputs to an action,
of the JVM are collected via JMX (Java they are not included in the computa-
Management Extensions). This exam- tion of the action digest for looking up
ple architecture is more reliable than a the action cache. This can lead to the
single-machine installation and easily reuse of object files compiled with a
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 41
practice
different compiler or from a different build --experimental _ strict _ solution is to index the action cache
CPU architecture. The resulting build action _ env with the action digests computed us-
outputs might be incorrect. ing both methods.
Docker containers (https://www.dock- Future Improvements Another shortcoming in the imple-
er.com/what-container) can be used to With just a few changes, the remote mentation of remote cache in Bazel is
ensure that all build workers have exactly cache feature in Bazel will become the repeated computation of the Merkle
the same system files, including tool- even more adept at boosting perfor- tree digest of the input files. The con-
chain and system libraries. Alternatively, mance and reducing the time neces- tent digests of all the source files and
you can check in a custom toolchain to sary to complete a build. intermediate action outputs are already
your code repository and teach Bazel to Optimizing the remote cache. When cached in memory, but the Merkle tree
use it, ensuring all users have the same there is a cache hit after looking up the digest for a set of input files is not. This
files. The latter approach comes with a remote cache using the digest comput- cost becomes evident when each action
penalty, however. A custom toolchain ed for an action, Bazel always down- consumes a large number of input files,
usually contains thousands of files such loads all the output files. This is true which is common for compilation us-
as the compiler, linker, libraries, and for all the intermediate outputs in a ing a custom toolchain for Java or C and
many header files. All of them will be fully cached build. For a build that has C++. Such build actions have large por-
declared as inputs to every C and C++ many intermediate actions this results tions of the input files coming from the
action. Digesting thousands of files for in a considerable amount of time and toolchain and will benefit if parts of the
every compilation action will be compu- bandwidth spent on downloading. Merkle tree are cached and reused.
tationally expensive. Even though Bazel A future improvement would be to Local disk cache. There is ongo-
caches file digests, it is not yet smart skip downloading unnecessary action ing development work to use the file
enough to cache the Merkle tree digest outputs. The result of successfully look- system to store objects for the action
of a set of files. The consequence is that ing up the action cache would contain cache and the CAS. While Bazel al-
Bazel will combine thousands of digests the list of output files and their corre- ready uses a disk cache for incremental
for each compilation action, which adds sponding content digests. This list of builds, this additional cache stores all
considerable latency. content digests can be used to compute build outputs ever produced and al-
Nonreproducible build actions the digests to look up the dependent ac- lows sharing between workspaces.
should be tagged accordingly to avoid tions. Files would be downloaded only if
being cached. This is useful, for exam- they are the final build artifacts or are Conclusion
ple, to put a timestamp on a binary, an needed to execute an action locally. This Bazel is an actively developed open
action that should not be cached. The change should help reduce bandwidth source build and test system that aims
following genrule example shows how and improve performance for clients with to increase productivity in software de-
the tags attribute is used to control weak network connections. velopment. It has a growing number of
caching behavior. It can also be used Even with this optimization, the optimizations to improve the perfor-
to control sandboxing and to disable scheme still requires many network mance of daily development tasks.
remote execution. round trips to look up the action cache Remote cache service is a new devel-
genrule( for every action. For a large build graph, opment that significantly saves time in
name = "timestamp", network latency will become the major running builds and tests. It is particu-
srcs = [], factor of the critical path. larly useful for a large code base and
outs = ["date.txt"], Buck has developed a technique any size of development team.
cmd = "date > date.txt", to overcome this issue (https://bit.
tags = ["no-cache"], ly/2OiFDzZ). Instead of using the content
Related articles
) digests of input files to compute a digest on queue.acm.org
Sometimes a single user can write for each action, it uses the action digests
Borg, Omega, and Kubernetes
erroneous data to the remote cache and from the corresponding dependency ac-
Brendan Burns, Brian Grant, David
cause build errors for everyone. You can tions. If a dependency action outputs Oppenheimer, Eric Brewer, and John Wilkes
limit Bazel to read-only access to the multiple files, each can be uniquely iden- https://queue.acm.org/detail.cfm?id=2898444
remote cache by using the flag shown tified by combining the action digest from Nonblocking Algorithms and Scalable
in the next example. The remote cache its generating action and the path of the Multicore Programming
should be written only by managed ma- output file. This mechanism needs only Samy Al Bahra
chines such as the build workers from the content digests of the source files and https://queue.acm.org/detail.cfm?id=2492433
a continuous integration system. the action dependency graph to compute Unlocking Concurrency
build --remote _ upload _ local _ every action digest in the entire graph. Ali-Reza Adl-Tabatabai, Christos Kozyrakis, and
results=false The remote cache service can be queried Bratin Saha
https://queue.acm.org/detail.cfm?id=1189288
A common cause of cache miss is an in bulk, saving the network round trips.
environment variable such as TMPDIR. The disadvantage of this scheme is Alpha Lam is a software engineer. His areas of interest
Bazel provides a feature to standardize that a change in a single source file— are video technologies and build systems. Most recently
he worked at Two Sigma Investments. He currently works
environment variables such as PATH even a trivial one such as changing the at Google.
for running actions. The next example code comments—will invalidate the
shows how .bazelrc enables this feature: cache for all dependents. A potential Copyright held by author/owner.
Article development led by
queue.acm.org
Research
for Practice:
Security
for the
Modern Age
WHEN DEPLOYING APPLICATIONS in the cloud,
practitioners seek to use the most operable
set of tools for the job; determining the
“right” tool is, of course, nontrivial. Back in
2013, Docker won the hearts of developers by
being easy to use, but Linux containers themselves
have been around since 2007, when Containers are not the abstraction
control groups (cgroups) were added an application developer typically en-
to the kernel. Today, containers have counters today. The trend is toward
spawned a large ecosystem of new functions and “serverless,” allowing
tools and practices that many pro- the user to run a single function in the
fessionals are using on a daily ba- cloud. Because of the way applications
sis. The foundational technologies and functions are run in the cloud,
making up containers are not new, there will likely be a new generation of
however. Unlike Solaris Zones or isolation techniques built around run-
FreeBSD Jails, Linux containers are ning a single process securely in an easy
not discrete kernel components built and minimal way.
with isolation in mind. Rather, Linux While evidence has shown that “a
containers are a combination of tech- container with a well-crafted secure
nologies in the kernel: namespaces, computing mode (seccomp) profile
cgroups, AppArmor, and SELinux, to (which blocks unexpected system calls)
name a few. provides roughly equivalent security to
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 43
practice
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 45
practice
DOI:10.1145/ 3287299
and enabled. The process takes many
Article development led by
queue.acm.org
hours, so we tend to do it on the week-
end, which I hate. If it fails, we have to re-
vert to the backup tapes and restore ev-
Automation and a little discipline erything from scratch and start again.”
allow better testing, shorter release cycles, He concluded, “Just scheduling such
an event takes weeks of negotiation.
and reduced business risk. We usually lose the negotiation, which
is why we end up doing it on the week-
BY THOMAS A. LIMONCELLI end. Doing this every few months is
painful and the number-one source of
SQL Is
stress around here. If we had to do this
for weekly releases, most of us would
just quit. We would have no weekends!
Heck, I’ve heard some companies do
No Excuse
software releases multiple times a day.
If we did that, our application would al-
ways be down for upgrades!”
Wow. There is a lot to unpack there.
to Avoid
Let me start by clearing up a number of
misconceptions, then let’s talk about
some techniques for making those de-
ployments much, much easier.
DevOps
First, DevOps is not a technology, it is
a methodology. The most concise defi-
nition of DevOps is that it is applying
Agile/lean methods from source code
all the way to production. This is done
to “deliver value faster,” which is a fancy
way of saying reducing the time it takes
for a feature to get from idea to produc-
tion. More frequent releases means less
time a newly written feature sits idle
waiting to be put into production.
A FRIEND R ECEN T LY said to me, “We can’t do DevOps, DevOps does not require or forbid
we use a SQL database.” I nearly fell off my chair. Such any particular database technology—
a statement is wrong on many levels. or any technology, for that matter. Say-
ing you can or cannot “do DevOps” be-
“But you don’t understand our situation!” he rebuffed. cause you use a particular technology is
“DevOps means we’ll be deploying new releases of like saying you cannot apply Agile to a
project that uses a particular language.
our software more frequently! We can barely handle SQL may be a common “excuse of the
deployments now and we only do it a few times a year!” month,” but it is a weak excuse.
I asked him about his current deployment process. I understand how DevOps and the
lack of SQL databases could become in-
“Every few months we get a new software release,” exorably linked in some people’s minds.
he explained. “Putting it into production requires a lot In the 2000s and early 2010s companies
that were inventing and popularizing
of work. Because we use SQL, the deployment looks DevOps were frequently big websites
something like this: First, we kick out all the users and that were, by coincidence, also popular-
shut down the application. Next the DBAs (database izing NoSQL (key/value store) databas-
es. Linking the two, however, is confus-
administrators) modify the database schema. Once ing correlation with causation. Those
their work is done, the new software release is installed same companies were also populariz-
something is risky there is a natu- deploy. You should always, in princi- To that end, the application should
ral inclination to seek to do it less. ple, be driving down the fixed cost of manage the schema. Each version of the
Counterintuitively, this actually in- deployment toward zero. Increasing schema should be numbered. An ap-
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 47
practice
ing to figure out which change caused own schema) and deploy a release that adding fields, phase 5 is skipped be-
the problem. If you make one change at modifies the schema but doesn’t use the cause there is nothing to be removed.
a time, and there is a failure, the search field. With the right transactional lock- The process reduces to what was de-
becomes a no-brainer. It is also easier to ing hullabaloo, the first instance that scribed earlier in this article. Phases 4
back out one change than many. is restarted with the new software will and 5 can be combined or overlapped.
Heck, even Google, with its highly cleanly update the schema. If there is Alternatively, phase 5 from one schema
sophisticated testing technologies and a problem, the canary will die. You can change can be merged into phase 2 of
methodologies, understands that sub- fix the software and try a new canary. Re- the next schema change.
tle differences between the staging en- verting the schema change is optional. With these techniques you can roll
vironment and the production environ- Since the schema and software are through the most complex schema
ment may result in deployment failures. decoupled, developers can start using changes without downtime.
They “canary” their software releases: the new field at their leisure. While in
upgrading one instance, waiting to see the past upgrades required finding a Summary
if it starts properly, then upgrading the maintenance window compatible with Using SQL databases is not an impedi-
remaining instances slowly over time. multiple teams, now the process is de- ment to doing DevOps. Automating
This is not a testing methodology, this coupled and all parties can work in a schema management and a little devel-
is an insurance policy against incom- coordinated way but not in lockstep. oper discipline enables more vigorous
plete testing—not that their testing More complicated changes require and repeatable testing, shorter release
people are not excellent, but nobody is more planning. When splitting a field, cycles, and reduced business risk.
perfect. The canary technique is now an removing some fields, adding others, Automating releases liberates us.
industry best practice and is even em- and so on, the fun really begins. It turns a worrisome, stressful, man-
bedded in the Kubernetes system. (The First, the software must be writ- ual upgrade process into a regular
term canary is derived from “canary in a ten to work with both the old and new event that happens without incident.
coalmine.” The first instance to be up- schemas and most importantly must It reduces business risk but, more
graded dies as a warning sign that there also handle the transition phase. Sup- importantly, creates a more sustain-
is a problem, just as coal miners used to pose you are migrating from storing a able workplace.
bring with them birds, usually canaries, person’s complete name in one field, When you can confidently deploy
which are more sensitive to poisonous to splitting it into individual fields for new releases, you do it more frequently.
gas than humans. If the canary died, it first, middle, last name, title, and so on. New features that previously sat unre-
was a sign to evacuate.) The software must detect which field(s) leased for weeks or months now reach
Since these problems are caused by exist and act appropriately. It must also users sooner. Bugs are fixed faster. Se-
software being tightly coupled to a par- work correctly while the database is in curity holes are closed sooner. It en-
ticular schema, the solution is to loosen transition and both sets of fields exist. ables the company to provide better
the coupling. These can be decoupled Once both sets of fields exist, a batch value to customers.
by writing software that works for mul- job might run that splits names and
tiple schemas at the same time. This is stores the individual parts, nulling the Acknowledgments
separating rollout and activation. old field. The code must handle the Thanks to Sam Torno, Mark Henderson,
The first phase is to write code that case where some rows are unconverted and Taryn Pratt, SRE, Stack Overflow Inc.;
doesn’t make assumptions about the and others are converted. Steve Gunn, independent; Harald Wa-
fields in a table. In SQL terms, this means The process for doing this conver- gener, iNNOVO Cloud GmbH; Andrew
SELECT statements should specify the sion is documented in the accompa- Clay Shafer, Pivotal; Kristian Köhntopp,
exact fields needed, rather than using nying sidebar “The Five Phases of a Booking.com, Ex-MySQL AB.
SELECT *. If you do use SELECT *, don’t Live Schema Change.” It has many
assume the fields are in a particular or- phases, involving creating new fields,
Related articles
der. LAST_NAME may be the third field updating software, migrating data, on queue.acm.org
today, but it might not be tomorrow. and removing old fields. This is called
The Small Batches Principle
With this discipline, deleting a field the McHenry Technique in The Prac-
Thomas A. Limoncelli
from the schema is easy. New releases are tice of Cloud System Administration (of https://queue.acm.org/detail.cfm?id=2945077
deployed that don’t use the field, and which I am coauthor with Strata R.
Adopting DevOps Practices
everything just works. The schema can Chalup and Christina J. Hogan); it is in Quality Assurance
be changed after all the instances are also called Expand/Contract in Release James Roche
running updated releases. In fact, since It!: Design and Deploy Production-Ready https://queue.acm.org/detail.cfm?id=2540984
the vestigial field is ignored, you can Software by Michael T. Nygard.
procrastinate and remove it later, much The technique is sophisticated Thomas A. Limoncelli is the SRE manager at Stack
Overflow Inc. in New York City. His books include The
later, possibly waiting until the next enough to handle the most complex Practice of System and Network Administration, The
(otherwise unrelated) schema change. schema changes on a live distributed Practice of Cloud System Administration, and Time
Management for System Administrators. He blogs at
Adding a new field is a simple matter system. Plus, each and every mutation EverythingSysadmin.com and tweets at @YesThatTom.
of creating it in the schema ahead of the can be rolled back individually.
first software release that uses it. We use The number of phases can be re- Copyright held by owner/author.
Technique 1 (applications manage their duced for special cases. If one is only Publication rights licensed to ACM. $15.00
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 49
contributed articles
DOI:10.1145/3210753
Autonomous
Tools
and Design:
A Triple-Loop
Approach to
Human-Machine
Learning
key insights
˽˽ Autonomous tools are able to generate
DESIGNERS INCREASINGLY LEVERAGE autonomous remarkable design outcomes, but
software tools that make decisions independent of the designers using them also need to change
the way they do their design work.
designer. Examples abound in virtually every design ˽˽ Designing with autonomous tools requires
that designers understand and actively
field. For example, semiconductor chip designers interact with the “mental models” of the
use tools that make decisions about placement and tools, in addition to the design artifact
and the design process, what we call the
logic checking. Game designers rely on software that “triple loop” model of learning.
generates initial drafts of virtual worlds. Autonomous ˽˽ Designers working with autonomous
tools need to build capabilities
tools employ artificial intelligence methods, including described here in terms of framing,
evaluating, and adjusting to navigate
machine learning, pattern recognition, meta-heuristics, this new design process.
erate design artifacts beyond any hu- must still actively frame, evaluate, and triple-loop approach, followed by illus-
man’s capabilities. adjust the “mental” models embed- trative examples from research into the
A naïve view suggests these tools will ded in autonomous tools, in the form design of semiconductors, video games,
someday replace human designers in of algorithms.a Organizations employ- software interfaces, and artificial intelli-
the design process. An alternative per- ing autonomous tools in their design gence. We conclude by identifying prac-
spective is that humans will continue processes must thus account for these tices that enable designers to frame,
to play an important role but also that activities in their design processes. evaluate, and adjust the mental models
this role is changing. To account for the embedded in autonomous tools.
changing role of humans in design pro- a We say “mental model embedded in an au-
cesses powered by autonomous tools, tonomous tool” to indicate that just as hu- Design as Triple-Loop
we describe in this article a “triple-loop man designers have mental models that guide Human-Machine Learning
their design activity, including their use of
approach” to human-machine learn- tools, autonomous tools also have models that
Design processes are a form of knowl-
ing. Increasing amounts of design ac- guide their design activity. Both types of model edge-intensive work that relies on de-
tivity are most certainly being carried change over time. signers’ capacity to learn. In his semi-
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 51
contributed articles
autonomous tools. Moreover, the tool loop 1). Designers then learn from the change the parameters interactively
may change its own model as it relates experiments in a way that helps them after evaluating design outcomes. The
to what the human wants and how the improve the input parameters for the developers learn about the effects of
human perceives the tool; this may next round of experiments, as in Figure the mental models embedded in the
result in changes in the interface or 2, loop 2. Developers of the algorithms tools, as well as the designers’ mental
the design parameters being applied. interact with the chip designers in or- models. This involves learning about
Much like two people with different der to learn how the chip designers the specific assumptions of the design-
mental models learn from each other
and work together to reconcile their Figure 1. Double-loop learning; based on Argyris.3,4
mental models, autonomous tools
and humans likewise have different 2
models related to design goals and
processes they may seek to reconcile
1
through various loops of learning.
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 53
contributed articles
ers while rooting out inefficiencies of learning provided a solution, enabling how to control the outputs (see Figure
the tools by updating and rewriting the Adobe Labs’ developers to reduce this 2, loop 2). Over time, the designer’s ex-
source code for the tools, as in Figure 2, high-dimensionality problem to a perience can be used to refine the in-
loop 3. Tool developers then carefully three-dimensional space comprehen- terface, as in Figure 2, loop 3. In such
calibrate the mental models embed- sible by human designers. Moreover, user-interface design, the machine-
ded in the autonomous tool to fit with the three-dimensional space was con- learning system begins with the goal
the mental models of the designers. trollable through three slider bars. Us- of reducing the dimensionality of the
Software interface design at Adobe ing this intuitive interface, designers interface from 100 dials to three slider
Labs. Interface designers today make can more easily configure the model bars. Although the mental model of the
extensive use of machine learning to match a given image. The example human can never be entirely aligned
to improve interface designs. For ex- shows that autonomous tools do not with the underlying mental model em-
ample, researchers at Adobe Labs cre- have to correspond precisely to the bedded in the tool due to the limits of
ated tools intended to control complex mental models of humans. Instead, human cognition, the new interface
user processes.13 In particular, visual they often provide an expressive but provides a level of abstraction neces-
designers wanted to be able to control low-dimensional interface. Humans sary for effective learning.
procedural models that render com- learn through interacting with this Designing Landscapes at Ubisoft.
plex shapes (such as trees and bushes) interface, and the machine and the Tools have a long track record in vid-
by growing them artificially from digi- human both participate in learning. eo game development. Algorithmi-
tal seeds into mature plants. Designers The interface amplifies the ability of cally generated content may include
had difficulty harnessing these models a human designer to explore a large a variety of game elements, including
because the array of approximately 100 design space. In this design process, textures, buildings, road networks,
parameters controlling such a growth the autonomous tools create an inter- and component behaviors like explo-
process had to be manipulated in face a designer can use to generate al- sions.7 In extreme cases, autonomous
unison, thereby making it an incred- ternative outputs, as in Figure 2, loop tools are able to generate large parts
ibly complex problem space. Machine 1. Through practice, designers learn of the game content that only later are
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 55
contributed articles
outputs in a way that leads to new hy- velopment, is decreasing, while the Foundation under grants IIS-1422066,
potheses with regard to what sets of demand for skills focused on how to CCF-1442840, IIS-1717473, and IIS-
input parameters should be tested in work with software tools is increas- 1745463.
the next batch of experiments. ing. Organizations need to engage
Adjustment. Evaluation by human more effectively with new forms of References
1. Allen, R.B. Mental models and user models. Chapter
designers can lead to the adjustment autonomous tools supporting design in Handbook of Human-Computer Interaction, Second
of parameter values (see Figure 2, loop Edition, M.G. Helander, T.K. Landauer, and P.V. Prabhu,
processes. This is not simply a shift Eds. North-Holland, Amsterdam, the Netherlands,
1) or even to changes in the mental of tasks from humans to machines 1997, 49–63.
2. Argyris, C. The executive mind and double-loop
model embedded in the autonomous but a deeper shift in the relationship learning. Organizational dynamics 11, 2 (Autumn
tool, resulting in changes in the algo- between humans and machines in 1982), 5–22.
3. Argyris, C. Teaching smart people how to learn.
rithms used; moreover, it might also the context of complex knowledge Harvard Business Review 69, 3 (May-June 1991).
change the mental models of human work. The shift puts humans in the 4. Argyris, C. Double-loop learning. Chapter in Wiley
Encyclopedia of Management, C.L. Cooper, P.C. Flood,
designers in terms of goals, cogni- role of coaches who guide tools to and Y. Freeney, Eds. John Wiley & Sons, Inc., New
tive rules, and underlying reasoning. perform according to their expecta- York, 2014.
5. Austin, R.D. and Devin, L. Artful Making: What
Changes of the mental model embed- tions and requirements (see Figure Managers Need to Know About How Artists Work.
ded in the autonomous tool could 2, loop 1) or in the role of laboratory Financial Times Press, Upper Saddle River, NJ, 2003.
6. Brown, C. and Linden, G. Chips and Change: How Crisis
change the tool’s constraints and pro- scientists conducting experiments to Reshapes the Semiconductor Industry. MIT Press,
pensities and require changes to the understand and modify the behavior Cambridge, MA, 2009.
7. Hendrikx, M., Meijer, S., Van Der Velden, J., and Iosup,
mental models of designers; likewise, of complex knowledge artifacts (see A. Procedural content generation for games: A
changes in the mental models of de- Figure 2, loop 2 and loop 3). survey. ACM Transactions on Multimedia Computing,
Communications, and Applications 9, 1 (Feb. 2013), 1.
signers could require changes to the 8. Jaderberg, M., Mnih, V., Czarnecki, W.M., Schaul,
T., Leibo, J.Z., Silver, D., and Kavukcuoglu, K.
algorithms and thus the mental model The Road Ahead Reinforcement learning with unsupervised auxiliary
embedded in the tool. Following each Engaging with autonomous tools re- tasks. In Proceedings of the Fifth International
Conference on Learning Representations (Toulon,
experiment, designers might thus quires reshaping the competencies France, Apr. 24–26, 2017).
have to continuously reconcile their designers need. Designers envision 9. Lake, B., Ullman, T., Tenenbaum, J., and Gershman,
S. Building machines that learn and think like people.
mental models with the counterpart certain results and thus need to inter- Behavioral and Brain Sciences 40, E253 (2017).
models embedded in the autonomous act with autonomous tools in ways that 10. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A.,
Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M.,
tool (see Figure 2, loop 3). help them realize their design vision. Fidjeland, A.K., Ostrovski, G., and Petersen, S. Human-
In order to change the mental model At the same time, the use of autono- level control through deep reinforcement learning.
Nature 518, 7540 (Feb. 2015), 529–533.
embedded in an autonomous tool, de- mous tools opens unprecedented 11. Sennet, R. The Craftsman. Allen Lane, London, U.K., 2008.
signers have to modify the underlying opportunities for creative problem 12. Werle, G. and Martinez, B. Ghost Recon Wildlands:
Terrain tools and technologies. Game Developers
algorithm. The original mental model solving. Consider the example of vid- Conference (San Francisco, CA, Feb. 27–Mar.
embedded in the tool—the one imple- eo game production, where autono- 3, 2017); https://666uille.files.wordpress.
com/2017/03/gdc2017_ghostreconwildlands_
mented by the tool designer—can thus mous tools are increasingly able to terrainandtechnologytools-onlinevideos1.pdf
evolve over time. procedurally generate artifacts of a 13. Yumer, M.E., Asente, P., Mech, R., and Kara, L.B.
Procedural modeling using autoencoder networks. In
Competencies related to these de- scope and scale that was not possible Proceedings of the 28th Annual ACM Symposium on
sign practices become critically im- User Interface Software & Technology (Charlotte, NC,
in the past. Future designers will con- Nov. 11–15). ACM Press, New York, 2015, 109–118.
portant for achieving complex design stantly be challenged to rethink their
outcomes. Having a detailed under- mental models, including their gen- Stefan Seidel (stefan.seidel@uni.li) is a professor and
standing of the designed artifact, as eral approach to design. The continu- the Chair of Information Systems and Innovation at the
Institute of Information Systems at the University of
well as of the consequences of specific ous reconciliation of mental models Liechtenstein, Vaduz, Liechtenstein.
local decisions, becomes less impor- embedded in both designer cogni-
Nicholas Berente (nberente@nd.edu) is an associate
tant. This explains why, in the context tion and their tools is an extension professor of IT, analytics, and operations in the Mendoza
of, say, chip design, we see software of traditional design processes that College of Business at the University of Notre Dame,
Notre Dame, IN, USA.
engineers displacing electrical engi- involve artful making where human
Aron Lindberg (alindberg@stevens.edu) is an assistant
neers with a deep understanding of actors gradually adjust their mental professor of information systems in the School of
physical aspects of chip design. Be- models to converge on solutions.5 Business of Stevens Institute of Technology, Hoboken,
NJ, USA.
cause the design is increasingly medi- The proposed three-loop model
ated by software that needs to be pa- contributes to the ongoing debate on Kalle Lyytinen (kalle@case.edu) is a Distinguished
University Professor and Iris S. Wolstein Professor of
rameterized and evaluated, designers’ how artificial intelligence will change Management Design at Case Western Reserve University,
Cleveland, OH, USA.
software skills become crucial; the knowledge work, challenging knowl-
table here outlines key implications edge workers to operate at a different Jeffrey V. Nickerson (jnickers@stevens.edu) is a
professor of information systems and the Associate Dean
in terms of emergent interrelated de- level. Designers may become increas- of Research of the School of Business at Stevens Institute
signer activities. ingly removed from the actual artifact of Technology, Hoboken, NJ, USA.
Some substitution of human de- but still use tools to create artifacts of a
sign activity through autonomous complexity never imagined before.
tools is indeed occurring. To a cer-
tain degree, demand for specific, Acknowledgments
manual-type competencies in design This material is based in part on work
professions, including software de- supported by the National Science Copyright held by authors.
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 57
contributed articles
DOI:10.1145/ 3210752
value, or aiming to generate significant
Featuring the various dimensions of data value for the organization; veracity, or
reliability of the processed data; and
management, it guides organizations variability, or the flexibility to adapt to
through implementation fundamentals. new data formats through collecting,
storing, and processing.
BY SERGIO ORENGA-ROGLÁ AND RICARDO CHALMETA Big data sources can include an
overall company itself (such as through
Framework for
log files, email messages, sensor data,
internal Web 2.0 tools, transaction
records, and machine-generated), as
well as external applications (such as
a Big Data
This data cannot be managed effi-
ciently through traditional methods17
(such as relational databases) since big
Ecosystem in
data requires balancing data integrity
and access efficiency, building indices
for unstructured data, and storing data
Organizations
with flexible and variable structures.
Aiming to address these challenges,
the NoSQL and NewSQL database sys-
tems provide solutions for different
scenarios.
Big data analytics can be used to
extract useful knowledge and analyze
large-scale, complex data from applica-
tions to acquire intelligence and extract
unknown, hidden, valid, and useful re-
lationships, patterns, and information.1
data have been generated and
EN ORM OU S AM O UNT S O F Various methods are used to deal with
such data, including text analytics, audio
stored over the past few years. The McKinsey Global analytics, video analytics, social media
Institute reports this huge volume of data, which is analytics, and predictive analytics; see
generated, stored, and mined to support both strategic the online appendix “Main Methods for
Big Data Analytics,” dl.acm.org/citation.
and operational decisions, is increasingly relevant to cfm?doid=3210752&picked=formats.
businesses, government, and consumers alike,7 as
they extract useful knowledge from it.11 key insights
There is no globally accepted definition of “big ˽˽ This fresh approach to the problem
of creating frameworks helps project
data,” although the Vs concept introduced by managers and system developers
implement big data ecosystems in
Gartner analyst Doug Laney in 2001 has emerged as business organizations.
a common structure to describe it. Initially, 3Vs were ˽˽ The related literature review of big data
for business management covers
used, and another 3Vs were added later.13 The 6Vs some of the existing frameworks used
that characterize big data today are volume, or very for this purpose.
with most trying to collect, process, and refine internal processes.25 Big data frameworks focus on assist-
and manage the potential offered by Big data analytics can be seen as a ing organizations to take advantage of
big data.5 To take advantage, organiza- more advanced form of business in- big data technology for decision mak-
tions need to generate or obtain a large telligence that works with structured ing. Each has its good points, although
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 59
contributed articles
each also has weaknesses that must be ing, and visualizing necessary to make
addressed, including that none include use of it. However, unlike other frame-
all dimensions (such as data architec- works, it focuses not only on operations
ture, organization, data sources, data affecting data but also other aspects of
quality, support tools, and privacy/se-
curity). Moreover, they lack a method- Data and management like human and material
resources, economic feasibility, profit es-
ology to guide the steps to be followed information are timation, type of data analysis, business
processes re-engineering, definition of
thus primary assets
in the process of developing and imple-
menting a big data ecosystem, making indicators, and system monitoring.
the process easier. They fail to provide
strong case studies in which they are
for organizations, The BD-IRIS framework includes
seven interrelated dimensions (see Fig-
evaluated, so their validity has not been with most trying ure 1): methodology, data architecture,
proved. They do not consider the im-
pact of the implementation of big data
to collect, process, organization, data sources, data qual-
ity, support tools, and privacy/security.
on human resources or organizational and manage The core is the methodology dimen-
and business processes. They do not
consider previous feasibility studies of the potential sion that serves as a guide for the steps
involved in implementing an ecosys-
big data ecosystem projects. They lack offered by big data. tem with big data technology includes
systems monitoring and a definition of phases, activities, and tasks supported
indicators. They fail to study or identify by the six other dimensions. These
the type of knowledge they need to man- other dimensions include various tech-
age. Moreover, they fail to define the niques, tools, and good practices that
type of data analysis required to address support each phase, activity, and task
organizational goals; see the online ap- of the methodology. Additionally, they
pendix for more on the frameworks and include properties and characteristics
their features and weaknesses. that must be fulfilled in certain stages
In addition to big data frameworks, of such development. With the excep-
system developers should also consid- tion of a methodology, the other six
er big data maturity models that define dimensions are included in some of
the states, or levels, where an enter- the seven frameworks outlined earlier,
prise or system can be situated, a set of though none includes all dimensions.
good practices, goals, and quantifiable Methodology dimension. This is the
parameters that make it possible to de- main axis of the framework; the other
termine on which of the levels the en- dimensions are techniques, tools, and
terprise stands, and a series of propos- good practices that support each phase,
als with which to evolve from one level and the activities and tasks within it. The
of maturity to a higher level.2 Several methodology provides practical guid-
such models have been proposed,15,16,24 ance for managing an entire project life
all focused on assessing big data matu- cycle by indicating the steps needed to
rity (the “as is”) and building a vision execute development and implementa-
for what the organization’s future big tion of big data ecosystems. The meth-
data state should be and why (the “to odology consists of phases that in turn
be”). There is thus a need for a new consist of activities that in turn consist
framework for managing big data eco- of tasks, whereby each one must be com-
systems that can be applied effectively pleted before the next one can begin.
and simply, accounting for the main Table 2 (see in the online appendix) lists
features of big data technology and the phases and activities that constitute
avoiding the weaknesses so identified. the methodology, along with the main
dimensions that support execution of
Proposed Framework the activities and tasks. The support-
In this context, the IRIS (the Spanish tools dimension is not included in Table
acronym for Systems Integration and 2 because it is present or can be present
Re-Engineering) research group at the in all tasks of the methodology, as dif-
Universitat Jaume I of Castellón, Spain, ferent information technology tools are
has proposed the Big Data IRIS (BD-IRIS) available to support each of them.
framework to deal with big data ecosys- The methodology can be applied in
tems, reflecting the literature dealing waterfall mode, or sequentially, for each
with this line of research. The BD-IRIS phase, activity, and task. It can also be
framework focuses on data and the tasks applied iteratively, whereby the project
of collecting, storing, processing, analyz- is divided into subprojects executed in
waterfall mode, with each subproject be- terns are applied by software engineers vanced data-analysis techniques are
gun when the previous one has finished; to ensure only valuable data is collect- applied, perhaps divided into two
for example, each subproject can cover ed. Traditional data sources are easier main groups: research and modeling.
an individual knowledge block or a tool. to link to because they consist of struc- Valuable information is obtained as
Data architecture dimension. This tured data. But social software poses a result of applying these techniques
dimension identifies the proposed a greater technological challenge, as to the collected data. Metadata is also
steps the software engineer performs it contains human information that generated, reducing the complexity
during data analysis. The order in is complex, unstructured, ubiquitous, and processing of queries or opera-
which each task is executed in each of multi-format, and multi-channel. tions that must be performed while
the steps and its relationship with the Enhancement. The main objec- endowing the data with meaning.
other dimensions of the framework are tives here are to endow the collected Data and metadata are stored in a da-
specified in the methodology dimen- data with value, identify and extract tabase for future queries, processing,
sion. The data architecture dimension information, and discover otherwise generation of new metadata, and/or
is divided into levels ranging from unknown relationships and patterns. training and validation of the models.
identifying the location and structure To add such endowment, various ad- Inquiry. Here, the system can ac-
of the data to the display of the results
requested by the organization. Figure Figure 1. BD-IRIS framework dimensions.
2 outlines the levels that make up the
data architecture, including:
Content. Here, the location and char-
acteristics of the data are identified
Data
(such as format and source of required Architecture
data, both structured and unstruc-
tured). In addition, the software engi-
neer performs a verification process to
Organizational Support Tools
check that data location and character-
istics are valid for the next level. Data
can be generated offline, through the Methodology
traditional ways of entering data (such
as open data sources and relational
databases in enterprise resource plan- Data Quality Data Sources
ning, customer relationship manage-
ment systems, and other management Privacy
information systems). In addition, data
and Security
can also be obtained online through
social media (such as LinkedIn, Face-
book, Google+, and Twitter).
Acquisition. Here, filters and pat-
Research
Connectors Query Plan
Structured Analysis User
Term Analysis Analysis Interaction
Data Sources Highlighting
Enhancement
Visualization
Acquisition
Query
Inquiry
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 61
contributed articles
cess the data and metadata stored in dimension is related to the character- project’s target users, including custom-
the system database generated at the istics and needs of the organization to ers, suppliers, and employees. It is also
enhancement level. The main mode provide data and processing and mak- necessary to define the overall corporate
of access is through queries, usually ing use of it. It is also related to all the transformation it is willing to make and
based on the Structured Query Lan- decisions the organization has to make the new business roles required to ex-
guage, that extract the required infor- to adapt the system to its needs. ploit big data technology. For example,
mation as needed. On the one hand, the organization’s a big data project could aim to use the
Visualization. This level addresses strategy must be analyzed, since big knowledge extracted from customer
presentation and visualization of the data projects must align with the or- data, products, and operations through
results, as well as interpretation of ganization’s business strategy. If not the organization’s processes to change
the meaning of the discovered infor- aligned, the results obtained may not be its business model and create value, op-
mation. Due to the nature of big data as valuable as they could be for the orga- timize business management, and iden-
and the large amount of data to be nization’s decision making. To achieve tify new business opportunities. These
processed, clarity and precision are such alignment, the organization must projects are thus potentially able to in-
important in the presentation and vi- determine the objectives the project is crease customer acquisition and satis-
sualization of the results. intended to achieve, as well as the orga- faction, as well as increase loyalty and
Organizational dimension. This nizational challenges involved and the reduce the rate of customer abandon-
ment. They can also improve business
Criteria for selecting appropriate tools. efficiency by, say, eliminating overpro-
duction and reducing the launch time
What is the price? of new products or services. In addition,
Is it a new product and/or company or well established? they can help negotiate better prices
with suppliers and improve customer
Is it an open source or commercial tool?
service. The project will thus be defined
If commercial, is a trial version available?
by the organization’s business strategy.
If commercial, is licensing per seat or per core? On the other hand, the resources offered
Is it platform independent? and the knowledge acquired through
What is the implementation time? big data technology allows optimization
What is the implementation cost?
of existing business processes by im-
proving them as much as possible.
Does it work in the cloud and use MapReduce and NoSQL features?
To integrate enterprise strategy, busi-
Can real-time features be used or integrated into a real-time system? ness process, and human resources, the
How easy is it to upgrade? BD-IRIS framework uses the ARDIN
How scalable is it? (the Spanish acronym for Reference Ar-
Can it work in batch and/or programmable mode? chitecture for INtegrated Development)
enterprise reference architecture, al-
How easy is it to use? Is a GUI available?
lowing project managers to redefine
What learning curve should be expected?
the conceptual aspects of the enterprise
How compatible is it with other products? (such as mission, vision, strategy, poli-
Does it work with big data? cies, and enterprise values), redesign
Does it offer an API? and implement the new business pro-
Can it integrate with geospatial data (such as GIS)?
cess map, and reorganize and manage
human resources considering in light
Does it provide modern techniques for data analysis?
of the new information and communi-
Can it handle missing data and data cleaning? cation technologies—big data in this
Will it be possible to incorporate new techniques (such as add-ons or modules) different from those case—to improve them.6
already implemented, as user needs evolve? In addition, models of the business
What is the speed of computations? Does it use memory efficiently? processes must be developed so weak
Does it support programming languages (such as C++, Python, Java, and R) rather than just some points and areas in need of improve-
internal ad hoc language? ment are detected. BD-IRIS uses sev-
Is it able to fetch data from the Internet or from databases (such as SQL-supported)? eral modeling languages:
Does it require connectors for databases? If yes, what do they cost? I*. I* makes it possible for project
Does it support the SQL language?
engineers to gain a better understand-
ing of organizational environments
Are visualization capabilities available?
and business processes, understand
Does it offer a Web or mobile client? the motivations, intentions, goals, and
Is good technical support, training, and documentation available? rationales of organizational manage-
Is benchmarking available? ment, and illustrate the various char-
acteristics seen in the early phases of
requirement specification.30
Business Process Model and Notation Big data technology is able to process
(BPMN). BPMN,20 designed to model both structured data (such as from re-
an overall map of an enterprise’s busi- lational databases, ERPs, CRMs, and
ness processes, includes 11 graphical, open data), as well as data from semi-
or modeling, elements classified into
four categories: core elements (the BPD Considering that structured and unstructured data (such
as from log files, machine-generated
core element set), flow objects, con-
necting objects, and “swimlanes” and
the foundation data, social media, transaction records,
sensor data, and GPS signals). Objec-
artifacts. BPMN 2.0 extends BPMN. of big data tives depend on the data that is available
Unified Modeling Language. UML2.019
is also used to model interactions among
ecosystems is data, to the organization. To ensure optimal
performance, the organization must de-
users and the technological platform in it is essential fine what data is of interest, identify its
greater detail without ambiguity.
In selecting these modeling lan-
that such data sources and formats, and perform, as
needed, the pre-processing of raw data.
guages, we took into account that they is reliable and Data is transformed into a format that
are intuitive, well-known by academ-
ics and practitioners alike, useful for provides value. is more readily “processable” by the
system. Methods for preprocessing raw
process modeling and information- data include feature extraction (select-
system modeling, and proven in real- ing the most significant specific data
world enterprise-scale settings. for certain contexts), transformation
Support-tools dimension. This di- (modifying it to fit a particular type of
mension consists of information-tech- input), sampling (selecting a represen-
nology tools that support all dimen- tative subset from a large dataset), nor-
sions in the framework, facilitating malization (organizing it with the aim
execution of the tasks to be performed of allowing more efficient access to it),
in each dimension. Each such task and “de-noising” (eliminating existing
can be supported by tools with certain noise in it). Once such operations are
characteristics; for example, some performed, data is available to the sys-
tools support only certain tasks, and tem for processing.
some tasks can be carried out with and Data-quality dimension. The aim
without the help of tools. here is to ensure quality in the acquisi-
The tools that can be used in each tion, transformation, manipulation,
dimension, except for data architec- and analysis of data, as well as in the
ture, are standard tools that can be validity of the results. Quality is the con-
used in any software-engineering sequence of multiple factors, includ-
project. Types of tools include busi- ing complexity (lack of simplicity and
ness management, office, case, project uniformity in the data), usability (how
management, indicator management, readily data can be processed and inte-
software testing, and quality manage- grated with existing standards and sys-
ment. The data architecture dimen- tems), time (timelines and frequency
sion requires specific tools for each of of data), accuracy (degree of accuracy
its levels; see Table 3 in the online ap- describing the measured phenome-
pendix for examples of tools that can non), coherence (how the data meets
be used at each level in the data archi- standard conventions and is internally
tecture dimension. consistent, over time, with other data
Several tools are able to perform the sources), linkability (how readily the
same tasks, and the choice of appropri- data can be linked or joined with other
ate tool for each project depends on data), validity (the data reflects what it
the scenario in which it is used. The is supposed to measure), accessibility
table here lists criteria to help prompt (ease of access to information), clarity
the questions that project engineers (availability of clear and unambiguous
must address when choosing the ap- descriptions, together with the data),
propriate tools for the particular needs and relevance (the degree of fidelity of
of each project. the results with regard to user needs, in
Data sources dimension. Consider- terms of measured concepts and repre-
ing that the foundation of big data eco- sented populations).29
systems is data, it is essential that such The United Nations Economic Com-
data is reliable and provides value. This mission for Europe29 has identified the
dimension refers to the sources of the actions software engineers should per-
data processed in big data ecosystems. form to ensure quality in data input and
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 63
contributed articles
how to improve business strategy or Although the framework has been July 16–20). Lecture Notes in Computer Science,
8557. Springer International Publishing, Switzerland,
when and how to carry out reengineer- validated through two different meth- 2014, 214–227.
ing of a business process using big data. ods—expert evaluation and case stud- 12. Ferguson, M. Architecting a Big Data Platform
for Analytics. IBM White Paper, Oct. 2012;
As a result, opportunities for improving ies—it also involves some notable limita- http://www-01.ibm.com/common/ssi/cgi-bin/
business performance can be lost. tions. For example, the methods we used ssialias?htmlfid=IML14333USEN
13. Flouris, I., Giatrakos, N., Deligiannakis, A., Garofalakis,
For this reason, the BD-IRIS frame- for the analysis and validation in the two M., Kamp, M., and Mock, M. Issues in complex event
work needs to be structured in all case studies are qualitative and not as processing: Status and prospects in the big data era.
Journal of Systems and Software 127 (May 2017),
seven dimensions. The main innova- precise as quantitative ones and based 217–236.
tion is the BD-IRIS methodology di- on the perceptions of the people involved 14. Gèczy, P. Big data management: Relational framework.
Review of Business & Finance Studies 6, 3 (2015), 21–30.
mension, along with the fact that it in the application of the framework in 15. Halper, F. and Krishnan, K. TDWI Big Data Maturity
Model Guide. TDWI Research, Renton, WA, 2013;
takes into account all the dimensions the case studies and the consultants who https://tdwi.org/whitepapers/2013/10/tdwi-big-data-
a big data framework should have evaluated it. Moreover, the evaluation maturity-model-guide.aspx
16. Hortonworks. Hortonworks Big Data Maturity Model, 2016;
within a single framework. The BD- experts were chosen from the same con- http://hortonworks.com/wp-content/uploads/2016/04/
IRIS methodology represents a guide sulting company to avoid potential bias. Hortonworks-Big-Data-Maturity-Assessment.pdf
17. Jagadish, H.V., Gehrke, J., Labrinidis, A.,
to producing a big data ecosystem ac- Finally, we applied the framework in two Papakonstantinou, Y., Patel, J.M., Ramakrishnan, R.,
cording to a process, covering the big companies in two different industrial and Shahabi, C. Big data and its technical challenges.
Commun. ACM 57, 7 (July 2014), 86–94.
data project life cycle and identifying sectors but have not yet tested its validity 18. Miller, H.G. and Mork, P. From data to decisions: A
when and how to use the approaches in other types of organization. value chain for big data. IT Professional 15, 1 (Jan.-
Feb. 2013), 57–59.
proposed in the other six dimensions. Regarding the scope of future work, 19. Object Management Group. Unified Modeling
The utility of the framework and its we are exploring four areas: apply and Language. OMG, 2000; http://www.uml.org/
20. Object Management Group. Business Process Model
completeness, level of detail, and ac- assess the framework in companies and Notation. OMG, 2011; http://www.omg.org/spec/
curacy of the relations among the from different industrial sectors; evalu- BPMN/2.0
21. Orenga-Roglá, S. and Chalmeta, R. Social customer
methodology tasks and the approach- ate the ethical implications of big data relationship management: Taking advantage of Web
es to other dimensions were validated systems; refine techniques for convert- 2.0 and big data technologies. SpringerPlus 5, 1462
(Aug. 2016), 1–17.
in 2016 by five expert professionals ing different input data formats into a 22. Orenga-Roglá, S. and Chalmeta, R. Methodology
for the implementation of knowledge management
from a Spanish consulting company common format to optimize the pro- systems 2.0: A case study in an oil and gas company.
with experience in big data projects, cessing and analysis of data in big data Business & Information Systems Engineering (Dec.
2017), 1–19; https://doi.org/10.1007/s12599-017-0513-1
and by managers of the two organiza- systems; and finally, refine the automat- 23. Pawlowski, J. and Bick, M. The global knowledge
tions (not experts in big data projects) ic identification of people in different management framework: Towards a theory for
knowledge management in globally distributed
participating in our case studies. Lack social networks, allowing companies to settings. Electronic Journal of Knowledge
of validation is a notable weakness of gather information entered by the same Management 10, 1 (Jan. 2012), 92–108.
24. Radcliffe, J. Leverage a Big Data Maturity Model to
the existing frameworks. person in a given social network. Build Your Big Data Roadmap. Radcliffe Advisory
Services, Ltd., Guildfor, U.K., 2014.
25. Sagiroglu, S. and Sinanc, D. Big data: A review. In
Conclusion References
Proceedings of the International Conference on
1. Adams, M.N. Perspectives on data mining. International
This article has explored a framework Journal of Market Research 52, 1 (Jan. 2010), 11–19.
Collaboration Technologies and Systems (San Diego,
CA, May 20–24). IEEE Press, 2013, 42–47.
for guiding development and imple- 2. Ahern, M., Clouse, A., and Turner, R. CMMI Distilled:
26. Shin, D.H. and Choi, M.J. Ecological views of big data:
A Practical Introduction to Integrated Process
mentation of big data ecosystems. We Perspectives and issues. Telematics and Informatics
Improvement, Second Edition. Addison-Wesley
32, 2 (May 2015), 311–320.
developed its initial design from the Longman Publishing Co., Inc., Boston, MA, 2003.
27. Sun, H. and Heller, P. Oracle Information Architecture:
3. Alfouzan, H.I. Big data in business. International
existing literature while providing ad- An Architect’s Guide to Big Data. Oracle White Paper,
Journal of Scientific & Engineering Research 6, 5 (May
Aug. 2012; https://d2jt48ltdp5cjc.cloudfront.net/
ditional knowledge. We then debugged, 2015), 1351–1352.
uploads/test1_3021.pdf
4. Bharadwaj, A., El Sawy, O.A., Pavlou, P.A., and
28. Tekiner, F. and Keane, J.A. Big data framework. In
refined, improved, and validated this Venkatraman, N. Digital business strategy: Toward a
Proceedings of the IEEE International Conference on
next generation of insights. MIS Quarterly 37, 2 (June
initial design through two methods— 2013), 471–482.
Systems, Man, and Cybernetics (Manchester, U.K., Oct.
13–16). IEEE Press, 2013, 1494–1499.
expert assessment and case studies—in 5. Brown, B., Chui, M., and Manyika, J. Are you ready for
29. United Nations Economic Commission for Europe.
the era of ‘big data’? McKinsey Quarterly 4 (Oct. 2011),
a Spanish metal fabrication company 24–35.
A Suggested Framework for the Quality of Big Data.
Deliverables of the UNECE Big Data Quality Task
and the Spanish division of an interna- 6. Chalmeta, R., Campos, C., and Grangel, R. Reference
Team. UNECE, Dec. 2014; http://www.unece.org/
architectures for enterprise integration. Journal of
tional oil and gas company. The results Systems and Software 57, 3 (July 2001), 175–191.
unece/search?q=A+Suggested+Framework+for+the+
Quality+of+Big+Data.+&op=Search
show the framework is considered valu- 7. Chui, M., Manyika, J., and Bughin, J. Big data’s
30. Yu, E. Why agent-oriented requirements engineering.
potential for businesses. Financial Times (May 13,
able by corporate management where 2011); https://www.ft.com/content/64095dba-7cd5-
In Proceedings of the Third International Workshop
on Requirements Engineering: Foundation of Software
the case studies were applied. 11e0-994d-00144feabdc0
Quality (Barcelona, Spain, June 16–17). Presses
8. Das, T.K. and Kumar, P.M. Big data analytics:
The framework is useful for guiding A framework for unstructured data analysis.
Universitaires de Namur, Namur, Belgium, 1997, 171–183.
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 65
contributed articles
DOI:10.1145/ 3198448
new computing paradigms that do
In its original form, the Church-Turing thesis breach the Church-Turing barrier, in
which the uncomputable becomes com-
concerned computation as Alan Turing putable, in an upgraded sense of “com-
and Alonzo Church used the term in 1936— putable”? Before addressing these ques-
human computation. tions, we first look back to the 1930s to
consider how Alonzo Church and Alan
Turing formulated, and sought to jus-
BY B. JACK COPELAND AND ORON SHAGRIR
tify, their versions of CTT. With this nec-
The Church-
essary history under our belts, we then
turn to today’s dramatically more pow-
erful versions of CTT.
Turing Thesis:
Turing stated what we will call “Turing’s
thesis” in various places and with vary-
ing degrees of rigor. The following for-
mulation is one of his most accessible.
Logical Limit
Turing’s thesis. “L.C.M.s [logical com-
puting machines, Turing’s expression
for Turing machines] can do anything
that could be described as … ‘purely me-
or Breachable
chanical’.”38
Turing also formulated his thesis
in terms of numbers. For example, he
said, “It is my contention that these op-
Barrier?
erations [the operations of an L.C.M.]
include all those which are used in
the computation of a number.”36 and
“[T]he ‘computable numbers’ include
all numbers which would naturally be
regarded as computable.”36
Church (who, like Turing, was work-
ing on the German mathematician
David Hilbert’s Entscheidungsproblem)
advanced “Church’s thesis,” which he
expressed in terms of definability in his
lambda calculus.
THE CHURCH-TURING THESIS (CTT) underlies tantalizing Church’s thesis. “We now define the
open questions concerning the fundamental place notion … of an effectively calculable
of computing in the physical universe. For example,
key insights
is every physical system computable? Is the universe
The term “Church-Turing thesis” is used
essentially computational in nature? What are the ˽˽
today for numerous theses that diverge
implications for computer science of recent speculation significantly from the one Alonzo Church
and Alan Turing conceived in 1936.
about physical uncomputability? Does CTT place a ˽˽ The range of algorithmic processes
fundamental logical limit on what can be computed, studied in modern computer science
far transcends the range of processes a
a computational “barrier” that cannot be broken, no “human computer” could possibly carry out.
matter how far and in what multitude of ways computers ˽˽ There are at least three forms of
the “physical Church-Turing thesis”—
develop? Or could new types of hardware, based perhaps modest, bold, and super-bold—though,
at the present stage of physical inquiry,
on quantum or relativistic phenomena, lead to radically it is unknown whether any of them is true.
function of positive integers by iden- nition,” Turing quickly proved that Church’s thesis have distinct meanings
tifying it with the notion of a recursive λ-definability and his own concept of and so are different theses, since they
function of positive integers (or of a computability (over positive integers) are not intensionally equivalent. A lead-
λ-definable function of positive inte- are equivalent. Church’s thesis and Tur- ing difference in their meanings is that
gers).”5 ing’s thesis are thus equivalent, if atten- Church’s thesis contains no reference
Church chose to call this a definition. tion is restricted to functions of positive to computing machinery, whereas Tur-
American mathematician Emil Post, on integers. (Turing’s thesis, more gen- ing’s thesis is expressed in terms of the
the other hand, referred to Church’s the- eral than Church’s, also encompassed “Turing machine,” as Church dubbed it
sis as a “working hypothesis” and criti- computable real numbers.) However, in his 1937 review of Turing’s paper.
cized Church for masking it in the guise it is important for a computer scientist It is now widely understood that
of a definition.33 to appreciate that despite this exten- Turing introduced his machines with
Upon learning of Church’s “defi- sional equivalence, Turing’s thesis and the intention of providing an idealized
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 67
contributed articles
description of a certain human activ- each going far beyond CTT-O. First, we Yanofsky in terms of equivalence class-
ity—numerical computation; in Tur- look more closely at the algorithmic es of programs, while Moshe Vardi has
ing’s day computation was carried out form of thesis, as stated to a first approx- speculated that an algorithm is both
by rote workers called “computers,” or, imation by Lewis and Papadimitriou29: abstract-state machine and recursor. It
sometimes, “computors”; see, for exam- “[W]e take the Turing machine to be a is also debated whether an algorithm
ple, Turing.37 The Church-Turing thesis precise formal equivalent of the intuitive must be physically implementable. Mos-
is about computation as the term was notion of ‘algorithm’.” chovakis and Vasilis Paschalis (among
used in 1936—human computation. others) adopt a concept of algorithm “so
Church’s term “effectively calculable What Is an Algorithm? wide as to admit ‘non-implementable’
function” was intended to refer to func- The range of algorithmic processes algorithms,”30 while other approaches
tions that are calculable by an idealized studied in modern computer science do impose a requirement of physical im-
human computer; and, likewise, Tur- far transcends the range of processes plementability, even if only a very mild
ing’s phrase “numbers which would a Turing machine is able to carry out. one. David Harel, for instance, writes:
naturally be regarded as computable” The Turing machine is restricted to, say, [A]ny algorithmic problem for which we
was intended to refer to those numbers changing at most one bounded part at can find an algorithm that can be pro-
that could be churned out, digit by digit, each sequential step of a computation. grammed in some programming lan-
by an idealized human computer work- As Yuri Gurevich pointed out, the con- guage, any language, running on some
ing ceaselessly. cept of an algorithm keeps evolving: “We computer, any computer, even one that
Here, then, is our formulation of have now parallel, interactive, distrib- has not been built yet but can be built
the historical version of the Church- uted, real-time, analog, hybrid, quan- … is also solvable by a Turing machine.
Turing thesis, as informed by Turing’s tum, etc. algorithms.”22 There are en- This statement is one version of the so-
proof of the equivalence of his and zymatic algorithms, bacterial foraging called Church/Turing thesis.”23
Church’s theses: algorithms, slime-mold algorithms, and Steering between these debates—
CTT-Original (CTT-O). Every function more. The Turing machine is incapable and following Harel’s suggestion that
that can be computed by the idealized of performing the atomic steps of algo- the algorithms of interest to computer
human computer, which is to say, can rithms carried out by, say, an enzymatic science are always expressible in pro-
be effectively computed, is Turing-com- system (such as selective enzyme bind- gramming languages—we arrive at the
putable. ing) or a slime mold (such as pseudopod following program-oriented formula-
Some mathematical logicians view extension). The Turing machine is simi- tion of the algorithmic thesis:
CTT-O as subject ultimately to either larly unable to duplicate (as opposed to CTT-Algorithm (CTT-A). Every algo-
mathematical proof or mathemati- simulate) John Conway’s Game of Life, rithm can be expressed by means of a
cal refutation, like open mathematical where—unlike a Turing machine—ev- program in some (not necessarily cur-
conjectures, as in the Riemann hypoth- ery cell updates simultaneously. rently existing) Turing-equivalent pro-
esis, while others regard CTT-O as not A thesis aiming to limit the scope gramming language.
amenable to mathematical proof but of algorithmic computability to Turing There is an option to narrow CTT-A
supported by philosophical arguments computability should thus not state by adding “physically implementable”
and an accumulation of mathematical that every possible algorithmic process before “program,” although in our view
evidence. Few logicians today follow can be performed by a Turing machine. this would be to lump together two dis-
Church in regarding CTT-O as a defini- The way to express the thesis is to say tinct computational issues that are bet-
tion. We subscribe to Turing’s view of the extensional input-output function ter treated separately.
the status of CTT-O, as we outline later. ια associated with an algorithm α is al- The evolving nature and open-end-
In computer science today, algo- ways Turing-computable; ια is simply edness of the concept of an algorithm is
rithms and effective procedures are, of the extensional mapping of α’s inputs matched by a corresponding open-end-
course, associated not primarily with to α’s outputs. The algorithm the Tur- edness in the concept of a programming
humans but with machines. (Note, while ing machine uses to compute ια might language. But this open-endedness not-
some expositors might distinguish be- be very different from α itself. A ques- withstanding, CTT-A requires that all
tween the terms “algorithm” and “ef- tion then naturally arises: If an algo- algorithms be bounded by Turing com-
fective procedure,” we use the terms in- rithmic process need not be one a Tur- putability.
terchangeably.) Many computer science ing machine can carry out, save in the Later in this article we examine com-
textbooks formulate the Church-Turing weak sense just mentioned, then where plexity-theoretic and physical versions
thesis without mentioning human com- do the boundaries of this concept lie? of the Church-Turing thesis but first
puters at all; examples include the well- What indeed is an algorithm? turn to the question of the justification
known books by Hopcroft and Ullman24 The dominant view in computer sci- of the theses introduced so far. Are CTT-
and Lewis and Papadimitriou.29 This is ence is that, ontologically speaking, al- O and CTT-A correct?
despite the fact that the concept of hu- gorithms are abstract entities; however,
man computation was at the heart of there is debate about what abstract en- What Justifies the
both Turing’s and Church’s analysis of tities algorithms are. Gurevich defined Church-Turing Thesis?
computation. the concept in terms of abstract-state Stephen Kleene—who coined the term
We discuss several important mod- machines, Yiannis Moschovakis in “Church-Turing thesis”—catalogued
ern forms of the Church-Turing thesis, terms of abstract recursion, and Noson four types of argument for CTT-O: First,
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 69
contributed articles
Cobham-Edmonds thesis, while Yao40 stated a similar thesis, describing it as binary sequence; Church showed such
introduced the term “Extended Church- “the physical version of the Church- sequences are uncomputable, as we
Turing thesis.” The thesis is of interest Turing principle.”17 The thesis is now discussed elsewhere.8 Moreover, specu-
only if P ≠ NP, since otherwise it is trivial. known as the Church-Turing-Deutsch lation that there may be deterministic
Quantum-computation researchers thesis and the Church-Turing-Deutsch- physical processes whose behavior can-
also use a variant of this thesis, as ex- Wolfram thesis. not be calculated by the universal Tur-
pressed in terms of probabilistic Turing Church-Turing-Deutsch-Wolfram the- ing machine stretches back over several
machines. Bernstein and Vazirani3 said: sis (CTDW). Every finite physical system decades; for a review, see Copeland.9 In
“[C]omputational complexity theory can be simulated to any specified de- 1981, Pour-El and Richards34 showed
rests upon a modern strengthening of gree of accuracy by a universal Turing that a system evolving from computable
[the Church-Turing] thesis, which as- machine. initial conditions in accordance with
serts that any ‘reasonable’ model of Deutsch pointed out that if “simu- the familiar three-dimensional wave
computation can be efficiently simulat- lated” is understood as “perfectly simu- equation is capable of exhibiting be-
ed on a probabilistic Turing machine.”3 lated,” then the thesis is falsified by con- havior that falsifies CTT-P; even today,
Aharonov and Vazirani1 give the fol- tinuous classical systems, since such however, it is an open question whether
lowing formulation of this assumption, classical systems necessarily involve un- these initial conditions are physically
naming it the “Extended Church-Turing computable real numbers, and went on possible. Earlier papers, from the 1960s,
thesis”—though it is not quite the same to introduce the concept of a universal by Bruno Scarpellini, Arthur Komar,
as Yao’s earlier thesis of the same name, quantum computer, saying such a com- and Georg Kreisel, in effect questioned
which did not refer to probabilistic Tur- puter is “capable of perfectly simulating CTT-P, with Kreisel stating: “There is no
ing machines: every finite, realizable physical system.” evidence that even present-day quan-
CTT-Extended (CTT-E). “[A]ny reason- Other physical formulations were ad- tum theory is a mechanistic, i.e., recur-
able computational model can be simu- vanced by Lenore Blum et al., John Ear- sive theory in the sense that a recur-
lated efficiently by the standard model man, Itamar Pitowsky, Marian Pour-El, sively described system has recursive
of classical computation, namely, a and Ian Richards, among others. behavior.”27 Other potential counterex-
probabilistic Turing machine.”1 We next formulate a strong version amples to CTT-P have been described
As is well known in computer science, of the physical Church-Turing thesis we by a number of authors, including what
Peter Shor’s quantum algorithm for call the “total physical computability are called “relativistic” machines. First
prime factorization is a potential coun- thesis.” (We consider some weaker ver- introduced by Pitowsky,32 they will be
terexample to CTT-E; the algorithm runs sions later in the article.) By “physical examined in the section called “Relativ-
on a quantum computer in polynomial system” we mean any system whose be- istic Computation.”
time and is much faster than the most- havior is in accordance with the actual
efficient known “classical” algorithm laws of physics, including non-actual CTT-P and Quantum Mechanics
for the task. But the counterexample is and idealized systems. There are a number of theoretical coun-
controversial. Some computer scientists Total physical computability thesis termodels to CTT-P arising from quan-
think the quantum computer invoked (CTT-P). Every physical aspect of the tum mechanics. For example, in 1964,
is not a physically reasonable model of behavior of any physical system can be Komar26 raised “the issue of the macro-
computation, while others think accom- calculated (to any specified degree of ac- scopic distinguishability of quantum
modating these results might require curacy) by a universal Turing machine. states,” asserting there is no effective
further modifications to complexity As with CTT-E, there is also a proba- procedure “for determining whether
theory. bilistic version of CTT-P, formulated in two arbitrarily given physical states can
We turn now to extensions of the terms of a probabilistic Turing machine. be superposed to show interference ef-
Church-Turing thesis into physics. Arguably, the phrase “physical ver- fects.” In 2012, Eisert et al.19 showed
sion of the Church-Turing thesis” is an “[T]he very natural physical problem of
Physical Computability inappropriate name for this and related determining whether certain outcome
The issue of whether every aspect of the theses, since CTT-O concerns a form of sequences cannot occur in repeated
physical world is Turing-computable effective or algorithmic activity and as- quantum measurements is undecid-
was broached by several authors in the serts the activity is always bounded by able, even though the same problem
1960s and 1970s, and the topic rose to Turing computability, while CTT-P and for classical measurements is readily
prominence in the mid-1980s. CTDW, on the other hand, entail that decidable.” This is an example of a prob-
In 1985, Stephen Wolfram formu- the activity of every physical system is lem that refers unboundedly to the fu-
lated a thesis he described as “a physical bounded by Turing computability; the ture but not to any specific time. Other
form of the Church-Turing hypothesis,” system’s activity need not be algorith- typical physical problems take the same
saying, “[U]niversal computers are as mic/effective at all. Nevertheless, in our form; Pitowsky gave as examples “Is the
powerful in their computational capaci- “CTT-” nomenclature, we follow the solar system stable?” and “Is the mo-
ties as any physically realizable system Deutsch-Wolfram tradition throughout tion of a given system, in a known initial
can be, so that they can simulate any this article. state, periodic?”
physical system.”39 In the same year, Da- Is CTT-P true? Not if physical systems Cubitt et al.14 described another such
vid Deutsch, who laid the foundations of include systems capable of producing undecidability result in a 2015 Nature
quantum computation, independently unboundedly many digits of a random article, outlining their proof that “[T]he
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 71
contributed articles
spectral gap problem is algorithmically admitted the model invoked in their weak to rule out the uncomputabil-
undecidable: There cannot exist any al- proof is highly artificial, saying, “Wheth- ity scenarios described by Cubitt et al.14
gorithm that, given a description of the er the results can be extended to more and by Eisert et al.19 This is because the
local interactions, determines whether natural models is yet to be determined.” physical processes involved in these
the resultant model is gapped or gap- There is also the question of whether the scenarios may, so far as we know, be
less.” Cubitt et al. also said this is the spectral gap problem becomes comput- Turing-computable; it is possible that
“first undecidability result for a major able when only local Hilbert spaces of each process can be simulated by a Tur-
physics problem that people would re- realistically low dimensionality are con- ing machine, to any required degree
ally try to solve.” sidered. Nevertheless, these results are of accuracy, and yet the answers to cer-
The spectral gap, an important deter- certainly suggestive: CTT-P cannot be tain physical questions about the pro-
minant of a material’s properties, refers taken for granted, even in a finite quan- cesses are, in general, uncomputable.
to the energy spectrum immediately tum universe. The situation is similar in the case of
above the ground-energy level of a quan- Summarizing the current situa- the universal Turing machine itself. The
tum many-body system, assuming a tion with respect to CTT-P, we can say, machine’s behavior (consisting of the
well-defined least-energy level of the sys- although theoretical countermodels physical actions of the read/write head)
tem exists; the system is said to be “gap- in which CTT-P is false have been de- is always Turing-computable since it is
less” if this spectrum is continuous and scribed, there is at present—so far as produced by the Turing machine’s pro-
“gapped” if there is a well-defined next- we know—not a shred of evidence that gram, yet the answers to some questions
least energy level. The spectral gap prob- CTT-P is false in the actual universe. Yet about the behavior (such as whether or
lem for a quantum many-body system is it would seem most premature to assert not the machine halts given certain in-
the problem of determining whether the that CTT-P is true. puts) are not computable.
system is gapped or gapless, given the fi- Nevertheless, bold forms (such as
nite matrices (at most three) describing Weaker Physical CTDW) are interesting empirical hy-
the local interactions of the system. Computability Theses potheses in their own right and the
In their proof, Cubitt et al.14 encoded Piccinini31 has distinguished between world might confute them. For in-
the halting problem in the spectral gap two different types of physical versions stance, CTDW fails in the wave-equa-
problem, showing the latter is at least as of the Church-Turing thesis, both com- tion countermodel due to Pour-El and
hard as the former. The proof involves monly found in the literature, describ- Richards34 where the mapping between
an infinite family of two-dimensional ing them as “bold” and “modest” ver- the wave equation’s “inputs” and “out-
lattices of atoms. But they pointed out sions of the thesis, respectively. The puts” is not a Turing-computable (real)
their result also applies to finite systems bold and modest versions are weaker function; although, as noted earlier, the
whose size increases, saying, “Not only than our “super-bold” version just dis- physicality of this countermodel can
can the lattice size at which the system cussed (CTT-P). Bold versions of the readily be challenged. We discuss some
switches from gapless to gapped be arbi- thesis state, roughly, that “Any physical other potential countermodels later in
trarily large, the threshold at which this process can be simulated by some Tur- the article, but turn first to what Picci-
transition occurs is uncomputable.” ing machine.”31 The Church-Turing- nini termed “modest” versions of the
Their proof offers an interesting coun- Deutsch-Wolfram thesis (CTDW) is an thesis.
termodel to CTT-P, involving a physical- example, though Piccinini emphasized Modest versions maintain in es-
ly relevant example of a finite system of that the bold versions proposed by dif- sence that every physical computing
increasing size. There exists no effective ferent researchers are often “logically process is Turing-computable; for two
method for extrapolating the system’s independent of one another” and that, detailed formulations, see Gandy20 and
future behavior from (complete descrip- unlike the different formulations of Copeland.8 Even if CTT-P and CTDW
tions of) its current and past states. CTT-O, which exhibit confluence, the are in general false, the behavior of the
It is debatable whether any of these different bold formulations in fact ex- subset of physical systems that are ap-
quantum models correspond to real- hibit “lack of confluence.”31 propriately described as computing sys-
world quantum systems. Cubitt et al.14 CTDW and other bold forms are too tems may nevertheless be bounded by
Turing-computability. An illustration of
Relationships between the three physical computability theses: CTT-P, CTDW, and CTT-P-C. the difference between modest versions
on the one hand and CTT-P and CTDW
Physical computability theses on the other is given by the fact that the
wave-equation example is not a counter-
super-bold CTT-P Total Physical Computability Thesis model to the modest thesis, assuming,
as seems reasonable, that the physical
dynamics described by the equation do
bold CTDW Church-Turing-Deutsch-Wolfram Thesis not constitute a computing process.
Here, we formulate a modest version
of the physical Church-Turing thesis we
modest CTTP-P-C Physical Computation Thesis call the “Physical Computation” thesis,
then turn to the question of whether it
is true.
Physical Computation Thesis the entire endless lifetime of one com- nal will have been received by TO before
This form of the thesis maintains that ponent of the machine is included in time t. So TO will fall into the black hole
physical computation is bounded by the finite chronological past of another with 1 in its output cell if TE halted and
Turing-computability. component, called “the observer.” The 0 if TE never halted. Fortunately, TO can
Physical computation thesis (CTT-P-C). first component could thus carry out an escape annihilation if its trajectory is
Every function computed by any physi- infinite computation (such as calculat- carefully chosen in advance, says Néme-
cal computing system is Turing-com- ing every digit of π) in what is, from the ti; the rotational forces of the Kerr hole
putable. observer’s point of view, a finite times- counterbalance the gravitational forces
Is CTT-P-C true? As with the stronger pan of, say, one hour. (Such machines that would otherwise “spaghettify” TO.
physical computability theses, it seems are in accord with Einstein’s general the- TO thus emerges unscathed from the
too early to say. CTT-P-C could be false ory of relativity, hence the term “relativ- hole and goes on to use the computed
only if CTT-P and CTDW turn out to be istic.”) Examples of relativistic compu- value of the halting function in further
false, since each of them entails CTT-P- tation have been detailed by Pitowsky, computations.
C (see the figure here, which outlines the Mark Hogarth, and Istvan Németi. Németi and colleagues emphasize
relationships among CTT-P, CTDW, and In this section we outline a relativistic their machine is physical in the sense
CTT-P-C). If all physical computation machine RM consisting of a pair of com- it is “not in conflict with presently ac-
is effective in the 1930s sense of Turing municating Turing machines, TE and cepted scientific principles” and, in par-
and Church, then CTT-P-C is certainly TO, in relative motion. TE is a universal ticular, “the principles of quantum me-
true. If, however, the door is open to a machine, and TO is the observer. RM is chanics are not violated.”2 They suggest
broadened sense of computation, where able to compute the halting function, in humans might “even build” a relativistic
physical computation is not necessarily a broad sense of computation. Speaking computer “sometime in the future.”2
effective in the sense of being bounded of computation here seems appropriate, This is, of course, highly controversial.
by Turing-computability, then CTT-P-C since RM consists of nothing but two However, our point is that Németi’s the-
makes a substantive claim. communicating Turing machines. oretical countermodel, which counters
There is, in fact, heated debate Here is how RM works. When the in- not only CTT-P-C but also CTT-P and
among computer scientists and phi- put (m,n), asking whether the mth Tur- CTDW, helps underscore that the “phys-
losophers about what counts as physi- ing machine (in some enumeration ical version of the Church-Turing thesis”
cal computation. Moreover, a number of the Turing machines) halts or not is quite independent of CTT-O, since the
of attempts have sought to describe a when started on input n, enters TO, TO countermodel stands whether or not
broadened sense of computation in first prints 0 (meaning “never halts”) CTT-O is endorsed. We next reconsider
which computation is not bounded in its designated output cell and then CTT-A.
by Turing-computability; see, for ex- transmits (m,n) to TE. TE simulates the
ample, Copeland.6 Computing ma- computation performed by the mth Tur- CTT-A and Computation in the Broad
chines that compute “beyond the Tur- ing machine when started on input The continuing expansion of the con-
ing limit” are known collectively as n and sends a signal back to TO if and cept of an algorithm is akin to the exten-
“hypercomputers,” a term introduced only if the simulation terminates. If sion of the concept of number from inte-
in Copeland and Proudfoot.11 Some of TO receives a signal from TE, TO deletes gers to signed integers to rational, real,
the most thought-provoking examples the 0 it previously wrote in its output and complex numbers. Even the con-
of notional machines that compute in cell and writes 1 in its place (meaning cept of human computation underwent
the broad sense are called “supertask” “halts”). After one hour, TO’s output an expansion; before 1936, computation
machines. These “Zeno computers” cell shows 1 if the mth Turing machine was conceived of in terms of total func-
squeeze infinitely many computational halts on input n and shows 0 if the mth tions, and it was Kleene in 1938 who ex-
steps into a finite span of time. Exam- machine does not halt on n. plicitly extended the conception to also
ples include accelerating machines,7,12 The most physically realistic version cover partial functions.
shrinking machines, and the intrigu- of this setup to date is due to Németi and Gurevich argued in 2012 that formal
ing relativistic computers described in his collaborators in Budapest. TE, an or- methods cannot capture the algorithm
the next section. dinary computer, remains on Earth, concept in its full generality due to the
Notional machines all constitute while the observer TO travels toward and concept’s open-ended nature; at best,
rather theoretical countermodels to enters a slowly rotating Kerr black hole. formal methods provide treatments of
CTT-P-C, so long as it is agreed that TO approaches the outer event horizon, “strata of algorithms” that “have ma-
they compute in a broadened sense, but a bubble-like hypersurface surrounding tured enough to support rigorous defi-
none has been shown to be physically the black hole. Németi theorized that nitions.”22 An important question for
realistic, although, as we explain, rela- the closer TO gets to the event horizon, computer science is whether CTT-A is
tivistic computers come close. In short, the faster TE’s clock runs relative to TO a reasonable constraint on the growth
the truth or falsity of CTT-P-C remains due to Einsteinian gravitational time di- of new strata. Perhaps not. In 1982,
unsettled. lation, and this speeding up continues Jon Doyle18 suggested equilibrating
with no upper limit. TO motion proceeds systems with discrete spectra (such as
Relativistic Computation until, relative to a time t on TO clock, the molecules and other quantum many-
Relativistic machines operate in space- entire span of TE’s computing is over. body systems) illustrate a concept of
time structures with the property that If any signal was emitted by TE, the sig- effectiveness that is broader than the
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 73
contributed articles
classical concept, saying, “[E]quilibrat- Conclusion 18. Doyle, J. What is Church’s thesis? An outline. Minds
and Machines 12, 4 (Nov. 2002), 519–520.
ing can be so easily, reproducibly, and In the computational literature the term 19. Eisert, J., Müller, M.P., and Gogolin, C. Quantum
mindlessly accomplished” that we may “Church-Turing thesis” is applied to a measurement occurrence is undecidable. Physical
Review Letters 108, 26 (June 2012), 1–5.
“take the operation of equilibrating as variety of different propositions usu- 20. Gandy, R.O. Church’s thesis and principles for
an effective one,” even if “the functions ally not equivalent to the original the- mechanisms. In Proceedings of the Kleene
Symposium, J. Barwise, H.J. Keisler, and K. Kunen,
computable in principle given Turing’s sis—CTT-O; some even go far beyond Eds. (Madison, WI, June 1978). North-Holland,
operations and equilibrating include anything either Church or Turing wrote. Amsterdam, Netherlands, 1980.
21. Goldreich, O. Computational Complexity: A Conceptual
non-recursive functions.” Several but not all are fundamental as- Perspective. Cambridge University Press, New York, 2008.
Over the years, there have been sever- sumptions of computer science. Others 22. Gurevich, Y. What is an algorithm? In Proceedings of
the 38th Conference on Current Trends in the Theory
al departures from Turing’s 1936 analy- (such as the various physical comput- and Practice of Computer Science (Špindleůrv Mlýn,
Czech Republic, Jan. 21–27), M. Bieliková, G. Friedrich,
sis, as the needs of computer science ability theses we have discussed) are im- G. Gottlob, S. Katzenbeisser, and G. Turán, Eds.
led to a broadening of the algorithm portant in the philosophy of computing Springer, Berlin, Heidelberg, Germany, 2012.
23. Harel, D. Algorithmics: The Spirit of Computing,
concept. For example, Turing’s fourth and the philosophy of physics but are Second Edition. Addison-Wesley, Reading, MA, 1992.
axiom, which bounds the number of highly contentious; indeed, the label 24. Hopcroft, J.E. and Ullman, J.D. Introduction to
Automata Theory, Languages, and Computation.
parts of a system that can be changed “Church-Turing thesis” should not mis- Addison-Wesley, Reading, MA, 1979.
simultaneously, became irrelevant lead computer scientists or anyone else 25. Kleene, S.C. Introduction to Metamathematics. Van
Nostrand, New York, 1952.
when the algorithm concept broadened into thinking they are established fact 26. Komar, A. Undecidability of macroscopically
to cover parallel computations. The fu- or even that Church or Turing endorsed distinguishable states in quantum field theory.
Physical Review 133, 2B (Jan. 1964), 542–544.
ture computational landscape might them. 27. Kreisel, G. Mathematical logic: What has it done for
conceivably include more extensive re- the philosophy of mathematics? Chapter in Bertrand
Russell: Philosopher of the Century, R. Schoenman, Ed.
visions of the concept, if, for example, References
Allen and Unwin, London, U.K., 1967.
1. Aharonov, D. and Vazirani, U.V. Is quantum mechanics
physicists were to discover that hard- 28. Kripke, S.A. Another approach: The Church-Turing
falsifiable? A computational perspective on the
‘thesis’ as a special corollary of Gödel’s completeness
ware effective in Doyle’s extended sense foundations of quantum mechanics. Chapter in
theorem. Chapter in Computability: Gödel, Turing,
Computability: Gödel, Turing, Church and Beyond, B.J.
is a realistic possibility. Church, and Beyond, B.J. Copeland, C.J. Posy, and O.
Copeland, C.J. Posy, and O. Shagrir, Eds. MIT Press,
Shagrir, Eds. MIT Press, Cambridge, MA, 2013.
Cambridge, MA, 2013.
If such hardware were to be devel- 2. Andréka, H., Németi, I., and Németi, P. General relativistic
29. Lewis, H.R. and Papadimitriou, C.H. Elements of the
Theory of Computation. Prentice Hall, Upper Saddle
oped—hardware in which operations hypercomputing and foundation of mathematics. Natural
River, NJ, 1981.
Computing 8, 3 (Sept. 2009), 499–516.
are effective in the sense of being “eas- 3. Bernstein, E. and Vazirani, U. Quantum complexity
30. Moschovakis, Y.N. and Paschalis, V. Elementary
algorithms and their implementations. Chapter in New
ily, reproducibly, and mindlessly ac- theory. SIAM Journal on Computing 26, 5 (Oct. 1997),
Computational Paradigms: Changing Conceptions of
1411–1473.
complished” but not bounded by Turing 4. Castelvecchi, D. Paradox at the heart of mathematics
What Is Computable, S.B. Cooper, B. Lowe, and A.
Sorbi, Eds. Springer, New York, 2008.
computability—then would the appro- makes physics problem unanswerable. Nature 528
31. Piccinini, G. The physical Church-Turing thesis: Modest
(Dec. 9, 2015), 207.
priate response by computer scientists 5. Church, A. An unsolvable problem of elementary
or bold? The British Journal for the Philosophy of
Science 62, 4 (Aug. 2011), 733–769.
be to free the algorithm concept from number theory. American Journal of Mathematics 58,
32. Pitowsky, I. The physical Church thesis and physical
2 (Apr. 1936), 345–363.
CTT-A? Or should CTT-A remain as a 6. Copeland, B.J. The broad conception of computation.
computational complexity. Iyyun 39, 1 (Jan. 1990), 81–99.
33. Post, E.L. Finite combinatory processes: Formulation
constraint on algorithms, with instead American Behavioral Scientist 40, 6 (May 1997),
I. The Journal of Symbolic Logic 1, 3 (Sept. 1936),
690–716.
two different species of computation be- 7. Copeland, B.J. Even Turing machines can compute
103–105.
34. Pour-El, M.B. and Richards, I.J. The wave equation
ing recognized, called, say, algorithmic uncomputable functions. Chapter in Unconventional
with computable initial data such that its unique
Models of Computation, C. Calude, J. Casti, and M.
computation and non-algorithmic com- solution is not computable. Advances in Mathematics
Dinneen, Eds. Springer, Singapore, 1998.
39, 3 (Mar. 1981), 215–239.
putation? Not much rides on a word, but 8. Copeland, B.J. Narrow versus wide mechanism:
35. Sieg, W. Mechanical procedures and mathematical
Including a re-examination of Turing’s views on the
we note we prefer “effective computa- experience. Chapter in Mathematics and Mind, A.
mind-machine issue. The Journal of Philosophy 97, 1
George, Ed. Oxford University Press, New York, 1994.
(Jan. 2000), 5–32.
tion” for computation that is bounded 9. Copeland, B.J. Hypercomputation. Minds and Machines
36. Turing, A.M. On computable numbers, with an
application to the Entscheidungsproblem (1936); in
by Turing computability and “neo-ef- 12, 4 (Nov. 2002), 461–502.
Copeland.10
10. Copeland, B.J. The Essential Turing: Seminal Writings
fective computation” for computation in Computing, Logic, Philosophy, Artificial Intelligence,
37. Turing, A.M. Lecture on the Automatic Computing
Engine (1947); in Copeland.10
that is effective in Doyle’s sense and not and Artificial Life, Plus the Secrets of Enigma. Oxford
38. Turing, A.M. Intelligent Machinery (1948); in
University Press, Oxford, U.K., 2004.
bounded by Turing computability, with 11. Copeland, B.J. and Proudfoot, D. Alan Turing’s
Copeland.10
39. Wolfram, S. Undecidability and intractability in
“neo” indicating a new concept related forgotten ideas in computer science. Scientific
theoretical physics. Physical Review Letters 54, 8 (Feb.
American 280, 4 (Apr. 1999), 98–103.
to an older one. 12. Copeland, B.J. and Shagrir, O. Do accelerating Turing
1985), 735–738.
40. Yao, A.C.C. Classical physics and the Church-Turing
The numerous examples of notional machines compute the uncomputable? Minds and
thesis. Journal of the ACM 50, 1 (Jan. 2003), 100–105.
Machines 21, 2 (May 2011), 221–239.
“hypercomputers” (see Copeland9 for 13. Copeland, B.J. and Shagrir, O. Turing versus Gödel on
a review) prompt similar questions. In- computability and the mind. Chapter in Computability: B. Jack Copeland (jack.copeland@canterbury.ac.nz) is
Gödel, Turing, Church, and Beyond, B.J. Copeland, Distinguished Professor of Philosophy at the University of
terestingly, a study of the expanding lit- C.J. Posy, and O. Shagrir, Eds. MIT Press, Cambridge, Canterbury in Christchurch, New Zealand, and Director of
erature about the concept of an infinite- MA, 2013. the Turing Archive for the History of Computing, also at
14. Cubitt, T.S., Perez-Garcia, D., and Wolf, M.M. the University of Canterbury.
time Turing machine, introduced by Undecidability of the spectral gap. Nature 528, 7581
Joel Hamkins and Andy Lewis in 2000, (Dec. 2015), 207–211. Oron Shagrir (oron.shagrir@gmail.com) is Schulman
15. Davis, M. Why Gödel didn’t have Church’s thesis. Professor of Philosophy and Cognitive Science at the
shows that a number of computer sci- Information and Control 54, 1-2 (July 1982), 3–24. Hebrew University of Jerusalem, Jerusalem, Israel.
16. Dershowitz, N. and Gurevich, Y. A natural
entists are prepared to describe the in- axiomatization of computability and proof of Church’s
finite-time machine as computing the thesis. Bulletin of Symbolic Logic 14, 3 (Sept. 2008),
299–350.
halting function. Perhaps this indicates 17. Deutsch, D. Quantum theory, the Church-Turing
the concept of computation is already principle and the universal quantum computer.
Proceedings of the Royal Society of London A:
in the process of bifurcating into “effec- Mathematical, Physical and Engineering Sciences 400, Copyright held by the authors.
tive” and “neo-effective” computation. 1818 (July 1985), 97–117. Publication rights licensed to ACM. $15.00
Intelligent
ARINDAM BANERJEE
University of Minnesota
KIRK BORNE
Booz Allen Hamilton
Systems for
GARY BUST
Johns Hopkins University
MICHELLE CHEATHAM
Wright State University
Geosciences:
IMME EBERT-UPHOFF
Colorado State University
CARLA GOMES
Cornell University
MARY HILL
An Essential
University of Kansas
JOHN HOREL
University of Utah
LESLIE HSU
Research Agenda
Columbia University
JIM KINTER
George Mason University
CRAIG KNOBLOCK
University of Southern California
DAVID KRUM
University of Southern California
VIPIN KUMAR
University of Minnesota
PIERRE LERMUSIAUX
Massachusetts Institute of Technology
YAN LIU
University of Southern California
CHRIS NORTH MANY ASPECTS OF geosciences pose novel problems
Virginia Tech
VICTOR PANKRATIUS
for intelligent systems research. Geoscience data
Massachusetts Institute of Technology is challenging because it tends to be uncertain,
SHANAN PETERS
University of Wisconsin-Madison
intermittent, sparse, multiresolution, and multi-
BETH PLALE scale. Geosciences processes and objects often have
Indiana University Bloomington
amorphous spatiotemporal boundaries. The lack of
ALLEN POPE
University of Colorado Boulder ground truth makes model evaluation, testing, and
SAI RAVELA comparison difficult. Overcoming these challenges
Massachusetts Institute of Technology
JUAN RESTREPO
requires breakthroughs that would significantly
Oregon State University transform intelligent systems, while greatly benefitting
IMAGE BY PHOTO BA NK GA LLERY
AARON RIDLEY
University of Michigan
the geosciences in turn. Although there have been
HANAN SAMET significant and beneficial interactions between the
University of Maryland
SHASHI SHEKHAR
intelligent systems and geosciences communities,4,12
University of Minnesota the potential for synergistic research in intelligent
76 COMM UNICATIO NS O F THE AC M | JA NUA RY 201 9 | VO L . 62 | NO. 1
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 77
review articles
systems for geosciences is largely un- knowledge boundaries. Different disci- locations, the problems under consid-
tapped. A recently launched Research plines in geosciences are facing these eration cover spatially vast regions of
Coordination Network on Intelligent challenges from different motivations the planet. Moreover, scientists have
Systems for Geosciences followed and perspectives: been collecting data at different times
a workshop at the National Science ˲˲ Forecasting rates of sea level in different places and reporting re-
Foundation on this topic.1 This ex- change in polar ice shelves: Polar sci- sults in separate repositories and often
panding network builds on the mo- entists, along with atmospheric and unconnected publications. This has re-
mentum of the NSF EarthCube initia- ocean scientists, face an urgent need sulted in a poorly connected collection
tive for geosciences, and is driven by to understand sea level rise around of information that makes wide-area
practical problems in Earth, ocean, the globe. Ice-shelf environments analyses extremely difficult and is im-
atmospheric, polar, and geospace sci- represent extreme environments for possible to reproduce. Earth systems
ences.11 Based on discussions and ac- sampling and sensing. Current efforts are integrated, but current geoscience
tivities within this network, this article to collect sensed data are limited and data and models are not. To unravel
presents a research agenda for intelli- use tethered robots with traditional significant questions about topics,
gent systems inspired by geosciences sampling frequency and collection such as Deep Earth Time, geoscientists
challenges. limitations. The ability to collect ex- need intelligent systems to efficiently
Geosciences research aims to un- tensive data about conditions at or integrate data from disparate loca-
derstand the Earth as a system of com- near the ice shelves will inform our tions, data types, and collection efforts
plex highly interactive natural process- understanding about changes in within a wide area.
es and their interactions with human ocean circulation patterns, as well as ˲˲ Predict critical atmosphere and
activities. Current approaches have feedbacks with wind circulation. New geospace events: Atmospheric and
fundamental shortcomings given the research on intelligent sensors would geospace science research aims to im-
complexity of geosciences data. First, support selective data collection, on- prove understanding of the Earth’s at-
using data alone is insufficient to cre- board data analysis, and adaptive sen- mosphere and its interdependencies
ate models of the very complex phe- sor steering. New submersible robotic with all of the other Earth components,
nomena under study so prior theories platforms could detect and respond and to understand the important
need to be taken into account. Second, to interesting situations while adjust- physical dynamics, relationships, and
data collection can be most effective if ing sensing frequencies that could be coupling between the incident solar
steered using knowledge about exist- triggered depending on the data being wind stream, and the magnetosphere,
ing models to focus on data that will collected in real time. ionosphere, and thermosphere of the
make a difference. Third, to combine ˲˲ Unlock deep Earth time: Earth Earth. Atmospheric research investi-
disparate data and models across dis- scientists focus on understanding the gates phenomena operating from plan-
ciplines requires capturing and rea- dynamics of the Earth, including the etary to micro spatial scales and from
soning about extensive qualifications interior of the Earth or deep Earth (such millennia to microseconds. Although
and context to enable their integration. as tectonics, seismology, magnetic the data collected is very large, it is
These are all illustrations of the need or gravity fields, and volcanic activity) miniscule given the complexity of the
for knowledge-rich intelligent systems and the near-surface Earth (such as the phenomena under study. Therefore,
that incorporate significant amounts hydrologic cycle, the carbon cycle, the the data available must be augmented
of geosciences knowledge. food production cycle, and the energy with knowledge about physical laws
The article begins with an overview cycle). While collecting data from the underlying the phenomena in order to
of research challenges in geosciences. field is done by individuals in select generate effective models.
It then presents a research agenda and ˲˲ Detect ocean-land-atmosphere-ice
vision for intelligent system to address key insights interactions: Our ability to understand
those challenges. It concludes with an the Earth system is heavily dependent
overview of ongoing activities in the ˽˽ Advances in artificial intelligence on our ability to integrate geoscience
are needed to collect data where and
newly formed research network of in- when it matters, to integrate isolated models across time, space, and disci-
telligent systems for geosciences that observations into broader studies, pline. This requires sophisticated ap-
is fostering a community to pursue this to create models in the absence of proaches that support composition
comprehensive data, and to synthesize
interdisciplinary research agenda. and discover structure, diagnose, and
models from multiple disciplines and
The pace of geosciences investiga- scales. compensate for compound model er-
tions today can hardly keep up with the rors and uncertainties, and generate
˽˽ Intelligent systems need to incorporate
urgency presented by societal needs to extensive knowledge about the physical,
rich visualizations of multidimension-
manage natural resources, respond to geological, chemical, biological, al information that take into account a
geohazards, and understand the long- ecological, and anthropomorphic factors scientist’s context.
term effects of human activities on the that affect the Earth system while The accompanying figure illustrates
leveraging recent advances in data-driven
planet.6–11 In addition, recent unprec- research. intelligent systems research directions
edented increases in data availability inspired by these geoscience challeng-
˽˽ A new generation of knowledge-rich
together with a stronger emphasis on es, organized at various scales. Study-
intelligent systems have the potential
societal drivers emphasize the need for to significantly transform geosciences ing the Earth as a system requires fun-
research that crosses over traditional research practices. damentally new capabilities to collect
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 79
review articles
and the application of Linked Open collaboratively, allowing different con- pling, resulting in richer datasets at
Data are all areas of active research to tributors to weigh in based on their di- lower costs. Interpreting sensor data
facilitate search and integration of data verse expertise and perspectives. onboard allows autonomous vehicles
without a great deal of manual effort.5 5. Automated extraction of scientific to make decisions guided by real-time
2. Capturing scientific processes, knowledge. Not all scientific knowledge variations in data, or to react to un-
hypo-theses, and theories. To comple- needs to be authored manually. Much expected deviations from the current
ment the ontologies and data rep- of the data known to geoscientists is physical model.
resentations just discussed, a great stored in semi-structured formats, such 3. Crowdsourcing data collection for
challenge is representing the ever- as spreadsheets or text, and is inacces- costly observations. Citizen scientists
evolving, uncertain, complex, and sible to structured search mechanisms. can contribute useful data (for exam-
dynamic scientific knowledge and Automated techniques are needed to ple, collected through geolocated mo-
information. Important challenges identify and import these kinds of data bile devices) that would otherwise be
will arise in representing dynamic pro- into structured knowledge bases. very costly to acquire. One challenge
cesses, uncertainty, theories and mod- Research vision: Knowledge maps. We in data collection through crowdsourc-
els, hypotheses and claims, and many envision rich knowledge graphs that ing is in ensuring high quality of data
other aspects of a constantly growing will contain explicit interconnected required by geoscience research. A po-
scientific knowledge base. These rep- representations of scientific knowl- tential area of research is to improve
resentations need to be expressive edge linked to time and space to form methods of evaluating crowdsourced
enough to capture complex scientific multidimensional knowledge maps. data collection empirically, and to
knowledge, but they also need to sup- Interpretations and assumptions will gain an understanding of the biases in-
port scalable reasoning that integrates be well documented and linked to ob- volved in the collection process.
disparate knowledge at different servational data and models. Today’s Research vision: Model-driven sensing.
scales. In addition, scientists will need semantic networks and knowledge New research on sensors will create
to understand the representations and graphs link together distributed facts a new generation of devices that will
trust the outcomes. on the Web, but they contain simple contain more knowledge of the scien-
3. Interoperation of diverse scientific facts that lack the depth and ground- tific context for the data being collect-
knowledge. Scientific knowledge comes ing needed for scientific research. ed. These devices will use that knowl-
in many forms that use different tacit Knowledge maps will have deeper spa- edge to optimize their performance
and explicit representations: hypoth- tiotemporal representations of pro- and improve their effectiveness. This
eses, models, theories, equations, as- cesses, hypotheses, and theories and will result in new model-driven sensors
sumptions, data characterizations, will be grounded in the physical world, that will have more autonomy and ex-
and others. These representations interconnecting the myriad models of ploratory capabilities.
are all interrelated, and it should be geoscience systems. Information integration. Data, mod-
possible to translate knowledge fluid- Robotics and sensing. Knowledge- els, information, and knowledge are
ly as needed from one representation informed sensing and data collec- scattered across different communi-
to another. A major research chal- tion has great potential to do more ties and disciplines, causing great
lenge is the seamless interoperation cost-effective data gathering across limitations to current geosciences
of alternative representations of sci- the geosciences. research. Their integration presents
entific knowledge, from descriptive Research directions: major research challenges that will re-
to taxonomic to mathematical, from 1. Optimizing data collection. Geo- quire the use of scientific knowledge
facts to interpretation and alternative science data is needed across many for information integration.
hypotheses, from smaller to larger scales, both spatial and temporal. Research directions:
scales, and from isolated processes to Since it is not possible to monitor ev- 1. Integrating data from distributed
complex integrated phenomena. ery measurement at all scales all of the repositories. The geosciences have phe-
4. Authoring scientific knowledge time, there is a crucial need for intel- nomenal data integration challenges.
collaboratively. Formal knowledge ligent methods for sensing. New re- Most of the hard geoscience problems
representation languages, especially search is needed to estimate the cost require that scientists work across sub-
if they are expressive and complex, are of data collection prior to sensor de- disciplinary boundaries and share very
not easily accessible to scientists for ployment, whether that means storage large amounts of data. Another facet
encoding understanding. A major chal- size, energy expenditure, or monetary of this issue is that the data spans a
lenge will be creating authoring tools cost. A related research challenge is wide variety of modalities and greatly
that enable scientists to create, inter- trade-off analysis of the cost of data varying temporal and spatial scales.
link, reuse, and disseminate knowl- collection versus the utility of the data Distributed data discovery tools, meta-
edge. Scientific knowledge needs to be to be collected. data translators, and more descriptive
updated continuously, allow for alter- 2. Active sampling. Geoscience standards are emerging in this context.
native models, and separate facts from knowledge can be exploited to inform Open issues include cross-domain
interpretation and hypotheses. These autonomous sensing systems to not concept mapping, entity resolution
are new challenges for knowledge cap- only enable long-term data collection, and scientifically valid data linking,
ture and authoring research. Finally, but to also increase the effectiveness and effective tools for finding, integrat-
scientific knowledge should be created of sensing through adaptive sam- ing, and reusing data.
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 81
review articles
ing highly complex nonlinear models the phenomena under study, the lack gain a better understanding of the un-
from data, which usually requires large of ground truth, and the high degree of derlying phenomena.
amounts of labeled data. However, in noise and uncertainty. New approaches 4. Immersive visualizations and virtu-
most cases, obtaining labels can be ex- for theory-guided learning will need to al reality. There are new opportunities
tremely costly and demand significant be developed, where knowledge about for low-cost usable immersive visual-
effort from domain experts, costly ex- underlying geosciences processes will izations and physical interaction tech-
periments, or long time periods. There- guide the machine learning algorithms niques that virtually put geoscientists
fore, a significant research challenge is in modeling complex phenomena. into the physical space under investi-
to effectively utilize a limited labeling Intelligent user interaction. Scien- gation, while also providing access to
effort for better prediction models. In tific research requires well-integrat- other related forms of data. This re-
machine learning, this area of research ed user interfaces where data can eas- search agenda requires bridging prior
is known as active learning. Many rel- ily flow from one to another, and that distinctions in scientific visualization,
evant active sampling algorithms, such include and exploit the user’s context information visualization, and immer-
as clustering-based active learning, to guide the interaction. New forms sive virtual environments.
have been developed. New challenges of interaction, including virtual real- 5. Interactive model building and
emerge when existing active learning ity and haptic interfaces, should be refinement through visualizations that
algorithms are applied in geosciences, explored to facilitate understanding combine models and data. Interactive
due to issues such as high dimension- and synthesis. environments for model building and
ality, extreme events, and missing data. Research directions: refinement would enable scientists to
In addition, in some cases, we may 1. Knowledge-rich context-aware rec- gain improved understanding on how
have abundant labeled data for some ommender systems. Scientists would models are affected by changes in ini-
sites while being interested in build- benefit from proactive systems that tial data and assumptions, how model
ing models for other locations (for ex- understand the task at hand and make changes affect results, and how data
ample, remote areas). Transfer active recommendations for potential next availability affects model calibration.
learning aims to solve the problem steps, suggest datasets and analytical Developing such interactive model-
with algorithms that can significant re- methods, and generate perceptually ef- ing environments requires visualiza-
duce the number of labeling requests fective visualizations. A major research tions that integrate data with models,
and build an effective model by trans- challenge is to design recommender ensembles of models, model param-
ferring the knowledge from areas with systems that appropriately take into ac- eters, model results, and hypothesis
large amount of labeled data. Transfer count the complex science context of a specifications. These integrated envi-
active learning is still in the early stages geoscientist’s investigation. ronments would be particularly use-
and many opportunities exist for novel 2. Embedding visualizations through- ful for developing machine learning
machine learning research. out the science process. Pervasive use approaches to geosciences problems,
8. Interpretive models. In the past of visualizations and direct manipu- for example in assisting with parame-
few decades, we have witnessed many lation interfaces throughout the sci- ter tuning and selecting training data.
successes of powerful but complex ma- ence process would need to link data A major challenge is the heterogene-
chine learning algorithms, exempli- to hypotheses and allow scientists to ity and complexity of these different
fied by the recent peak of deep learn- experience models from completely kinds of information that needs to be
ing models. They are usually treated as new perspectives. These visualization- represented.
a black box in practical applications, based interactive systems require re- 6. Interfaces for spatiotemporal in-
but have been accepted by more com- search on the design and validation of formation. The vast majority of geosci-
munities given the rise of big data and novel visual representations that effec- ences research products is geospatially
their modeling power. However, in ap- tively integrate diverse data in 2D, 3D, localized and with temporal referenc-
plications such as geosciences, we are multidimensional, multiscale, and es. Geospatial information requires
interested in both predictive modeling multispectral views, as well as how to specialized interfaces and data man-
and scientific understanding, which link models to the relevant data used agement approaches. New research is
requires explanatory and interpretive to derive them. needed in intelligent interfaces for spa-
modeling. A significant research area 3. Intelligent design of rich inter- tiotemporal information that exploit
for machine learning is the incorpora- active visualizations. In order to be the user’s context and goals to identify
tion of domain knowledge and causal more ubiquitous throughout the re- implicit location, to disambiguate tex-
inference to enable the design of inter- search process, visualizations must tual location specification, or to decide
pretive machine learning approaches be automatically generated and be what subset of information to present.
that can be understood by scientists interactive. One research challenge The small form factor of mobile devic-
and related to existing geosciences the- is to design visualizations. Another es is also constraint in developing ap-
ories and models. challenge is the design of visualiza- plications that involve spatial data.
Research vision: Theory-guided learn- tions that fit a scientist’s problem. 7. Collaboration and assistance for
ing. Geosciences data presents new chal- An important area of future research data analysis and scientific discovery
lenges to machine learning approaches is the interactive visualizations and processes. Intelligent workflow sys-
due to the small sample sizes relative direct manipulation interfaces would tems could help scientists by auto-
to the complexity and non-linearity of enable scientists to explore data and mating routine aspects of their work.
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 83
review articles
Because each scientist has a unique across these fields that do not typically paradigm for scientific discovery from data. IEEE
Transactions on Knowledge and Data Engineering 29,
workflow of activities, and because cross paths. This network focuses on 10 (2017) 2318–2331.
their workflow changes over time, a three major goals. First, the organi- 4. Mithal, V., Nayak, G., Khandelwal, A., Kumar, V., Oza,
N.C. and Nemani, R. RAPT: Rare class prediction
research challenge is that these sys- zation of joint workshops and other in absence of true labels. IEEE Transactions on
tems need to be highly flexible and forums will foster synergistic discus- Knowledge and Data Engineering, 2017; DOI: 10.1109/
TKDE.2017.2739739.
customizable. Another research chal- sions and collaborative projects. Sec- 5. Narock, T. and Fox, P. The Semantic Web in Earth and
lenge is to support a range of work- ond, repositories of challenge prob- space science. Current status and future directions.
Studies in the Semantic Web. IOS Press, 2015.
flows and processes, from common lems and datasets with crisp problem 6. National Research Council, Committee on Challenges
ones that can be reused to those that statements will lower the barriers to and Opportunities in the Hydrologic Sciences, Water
Science and Technology Board, Division on Earth
are highly exploratory in nature. Such getting involved. Third, a curated re- and Life Studies. Challenges and Opportunities in
the Hydrologic Sciences. National Academies Press,
workflows systems must enable col- pository of learning materials to edu- Washington, D.C., 2012, 188. ISBN: 978-0-309-22283-9.
laborative design and analysis and cate researchers and students alike 7. National Research Council, Committee on a Decadal
Strategy for Solar and Space Physics (Heliophysics);
be able to coordinate the work of will reduce the steep learning curve Space Studies Board; Aeronautics and Space
teams of scientists. Finally, workflow involved in understanding advanced Engineering Board; Division of Earth and Physical
Sciences. Solar and Space Physics: A Science for a
systems must also support emerging topics in the other discipline. Addi- Technological Society. National Academies Press,
science processes, including crowd- tionally, members of the Research Washington, D.C., 2013, 466. ISBN 978-0-309-16428-3.
8. National Research Council, Committee on Guidance
sourcing for problems such as data Coordination Network are engaging for NSF on National Ocean Science Research
collection and labeling. other synergistic efforts, programs, Priorities: Decadal Survey of Ocean Sciences, Ocean
Studies Board; Division on Earth and Life Studies.
Research vision: Integrative work- and communities, such as artificial Sea Change: 2015-2025 Decadal Survey of Ocean
spaces. New research is required to intelligence for sustainability, climate Sciences. National Academies Press, Washington, D.C.,
2014, 98. ISBN 978-0-309-36688-5.
allow scientists to interact with all informatics, science gateways, and 9. National Research Council, Committee on New
forms of knowledge relevant to the the U.S. NSF Big Data Hubs. Research Opportunities in the Earth Sciences. New
Research Opportunities in the Earth Sciences at the
phenomenon at hand, to understand A strong research community in National Science Foundation. National Academies
uncertainties and assumptions, and this area has the potential to have Press, Washington, D.C., 2012, 216. ISBN 978-0-309-
21924-2.
to provide many alternative views of transformative impact in artificial in- 10. National Research Council, Committee to Review the
NSF AGS Science Goals and Objectives. Review of the
integrated information. This will result telligence research with significant National Science Foundation’s Division on Atmospheric
in user interfaces focused on integra- concomitant advances in geosciences and Geospace Sciences Goals and Objectives
Document. National Academies Press, Washington,
tive workspaces, where visualizations as well as in other science disciplines, D.C., 2014, 36. ISBN 978-0-309-31048-2.
and manipulations will be embedded accelerating discoveries and innovat- 11. National Science Foundation. Dynamic Earth: GEO
Imperatives and Frontiers 2015–2020. Advisory
throughout the analytic process. These ing how science is done. Committee for Geosciences, 2014.
new intelligent user interfaces and in- 12. Peters, S.E., Zhang, C., Livny, M. and Ré, C. A
machine reading system for assembling synthetic
teraction modalities will support the Acknowledgments paleontological databases. PLoS ONE 9, 12 (2014).
exploration not only of data but of the This work was sponsored in part by
relevant models and knowledge that the Directorate for Computer and In- Yolanda Gil, University of Southern California; Suzanne
provide context to the data. Research formation Science and Engineering A. Pierce, The University of Texas Austin; Hassan Babaie,
Georgia State University; Arindam Banerjee, University
activities will flow seamlessly from one (CISE) and the Directorate for Geosci- of Minnesota; Kirk Borne, Booz Allen Hamilton; Gary
user interface to another, each appro- ences (GEO) of the U.S. National Sci- Bust, Johns Hopkins University; Michelle Cheatham,
Wright State University; Imme Ebert-Uphoff, Colorado
priate to the task at hand and rich in ence Foundation under awards IIS- State University; Carla Gomes, Cornell University;
user context. 1533930 and ICER-1632211. We thank Mary Hill, University of Kansas; John Horel, University
of Utah; Leslie Hsu, Columbia University; Jim Kinter,
NSF CISE and GEO program directors George Mason University; Craig Knoblock, University of
Southern California; David Krum, University of Southern
Conclusion for their guidance and suggestions, California; Vipin Kumar, University of Minnesota; Pierre
This article presented research oppor- in particular Hector Muñoz-Avila and Lermusiaux, Massachusetts Institute of Technology;
Yan Liu, University of Southern California; Chris North,
tunities in knowledge-rich intelligent Eva Zanzerkia for their guidance, Virginia Tech; Victor Pankratius, Massachusetts Institute
systems inspired by geosciences chal- and Todd Leen, Frank Olken, Sylvia of Technology; Shanan Peters, University of Wisconsin-
Madison; Beth Plale, Indiana University Bloomington;
lenges. Crucial capabilities are needed Spengler, Amy Walton, and Maria Ze- Allen Pope, University of Colorado Boulder; Sai Ravela,
that require major research in knowl- mankova for suggestions and feed- Massachusetts Institute of Technology; Juan Restrepo,
Oregon State University; Aaron Ridley, University of
edge representation, selective sens- back. We also thank all the partici- Michigan; Hanan Samet, University of Maryland; Shashi
ing, information integration, machine pants in the Research Coordination Shekhar, University of Minnesota
learning, and interactive analytics. Network on Intelligent Systems for
Correspondence regarding this article should be directed
Enabling these advances requires Geosciences for creating the intellec- to Yolanda Gil (gil@isi.edu).
intelligent systems and geosciences tual space for productive discussions
researchers work together to formu- across these disciplines. Copyright held by authors/owners.
late knowledge-rich frameworks, al-
gorithms, and user interfaces. Rec- References
ognizing that these interactions are 1. Gil, Y. and Pierce, S. (Eds). Final Report of the 2015
NSF Workshop on Information and Intelligent
not likely to occur without significant Systems for Geosciences. National Science Foundation
Workshop Report, October 2015; http://dl.acm.org/ Watch the authors discuss
facilitation, a new Research Coordina- collection.cfm?id=C13 and http://is-geo.org/ this work in the exclusive
tion Network on Intelligent Systems 2. Berners-Lee, T. Linked data. Design Issues Communications video.
(retrieved Nov. 11, 2017); https://www.w3.org/ https://cacm.acm.org/videos/
for Geosciences has been created to DesignIssues/LinkedData.html intelligent-systems-for-
enable sustained communication 3. Karpatne, A. et al. Theory-guided data science: A new geosciences
Deception,
Identity,
and Security:
The Game Theory
of Sybil Attacks
“When the world is destroyed, it will be destroyed
not by its madmen but by the sanity of its experts
and the superior ignorance of its bureaucrats.”
— John le Carré
D E CAD E S B EFO RE THE advent of the In-
ternet, Fernando António Nogueira Pes- key insights
soa assumed a variety of identities with ˽˽ Cyber systems have reshaped the role
the ease that has become common in of identity. The low cost to mint cyber
cyber-social platforms—those where identities facilitates greater identity
cyber technologies play a part in human fluidity. This simplicity provides a
form of privacy via anonymity or
activity (for example, online banking, pseudonymity by disguising identity, but
and social networks). Pessoa, a Portu- also hazards proliferation of deceptive,
guese poet, writer, literary critic, transla- multiple and stolen identities. With
tor, publisher, and philosopher, wrote growing connectivity, designing the
verification/management algorithms
under his own name as well as 75 imag- for cyber identity has become complex,
inary identities. He would write poetry and requires examing what motivates
or prose using one identity, then criti- such deception.
cize that writing using another iden- ˽˽ Signaling games provide a formal
tity, then defend the original writing mathematical way to analyze how
using yet another identity. Described identity and deception are coupled in
cyber-social systems. The game theoretic
by author Carmela Ciuraru as “the lov-
framework can be extended to reason
ing ringmaster, director, and traffic about dynamical system properties and
cop of his literary crew,” Pessoa is one behavior traces.
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 85
review articles
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 87
review articles
n
tio
costly signaling (for example, using
S may use a strategic deception by
ep
M-coins). There are analogous systems
c
De
used by eusocial organisms to maintain claiming either a fabricated iden-
identity and reputation by diffusing tity or making a malicious attempt
costly hard-to-produce Cuticular
Hydrocarbon Chemicals (CHCs): in ant
to impersonate another’s identity.
colonies only the queen ants produce Within a WANET we will consider
and distribute such chemicals. two natural types of nodes TC and TD
to indicate respectively a coopera-
tive node that employs no deceptions
Figure 2. Extensive form games. (preserving the desired systemwide
properties of identity management),
The game below is played between a sender identity and a receiver, where the senders, endowed with
invisible types by nature: C (Cooperative) and D (Defective), signal the receivers by sending messages,
and a deceptive node that directly em-
c or d, either honestly or deceptively. The game starts in the center of the figure with the sender ploys a deception. In either case, the
being assigned a type, which is only known to the sender, and the sender branches to the left or right. node will communicate a signal to
The sender then signals c (branching up) or d (branching down). The receiver, who knows the persistent
a receiver node R including a status
pseudo-identity of the sender, but not the type, may trust the sender or verify (audit) the sender.
The challenge results in different utilities for the senders and the receivers, which they rationally of c to indicate it is cooperative with
optimize. The inner box encapsulates the selections of the agent utilizing the identity. The audit report respect to system security, or a status
may also be made visible to the recommenders and verifiers, thus affecting the reputation (and other of d to indicate anomalous behavior
credible threats) assigned to the sender’s identity.
(such as compromised status). A
receiver node R, given the signal of
o5 o1
a sender node S but unaware of the
t t sender node’s true type, must select
a a an action to take.
o6 o2 One option for the receiver is to sim-
c c ply trust the sender node, denoted as
D C t; alternatively, the receiver node may
pose a challenge action, denoted as
a, which creates an attempt to reveal
d d the sender’s nature and leads to costly
a a
o8 o4 outcomes for deception. While any in-
dividual challenge may not reveal com-
t t pletely the nature of a sender, repeated
o7 o3 challenges may eventually expose Sybil
identities, as senders who are fre-
quently challenged are under pressure
tally but constrained like CHCs, aim to and private knowledge allows, and they to manage their resources for verifying
have similar effects for the utility and act rationally (that is, to optimize util- their identity.
identity of nodes within a WANET. ity) by selecting their own action in the We sketch the outcomes of an en-
The game. Traditional mathemati- context of how other agents act. As the counter scenario graphically with an
cal game theory23,35 models scenarios case of Pessoa’s creative use of identi- extensive-form game tree illustrated
where outcomes depend on multiple ties suggests, private knowledge is im- in Figure 2. Starting in the center, the
agent preferences. Not all outcomes are portant in shaping outcomes. sender S has type TC (cooperative) or TD
alike; under various conditions some To accommodate these types of (deceptive). Next, the sender selects a
outcomes feature greater stability (that scenarios, game theory has developed signal c (cooperative) or d (otherwise);
is, non-deviation)24,25 and are com- a branch of models known as incom- the receiver selects an action t (trust) or
putable.16,17,27 Interesting game sce- plete/partial information games,22,30 a (challenge). We explore the outcomes
narios yield differing rewards to agents of which the Lewis signaling game and payoffs for identity as illustrated in
depending on outcome. Thus, agents is one example.4,14,19,31,34 Signaling the accompanying table.
evaluate scenarios insofar as common games have been studied in diverse Outcomes. Outcome o1 describes a
sender S that is cooperative by nature type; and let D be the imputed cost to identities, and identities are bound
and offers a nominal proof of identity the sender for being deceptive (identi- to physical agents (the resident deci-
to the receiver R. The receiver R then fied by a receiver’s challenge). sion control at the time of play). Agent
trusts S and acts upon the information Repeated games and strategy. Re- types will remain fixed by nature but
provided, for example, relaying the peated interactions occur as a se- note that in subsequent plays the
communicated message. quence of plays between two iden- control of an identity can pass from
Outcome o2 describes a scenario like tities. While in classical signaling one agent to another, consequently
o1, except the receiver R challenges S to games there is little need for a dis- the type changes accordingly. This
provide a more rigorous proof of iden- tinction to be made between identity type of perturbation is intended to
tity. In this case, given the cooperative and agent, here we highlight identity be explored by our model, in order
nature of the sender, the challenge is fluidity with which an identity or cy- that cybersecurity issues such as Sybil
unnecessary, netting cost burdens to ber asset can be usurped by another attacks (where identities are stolen
maintaining a trusted network. agent. Games are played between two or fabricated) can be adequately ex-
Outcome o3 describes a cooperative
sender S not willing (or able) to offer a Outcome labels, payoff, transaction costs, and DFA codes for identity management
signal game.
nominal proof of identity (for example,
after being repeatedly but maliciously
challenged by “suspicious” receivers to Outcome labels, payoff (S, R), transaction cost, and encoding
the point of insolvency).a The receiver Sender S Receiver R Outcomes
R nonetheless trusts S, and in this case
Type Signal action Label Payoff tcost DFA Code
the exchange is altruistic, helping to
recover a trustworthy node in distress.
c trust o1 (B, B) 1 s1 •
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 89
review articles
pressed and tested for their ability to During a generation, pairs of agents
destabilize a desired equilibrium. will encounter one another to play re-
To accommodate this, we encode peated signaling games; the encoun-
change to the population over time (for ters are determined by an encounter
example, by invasion of mutants) over
repeated games by using determin- When a deceptive distribution. At the completion of a
generation, agents evaluate rewards
istic finite automata (DFA). The DFA
strategy space offers a vastly reachable
identity succeeds, obtained from their implemented
strategies. This evaluation results in
space of dynamic strategic structures. it will be used their performance measure. Next, per-
This provides the means to explore
the uses of identity in repeated signal-
numerous times formance measures are compared
within a set of peer agents that coop-
ing interactions. as there is no erte to inform each agents’ reselection
The DFA state codes noted in the
table determine the (type, signal) of a
reason to abandon stage. During the reselection stage,
agents determine a strategy to use in
sender’s controlling agent, or the ac- it after one the next generation, as achieved by a
tion as receiver. Each DFA encounter
determines a sequence of outcomes interaction. boosting probability distribution that
preferentially selects strategies based
as illustrated in the example that fol- Moreover, it is on performance. After reselection,
lows. Consider the strategy of Figure
3(c) as sender matched against strate- precisely the some agents are mutated with a muta-
tion probability distribution. This step
gy of (d) as receiver with a transaction
budget of two units. The sender starts
repeated completes the generation and estab-
lishes the strategies implemented dur-
in state s1, and the receiver starts in interactions that ing the next generation.
state s3; they play at the cost of one
unit against the transaction budget.
are needed to The agents evolve discrete strategic
forms (DFA); a strategic mutation net-
Note that the discount for deception develop trust. work is graphed in Figure 3(e) to provide
will entail additional communication a sense of scale. The dynamic system
efforts. Next, the sender transitions thus evolves a population measure over
to state s7 by following the s3 labeled strategies. Within the WANET, nodes
transition, and the receiver loops freely mutate, forming deceptive strate-
back to state s3; they both play at the gies as often as they augment coopera-
cost of a half unit since state s7 uses tive ones. Evolutionary games allow us
deception. Next, the sender transi- to elucidate the stability and resilience
tions to state s1 while the receiver tran- of various strategies arising from muta-
sitions to state s6 to exhaust the trans- tions and a selection process ruled by
action budget and complete the game. non-cooperation and rationality.
The computed outcome sequence is We augment the basic structure of
o1, o7, o2, resulting in a sender aggre- reselection by considering carefully
gate utility of (A + B) and receiver ag- how strategic information is shared.
gregate utility of (B − (A + C)). Upon noticing that deceptive and
Evolutionary strategy. Evolutionary cooperative strategies differ funda-
game theory models a dynamic popu- mentally in their information asym-
lation of agents capable of modify- metric requirements, we introduce a
ing their strategy and predicts popu- technique referred to as split-boosting,
lation-level effects.2,3,5,19,32 Formally, which modulates the information flow
evolutionary games are a dynamic components of the network.
system with stochastic variables. The Recreate by split-boosting. During the
agents in evolutionary games may Recreate phase, agents select strategies
(both individually and collectively) ex- preferentially by comparing performance
plore strategy structures directly (via measured only among a set of agents that
mutation and peer-informed reselec- share this pooled information.
tion), and they may exploit strategies Splitting the set of agents into
where and when competitive advan- components we limit the boosting
tages are found. to include only strategies available
To implement this system, the time from the component. Within a com-
domain is divided into intervals called ponent (subset) S, let v i be the per-
generations. The system is initialized by formance measure for strategy used
fixing a finite set of agents and assign- by agents i ∈ S. Letting
ing each agent a strategy determined and we can safely trans-
with a seeding probability distribution. fer the performance measures to the
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 91
review articles
setting to illustrate the intuition that identical information pooling. There- tions offer strong controls to stabilize
costly signaling and verified informa- fore, both cooperative and deceptive the dynamic equilibrium favorable to
tion flows among cooperative types can types are treated alike, specifically with cooperators. In S1 the advantages of
stabilize behavior in WANETs. More the same awareness to and distortions deception are short-lived, and coop-
generally simulations (as a computa- of pooled information guiding strate- erative behaviors are promoted even
tional technique) can evaluate a variety gic exploration. when agents remain free to explore for
of mechanisms and how they influence In the second system S1, agents se- niche use of deception.
system behaviors. lect strategy with boosting split by type.
Our major control in experiments Strategic information, once verified as Conclusion and Future Work
examines how differing information cooperative, is offered to all agents with Several insights and contributions
pooling for cooperative vs. deceptive an openly shared common database emerge from our experiments. One key
types leads to differing qualitative be- of clean strategies. This modification insight is that challenging an agent in
havior outcomes. We consider a refer- enhances information for cooperative such a way that deceptive agents either
ence system S0 and reengineer it with types while conversely imposing iso- fail the challenge or face greater risk can
a device to express improved informa- lating effects for deceptive types. Also, deter deception. Another key insight
tion pooling among cooperative types in our simulations, the deceptive types is that many instances where agents
to create alternate system S1. The sys- maintain rationality, so when a decep- use deceptive identities in cyber-social
tems feature the same competitive tive strategy is found to be perform- systems are repeated games. When a
pressures and are identical in every ing poorly (less than the cooperative deceptive identity succeeds, it will be
way except in their implementation of group average), the agents abandon used numerous times as there is no rea-
the reselection step. Game parameters the deceptive strategy as being non- son to abandon it after one interaction.
are A, B, C, D = 4, 0.5, 0.5, 4.0, with 800 productive, thereby coming clean and Moreover, it is precisely the repeated
network nodes and 400 generations. In reselecting strategies from the shared interactions that are needed to develop
both systems, the same seeding distri- database as the best survival option. trust. Thus, formalizing these insights
bution initializes the simulations from In Figure 4 we show typical simu- we devised a mathematical game to
a state where no nodes employ (imme- lated traces for systems S0 and S1 model strategic interactions, while rec-
diately) deceptive or Sybil identities. plotting the proportion of popula- ognizing a possibility of permissive and
From these initial conditions, muta- tion employing deceptive strategies malleable identities. With the dilemma
tions allow nodes to quickly use decep- (a crude estimation of deception as between privacy and intent clarified for-
tive strategies and test their efficacy. defined in the sidebar “Defining De- mally in signaling games, we computa-
In the first system S0, all agents se- ception”). The differing properties for tionally considered various strategies
lect strategies using common and information flows affecting reselec- such as those based in behavior learn-
Figure 4. Results.
For cyber-social systems, we can use simulation to study a variety of equilibria (or lack thereof)
affected by various mechanisms. Here, and with few additional assumptions concerning Sybil attackers,
the effects of using a shared database of verified cooperative strategic forms is shown to deter deceptive
types (b) in contrast to instances where no such advantage is given to cooperative strategic forms (a).
The x-axis represents the temporal dimension (generations), the blue graph and quantile figure represents
the proportion of population using honest identity signaling, the red otherwise.
800
800 800
600
600
600 600
Type Counts
Type Counts
Cooperative Cooperative
400
400
200
200 200
0 0
0
ing and costly signaling. Our computa- WANETS, HFNs, and other fluid-iden- the 2014 Annual Conf. Genetic and Evolutionary
Computation, (2014), 153–160.
tional simulations uncovered several tity cyber-social and cyber-physical 14. Cho, I-K. and Sobel, J. Strategic stability and
interesting information flow properties systems to reliably verify private but uniqueness in signaling games. J. Economic Theory
50, 2 (1990), 381–413.
that may be leveraged to deter decep- trustworthy identities and limit the 15. Chung, H. and Carroll, S.B. Wax, sex and the origin of
tion, specifically by enhancing the flow damage of deceptive attack strategies. species: Dual roles of insect cuticular hydrocarbons in
adaptation and mating. BioEssays, (2015).
of information regarding cooperative Starting with WANETs, we motivate 16. Daskalakis, C., Goldberg, P.W. and Papadimitriou, C.H.
strategies while reinforcing the coop- an elegant solution using formalisms The complexity of computing a Nash equilibrium.
SIAM J. Computing 39, 1 (2009), 195–259.
erative group’s identity. Interestingly, we originally developed for signaling 17. Fabrikant, A., Papadimitriou, C. and Talwar, K. The
this result indicates an identity manage- games. Nonetheless, we are encour- complexity of pure Nash equilibria. In Proceedings
of the 36th Annual ACM Symposium on Theory of
ment system, typically thought to hinge aged by analogous biological solu- Computing, (2004), 604–612.
18. Hamblin, S. and Hurd, P.L. When will evolution lead
on the precision of true positives and tions derived naturally under Darwin- to deceptive signaling in the Sir Philip Sidney game?
astronomical unlikeliness of false-pos- ian evolution. Theoretical Population Biology 75, 2 (2009), 176–182.
19. Huttegger, S.M., Skyrms, B., Smead, R. and and
itive recognition, may rather critically Acknowledgments. We thank the Zollman, K.J.S. Evolutionary dynamics of Lewis
depend on how learned behavior and anonymous reviewers for their in- signaling games: Signaling systems vs. partial pooling.
Synthese 172, 1 (2010), 177–191.
strategic information can be shared. sightful comments. This material is 20. Jee, J., Sundstrom, A., Massey, S.E. and Mishra, B.
Our computational experiment of- based upon work funded and sup- What can information-asymmetric games tell us
about the context of Crick’s ‘frozen accident’? J. the
fers new insights for achieving strong ported by U.S. Department of Defense Royal Society Interface 10, 88 (2013).
deterrence of identity deception with- Contract No. FA8702-15-D-0002 with 21. King, D. The Haiti earthquake: Breaking new
ground in the humanitarian information landscape.
in ad hoc networks such as WANETs, Carnegie Mellon University Software Humanitarian Exchange Magazine 48, (2010).
however much is left as future work. Engineering Institute and New York 22. Lewis, D. Convention: A Philosophical Study. John
Wiley & Sons, 2008.
Our larger practical goal is M-coin, a University and ARO grant A18-0613- 23. Nash, J. Non-cooperative games. Annals of
design strategy and system for cooper- 00 (B.M.). This material has been ap- Mathematics, (1951), 286–295.
24. Nash, J.et al. Equilibrium points in n-person games. In
ation enhancing technologies. M-coin proved for public release and unlim- Proceedings of the National Academy of Sciences 36,
may be thought of as an abstract cur- ited distribution, ref DM17-0409. 1 (1950), 48–49.
25. Nash, J.F. Jr. The bargaining problem. Econometrica:
rency guiding an open recommender- J. Econometric Society, (1950), 155–162.
26. Newsome, J., Shi, E., Song, D. and Perrig, A. The Sybil
verification system that incorporates References attack in sensor networks: Analysis & defenses. In
new agent types (to verify identities, 1. Argiento R., Pemantle, R., Skyrms, B. and Volkov, Proceedings of the 3rd International Symposium on
S. Learning to signal: Analysis of a micro-level Information Processing in Sensor Networks, (2004),
behavior histories, and cooperative reinforcement model. Stochastic Processes and their 259–268.
Applications 119, 2 (2009), 373–390.
strategies as well as the consistency 2. Axelrod, R. An evolutionary approach to norms. American
27. Papadimitriou, C. Algorithms, Games, and the
Internet. In Proceedings of the 33rd Annual ACM
of distrusted information); the new Political Science Review 80, 4 (1986), 1095–1111. Symposium on Theory of Computing, (2001), 749–753.
3. Axelrod, R. The Evolution of Cooperation. Basic books,
types promote efficiencies support- 2006.
28. Sharma, K.R., Enzmann, B.L. et al. Cuticular
Hydrocarbon pheromones for social behavior and their
ing cooperative coalitions. The main 4. Banks, J. and Sobel, J. Equilibrium selection in coding in the ant antenna. Cell Reports 12, 8 (2015),
signaling games. Econometrica: J. EconometricSociety,
step forward, as demonstrated here, is (1987), 647–661.
1261–1271.
29. Silk, J.B., Kaldor, E., and Boyd, R. Cheap talk when
recognizing the effects of pooled and 5. Binmore, K. and Samuelson, L. Evolutionary stability interests conflict. Animal Behavior 59, 2 (2000),
in repeated games played by finite automata. J.
verified strategic information, and its Economic Theory 57, 2 (1992), 278–305.
423–432.
30. Skyrms, B. The Stag Hunt and the Evolution of Social
flow constraints (as well as its capa- 6. Casey, W., Memarmoshrefi, P., Kellner, A., Morales, Structure. Cambridge University Press, 2004.
J.A. and Mishra, B. Identity deception and game 31. Skyrms, B. Signals: Evolution, Learning, and
bilities to operate in the open). Vetted deterrence via signaling games. In Proceedings of the Information. Oxford University Press, 2010.
strategic information assists coopera- 9th EAI Intern. Conf. Bio-inspired Information and 32. Smith, J.M. Evolution and the Theory of Games.
Communications Technologies, 73–82. Cambridge University Press, 1982.
tors to rapidly adapt to and out-com- 7. Casey, W., Morales, J.A. and Mishra, B. Threats from 33. Smith, J.M. Honest signaling: The Philip Sidney game.
pete deceptive strategies. inside: Dynamic utility (mis) alignments in an agent- Animal Behaviour 42, 6 (1991), 1034–1035.
based model. J. Wireless Mobile Networks, Ubiquitous 34. Sobel, M.J. et al. Non-cooperative stochastic games.
Still, many challenges remain out- Computing, and Dependable Applications 7 (2016), The Annals of Mathematical Statistics 42, 6 (1971),
standing. The possibility of an agent 97–117. 1930–1935.
8. Casey, W., Morales, J.A., Nguyen,T., Spring, J., Weaver, 35. Neumann, J.V. and Morgenstern, O. Theory of Games and
not compelled by utility presents a R., Wright, E., Metcalf, L. and Mishra, B. Cyber security Economic Behavior. Princeton University Press, 2007.
via signaling games: Toward a science of cyber
problem, as that agent may persist security. In Proceedings of the Intern. Conf. Distributed
36. Zollman, K.J.S., Bergstrom, C.T., and Huttegger, S.M.
Between cheap and costly signals: The evolution of
within the network indefinitely to Computing and Internet Technology, 34–42. partially honest communication. In Proceedings of the
9. Casey, W., Morales, J.A., Wright, E., Zhu, Q. and Mishra,
form effective attacks. Future work B. Compliance signaling games: Toward modeling
Royal Society of London B: Biological Sciences, (2012).
may focus on how the expression of the deterrence of insider threats. Computational
and Mathematical Organization Theory 22, 3 (2016), William Casey (wcasey@cmu.edu) is a senior member
rationality could be fortified for iden- 318–349. of Carnegie Mellon University, Software Engineering
tities/nodes. Critically, deceptively 10. Casey, W., Weaver, R., Morales, J.A., Wright, E. and Institute, Pittsburgh, PA, USA.
Mishra, B. Epistatic signaling and minority games, the
minded actors will need to prefer a adversarial dynamics in social technological systems. Ansgar Kellner is a research fellow at the Institute of
base level of utility, and this remains Mobile Networks and Applications 21, 1 (2016), System Security at Technische Universität Braunschweig,
161–174. Germany.
an open challenge (although the solu- 11. Casey, W., Wright, E., Morales, J.A., Appel, M., Gennari,
J. and Mishra, B. Agent-based trace learning in a Parisa Memarmoshrefi is a research staff member at
tion could lie in the many possibili- University of Göttingen, Germany.
recommendation verification system for cybersecurity.
ties suggested by biological systems). In Proceedings of the 9th IEEE Intern. Conf. on Jose Andre Morales is a researcher at the Software
Additionally, technologies support- Malicious and Unwanted Software: The Americas, Engineering Insitute, Carnegie Mellon University,
(2014), 135–143. Pittsburgh, PA, USA.
ing the tedious aspects of informa- 12. Casey, W., Zhu, Q., Morales, J.A. and Mishra, B.
Compliance control: Managed vulnerability surface in Bud Mishra (mishra@nyu.edu) is a professor at New
tion gathering and validation must be social-technological systems via signaling games. In York University Courant Institute, Tandon School of
aligned to user incentives. Proceedings of the 7th ACM CCS Intern. Workshop on Engineering and School of Medicine, New York, NY, USA.
Managing Insider Security Threats, (2015), 53–62.
Properly constructed recommender- 13. Catteeuw, D., Manderick, B. et al. Evolution of honest
verifier architectures could be used in signaling by social punishment. In Proceedings of ©2019 ACM 0001-0782/1/19
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 93
research highlights
P. 95 P. 96
Technical
Perspective Face2Face: Real-Time Face
Photorealistic Facial Capture and Reenactment
Digitization and
Manipulation of RGB Videos
By Hao Li By Justus Thies, Michael Zollhöfer, Marc Stamminger,
Christian Theobalt, and Matthias Nießner
P. 105 P. 106
Technical
Perspective Imperfect Forward Secrecy:
Attacking How Diffie-Hellman
Cryptographic Key
Exchange with Fails in Practice
By David Adrian, Karthikeyan Bhargavan, Zakir Durumeric,
Precomputation Pierrick Gaudry, Matthew Green, J. Alex Halderman, Nadia Heninger,
By Dan Boneh Drew Springall, Emmanuel Thomé, Luke Valenta, Benjamin VanderSloot,
Eric Wustrow, Santiago Zanella-Béguelin, and Paul Zimmermann
Technical Perspective
To view the accompanying paper,
visit doi.acm.org/10.1145/3292039 rh
Photorealistic Facial
Digitization and Manipulation
By Hao Li
F OR M O RE T H A N a decade, computer sumer space, most notably through ear appearance deformations of the
graphics (CG) researchers and visual several seminal SIGGRAPH publi- mouth, in which plausible textures
effects experts have been fascinated cations between 2010 and 2013, as are retrieved instead of being ren-
with bringing photorealistic digital well as the popular facial animation dered using a parametric model. Such
actors to the screen. Crossing the well- software, Faceshift, later acquired by an approach is particularly effective in
known “uncanny valley” in CG humans Apple. While computer vision-based producing a photorealistic output, as
has been one of the most difficult and facial landmark detectors are suit- it bypasses the traditional and more
crucial challenges, due to hypersensi- able for puppeteering CG faces using complex rendering pipeline. While
tivity to synthetic humans lacking even conventional RGB cameras, they do some limitations remain, such as the
the slightest and most subtle features not capture nuanced facial expres- inability to control the head pose in
of genuine human faces. Given suffi- sions, as only sparse features are the target video sequence, very con-
cient resources and time, photoreal- tracked. However, when dense depth vincing photorealistic facial reenact-
istic renderings of digital characters measurements are available, an accu- ments are demonstrated on footages
have been achieved in recent years. rate 3D face model can be computed of celebrities and politicians obtained
Some of the most memorable cases by refining the shape of a statistical from YouTube.
are seen in blockbuster movies, such face model to fit a dense input depth While the original intent of perfor-
as The Curious Case of Benjamin Button, map. Not only can this face-fitting mance-driven video was to advance im-
Furious 7, and Rogue One: A Star Wars problem be solved in real time using mersive communication, teleconfer-
Story, in which large teams of highly efficient numerical optimization, but encing, and visual effects, the ease and
skilled digital artists use cutting-edge the shape and expression parameters speed with which believable manipula-
digitization technologies. Despite the of the face can be fully recovered and tions can be created with such technol-
progress of 3D-scanning solutions, fa- used for retargeting purposes. If facial ogy has garnered widespread media
cial animation systems, and advanced performance capture is possible for attention, and raised concerns about
rendering techniques, weeks of manu- conventional RGB videos in real time, the authenticity and ethical aspects of
al work are still needed to produce even then believable facial expressions can artificially generated videos.
just a few seconds of animation. be transferred effortlessly from one Recent progress in artificial in-
When depth cameras, such as person to another in a live-action sce- telligence, such as deep generative
structured light systems or time-of- nario. This capability is demonstrated models, is further accelerating these
flight sensors, were introduced, the by the Face2Face system of Thies et al. capabilities and making them even
3D acquisition of highly deformable detailed in the following paper. easier for ordinary people to use. For
surfaces became possible. Graph- As opposed to animating a CG instance, Pinscreen’s photorealistic
ics and vision researchers started to character in a virtual environment, avatar creation technology requires
investigate the possibility of directly the key challenge is to produce a pho- only a single input picture and can be
capturing complex facial performanc- torealistic video of a target subject used to create compelling video game
es, instead of manually key-framing whose facial performance matches characters at scale, but face replace-
them or applying complex simula- the source actor. In addition to being ment technologies, such as DeepFake,
tions. While marker-based motion able to track and transfer dense facial have been exploited to create inappro-
capture technologies are already movements at the pixel level, the fa- priate and misleading video content.
widely adopted in industry, massive cial albedo and lighting environment I highly recommend the following
amounts of hand-tweaking and post- also must be estimated on the target paper, as it is one of the first that pro-
processing are still needed to generate video, in order to ensure a consistent motes awareness of modern technol-
lifelike facial movements. On the oth- shading with the original footage. The ogy’s capability to manipulate videos,
er hand, markerless solutions based solution consists of a real-time GPU at a time in which social media is sus-
on real-time RGB-D sensors provide implementation of a photometric ceptible to the spread of doctored vid-
dense and accurate facial shape mea- consistency optimization that solves eos and fake news.
surements and were poised to auto- for parameters of a morphable face
mate and scale animation production. model originally introduced by Blanz Hao Li (hao@hao-li.com) is assistant professor of
computer science at the University of Southern California,
The release of the mainstream and Vetter, extended with linear facial director of the Vision and Graphics Lab of the USC
Kinect depth sensor from Microsoft expression blendshapes. The authors Institute for Creative technologies, and CEO of Pinscreen.
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 95
research highlights
DOI:10.1145/ 32 9 2 0 3 9
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 97
research highlights
velocity measure and search for the k-nearest neighbors In contrast to state-of-the-art movie production setups
based on time stamps and flow distance. Saragih et al.15 that work with markers and complex camera setups, our sys-
present a real-time avatar animation system from a single tem presented in this paper only requires commodity hard-
image. Their approach is based on sparse landmark track- ware without the need for markers. Our tracking results can
ing, and the mouth of the source is copied to the target using also be used to animate virtual characters. These virtual
texture warping. characters can be part of animation movies, but can also be
used in computer games. With the introduction of virtual
2.5. Online reenactment reality glasses, also called Head Mounted Displays (HMDs),
Recently, first online facial reenactment approaches based the realistic animation of such virtual avatars, becomes
on RGB-(D) data have been proposed. Kemelmacher- more and more important for an immersive game-play.
Shlizerman et al.10 enable image-based puppetry by query- FaceVR22 demonstrates that facial tracking is also possible if
ing similar images from a database. They employ an the face is almost completely occluded by such an HMD. The
appearance cost metric and consider rotation angular dis- project also paves the way to new applications like telecon-
tance. While they achieve impressive results, the retrieved ferencing in VR based on HMD removal.
stream of faces is not temporally coherent. Thies et al.19 Besides these consumer applications, you can also think
show the first online reenactment system; however, they of numerous medical applications. For example, one can
rely on depth data and use a generic teeth proxy for the build a training system that helps patients to train expres-
mouth region. In this paper, we address both shortcom- sions after a stroke.
ings: (1) our method is the first real-time RGB-only reenact-
ment technique; (2) we synthesize the mouth regions 4. METHOD OVERVIEW
exclusively from the target sequence (no need for a teeth In the following, we describe our real-time facial reenact-
proxy or direct source-to-target copy). ment pipeline (see Figure 2). Input to our method is a mon-
ocular target video sequence and a live video stream captured
2.6. Follow-up work by a commodity webcam. First, we describe how we synthe-
The core component of the proposed approach is the dense size facial imagery using a statistical prior and an image for-
face reconstruction algorithm. It has already been adapted mation model (see Section 5). We find optimal parameters
for several applications, such as head mounted display that best explain the input observations by solving a varia-
removal,22 facial projection mapping,17 and avatar digitiza- tional energy minimization problem (see Section 6). We
tion.9 FaceVR22 demonstrates self-reenactment for head minimize this energy with a tailored, data-parallel GPU-
mounted display removal, which is particularly useful for based Iteratively Reweighted Least Squares (IRLS) solver
enabling natural teleconferences in virtual reality. The (see Section 7). We employ IRLS for off-line non-rigid model-
FaceForge17 system enables real-time facial projection map- based bundling (see Section 8) on a set of selected keyframes
ping to dynamically alter the appearance of a person in the to obtain the facial identity of the source as well as of the tar-
real world. The avatar digitization approach of Hu et al.9 get actor. This step jointly recovers the facial identity, expres-
reconstructs a stylized 3D avatar that includes hair and sion, skin reflectance, and illumination from monocular
teeth, from just a single image. The resulting 3D avatars can input data. At runtime, both source and target animations
for example be used in computer games. are reconstructed based on a model-to-frame tracking strat-
egy with a similar energy formulation. For reenactment, we
3. USE CASES propose a fast and efficient deformation transfer approach
The proposed facial tracking and reenactment has several that directly operates in the subspace spanned by the used
use-cases that we want to highlight in this section. In movie statistical prior (see Section 9). The mouth interior that best
productions the idea of facial reenactment can be used as a matches the re-targeted expression is retrieved from the
video editing tool to change for example the expression of input target sequence (see Section 10) and is warped to pro-
an actor in a particular shot. Using the estimated geometry duce an accurate fit. We demonstrate our complete pipeline
of an actor, it can also be used to modify the appearance of in a live reenactment setup that enables the modification of
a face in a post-process, for example, changing the illumi- arbitrary video footage and perform a comparison to state-
nation. Another field in post-production is the synchroniza- of-the-art tracking as well as reenactment approaches (see
tion of an audio channel to the video. If a movie is translated Section 11). In Section 12, we show the limitations of our pro-
to another language, the movements of the mouth do not posed method.
match the audio of the so called dubber. Nowadays, to Since we are aware of the implications of a video editing
match the video, the audio including the spoken text is tool like Face2Face, we included a section in this paper that
adapted, which might result in a loss of information. discusses the potential misuse of the presented technology
Using facial reenactment instead, the expressions of the (see Section 13). Finally, we conclude with an outlook on
dubber can be transferred to the actor in the movie and future work (see Section 14).
thus the audio and video is synchronized. Since our reen-
actment approach runs in real time, it is also possible to 5. SYNTHESIS OF FACIAL IMAGERY
setup a teleconferencing system with a live interpreter The synthesis of facial imagery is based on a multi-linear face
that simultaneously translates the speech of a person to model (see the original Face2Face paper for more details). The
another language. first two dimensions represent facial identity — that is,
JA N UA RY 2 0 1 9 | VO L. 6 2 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 99
research highlights
this information a compute shader calculates the final to the number k of used keyframes, but the processing
derivatives that are needed for the optimization. time increases linearly with k. In our experiments we
used k = 6 keyframes for the estimation of the identity
8. NON-RIGID MODEL-BASED BUNDLING parameters, which results in a processing time of only a
To estimate the identity of the actors in the heavily few seconds (∼ 20s).
underconstrained scenario of monocular reconstruc-
tion, we introduce a non-rigid model-based bundling 9. EXPRESSION TRANSFER
approach. Based on the proposed objective, we jointly To transfer the expression changes from the source to the
estimate all parameters over k key-frames of the input target actor while preserving person-specificness in each
video sequence. The estimated unknowns are the global actor’s expressions, we propose a sub-space deforma-
identity {α, β} and intrinsics κ as well as the unknown tion transfer technique. We are inspired by the defor-
per-frame pose {δ k, R k, t k} k and illumination parame- mation transfer energy of Sumner et al., 18 but operate
ters {γ k} k. We use a similar data-parallel optimization directly in the space spanned by the expression blend-
strategy as proposed for model-to-frame tracking, but shapes. This not only allows for the precomputation of
jointly solve the normal equations for the entire key- the pseudo-inverse of the system matrix, but also drasti-
frame set. For our non-rigid model-based bundling cally reduces the dimensionality of the optimization
problem, the non-zero structure of the corresponding problem allowing for fast real-time transfer rates.
Jacobian is block dense. Our PCG solver exploits the Assuming source identity α S and target identity α T fixed,
non-zero structure for increased performance (see origi- transfer takes as input the neutral , deformed source
nal paper). Since all keyframes observe the same face δS, and the neutral target expression. Output is the
identity under potentially varying illumination, expres- transferred facial expression δT directly in the reduced
sion, and viewing angle, we can robustly separate iden- sub-space of the parametric prior.
tity from all other problem dimensions. Note that we As proposed by Sumner and Popović,18 we first compute
also solve for the intrinsic camera parameters of Π, the source deformation gradients Ai ∈ R3×3 that transform
thus being able to process uncalibrated video footage. the source triangles from neutral to deformed. The deformed
The employed Gauss-Newton framework is embedded target is then found based on the undeformed
in a hierarchical solution strategy (see Figure 3). The state by solving a linear least-squares prob-
underlying hierarchy enables faster convergence and lem. Let (i0, i1, i2) be the vertex indices of the i-th
avoids getting stuck in local minima of the optimized triangle, and , then
energy function. We start optimizing on a coarse level the optimal unknown target deformation δT is the mini-
and lift the solution to the next finer level using the mizer of:
parametric face model. In our experiments we used
three levels with 25, 5, and 1 Gauss-Newton iterations (7)
for the coarsest, the medium, and the finest level,
respectively. In each Gauss-Newton iteration, we This problem can be rewritten in the canonical least-squares
employ 4 PCG steps to efficiently solve the underlying form by substitution:
normal equations. Our implementation is not restricted
(8)
Figure 5. Results of our reenactment system. Corresponding run times are listed in Table 1. The length of the source and resulting output
sequences is 965, 1436, and 1791 frames, respectively; the length of the input target sequences is 431, 286, and 392 frames, respectively.
Input
Reenactment
Input
Reenactment
Input
Reenactment
14. CONCLUSION
Thies et al.
Acknowledgments
We would like to thank Chen Cao and Kun Zhou for the
blendshape models and comparison data, as well as Volker
Blanz, Thomas Vetter, and Oleg Alexander for the provided
face data. The facial landmark tracker was kindly provided
by TrueVisionSolution. We thank Angela Dai for the video
Input Garrido et al. 2015 Ours
voice over and Daniel Ritchie for video reenactment. This
research is funded by the German Research Foundation
(DFG), grant GRK-1773 Heterogeneous Image Systems, the
ERC Starting Grant 335545 CapReal, and the Max Planck
experts. Our approach is a game changer, since it Center for Visual Computing and Communications
enables editing of videos in real time on a commodity (MPC-VCC). We also gratefully acknowledge the support
PC, which makes this technology accessible to from NVIDIA Corporation for hardware donations.
n on-experts. We hope that the numerous demonstra-
tions of our reenactment systems will teach people to
think more critical about the video content they References
1. Blanz, V., Vetter, T. A morphable modeling for realtime facial
model for the synthesis of 3d faces. animation. ACM TOG 32, 4 (2013), 40.
Proc. SIGGRAPH (1999), ACM Press/ 3. Bregler, C., Covell, M., Slaney, M.
Addison-Wesley Publishing Co., Video rewrite: Driving visual speech
a
Standard deviations w.r.t. the final frame rate are 0:51, 0:56, and 0:59 fps, 187–194. with audio. Proc. SIGGRAPH (1997),
respectively. Note that CPU and GPU stages run in parallel. 2. Bouaziz, S., Wang, Y., Pauly, M. Online ACM Press/Addison-Wesley
Publishing Co., 353–360. 11. Li, H., Yu, J., Ye, Y., Bregler, C. ACM Trans. Graph. (TOG) 34, 6 reenactment and eye gaze control in
4. Cao, C., Bradley, D., Zhou, K., Realtime facial animation with (2015). virtual reality. ArXiv, Non-Peer-
Beeler, T. Real-time high-fidelity on-the-fly correctives. ACM 20. Thies, J., Zollhöfer, M., Stamminger, M., Reviewed Prepublication
facial performance capture. TOG 32, 4 (2013), 42. Theobalt, C., Nießner, M. Demo of by the Authors, abs/1610.03151
ACM TOG 34, 4 (2015), 46: 12. Li, K., Xu, F., Wang, J., Dai, Q., Liu, Y. A face2face: Real-time face capture and (2016).
1–46:9. data-driven approach for facial reenactment of RGB videos. ACM 23. Vlasic, D., Brand, M., Pfister, H.,
5. Cao, C., Hou, Q., Zhou, K. Displaced expression synthesis in video. Proc. SIGGRAPH 2016 Emerging Popović, J. Face transfer with
dynamic expression regression for CVPR (2012), 57–64. Technologies, SIGGRAPH ‘16 multilinear models. ACM TOG 24, 3
real-time facial tracking and 13. Ramamoorthi, R., Hanrahan, P. A (ACM, 2016), New York, NY, USA, (2005), 426–433.
animation. ACM TOG 33, 4 signal-processing framework for 5:1–5:2. 4. Weise, T., Bouaziz, S., Li, H., Pauly, M.
2
(2014), 43. inverse rendering. Proc. SIGGRAPH 21. Thies, J., Zollhöfer, M., Stamminger, M., Realtime Performance-Based
6. Chen, Y.-L., Wu, H.-T., Shi, F., Tong, X., (ACM, 2001), 117–128. Theobalt, C., Nießner, M. Face2Face: Facial Animation 30, 4 (2011), 77.
Chai, J. Accurate and robust 3d 14. Saragih, J.M., Lucey, S., Cohn, J.F. Real-time face capture and 25. Weise, T., Li, H., Gool, L.V., Pauly, M.
facial capture using a single rgbd Deformable model fitting reenactment of RGB videos. Proc. Face/off: Live facial puppetry.
camera. Proc. ICCV (2013), by regularized landmark Comp. Vision and Pattern Recog. Proc. 2009 ACM SIGGRAPH/
3615–3622. mean-shift. IJCV 91, 2 (2011), (CVPR), IEEE (2016). Eurographics Symposium on
7. Garrido, P., Valgaerts, L., Rehmsen, O., 200–215. 22. Thies, J., Zollhöfer, M., Computer animation (Proc. SCA’09),
Thormaehlen, T., Perez, P., 15. Saragih, J.M., Lucey, S., Cohn, J.F. Stamminger, M., Theobalt, C., Nießner, ETH Zurich, August 2009.
Theobalt, C. Automatic face Real-time avatar animation from a M. FaceVR: Real-time facial Eurographics Association.
reenactment. Proc. CVPR (2014). single image. Automatic Face and
8. Garrido, P., Valgaerts, L., Sarmadi, H., Gesture Recognition Workshops
Steiner, I., Varanasi, K., Perez, P., (2011), 213–220.
Theobalt, C. Vdub: Modifying face video 16. Shi, F., Wu, H.-T., Tong, X., Chai, J. Justus Thies and Matthias Nießner Marc Stamminger (marc.stamminger@
of actors for plausible visual alignment Automatic acquisition of high-fidelity ({justus.thies, niessner}@tum.de), fau.de), University of Erlangen-Nuremberg,
to a dubbed audio track. Computer facial performances using Technical University Munich, Erlangen, Germany.
Graphics Forum, Wiley-Blackwell, monocular videos. ACM TOG 33, Garching, Germany.
Hoboken, New Jersey, 2015. 6 (2014), 222. Christian Theobalt (theobalt@mpi-inf.
9. Hu, L., Saito, S., Wei, L., Nagano, K., 17. Siegl, C., Lange, V., Stamminger, M., Michael Zollhöfer (zollhoefer@cs. mpg.de), Max-Planck-Institute for
Seo, J., Fursund, J., Sadeghi, I., Bauer, F., Thies, J. Faceforge: stanford.edu), Stanford University, Informatics, Saarbrücken, Germany.
Sun, C., Chen, Y., Li, H. Avatar Markerless non-rigid face Stanford, CA, USA.
digitization from a single image multi-projection mapping. IEEE
for real-time rendering. ACM Transactions on Visualization Copyright held by authors/owners.
Trans. Graph. 36, 6 (2017), and Computer Graphics, Publication rights licensed to ACM. $15.00
195:1–195:14. 2017.
10. Kemelmacher-Shlizerman, I., Sankar, 18. Sumner, R.W., Popović, J.
A., Shechtman, E., Seitz, S.M. Being Deformation transfer for triangle
john malkovich. In Computer meshes. ACM TOG 23, 3 (2004),
Vision—ECCV 2010, 11th European 399–405.
Conference on Computer Vision, 19. Thies, J., Zollhöfer, M., Nießner, M.,
Heraklion, Crete, Greece, September Valgaerts, L., Stamminger, M., Watch the authors discuss this work in the
5–11, 2010, Proceedings, Part I Theobalt, C. Real-time expression exclusive Communications video.
(2010), 341–353. transfer for facial reenactment. https://cacm.acm.org/videos/face2face
Technical Perspective
To view the accompanying paper,
visit doi.acm.org/10.1145/3292035 rh
THE DIFFIE-HELLMAN KEY exchange to quickly break many hashed pass- ing table enables an online attack on a
protocol is at the heart of many crypto- words. The beautiful insight of this victim session in just under a minute.
graphic protocols widely used on the paper is that precomputation can be To make matters even worse, the
Internet. It is used for session setup in devastating for systems that use Diffie- authors describe a new clever attack on
HTTPS (TLS), in SSH, in IPsec, and oth- Hellman modulo a prime. Precomputa- TLS 1.2, called logjam, which lets an at-
ers. The original protocol, as described tion attacks are a real threat and must tacker downgrade a victim connection
by Diffie and Hellman, operates by be taken into account when choosing to TLS Export. The resulting session is
choosing a large prime p and comput- parameters for real-world cryptography. then vulnerable to a precomputation
ing certain exponentiations modulo The authors speculate that a pre- attack. Logjam exposes a significant
this prime. For the protocol to be secure computation attack on discrete-log flaw in the design of TLS 1.2.
one needs, at the very least, that the modulo a fixed 1,024-bit prime is with- So, what should we do? The short
discrete-log problem modulo the prime in reach for a nation state. Because a answer is that websites must migrate to
p be difficult to solve. This problem is small number of fixed primes is em- TLS 1.3. TLS 1.3 is a recent significant
quite easy to state: fix a large prime p, ployed by a large number of websites, a upgrade to the TLS protocol. Compliant
and an integer 0 < g < p (a generator). precomputation attack on a few primes implementations must support Diffie-
Next, choose an integer 0 < x < p and can be used to compromise encrypted Hellman using an elliptic curve group
compute h = gx modulo p. The discrete- Internet traffic at many sites. called NIST P-256. It is likely that many
log problem is to compute x given only To make matters worse, the authors websites will use Diffie-Hellman in this
p, g and h. If this problem could be show there is no need to break 1,024- group. Using a universally fixed group
solved efficiently, for most h, then the bit primes to attack TLS. The reason is a seems as bad as using a universal prime
Diffie-Hellman protocol for the chosen weak TLS cryptography suite called TLS p, however, currently there is no known
(p, g) would be insecure. Export. This suite was included in TLS practical precomputation attack on el-
The authors of the following paper due to export control regulations that liptic curve Diffie-Hellman, so that the
show that, in practice, implementations were in effect at the time that TLS was precomputation attacks discussed ear-
that use Diffie-Hellman tend to choose designed. TLS Export includes support lier do not apply, as far as we know. One
a universally fixed prime p (and fixed for 512-bit primes, where discrete-log is point of concern is NSA’s August 2015
g). For example, many SSH servers and woefully insecure. Sadly, TLS Export is announcement recommending that
IPsec VPNs use a fixed universal 1,024- still supported by many websites, and companies stop their transition to ellip-
bit prime p. The same is true for HTTPS many (82%) use a fixed 512-bit prime tic curve cryptography or, if they already
Web servers, although to a lesser extent. shipped with the Apache Web server. have transitioned, use larger elliptic
Is it safe to use the same 1,024-bit The precomputation attack is extremely curve parameters. The official reason
prime p everywhere? The authors show effective against this 512-bit prime. The in the notice is the concern over a quan-
that the answer is no. The reason is a authors carry out the offline precompu- tum computer that can break elliptic
beautiful precomputation attack on the tation phase in a few days, and the result- curve Diffie-Hellman. One may wonder,
discrete-log problem modulo a prime. however, if there are other reasons be-
A precomputation attack proceeds in hind this announcement. Is there a yet-
two steps: First, in a one-time offline The authors of the to-be discovered practical preprocess-
phase, before trying to attack any par- ing attack on P-256? Currently, there is
ticular victim, the attacker works hard following paper show no indication that such an attack exists.
to compute a certain table based on the that, in practice, In summary, preprocessing attacks
fixed p and g. Then, when attacking a are a real concern in cryptography. It is
victim session, the attacker uses the implementations that critically important to take them into
precomputed table to quickly compute use Diffie-Hellman account when choosing cryptographic
discrete-log and break the session. The parameters. The following paper is a
same precomputed table can be used to tend to choose wonderful illustration of this.
quickly break many sessions. a universally fixed
Precomputation attacks affect many Dan Boneh is a professor of computer science and
cryptographic schemes. For example, prime p (and fixed g). electrical engineering at Stanford University, and
co-director of the Stanford Computer Security Lab,
they are often used to break weak pass- Stanford, CA, USA.
Figure 1. Number field sieve for discrete logarithms. This algorithm consists of a precomputation stage that depends only on the prime p
and a descent stage that computes individual logarithms. With sufficient precomputation, an attacker can quickly break any Diffie-Hellman
instances that use a particular p.
p Log db x
Diffie-Hellman parameters. A prominent example is the To ensure agreement on the negotiation messages,
Oakley groups,17 which give “safe” primes of length 768 and to prevent downgrade attacks, each party computes
(Oakley Group 1), 1024 (Oakley Group 2), and 1536 (Oakley the TLS master secret from gab and calculates a Message
Group 5). These groups were published in 1998 and have Authentication Code (MAC) of its view of the handshake
been used for many applications since, including IKE, SSH, transcript. These MACs are exchanged in a pair of Finished
Tor, and Off-the-Record Messaging (OTR). messages and verified by the recipients.
When primes are of sufficient strength, there seems to To comply with 1990s-era U.S. export restrictions on cryp-
be no disadvantage to reusing them. However, widespread tography, SSL 3.0 and TLS 1.0 supported reduced-strength
reuse of Diffie-Hellman groups can convert attacks that are DHE_EXPORT ciphersuites that were restricted to primes
at the limits of an adversary’s capabilities into devastating no longer than 512 bits. In all other respects, DHE_EXPORT
breaks, since it allows the attacker to amortize the cost of protocol messages are identical to DHE. The relevant export
discrete logarithm precomputation among vast numbers of restrictions are no longer in effect, but many servers main-
potential targets. tain support for backward compatibility.
To understand how HTTPS servers in the wild use Diffie-
3. ATTACKING TLS Hellman, we modified the ZMap6 toolchain to offer DHE and
TLS supports Diffie-Hellman as one of several possible key DHE_EXPORT ciphersuites and scanned TCP/443 on both
exchange methods, and prior to public disclosure of our the full public IPv4 address space and the Alexa Top Million
attack, about two-thirds of popular HTTPS sites supported it, domains. The scans took place in March 2015. Of 539,000
most commonly using 1024-bit primes. However, a smaller HTTPS sites among Top Million domains, we found that
number of servers also support legacy “export-grade” Diffie- 68.3% supported DHE and 8.4% supported DHE_EXPORT.
Hellman using 512-bit primes that are well within reach of Of 14.3mn IPv4 HTTPS servers with browser-trusted certifi-
NFS-based cryptanalysis. Furthermore, for both normal and cates, 23.9% supported DHE and 4.9% DHE_EXPORT.
export-grade Diffie-Hellman, the vast majority of servers use While the TLS protocol allows servers to generate their
a handful of common groups. own Diffie-Hellman parameters, just two 512-bit primes
In this section, we exploit these facts to construct a novel account for 92.3% of Alexa Top Million domains that sup-
attack against TLS, which we call the Logjam attack. First, we port DHE_EXPORT (Table 1), and 92.5% of all servers with
perform NFS precomputations for the two most popular 512- browser-trusted certificates that support DHE_EXPORT. The
bit primes on the web, so that we can quickly compute the dis- most popular 512-bit prime was hard-coded into many ver-
crete logarithm for any key exchange message that uses one of sions of Apache; the second most popular is the mod_ssl
them. Next, we show how a man-in-the-middle, so armed, can default for DHE_EXPORT.
attack connections between popular browsers and any server
that allows export-grade Diffie-Hellman, by using a TLS proto- 3.2. Active downgrade to export-grade DHE
col flaw to downgrade the connection to export-strength and Given the widespread use of these primes, an attacker with
then recovering the session key. We find that this attack with the ability to compute discrete logarithms in 512-bit groups
our precomputations can compromise connections to about could efficiently break DHE_EXPORT handshakes for about
8% of HTTPS servers among Alexa Top Million domains. 8% of Alexa Top Million HTTPS sites, but modern browsers
never negotiate export-grade ciphersuites. To circumvent
3.1. TLS and Diffie-Hellman this, we show how an attacker can downgrade a regular
The TLS handshake begins with a negotiation to determine DHE connection to use a DHE_EXPORT group, and thereby
the cryptographic algorithms used for the session. The cli- break both the confidentiality and integrity of application
ent sends a list of supported ciphersuites (and a random data.
nonce cr) within the ClientHello message, where each cipher- The attack, which we call Logjam, is depicted in Figure 2
suite specifies a key exchange algorithm and other primi- and relies on a flaw in the way TLS composes DHE and
tives. The server selects a ciphersuite from the client’s list
and signals its selection in a ServerHello message (containing
a random nonce sr).
Table 1. Top 512-bit Diffie-Hellman primes for TLSd.
TLS specifies ciphersuites supporting multiple varieties of
Diffie-Hellman. Textbook Diffie-Hellman with unrestricted Source Popularity Prime
strength is called “ephemeral” Diffie-Hellman, or DHE, and Apache 82% 9fdb8b8a004544f0045f1737d0ba2e0b
is identified by ciphersuites that begin with TLS_DHE_*.c In 274cdf1a9f588218fb435316a16e3741
DHE, the server is responsible for selecting the Diffie-Hellman 71fd19d8d8f37c39bf863fd60e3e3006
parameters. It chooses a group (p, g), computes gb, and sends 80a3030c6e4c3757d08f70e6aa871033
mod_ssl 10% d4bcd52406f69b35994b88de5db89682
a ServerKeyExchange message containing a signature over the c8157f62d8f33633ee5772f11f05ab22
tuple (cr, sr, p, g, gb) using the long-term signing key from its d6b5145b9f241e5acc31ff090a4bc711
certificate. The client verifies the signature and responds 48976f76795094e71e7903529f5a824b
with a ClientKeyExchange message containing g a. (others) 8% (463 distinct primes)
d
8.4% of Alexa Top Million HTTPS domains allow DHE_EXPORT, of which 92.3% use
c
New ciphersuites that use elliptic curve Diffie-Hellman (ECDHE) are gaining one of the two most popular primes, shown here.
in popularity, but we focus exclusively on the traditional prime field variety.
TLS False Start. Even when clients enforce shorter tim- and our own experiments but further work is needed for
eouts and servers do not reuse values for b, the attacker can greater confidence. We summarize all the costs, measured
still break the confidentiality of user requests that use TLS or estimated in Table 2.
False Start. Recent versions of Chrome, Internet Explorer, DH-768: done in 2016. When the ACM CCS version of this
and Firefox implement False Start, but their policies on article was prepared, the latest discrete logarithm record
when to enable it vary. Firefox 35, Chrome 41, and Internet was a 596-bit computation. Based on that work, and on prior
Explorer (Windows 10) send False Start data with DHE. experience with the 768-bit factorization record in 2009,12
In these cases, a man-in-the-middle can record the hand- we made the conservative prediction that it was possible, as
shake and decrypt the False Start payload at leisure. explained in Section 2, to put more computational effort into
sieving for the discrete logarithm case than for factoring, so
4. NATION-STATE THREATS TO DIFFIE-HELLMAN that the linear algebra step would run on a slightly smaller
The previous sections demonstrate the existence of practi- matrix. This led to a runtime estimate of around 37,000 core-
cal attacks against Diffie-Hellman key exchange as currently years, most of which was spent on linear algebra.
used by TLS. However, these attacks rely on the ability to This estimate turned out to be overly conservative, for sev-
downgrade connections to export-grade cryptography. In eral reasons. First, there have been significant improve-
this section we address the following question: how secure ments in our software implementation (Section 3.3). In
is Diffie-Hellman in broader practice, as used in other pro- addition, our estimate did not use the Joux-Lercier alter-
tocols that do not suffer from downgrade, and when applied native polynomial selection method,11 which is specific
with stronger groups? to discrete logarithms. For 768-bit discrete logarithms,
To answer this question we must first examine how the this polynomial selection method leads to a significantly
number field sieve for discrete logarithms scales to 768- smaller computational cost.
and 1024-bit groups. As we argue below, 768-bit groups in In 2016, Kleinjung et al. completed a 768-bit discrete log-
relatively widespread use are now within reach for academic arithm computation.13 While this is a massive computation
computational resources. Additionally, performing precom- on the academic scale, a computation of this size has likely
putations for a small number of 1024-bit groups is plausi- been within reach of nation-states for more than a decade.
bly within the resources of nation-state adversaries. The This data is mentioned in Table 2.
precomputation would likely require special-purpose hard- DH-1024: Plausible with nation-state resources. Experi
ware, but would not require any major algorithmic improve- mentally extrapolating sieving parameters to the 1024-bit
ments. In light of these results, we examine several standard case is difficult due to the trade-offs between the steps of
Internet security protocols — IKE, SSH, and TLS — to deter- the algorithm and their relative parallelism. The prior work
mine their vulnerability. Although the cost of the precompu- proposing parameters for factoring a 1024-bit RSA key is
tation for a 1024-bit group is several times higher than for an thin, and we resort to extrapolating from asymptotic com-
RSA key of equal size, a one-time investment could be used plexity. For the number field sieve, the complexity is exp
to attack millions of hosts, due to widespread reuse of the ((k + o(1) )(log N)1/3(log log N)2/3 ), where N is the integer to
most common Diffie-Hellman parameters. Finally, we apply factor or the prime modulus for discrete logarithm and k is
this new understanding to a set of recently published docu- an algorithm-specific constant. This formula is inherently
ments to evaluate the hypothesis that the National Security imprecise, since the o(1) in the exponent can hide poly-
Agency has already implemented such a capability. nomial factors. This complexity formula, with k = 1.923,
describes the overall time for both discrete logarithm and
4.1. Scaling NFS to 768- and 1024-bit Diffie-Hellman factorization, which are both dominated by sieving and lin-
Estimating the cost for discrete logarithm cryptanalysis at ear algebra in the precomputation. Evaluating the formula
larger key sizes is far from straightforward due to the com- for 768- and 1024-bit N gives us estimated multiplicative fac-
plexity of parameter tuning. We attempt estimates up to tors by which time and space will increase from the 768- to
1024-bit discrete logarithm based on the existing literature the 1024-bit case.
traffic are sent to the Cryptanalysis and Exploitation Services efficient recovery of a large number of Diffie-Hellman shared
(CES).21, 23, 25 Within the CES enclave, a specialized “attack secrets used to derive SKEYID and the subsequent KEYMAT.
orchestrator” attempts to recover the ESP decryption key with Given an efficient oracle for solving the discrete loga-
assistance from high-performance computing resources as rithm problem, attacks on IKE are possible provided that the
well as a database of known PSKs (“CORALREEF”).21, 23, 25 If attacker can obtain the following: (1) a complete two-sided
the recovery was successful, the decryption key is returned IKE transcript, and (2) any PSK used for deriving SKEYID in
from CES and used to decrypt the buffered ESP traffic such IKEv1. The available documents describe both of these as
that the encapsulated content can be processed.21, 24 explicit prerequisites for the VPN exploitation process out-
Evidence for a discrete logarithm attack. The ability to lined above and provide the reader with internal resources
decrypt VPN traffic does not necessarily indicate a defeat of available to meet these prerequisites.23
Diffie-Hellman. There are, however, several features of the Of course, this explanation is not dispositive and the possi-
described exploitation process that support this hypothesis. bility remains that NSA could defeat VPN encryption using
The IKE protocol has been extensively analyzed3,15 and is not alternative means. A published NSA document refers to the use
believed to be exploitable in standard configurations under of a router “implant” to allow decryption of IPsec traffic, indicat-
passive eavesdropping attacks. Absent a vulnerability in the ing the use of targeted malware is possible. However, this
key derivation function or transport encryption, the attacker implant “allows passive exploitation with just ESP”23 with-
must recover the decryption keys. This requires the attacker to out the prerequisite of collecting the IKE handshake mes-
calculate SKEYID generated from the Phase 1 Diffie-Hellman sages. This indicates it is an alternative mechanism to the
shared secret after passively observing an IKE handshake. attack described above.
While IKE is designed to support a range of Diffie-Hellman The most compelling argument for a pure cryptographic
groups, our Internet-wide scans (Section 4.3) show that the attack is the generality of NSA’s VPN exploitation process.
vast majority of IKE endpoints select one particular 1024-bit This process appears to be applicable across a broad swath
Diffie-Hellman group even when offered stronger groups. of VPNs without regard to endpoint’s identity or the ability
Conducting an expensive, but feasible, precomputation for to compromise individual endpoints.
this single 1024-bit group (Oakley Group 2) would allow the
4.3. Effects of a 1024-bit break
In this section, we use Internet-wide scanning to assess the
Figure 3. NSA’s VPN decryption infrastructure. This classified impact of a hypothetical DH-1024 break on IKE, SSH, and
illustration published by Der Spiegel25 shows captured IKE HTTPS. Our measurements, performed in early 2015, indicate
handshake messages being passed to a high-performance that these protocols would be subject to widespread compro-
computing system, which returns the symmetric keys for ESP
mise by a nation-state attacker who had the resources to invest
session traffic. The details of this attack are consistent with an
efficient break for 1024-bit Diffie-Hellman. in precomputation for a small number of 1024-bit groups.
IKE. We measured how IPsec VPNs use Diffie-Hellman
TOP SECRET//COMINT//REL TO USA, FVEY
Internet Key Exchange (IKE) / Internet Security Association Key Management Protocol (ISAKMP)
in practice by scanning a 1% random sample of the pub-
Authentication Header (AH) / Encapsulating Security Payload (ESP) lic IPv4 address space for IKEv1 and IKEv2 (the protocols
VPN1
TURMOIL
T: Socket Connection
C: Selector Hit Query/Response
used to initiate an IPsec VPN connection) in May 2015. We
F: Binary Interface Key
VPN18
KEYCARD T = Transport
C = Content
used the ZMap UDP probe module to measure support for
T: Secure Socket (SSL) F = Format
VPN7
T: Socket Connection
C: PIQ Blade Management
F: WebSA
DOC / CES
Oakley Groups 1 and 2 (two popular 768- and 1024-bit, built-
C: Encrypted and Decrypted
Application Sessions
HPC
Resources in groups) and which group servers prefer. Of the 80K hosts
F: SOTF Grid
VPN4
VPN
Metrics
Resource
Allocation
that responded with a valid IKE packet, 44.2% were will-
CES XML Gateway
PIQ T: ITx
Manager
ing to negotiate a connection using one of the two groups.
CES Firewall
Orchestrator
XKEYSCORE
VPN3
Attack
(AO)
T: ITx
C: ESP Key Req/Res
F: IH(SOAP)
Cryptovariables
CORAL supported Oakley Group 1 (768-bit) while 86.1% and 91.0%
REEF
respectively supported Oakley Group 2 (1024-bit). In our
sample of IKEv1 servers, 2.6% of profiled servers preferred
Auburn University philosophy, and names of three to five refer- Boston College
Department of Computer Science and Software ences at http://aufacultypositions.peopleadmin. Associate or Full Professor of Computer
Engineering (CSSE) com/postings/3222. There is no application Science
Multiple Faculty Positions in Data Science & deadline. The application review process will
Engineering continue until successful candidates are identi- Description:
fied. Selected candidates must be able to meet The Computer Science Department of Boston
Auburn CSSE invites applications from candidates eligibility requirements to work legally in the College is poised for significant growth over the
specializing in all areas related to data: analytics, United States at the time of appointment for the next several years and seeks to fill faculty posi-
engineering, mining, science and techniques for proposed term of employment. Auburn Univer- tions at all levels beginning in the 2019-2020 aca-
massive data storage, querying and analysis to sity is an Affirmative Action/Equal Opportunity demic year. Outstanding candidates in all areas
solve real-world problems. We seek candidates at Employer. It is our policy to provide equal em- will be considered, with a preference for those
the Assistant Professor level, however outstand- ployment opportunities for all individuals with- who demonstrate a potential to contribute to
ing candidates at a senior level will also be con- out regard to race, sex, religion, color, national cross-disciplinary teaching and research in con-
sidered. A Ph.D. degree in computer science, soft- origin, age, disability, protected veteran status, junction with the planned Schiller Institute for
ware engineering or a closely related field must be genetic information, sexual orientation, gender Integrated Science and Society at Boston College.
completed by the start of appointment. Excellent identity, or any other classification protected by See https://www.bc.edu/bc-web/schools/mcas/
communication skills are required. applicable law. departments/computer-science.html and https://
The department will offer a new joint (with www.bc.edu/bc-web/schools/mcas/sites/schiller-
the Department of Mathematics and Statistics) institute.html for more information.
M.S. degree in Data Science & Engineering in fall Boston College
2019. Successful candidates will play an active Assistant Professor of the Practice or Lecturer Qualifications:
role in this program as well as develop a nation- in Computer Science A Ph.D. in Computer Science or a closely related
ally recognized and extramurally funded research discipline is required, together with a distin-
program in Data Science & Engineering. The Computer Science Department of Boston guished track record of research and external
CSSE is home to the Auburn Cyber Research College seeks to fill one or more non-tenure-track funding, and evidence of the potential to play a
Center (http://cyber.auburn.edu), and is affiliated teaching positions, as well as shorter-term visit- leading role in the future direction of the depart-
with the McCrary Institute for Critical Infrastruc- ing teaching positions. All applicants should be ment, both in the recruitment of faculty and the
ture Protection and Cyber Systems (http://mc- committed to excellence in undergraduate edu- development of new academic programs.
crary.auburn.edu). The department currently has cation, and be able to teach a broad variety of un-
21 full-time tenure-track and six teaching-track dergraduate computer science courses. Faculty To apply go to http://apply.interfolio.
faculty members, who support strong undergrad- in longer-term positions will participate in the com/54226.
uate and graduate programs (M.S. in CSSE, M.S. development of new courses that reflect the evolv- Application process begins October 1, 2018.
in Cybersecurity Engineering and Ph.D. in CSSE). ing landscape of the discipline.
Faculty research areas include artificial intelli- Minimum requirements for the title of As- Boston College is a Jesuit, Catholic university
gence, architecture, computational biology, com- sistant Professor of the Practice, and for the title that strives to integrate research excellence with
puter science education, cybersecurity, data sci- of Visiting Assistant Professor, include a Ph.D. in a foundational commitment to formative liberal
ence, energy-efficient systems, human-computer Computer Science or closely related discipline. arts education. We encourage applications from
interaction, Internet of Things, learning science, Candidates who have only attained a Master’s candidates who are committed to fostering a di-
machine learning, modeling and simulation, degree would be eligible for the title of Lecturer, verse and inclusive academic community. Boston
multi-agent systems, networks, software engi- or Visiting Lecturer. See https://www.bc.edu/bc- College is an Affirmative Action/Equal Opportu-
neering and wireless engineering. Further infor- web/schools/mcas/departments/computer-sci- nity Employer and does not discriminate on the
mation may be found at the department’s home ence.html for more information. basis of any legally protected category including
page http://www.eng.auburn.edu/csse. disability and protected veteran status. To learn
Auburn University is one of the nation’s pre- To apply go to more about how BC supports diversity and inclu-
mier public land-grant institutions. It is ranked http://apply.interfolio.com/54268. sion throughout the university, please visit the
52nd among public universities by U.S. News and Application process begins October 1, 2018. Office for Institutional Diversity at http://www.
World Report. The university is nationally rec- bc.edu/offices/diversity.
ognized for its commitment to academic excel- Boston College is a Jesuit, Catholic university
lence, its positive work environment, its student that strives to integrate research excellence with
engagement, and its beautiful campus. Auburn a foundational commitment to formative liberal Boston College
residents enjoy a thriving community, recognized arts education. We encourage applications from Tenure Track, Assistant Professor of Computer
as one of the “best small towns in America,” with candidates who are committed to fostering a di- Science
moderate climate and easy access to major cities verse and inclusive academic community. Boston
or to beach and mountain recreational facilities. College is an Affirmative Action/Equal Opportu- The Computer Science Department of Boston
Situated along the rapidly developing I-85 corri- nity Employer and does not discriminate on the College is poised for significant growth over the
dor between Atlanta, Georgia, and Montgomery, basis of any legally protected category including next several years and seeks to fill faculty posi-
Alabama, Auburn residents have access to excel- disability and protected veteran status. To learn tions at all levels beginning in the 2019-2020 aca-
lent public school systems and regional medical more about how BC supports diversity and inclu- demic year. Outstanding candidates in all areas
centers. sion throughout the university, please visit the will be considered, with a preference for those
Applicants should submit a cover letter, Office for Institutional Diversity at http://www. who demonstrate a potential to contribute to
curriculum vita, research vision, teaching bc.edu/offices/diversity. cross-disciplinary teaching and research in con-
junction with the planned Schiller Institute for record of research excellence. All successful can- and hiring process should contact the Office of
Integrated Science and Society at Boston College. didates are expected to develop a vibrant, high- Inclusion, Diversity and Equal Opportunity at
A Ph.D. in Computer Science or a closely related quality externally sponsored research program, 216-368-8877 to request a reasonable accommo-
discipline is required for all positions. See https:// supervise graduate students, and interact and dation. Determinations as to granting reasonable
www.bc.edu/bc-web/schools/mcas/departments/ collaborate with faculty across the department accommodations for any applicant will be made
computer-science.html and https://www.bc.edu/ and campus. Applicants should have a strong on a case-by-case basis.
bc-web/schools/mcas/sites/schiller-institute. commitment to high quality teaching at the un-
html for more information. dergraduate and graduate levels. Candidates
Successful candidates for the position of As- must have a Ph.D. in Computer Science or a close- Columbia Quantum Initiative at
sistant Professor will be expected to develop ly related field. Current departmental strengths Columbia University
strong research programs that can attract exter- include Artificial Intelligence, Bioinformatics, Open Rank Faculty Positions in the School of
nal research funding in an environment that also Internet of Things, Machine Learning, Networks Engineering and Applied Science
values high-quality undergraduate teaching. and Distributed Systems, Cyber-Security and Pri-
Minimum requirements for all positions in- vacy, and Software Engineering, and successful Columbia Engineering is pleased to invite appli-
clude a Ph.D. in Computer Science or closely re- candidates will be expected to be synergistic with cations for faculty positions in Quantum Science
lated discipline, an energetic research program these strengths. and Technology as part of the Quantum Initiative
that promises to attract external funding, and Non-Tenure-Track Faculty Position in Com- at Columbia University in the City of New York.
a commitment to quality in undergraduate and puter Science: We are seeking applicants dedi- Applications at all ranks will be considered. Areas
graduate education. cated to curriculum development and teaching of interest in computing, communication, and
in foundational areas of Computer and Data Sci- theoretical research include novel computation
To apply go to https://apply.interfolio. ences, including introductory programming, dis- and communication approaches, programming
com/54208. crete mathematics, data structures, data science, paradigms, algorithms, and protocols for quan-
Application review begins October 1, 2018. and computer systems. The rank of the candidate tum information applications. Areas of interest
will be commensurate with experience. In addi- in experimental research include novel physical
Boston College is a Jesuit, Catholic university tion to teaching, successful candidates are also phenomena, electronic/optical materials, de-
that strives to integrate research excellence with expected to be involved in departmental service. vices, circuits and integrated systems for quan-
a foundational commitment to formative liberal Applicants must submit (i) a cover letter, tum communication, computing, sensing, and
arts education. We encourage applications from (ii) current curriculum vita, (iii) statement of metrology. We are seeking researchers who can
candidates who are committed to fostering a di- research interests, (iv) statement of teaching in- benefit from the highly multidisciplinary envi-
verse and inclusive academic community. Boston terests, and (v) contact information for at least ronment and the state-of-the-art shared facilities/
College is an Affirmative Action/Equal Opportu- three references for a junior position and six ref- infrastructure within Columbia University such
nity Employer and does not discriminate on the erences for a senior position. Applications will be as the Columbia Nano Initiative and the Data Sci-
basis of any legally protected category including reviewed starting immediately and will continue ence Institute. The candidate is expected to hold
disability and protected veteran status. To learn until the positions are filled. a full or joint appointment in the Departments
more about how BC supports diversity and inclu- of Computer Science, Electrical Engineering,
sion throughout the university, please visit the Application materials may be sent by email to: Applied Physics and Applied Mathematics, In-
Office for Institutional Diversity at http://www. Faculty Search Committee dustrial Engineering and Operations Research,
bc.edu/offices/diversity. Dept. of Electrical Engineering and or Mechanical Engineering and is expected to
Computer Science contribute to the advancement of their field, the
Case Western Reserve University department(s) and the School by developing an
Case Western Reserve University c/o YoLonda Stiggers (yxs307@case.edu) original and leading externally funded research
Faculty Positions 10900 Euclid Avenue, Glennan 321 program, establishing strong collaborations in
Cleveland, OH 44106-7071 research and education with related disciplines
The Department of Electrical Engineering and such as Physics and Chemistry, and contributing
Computer Science at Case Western Reserve Uni- Founded in 1826, Case Western Reserve Uni- to the undergraduate and graduate educational
versity invites applications for three faculty posi- versity is a highly ranked private research uni- mission of the Department(s) and the School.
tions: versity located in Cleveland, Ohio. As a vibrant Columbia fosters multidisciplinary research and
Tenure-Track Faculty Position in Data Sci- and up-and-coming city, Cleveland was named encourages collaborations with academic depart-
ence: While exceptional candidates in all areas of one of the top 15 best places to live in the US by ments and units across Columbia University.
Computer and Data Sciences will be considered timeout.com in 2016. The campus is in the heart Candidates must have a Ph.D. or its profes-
for this position, our priority areas include Big of University Circle, a world-renowned area for its sional equivalent by the starting date of the ap-
Data Management and Systems, Databases, Data cultural vibrancy, hosting the Cleveland Museum pointment. Applicants for this position must
Mining, and Machine Learning. While all ranks of Art (the second highest ranked art museum in demonstrate the potential to do pioneering re-
will be considered, preference will be given to the country), Cleveland Orchestra, the Museum search and to teach effectively. The school is es-
candidates at the Assistant Professor level. of Natural History, Cleveland Institute of Music, pecially interested in qualified candidates who
Tenure-Track Faculty Position in Cyber-Secu- and the Cleveland Botanical Garden, as well as can contribute, through their research, teaching,
rity: In conjunction with the Institute for Smart, two world-class health institutions, The Cleve- and/or service, to the diversity and excellence of
Secure, and Connected Systems (ISSACS), we are land Clinic and University Hospitals of Cleveland. the academic community.
seeking candidates with research interests in- With generous support from the Cleveland Foun- For additional information and to apply,
cluding but not limited to: theory and algorithms dation, Case Western Reserve University recently please see: http://engineering.columbia.edu/
(e.g., cryptography, secure computing, secure launched the Institute for Smart, Secure and Con- faculty-job-opportunities. Applications should
data analysis, data privacy), systems (e.g., secure nected Systems and is an anchor partner in the be submitted electronically and include the fol-
networks, distributed systems, cloud and virtual- IOT Collaborative. lowing: curriculum-vitae including a publication
ized environments, mobile devices), and applica- In employment, as in education, Case West- list, a description of research accomplishments, a
tions (e.g., security in Internet-of-Things, cyber- ern Reserve University is committed to Equal statement of research and teaching interests and
physical systems, health, computer forensics). Opportunity and Diversity. Women, veterans, plans, contact information for three experts who
While all ranks will be considered, preference members of underrepresented minority groups, can provide letters of recommendation, and up to
will be given to candidates at the Associate or Full and individuals with disabilities are encouraged three pre/reprints of scholarly work. All applica-
Professor level. to apply. tions received by February 1, 2019 will receive full
For the tenure-track positions, candidates Case Western Reserve University provides consideration.
for the junior positions should have potential for reasonable accommodations to applicants with Applicants can consult http://www.engineer-
excellence in innovative research. Candidates for disabilities. Applicants requiring a reasonable ing.columbia.edu for more information about
the senior positions should have an established accommodation for any part of the application the school.
ted to excellence in both research and teaching. existing departmental strengths, (2) leverage
Salary is highly competitive and dependent upon exceptional interdisciplinary collaboration op-
qualifications. portunities, and (3) align with vital college-level,
The Department of Computer Science (www. cross-cutting research themes including smart &
ACM Transactions cs.memphis.edu) offers B.S., M.S., and Ph.D. connected communities, transformative comput-
programs as well as graduate certificates in ing, healthcare transformations, and agile manu-
on Social Computing Data Science and Information Assurance, and
participates in an M.S. program in Bioinformat-
facturing (for details on these initiatives, please
visit: http://cec.sc.edu/employment).
ics (through the College of Arts and Sciences).
The Department has been ranked 55th among Applicants from all traditional as well as non-
CS departments with federally funded research. traditional and interdisciplinary areas of Com-
The Department regularly engages in large-scale puter Science and Engineering are urged to apply.
ACM TSC seeks to publish multi-university collaborations across the na- Research areas of special interest include:
work that covers the tion. For example, CS faculty led the NIH-funded
Big Data “Center of Excellence for Mobile Sensor
˲˲ Human in the loop or knowledge-enhanced
AI, deep learning, natural language processing,
full spectrum of social Data-to-Knowledge (MD2K)” and the “Center for question-answering/conversational AI, brain-
Information Assurance (CfIA)”. inspired computing, semantic/cognitive/percep-
computing including The Institute for Intelligent Systems consists tual computing;
theoretical, empirical, of 54 faculty members across 14 departments in-
cluding Communication Sciences and Disorders,
˲˲ Big data - including social, sensor, biological,
and health - and scalable computing/analysis of
systems, and design Computer Science, Engineering, Education, Lin- big data;
research contributions. guistics, Philosophy and Psychology. The IIS of-
fers a graduate certificate in Cognitive Science, a
˲˲ Computer vision, robotics, and human-com-
puter interaction Including personal digital/as-
TSC welcomes research minor in Cognitive Science, and is affiliated with sistive technology;
˲˲ Cyber-physical systems and Internet of Things;
employing a wide range BA and MS programs in other departments. The
IIS receives $4-5 million in external awards per ˲˲ Software analysis and testing, adaptive and au-
of methods to advance year from federal agencies such as NSF, IES, DoD,
and NIH. Further information about the Institute
tonomous systems, and search-based software
engineering; and
the tools, techniques, for Intelligent Systems can be found at http://iis. ˲˲ Next generation networking, cybersecurity, and
understanding, and memphis.edu.
Known as America’s distribution hub, Mem-
privacy
The Department of Computer Science and
practice of social phis ranked as America’s 6th best city for jobs by Engineering offers B.S. degrees in Computer
Upstart Puzzles
Randomized Anti-Counterfeiting
CON S U M E R S AND PHARMACEUTICAL
companies have been known to dis-
agree over drug prices but have at least
one interest in common: Neither “If at most one of the two
wants the consumer to consume packages in this jar has
counterfeit drugs. The drug compa- fake pills, how can I be pretty
nies do not want to lose the sales, 4 8
and consumers may have a critical 2 1 sure to consume exactly
need for the drug they paid for. 7 9 one real pill per day,
0
3 1 3 6
no more, no less?”
Counterfeiters have other ideas,
however, so produce and sell packages
5
8 1
full of fakes (usually simple sugar pills)
in a $100 billion-per-year worldwide 6
business. The drug companies have 4 10 7
fought such fakery by incorporating
special packaging (holograms, unique 2 9 5
numbers, sometimes even electronic
tags) on the drug containers. With so
much money to gain by selling sugar
pills for high prices, however, the coun-
terfeiters have managed to copy the
packaging very expertly. While the colors are invisible before packages 133 and 152 and separated
A clever but so far fictitious drug consumption, the consumer sees the the pills into two jars—J1 and J2—but
company has implemented the follow- color after taking the pill. A consumer does not remember which jar corre-
ing random algorithm-style invention: worried about counterfeiting can log sponds to which package. The con-
Give each drug package (or bottle) a into a drug company website and, upon sumer does not want to take two real
unique identifier, number each pill demonstrating some proof of pur- pills on any particular day (they can
within a package—1, 2, 3, . . . —and in- chase, look up the package number to be toxic in high doses) and wants to,
sert an innocuous food coloring, from see the pills associated with each num- of course, avoid taking only fake pills
a palette of at least two colors, inside ber (such as, 1: red, 2: red, 3: blue, . . . any day.
each pill. The food coloring is invisible for package 133). Once the consumer Problem. How can the consumer
until the pill is consumed. starts taking the pills, the consumer still determine whether the pills in
Now suppose each package receives can compare the pill’s color with the each of the two jars are fake with a
a random sequence of red or blue food one intended for that pill number in probability of ¾ for each jar after tak-
colors, with equal probability. For con- that package. If even a single color ing at most six pills altogether?
creteness, suppose package 133 has col- does not match, that pill, as well as all Solution. The consumer picks a
ors in the following numeric sequence the other pills in the package, should number i between 1 and 10, such that
be presumed fake. Because the coun- the intended color of pill i from pack-
1: red, 2: red, 3: blue, 4: blue, 5: blue, terfeiter would have to guess the color- age 133 differs from pill i from pack-
6: red, 7: blue, 8: red, 9: red, 10: blue ing, after only two pills have been con- age 152. In our example, i could be 1
sumed, the consumer has probability because pill 1 in package 133 should
Package 152 has 1 – ((1/2)x(1/2)) or ¾ probability of be red, and pill 1 in package 152
knowing the package is fake. should be blue. If the consumer picks
1: blue, 2: red, 3: blue, 4: blue, 5: red, But mistakes happen. For example, pill 1 from jar J1 and it is red, the
6: red, 7: red, 8: red, 9: blue, 10: blue suppose the consumer has opened consumer [C O NTINUED O N P. 119]
M
&C
Association for Computing Machinery
2 Penn Plaza, Suite 701, New York, NY 10121-0701, USA
Phone: +1-212-626-0658 Email: acmbooks-info@acm.org