Sie sind auf Seite 1von 11

BeyondBigData1

HalR.Varian2

Thereisnowacomputerinthemiddleofmosteconomictransactions.Thesecomputermediated
transactionsenabledatacollectionandanalysis,personalizationandcustomization,continuous
experimentation,andcontractualinnovation.Takingfulladvantageofthepotentialofthesenew
capabilitieswillrequireincreasingsophisticationinknowingwhattodowiththedatathatarenow
available.
Keywords:bigdata,machinelearning,privacy,personalization,experimentation,monitoring.

Abillionhoursago,modernhomosapiensemerged.
Abillionminutesago,Christianitybegan.
Abillionsecondsago,theIBMPCwasreleased.
AbillionGooglesearchesagowasthismorning.
Thereisalotofpressabout``bigdata:''howimportantitis,howpowerfulitis,andhowitwill

changeourlives.Allthatmaybetrue,butIbelievethatthereissomethingmorefundamentalgoingon,
whichIrefertoascomputermediatedtransactions.
Duetothedramaticcostdecreaseincomputersandcommunication,thereismpwa
computerinthemiddleofvirtuallyeverytransaction.Thiscomputercouldbeassimpleasacash
registerorascomplexasadatacenter.Thesecomputersweremostlyputinplaceforaccounting
reasons,butnowthattheyareavailablethesecomputershaveseveralotheruses.

PresentedattheNABEAnnualMeeting,September10,2013,SanFrancisco,CA.
HalR.VarianisthechiefeconomistatGoogle.Since2002hehasbeeninvolvedinmanyaspectsof
thecompany,includingauctiondesign,econometricanalysis,finance,corporatestrategy,andpublic
policy.HealsoholdsacademicappointmentsattheUniversityofCalifornia,Berkeleyinthree
departments:business,economics,andinformationmanagement.HehastaughtatMIT,Stanford,
Oxford,Michigan,andotheruniversities.HeisafellowoftheGuggenheimFoundation,the
EconometricSociety,andtheAmericanAcademyofArtsandSciences.Hewascoeditorofthe
AmericanEconomicReviewfrom1987to1990andholdshonorarydoctoratesfromtheUniversityof
OuluinFinlandandtheUniversityofKarlsruheinGermany.Hehaspublishednumerouspapersin
economictheory,industrialorganization,financialeconomics,econometrics,andinformation
economicsandseveralbooks.HereceivedhisS.B.fromMITandhisM.A.inmathematicsandhis
Ph.D.ineconomicsfromtheUniversityofCaliforniaatBerkeley
1
2

Today,Iwanttotalktoyouaboutfoursuchuses

Dataextractionandanalysis.

Personalizationandcustomization.

Continuousexperiments.

Newkindsofcontractsduetobettermonitoring.

Thefirstonedataextractionandanalysisiswhateveryoneistalkingaboutwhentheytalk
aboutbigdata.Itisimportant,butIthinkthattheotherthreeusesgobeyondbigdataandwill,intime,
becomeevenmoreimportantthanthefirst.Butlet'sstartatthebeginningandlookattheminorder.3

DataExtractionandAnalysis
Accordingtopublishedreports,Googlehasseen30trillionURLs,crawlsover20billionofthose
aday,andanswers100billionsearchqueriesamonth[Singhal2012].Ordinarydatabasesjustcan't
handlethesemagnitudes,sowehavehadtodevelopnewtypesofdatabasesthatcanstoredatain
massivetablesspreadacrossthousandsofmachinesandcanprocessqueriesonmorethanatrillion
recordsinfewseconds
Wepublisheddescriptionsofthesetoolsintheacademicliterature,andindependent
developershavecreatedopensourcetoolsthathavesimilarfunctionality.Thesetoolsarenowwidely
availableandrunoncloudcomputingenginessuchasAmazonWebServices,GoogleCompute
Engine,andotherservices
Fromtheeconomicpointofview,whatwaspreviouslyafixedcost(deployingandmanaginga
datacentercapableofdealingwithmassivedata)isnowavariablecost.Asanyeconomistknows,if
youlowerthebarrierstoentryyouwillgetlotsofnewentrantsandwehaveseenanumberofstartups
inthisarea.

SomeofthematerialinthispresentationiscoveredinmoredetailinVarian[2010].

Buttoolsfordatamanipulationareonlypartofthestory.Wehavealsoseensignificant
developmentsinmethodsfordataanalysisthathaveemergedfromthemachinelearningcommunity.
Nowadayswehearalotabout``predictiveanalytics,''``datamining,''and``datascience.''The
techniquesfromthesesubjects,alongwithsomegoodoldfashionedstatisticsandeconometricshave
allowedfordeeperanalysisofthesevastdatasets,enabledbycomputermediatedtransactions.
Therehavealsobeensignificantdevelopmentsinopensourceprogramsthatcanbeusedto
applythesetools,suchastheRlanguage,Weka(WaikatoEnvironmentforKnowledgeAnalysis),and
others.Oneofmostimportantfeaturesoftheselanguagesisthethrivingcommunitiesofuserswho
providepeersupportontheweb.Giventhefactthatcloudhardware,databasetools,analysistools,
anddevelopersupportisnowwidelyavailableitisnotsurprisingthatwehaveseenmanynewentrants
inthedataanalysisarea.
Backin2006,NetFlixrealizedthat75percentofmovieviewsinitslibraryweredrivenby
recommendations.TheycreatedtheNetFlixPrizeof$1millionthatwouldbeawardedtothegroup
thatdevelopedthebestmachinelearningsystemforrecommendations,aslongasitimprovedthe
currentversionbyatleast10percent.Theyprovidedtrainingdataofabout100Mratings,500,000
users,and1800movies.Ayearlater,theprizewaswonbyateamthatblended800statisticalmodels
togetherusingmodelaveraging.ThesuccessoftheNetFlixchallengeledtotheestablishmentofa
startupnamedKaggle,whowillsetupNetFlixlikechallenges.(Note:Iamaninvestorandanadviser
toKaggle.)
Nowadays,therearemanyorganizationsthathaveinterestingdatabutnointernalexpertisein
dataanalysis.Asthesametime,therearedataanalystsallovertheworldthathaveexpertisebutno
data(andcouldusesomemoney).Kaggleputsthetwosidesofthemarkettogether.Theynowhave
114,000datascientistswhotacklethesubmittedproblems.TheirmottoisWemakedatasciencea
sport.Herearesomeexamplesoftheirprojects:

HeritageHealthPrize:$3milliontopredicthospitalreadmits

GesturerecognitionforMicrosoftKinnect:$10,000

GEflightoptimization:$250,000

Belkinenergydisaggregationforappliances:$25,000

RecognizingParkinsonsdiseasefromsmartphoneaccelerometerdata:$10,000

...andmanymore
WhenyoucombineData+Tools+Techniques+Expertise,youcansolvealotofhard

problems!

Personalizationandcustomization
Nowadays,peoplehavecometoexpectpersonalizedsearchresultsandads.Ifyouask
Googleforpizzanearmeyouwillgetbackwhatyouexpect.WhenyougotoAmazon,they
recommendproductsjustforyou.ThestoryhasitthatJeffBezosrecentlysignedontohisAmazon
accountandsawamessagethatsaidBasedonyourrecentpurchases,werecommendTheNew
YorkTimes,theChicagoTribune,andtheLosAngelesTimes.
Thesepersonalizedsearches,servicesandadshavethepotentialofrevolutionizingmarketing.
TheaveragemarketingcostpercarsoldintheUSisabout$650.Butmuchofthatmarketing
expenditureiswasted.Whyshowadstosomeonewhojustboughtacar?
LarryPageusedtosaythatthetroublewithGooglewasthatyouhadtoaskitquestions.He
thoughtGoogleshouldknowwhatyouwantandtellittoyoubeforeyouaskthequestion.Weall
thoughthewasjokingbutLarry'svisionhasbeenrealizedbyGoogleNow,anapplicationthatrunson
Androidphones.OnedaymyphonebuzzedandIlookedatamessagefromGoogleNow.Itsaid:
YourmeetingatStanfordstartsin45minutesandthetrafficisheavy,soyoubetterleavenow.The
kickeristhatIhadnevertoldGoogleNowaboutmymeeting.ItjustlookedatmyGoogleCalendar,
sawwhereIwasgoing,sentmycurrentlocationanddestinationtoGoogleMaps,andfiguredouthow
longitwouldtakemetogettomyappointmentgivencurrenttrafficconditions.

Somepeoplethinkthatsthecoolestthingintheworld,andothersarejustcompletelyfreaked
outbyit.TheissueisthatGoogleNowhastoknowalotaboutyouandyourenvironmenttoprovide
theseservices.Thisworriessomepeople.But,ofcourse,Isharehighlyprivateinformationwithmy
doctor,lawyer,accountant,trainer,andothersbecauseIreceiveidentifiablebenefitsandItrustthem
toactinmyinterest.IfIwanttogetamortgage,Ihavetosendthebanktwoyearsofincometax
returns,amonthofpaychecks,aprintoutofmynetworth,anddozensofotherdocuments.WhyamI
willingtoshareallthisprivateinformation?BecauseIgetsomethinginreturnthemortgage.
Oneeasywaytoforecastthefutureistopredictthatwhatrichpeoplehavenow,middleclass
peoplewillhaveinfiveyears,andpoorpeoplewillhaveintenyears.Itworkedforradio,TV,
dishwashers,mobilephones,flatscreenTV,andmanyotherpiecesoftechnology.
Whatdorichpeoplehavenow?Chauffeurs?Inafewmoreyears,wellallhaveaccessto
driverlesscars.Maids?Wewillsoonbeabletogethousecleaningrobots.Personalassistants?
ThatsGoogleNow.4 Thisareawillbeanintenselycompetitiveenvironment:ApplealreadyhasSIriand
Microsoftishardatworkatdevelopingtheirowndigitalassistant.AnddontforgetIBMsWatson.
Ofcoursetherewillbechallenges.Butthesedigitalassistantswillbesousefulthateveryone
willwantone,andthescarestoriesyoureadtodayaboutprivacyconcernswilljustseemquaintand
oldfashioned.

Experiments
Asalleconomistsareaware,correlationisnotthesamethingascausation.Observational
datanomatterhowbigitiscanusuallyonlymeasurecorrelation,notcausality.Totakeatrivial
example,supposeweobservethattherearemorepoliceinareaswithhighercrimeratescanwe
concludethatpolicecausecrime?Morespecifically,doesthecorrelationmeanthatifyouassign

ForanideaofGooglescurrentefforts,seeSinghal[2012].

morepolicetoanareawillyougetmorecrime?Youmayhaveaverygoodmodelthatcanpredictthe
crimeratebyprecinctdependingonhowmanypolicehavebeenassignedtothatprecinct,butthat
modelcouldfailmiserablyinestimatinghowthecrimeratechangeswhenyouaddmorepolicemento
aprecinct.Itisquitepossibletoseeapositiverelationshipbasedontheobservationaldataanda
negativerelationbasedonanactualexperiment.
Likewise,biggerfireshavemorefiremendomorefiremencausebiggerfires?Ifyouassign
morefiremenwillthefiregetbigger?Or,totakeanexampleclosertohome,whatabouttheimpactof
advertisingonsales?Considerthisprobablyapocryphalquestionaskedtoamarketingmanager:
Howdoyouknowincreasedadvertisingwillgeneratemoresales?Lookatthischart,he
responded,EveryDecemberIincreaseadspend,andeveryDecemberIgetmoresales.
Companiesareveryinterestedinusingtheirbigdatatoestimatedemandfunctions:ifIcutmy
price,howwilltheamountsoldchange?Thisisactuallyoneofthemostcommonrequestsforbig
dataanalysts.Alas,usuallyobservationaldatacannotanswerthisquestion.Supposemysalesfall
whendisposableincomeislow,soIrespondbycuttingprice.Conversely,salesrisewhendisposable
incomeishigh,soIraiseprice.Overtime,weseehighpricesassociatedwithhighsalesandlow
priceswithlowsalessowegetanupwardslopingdemandfunction!Thehistoricalrelationship
betweenpriceandquantitywouldnotbeagoodguideforfuturepricingdecisions.
Thisiswhereeconometricscomesin:ourgoalistoestimatethecausalresponsefrom
changingprice,whichwilloftenbedifferentfromthehistoricalrelationshipbetweenpriceandsales.In
thesesimpleexamplestheproblemisobvious,butinothercasesitcanbetrickytodistinguishcausal
effectsfrommerecorrelation.
Whatsthesolution?Experiments.Thesearethegoldstandardforcausality.More
specifically,whatyouwantarerandomlyassignedtreatmentcontrolexperiments.Ideallythesewould
becarriedoncontinuously.Thisisprettyeasytodoontheweb.Youcanassigntreatmentandcontrol
groupsbasedontraffic,cookies,usernames,geographicareas,andsoon.

Googlerunsabout10,000experimentsayearinsearchandads.Thereareabout1,000
runningatanyonetime,andwhenyouaccessGoogleyouareindozensofexperiments.
Whattypesofexperiments?Therearemany:

userinterfaceexperiments

rankingalgorithmsforsearchandads

featureexperiments

productdesign

tuningexperiments
Allofthesecanbeongoingexperimentsorspecialpurposeexperiments.Googlehasbeenso

successfulwithourownexperimentsthatwehavemadethemavailabletoouradvertisersand
publishersintwoprograms.Thefirst,AdvertiserCampaignExperiments(ACE),allowsadvertisersto
experimentwithbids,budgets,creatives5 ,andsoon,inordertofindtheoptimalsettingsfortheirads.
Thesecond,ContentsExperimentPlatform,ispartofGoogleAnalyticsitallowspublishersto
experimentwithdifferentwebpagedesignstofindtheone(s)thatperformbestforthem.
Sobigdataisonlypartofthestory.Ifyoureallywanttounderstandcausality,youhavetorun
experiments.Andifyourunexperimentscontinuously,youcancontinuouslyimproveyoursystem.
In1910,HenryFordandhiscolleagueswereonthefactoryflooreveryday,finetuningtheassembly
lineamethodofproductionthatrevolutionizedmanufacturing.Inthe1970s,thebuzzin
manufacturingwaskaizentheJapanesetermforcontinuousimprovement.Nowwehave
computerkaizen,wheretheexperimentationcanbeentirelyautomated.Justasmassproduction
changedthewayproductswereassembledandcontinuousimprovementchangedhowmanufacturing
wasdone,continuousexperimentationwillimprovethewayweoptimizebusinessprocessesinour
organizations.

Materialusedtogenerateleadsandselladvertisingdevelopedanddesignedbyartdirectorsand/or
copyrightersinanadagency.http://marketing.about.com/od/marketingglossary/g/creativesdef.htm
5

Monitoringandcontracts
Mylastexampleofhowcomputermediatedtransactionsaffectseconomicactivityhastodo
withcontracts.Contractscanbeverysimple:YougivemealatteandIwillgiveyou$2.This
contractiseasilyverifiedIcanseethatIgotmylatte,youcanseethatyougotyour$2.However,
thereareothercontractsthatarenotsoeasytoverify.WhenIrentacar,somewhereinthefineprint
thereisastatementtotheeffectthatIwilloperatethecarinasafemanner.Buthowcantheyverify
that?Thereusedtobenoway,butnowinsurancecompaniescanputvehicularmonitoringsystemsin
thecar[Scism2013andStross2012].Theycanusethesesystemstoverifywhetherornotyouare
fulfillingyourpartofthecontract.TheygetloweraccidentratesandIgetlowerprices.
Hereisanothercarexample:Iwillleaseacartoyou,ifyousendinyourmonthlypaymentson
time.Whathappensifyoustopsendinginthemonthlypayments?Thelenderwilllikelysendouta
repromantorepossessthecar.Nowadaysitsaloteasierjusttoinstructthevehicularmonitoring
systemnottoallowthecartobestartedandtosendamessagetotheleasingtellingitwheretogoto
pickitup.
Theexamplesofcontractualinnovationarentallaboutmisbehavior.Supposeanadvertiser
wenttoanewspaperandsaidIwillbuyanadinyourpublication,aslongasIonlyhavetopayforthe
peoplethatsawmyadandthencametomystore.Ifwouldbenearlyimpossibletoverify
performanceinthatcontract.Butnow,inpayperclickadvertising,itiscommonforadvertiserstopay
onlyforthoseadvertisersthatclickthroughtotheirwebsite.
Becausetransactionsarenowcomputermediated,wecanobservebehaviorthatwas
previouslyunobservableandwritecontractsonit.Thisenablestransactionsthatweresimplynot
feasiblebefore.Letmegiveyouanexample:PremiseDataCorporation.Thisisasmallstartup
fundedbyGoogleVenturesthatcollectseconomicdata.(Note:IamanadvisertoPremise.)

SupposeyouwanttotrackthepriceofporkinShanghai.Premisewillsend20college
studentsoutwithmobilephonestophotographthepriceofporkinhundredsofstoresinShanghai.
Thenicethingisthattheycanprovethattheyactuallywenttothestoreandwerentsittingacoffee
shoptypingpricesintoaspreadsheet.Infact,theirmobilephonesprovidetheproof:ageolocated,
timestamped,photographoftheporkpriceinastore.Youcandothesamethingwithotherdata
collectionscountingcarsinparkinglots,peopleinshoppingcenters,ortrafficthroughan
intersection.Premiseusescrowdsourcing+smartphone+camera+timestamp+geolocation,allof
whichcanbeusedtoverifydataintegrity.
Premiseisusingthelatesttechnology,buttherearehistoricalexamplesofcomputermediated
transactionsthatdatebackmorethanacentury.Theclassiccashregisterwasinventedbysaloon
keeperJamesRittyin1883,whousedthenametheincorruptiblecashierforhisinvention.Whatkind
ofmonitoringdiditprovide?First,itwentkachingwhenthecashdrawerwasopened,sothecashier
knewtopayattentiontowhatwasgoingon.Secondly,itusedapapertapetorecordalltransactions,
sotheycouldbereconciledwithcashflowandproductflow.Somehistoriansclaimthatcreditcash
registerswereacriticalinventionforeconomicgrowhtsincetheywithallowedretailbusinessestohire
clerksandcashiersfromoutsidethetrustedfamily.6
Computermediatedtransactionshaveenablednewbusinessmodelsthatweresimplynot
feasiblebefore.Herearetwomoreexamplesofcontractualinnovation:
Uber.thisserviceallowsyoutoloadanapponyourphoneanduseittocallablackcarsedan
whenyouneedit.Youseethecaronamaponyourphoneandliterallywatchitcometoyou.Youget
inthecar,gotowhereyouregoing,andautomaticallypaybyusingyourmobilephone.Theentire
transactionismonitored.Ifsomethinggoeswrongwiththetransaction,youcanusethecomputerized
recordtofindwhatwentwrong.Uberisgivingboththedriverandthepassengeranosurprise
experienceviatheidentityverificationduetothecomputermediation..

Foradiscussionofhowmoderntechnologyisaffectingtherestaurantindustry,seeLohr[2013].

AirBnB.Thiscompanyallowsyoutorentoutaspareroom,anapartment,oraninlawunit.
JustaswithUber,computersverifyidentityonbothsidesofthetransactionandeachsideofthe
transactioncanratetheperformanceoftheotherside.Whatwaspreviouslyhardtofindinformation
aboutreputationoftherenterandtherenteehasnowbecomeeasilyaccessible,allowingpeopleto
trustmorebecauseverificationhasbeenautomated.

Summary
Computermediatedtransactionscanmakeabigdifferencetoeconomicperformance.They
allowforanalysis,personalization,experimentation,andmonitoring.Thesecapabilitiesallowfornew
transactionsthatwerenotfeasiblebefore.Kaggle,Premise,Uber,AirBnBareallexamples,butthere
aremanymore.
Manycompanieshavethedata:theyjustdon'tknowwhattodowithit.Missingingredients:
datatools(easy),knowledge(hard),andexperience(veryhard).ThreeyearsagoIsaid"statisticswill
bethesexyjobofthenextdecade",whichendearedmetostatisticianseverywhere.Sincethen,
applicationstotheStanfordStatisticsDepartmenthavetripled.Isthiscausationorcorrelation?Idont
know,andtheydontknowbutImwillingtotakecreditanyway.

References

Lohr,Steve.2013.HowSurveillanceChangesBehavior:aRestaurantWorkersCaseStudy,New
YorkTimes.August26.
Scism,Leslie.2013.StateFarmisThere:AsyouDrive,WallStreetJournal.August15.
Singhal,Amit2012.BreakfastwithGooglesSearchTeam,
http://www.youtube.com/watch?v=8a2VmxqFg8A#!
Stross,Randall.2012.SoYoureaGoodDriver?LetsGototheMonitor,NewYorkTimes,
November24
Varian,HalR.2010.ComputerMediatedTransactions,AmericanEconomicReview,100(2)110.