REPORT OF DR. INGRAM OLKIN
BACKGROUND AND QUALIFICATIONS
am cuenty «Professor of Statists and Bducation at Stanford Univesity My
cee backroond ae Bechror's got in emai fo b= COneEP of the City
of New You in 1947, « Mars degen mathematical ain Columbia
University in 1948, a «Doctorate in theoretical statistics Som ne University of North
Carolin, Chapel Hi, From 1951 fo 1960 Fservedon he cles of Michigan State
“Usiversty andthe University of Minnesota, and fom 1961 to the present, on the faculty
of Stanford University
‘nave been Ci ofthe Department of Stasis at Stanford UES and the
University of Minnesota 1am the recipient of Gugeesneim Fellowship; an Alexander
von Humbolt Senior Resezeh Feowship a Lady Davis Fellowship st Hebrew
Unversity, and an Overseas Flow, Chur College Cambrdas, 1 a¥e received an
lonorary Dato of Science degre fom DeMoation Univers as well asthe Wilks
Medal fiom the American Statiseal Assocation. Ihave receive 2 Lifetime
chievemeot Award from the Arcerican Payehologial Asoiaton and have bees
jected a Fellow of several staxicel societies, Thave served on SSNS governmental
panels and commis including he Nations Sieace Foundation, National Instittes of
Haan, National Center for Eauction Stato, Burns of Be Census, and Research
Council ofthe Nanonal Academy of Sciences.
‘ay fet of statistical experise isin ssl infernesy statistical models in the
seca! and behavior] sence, multvariae static aly and methods for
combining information, (have wren on sampbing methods and have taught courses 2this area, Thave authored or edited numerous books and over 150 scenic papers. My
corieulum vitae and bibliography, which iscudes a isting of all my polished papers
anached as Appendix A
‘have not given court cestimony during he past four eur, Tam being
compensated at my standard hourly rate of $400.
SCOPE OF WORK
1 was aod by counse! forthe plaints inthe lawouts against NAPSTER (nde
‘Nos. €99-05183 and C00.0074 (MEP) to design and implement a sampling protocol
tbat would allow one t draw fir conclusions about the activites accusing on the
NAPSTER system regarding, and to provide an ounion onthe following wo subjects:
(@) the percent of NAPSTER uses tht make copyrighted works available to others for
authorized exchange sing the NAPSTER system (Qh “Use Project and
(&) the frequency with which NAPSTER users actully trade copyrighted works
without permission (he “Download Project”)
‘SUMMARY OF CONCLUSIONS
1. User Panis
‘Based on the sampling that I direct Ihave concluded that at any given time,
vtualy all users of NAPSTER are making copyrighted files available for downloading,
by others without the permission ofthe copyright owners.2. Download Project
‘Based on the sampling that I directed Thave concluded that a very high percentage
of files actully exchanged by NAPSTER's users are copyrighted works that are likely
being traded without permission.
ISCUSSION OF SAMPLING PROTOCOL,
In order to create any sampling procedure, the sample shouldbe representative of
the populmion and not give undue weight wan “item,” An item might be a ie of day,
sm individual, and 0 on.
‘A second aspect i the determination ofthe sie ofthe sample tobe taken in over
to provide with 95% confidence a margin of err thet sno greater than 3%, which is a
standard used, for example, in opinion polls. To accomplish this accuracy, standard
stitial methodology yields a required sample size of 1067 items. A larger sample size
‘would provide an even smaller marin of cor, and accordingly, to be conservative,
sed the igure of 1150 items
“The basis fr the sampling procedure that used is what is called "simple random
sampling” In such a sampling procedure, cach item bas the same opportunity tobe
chosen as does every othe item. Taus, for example, the choice ofa random hour during 2
24-hour day is obtained by labeling tickets from 1 10 24 and drawing one ticket from an
sum. This isthe mode ofa lottery. However, itis more efficient to have a computer
provide the draws instead of actually drawing tickets from an ur, and there curently
‘exist a number of algorithms that provide fr simple random sampling. Accordingly, ll
sampling ofthe data taken from NAPSTER was performed using an algorithm.After we generated the two populations of material (User and Download), the fles
‘were sent to other persons fora copyright analysis. Specifically I understand that he
information was sent to persons atthe Recording industry Association of America
(RIAA) and persons atthe Harry Fox Agency (“HFA”) to determine copyright
‘ovenership ofthe song st forth inthe data
Thave drawn my conclusions based upon the information obtained s a result of
that analysis, and I did not independently peform such analysis. For purposes of ay
‘conclusions I assume reasonable accuracy of that analysis.
“The actual sampling was designed in wo pats: 1) forthe User Project, a
sampling procedure that took place independently of NAPSTER’ participation, and (2)
{or the Download Project a sampling procedure of dat that NAPSTER captured at ines
that I designated.
1. Sampling Protogo fr the User Praiect
“The sample was ikea once every hour forfour days fora total period of 96 hours.
“Thi permite as to generate a very large database of milion of ies and many
‘thousands of users.
In order to use the NAPSTER search engine, « "word" it required. The teem word
{is not well defined, It could be (1) the name of 2 performer, (2) the name of a song, (3) 8
meaningless set of eters, or (8) an actual word taken from the dictionary. We decided
otto employ method (3), that is, meaningless letter combinations, onthe assumption
tna his procedure would not replicate how NAPSTER users typically search, Methods
(2) or (2), moreover, could generate some bias in that they would guide the result
obtained, far example, known songs by known artists, Thus, for ecample, the name of aPopular performer would be expected to yield an overestimate of usage ofthat
performer's works. inorder to avoid such a bias we chose method (4) and randomly
sampled words from an electronie dictionary.
Twenty-ive ofthese randomly selected words were input into NAPSTER daring
‘ech sampling session. To choose a word, «letter was euosen in proportion t its
‘Fequency ia the English language. Given a lever, words were then chosen at random
‘rom the Random House Unabridged Electronic Dictionary.
Each search provided alist of users and fle names. We used the results ofthese
random searches to identify individual NAPSTER users. Then, the files that each weer
‘was offering though the NAPSTER system at thet time were also captured.
The outcome ofthis procedure le tothe collection of approximately 24,000,000
songs and approximately 28,000 users in the four-day period. It was now required to
choose a subsample of 1150 users
‘There are a variety of methods to choose a subsample fiom a larger sample, One
such method is ols the items inthe larger sample in all possible orders, thereby giving
‘ual weight to every ordering ofthe items. Once this is done we choose a single
‘ordering at random, and thea choose the first nuzaber of elements forthe subsample, This
‘ethod is called a random penmutation procedure, and is readily eared out by computer
‘To accomplish this I generated a random permutation ftom which the first 1150 users
constituted the final subsample, Henceforth, forthe purposes ofthis discussion, the word
sample will aeually refer to the subsample, when appropriate2, Sampling Protocol for the Download Project
{estimated that downloads from eight separate times would provide a statistically
alia panorama of usage tat would be representative ofthe generel availability of files
on the NAPSTER system. Two groups of days were chosen, one from weekdays
(ofonday, Tuesday, Wednesday, Thursday), and one from weekends (Friday, Saturday,
sunday) {then generated 16 sime day combinations fom the weekday roup, aed 12
sime-day combinations fom the weekend group. Subsequently I randomly selected four
times to be sampled from each group for a taal of eight sample times, This allowed for
statisieally valid sampling ofthe actual downloading oocursng in NAPSTER’s system
vurther, the sampling assured thatthe time-day choices were different, This
resulted in the following list of eight time-day choices:
01:00 Thursday 02-00 Friday
(02:00 Monday 10:00 Sunday
17:00 Monday 15:00 Friday
22:00 Wednesday 17:00 Saturday
For each ofthe eight times that a sample was un, we caprored data fom
[NAPSTER’s servers for a five minute period, As sesult ofthis data collection we
brane a population of approximately 574185 files actully downloaded, We then
generated alist of 1150 song in a manner salogous oat sseribod inthe User Pret
to obiain our sample.CONCLUSIONS AND FINDINGS
1. The User Projees
For the User Project, the persons analyzing the songs were asked to review the
files that cach user had to offer to determine whether they could fiad two (2) entries that
represent copyrighted songs offered for downloading without permission. We found this
tobe the case for all of the 1150 users selected.
2. The Download Project
For the Download Project, Ihave concluded the following based on my review of
the data [have collected as set forth above:
(a) 1002 files, o 87.1 percent ofthe files in the sample are songs that belong to or are
administered by plaintiffs or other copyright holders ané were selected for
downloading using the NAPSTER system, without permission.
(8) 37 files, or 3.2 pereent of the files in the sample, are songs that are likely
copyrighted nd being traded without the authority ofthe rights holder, based onthe
preliminary analysis of RIAA/HEFA, but sufficient information was not avaiable to
allow for definitive confirmation in the available time,
(6) 3 files, or les than 0.26 percet of the files inthe sample, are songs which are
appareotly being made available on NAPSTER withost objection fom the igh
holders.
(4) 108 tiles, o 9.4 percent ofthe files in the sample, did aot present enough data for
conclusion regarding copyright ownersbip in the available ume,“The sampling procedures adhered to, in my opinion, indicate thatthe sampled
files and users are representative of the NAPSTER population as a whole. Therefore, Iam
“able to conchude that virtually all NAPSTER users offer copyrighted materials for
diswibution on NAPSTER without authority, and that copyright inftingement sppears {0
tbe a central and dominant part ofthe NAPSTER system.| declare thatthe foregoing represents my opinions and conclusions under penalty
of pequry,
Stanford California
June 12, 2000
en (OU:
ln Olkin, Pb D.