Beruflich Dokumente
Kultur Dokumente
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-1
WriFng
a
MapReduce
Program
in
Java
Chapter
3.1
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-2
WriFng
a
MapReduce
Program
in
Java
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-3
Chapter
Topics
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-4
Review:
The
MapReduce
Flow
Reducer Reducer
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-5
A
Sample
MapReduce
Program:
WordCount
sofa 1
the 4
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-6
Our
MapReduce
Program:
WordCount
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-7
Gebng
Data
to
the
Mapper
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-8
Example:
TextInputFormat
TextInputFormat
The
default
the cat sat on the mat</n>
Creates
the aardvark sat on
the sofa</n>
LineRecordReader
objects
Treats
each
\n-terminated
line
of
a
le
as
a
value
Key
is
the
byte
oset
of
that
LineRecordReader
line
within
the
le
key
value
0 the cat sat on the mat
23 the aardvark sat on the
sofa
52
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-9
Other
Standard
InputFormats
FileInputFormat
Abstract
base
class
used
for
all
le-based
InputFormats
KeyValueTextInputFormat
Maps
\n-terminated
lines
as
key
[separator]
value
By
default,
[separator]
is
a
tab
SequenceFileInputFormat
Binary
le
of
(key,
value)
pairs
with
some
addiFonal
metadata
SequenceFileAsTextInputFormat
Similar,
but
maps
(key.toString(),
value.toString())
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-10
Keys
and
Values
are
Objects
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-11
What
is
Writable?
The
Writable
interface
makes
serializaDon
quick
and
easy
for
Hadoop
Any
values
type
must
implement
the
Writable
interface
Hadoop
denes
its
own
box
classes
for
strings,
integers,
and
so
on
IntWritable
for
ints
LongWritable
for
longs
FloatWritable
for
oats
DoubleWritable
for
doubles
Text
for
strings
Etc.
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-12
What
is
WritableComparable?
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-13
Chapter
Topics
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-14
The
Driver
Code:
IntroducFon
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-15
The
Driver:
Complete
Code
(1)
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Job;
if (args.length != 2) {
System.out.printf("Usage: WordCount <input dir> <output dir>\n");
System.exit(-1);
}
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-16
The
Driver:
Complete
Code
(2)
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-17
The
Driver:
Import
Statements
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Job;
MapReduce
if (args.length != j2)
ob
{you
write.
We
will
omit
the
import
statements
in
future
slides
for
brevity.
System.out.printf("Usage: WordCount <input dir> <output dir>\n");
System.exit(-1);
}
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-18
The
Driver:
Main
Code
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-19
The
Driver
Class:
main
Method
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-20
Sanity
Checking
The
Jobs
InvocaFon
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-21
Conguring
The
Job
With
the
Job
Object
To
congure
t he
j ob,
FileInputFormat.setInputPaths(job, create
a
FileOutputFormat.setOutputPath(job, new Path(args[1]));
n ew
ob
object.
IdenFfy
the
Jar
which
newJPath(args[0]));
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-22
CreaFng
a
New
Job
Object
The
Job
class
allows
you
to
set
conguraDon
opDons
for
your
MapReduce
job
The
classes
to
be
used
for
your
Mapper
and
Reducer
The
input
and
output
directories
Many
other
opFons
Any
opDons
not
explicitly
set
in
your
driver
code
will
be
read
from
your
Hadoop
conguraDon
les
Usually
located
in
/etc/hadoop/conf
Any
opDons
not
specied
in
your
conguraDon
les
will
use
Hadoops
default
values
You
can
also
use
the
Job
object
to
submit
the
job,
control
its
execuDon,
and
query
its
state
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-23
Conguring
the
Job:
Sebng
the
Name
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-24
Conguring
the
Job:
Specifying
Input
and
Output
Directories
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
Next,
specify
the
input
directory
from
which
data
will
be
read,
and
job.setMapOutputKeyClass(Text.class);
the
output
directory
to
which
nal
output
will
be
wri>en.
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-25
Conguring
the
Job:
Specifying
the
InputFormat
job.setInputFormatClass(KeyValueTextInputFormat.class)
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-26
Conguring
the
Job:
Determining
Which
Files
To
Read
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-27
Conguring
the
Job:
Specifying
Final
Output
With
OutputFormat
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-28
Conguring
the
Job:
Specifying
the
Mapper
and
Reducer
Classes
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
job.setMapOutputKeyClass(Text.class);
Give
the
Job
object
informaFon
about
which
classes
are
to
be
job.setMapOutputValueClass(IntWritable.class);
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-29
Default
Mapper
and
Reducer
Classes
IdentityMapper
mahout an elephant driver map()
mahout an elephant driver
IdentityReducer
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-30
Conguring
the
Job:
Specifying
the
Intermediate
Data
Types
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
Specify
the
types
for
the
intermediate
output
keys
and
values
job.setOutputValueClass(IntWritable.class);
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-31
Conguring
the
Job:
Specifying
the
Final
Output
Data
Types
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
Specify
the
types
for
the
Reducers
output
keys
and
values.
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-32
Running
The
Job
(1)
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
Start
the
job
and
wait
for
it
to
complete.
Parameter
is
a
Boolean,
job.setMapOutputKeyClass(Text.class);
specifying
verbosity:
if
true,
display
progress
to
the
user.
Finally,
exit
job.setMapOutputValueClass(IntWritable.class);
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-33
Running
The
Job
(2)
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-34
Reprise:
Driver
Code
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-35
Chapter
Topics
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-36
WordCount
Mapper
Review
Input
Data
(HDFS
le)
the cat sat on the mat key
value
the aardvark sat on the sofa
Mapper
the 1
cat 1
sat 1
map()
on 1
the 1
Record
Reader
mat 1
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-37
The
Mapper:
Complete
Code
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-38
The
Mapper:
import
Statements
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
brevity.
String line
= value.toString();
for (String word : line.split("\\W+")) {
if (word.length() > 0) {
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-39
The
Mapper:
Main
Code
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-40
The
Mapper:
Class
DeclaraFon
(1)
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-41
The
Mapper:
Class
DeclaraFon
(2)
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-42
The
map
Method
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-43
The
map
Method:
Processing
The
Line
(1)
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-44
The
map
Method:
Processing
The
Line
(2)
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-45
The
map
Method:
Outpubng
Intermediate
Data
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-46
Reprise:
The
map
Method
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-47
Chapter
Topics
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-48
WordCount
Review:
SumReducer
SumReducer
reduce()
key
value
list
on 2
on 1,1
reduce()
sat 2
sat 1,1 sofa 1
sofa 1 reduce()
the 6
the 1,1,1,1,1,1
reduce()
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-49
The
Reducer:
Complete
Code
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int wordCount = 0;
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-50
The
Reducer:
Import
Statements
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
intimport
wordCount statements
= 0; in
future
slides
for
brevity.
for (IntWritable value : values) {
wordCount += value.get();
}
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-51
The
Reducer:
Main
Code
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int wordCount = 0;
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-52
The
Reducer:
Class
DeclaraFon
int wordCount = 0;
Reducer
classes
extend
the
Reducer
base
class.
The
four
for (IntWritable value : values) {
generic
wordCount +=type
parameters
are:
input
(intermediate)
key
and
value.get();
}
value
types,
and
nal
output
key
and
value
types.
context.write(key, new IntWritable(wordCount));
}
}
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-53
The
reduce
Method
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int wordCount = 0;
The
reduce
method
receives
a
key
and
an
Iterable
for (IntWritable value : values) {
wordCountcollecFon
of
objects
(which
are
the
values
emi>ed
from
the
+= value.get();
}
Mappers
for
that
key);
it
also
receives
a
Context
object.
context.write(key, new IntWritable(wordCount));
}
}
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-54
The
reduce
Method:
Processing
The
Values
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int wordCount = 0;
}
We
use
the
Java
for-each
syntax
to
step
through
all
the
elements
context.write(key, new IntWritable(wordCount));
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-55
The
reduce
Method:
WriFng
The
Final
Output
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int wordCount = 0;
Finally,
we
write
the
output
key-value
pair
to
HDFS
using
for (IntWritable value : values) {
the
write
wordCount method
of
our
Context
object.
+= value.get();
}
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-56
Reprise:
The
Reducer
Main
Code
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int wordCount = 0;
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-57
Ensure
Types
Match
(1)
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-58
Ensure
Types
Match
(2)
Reducer
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-59
Chapter
Topics
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-60
Integrated
Development
Environments
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-61
Chapter
Topics
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-62
What
Is
The
Old
API?
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-63
New
API
vs.
Old
API:
Some
Key
Dierences
(1)
Mapper: Mapper:
public class MyMapper extends Mapper { public class MyMapper extends MapReduceBase
implements Mapper {
public void map(Keytype k, Valuetype v,
Context c) { public void map(Keytype k, Valuetype v,
... OutputCollector o, Reporter r) {
c.write(key, val); ...
} o.collect(key, val);
} }
}
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-64
New
API
vs.
Old
API:
Some
Key
Dierences
(2)
public class MyReducer extends Reducer { public class MyReducer extends MapReduceBase
implements Reducer {
public void reduce(Keytype k,
Iterable<Valuetype> v, Context c) { public void reduce(Keytype k,
for(Valuetype eachval : v) { Iterator<Valuetype> v,
// process eachval OutputCollector o, Reporter r) {
c.write(key, val); while(v.hasnext()) {
} // process v.next()
} o.collect(key, val);
} }
}
}
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-65
MRv1
vs
MRv2,
Old
API
vs
New
API
There
is
a
lot
of
confusion
about
the
New
and
Old
APIs,
and
MapReduce
version
1
and
MapReduce
version
2
The
chart
below
should
clarify
what
is
available
with
each
version
of
MapReduce
MapReduce v1
MapReduce v2
Summary:
Code
using
either
the
Old
API
or
the
New
API
will
run
under
MRv1
and
MRv2
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-66
Key
Points
InputFormat
Parses
input
les
into
key/value
pairs
WritableComparable,
Writable
classes
Box
or
wrapper
classes
to
pass
keys
and
values
Driver
Sets
InputFormat
and
input
and
output
types
Species
classes
for
the
Mapper
and
Reducer
Mapper
map()
method
takes
a
key/value
pair
Call
Context.write()
to
output
intermediate
key/value
pairs
Reducer
reduce()
method
takes
a
key
and
iterable
list
of
values
Call
Context.write()
to
output
nal
key/value
pairs
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-67
Bibliography
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-68
WriFng
a
MapReduce
Program
Using
Streaming
Chapter
3.2
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-69
Chapter
Topics
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-70
The
Streaming
API:
MoFvaFon
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-71
The
Streaming
API:
Advantages
and
Disadvantages
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-72
How
Streaming
Works
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-73
Streaming:
Example
Mapper
#!/usr/bin/env perl
while (<>) { # Read lines from stdin
chomp; # Get rid of the trailing newline
(@words) = split /\W+/; # Create an array of words
foreach $w (@words) { # Loop through the array
print "$w\t1\n"; # Print out the key and value
}
}
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-74
Streaming
Reducers:
CauFon
Recall
that
in
Java,
all
the
values
associated
with
a
key
are
passed
to
the
Reducer
as
an
Iterable
Using
Hadoop
Streaming,
the
Reducer
receives
its
input
as
one
key/value
pair
per
line
Your
code
will
have
to
keep
track
of
the
key
so
that
it
can
detect
when
values
from
a
new
key
start
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-75
Streaming:
Example
Reducer
#!/usr/bin/env perl
$sum = 0;
$last = "";
while (<>) { # read lines from stdin
($key, $value) = split /\t/; # obtain the key and value
$last = $key if $last eq ""; # first time through
if ($last ne $key) { # has key has changed?
print "$last\t$sum\n"; # if so output last key/value
$last = $key; # start with the new key
$sum = 0; # reset sum for the new key
}
$sum += $value; # add value to tally sum for key
}
print "$key\t$sum\n"; # print the final pair
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-76
Launching
a
Streaming
Job
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-77
Key
Points
The
Hadoop
Streaming
API
allows
you
to
write
Mappers
and
Reducers
in
any
language
Data
is
read
from
stdin
and
wri>en
to
stdout
Other
Hadoop
components
(InputFormats,
ParDDoners,
etc.)
sDll
require
Java
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-78
Bibliography
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-79
Unit
TesFng
MapReduce
Programs
Chapter
3.3
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-80
Unit
TesFng
MapReduce
Programs
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-81
Chapter
Topics
Unit
TesDng
The
JUnit
and
MRUnit
TesFng
Frameworks
WriFng
Unit
Tests
with
MRUnit
Running
Unit
Tests
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-82
An
IntroducFon
to
Unit
TesFng
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-83
Why
Write
Unit
Tests?
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-84
Chapter
Topics
Unit
TesFng
The
JUnit
and
MRUnit
tesDng
frameworks
WriFng
Unit
Tests
with
MRUnit
Running
Unit
Tests
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-85
Why
MRUnit?
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-86
JUnit
Basics
(1)
@Test
Java
annotaFon
Indicates
that
this
method
is
a
test
which
JUnit
should
execute
@Before
Java
annotaFon
Tells
JUnit
to
call
this
method
before
every
@Test
method
Two
@Test
methods
would
result
in
the
@Before
method
being
called
twice
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-87
JUnit
Basics
(2)
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-88
JUnit:
Example
Code
import org.junit.Before;
import org.junit.Test;
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-89
Chapter
Topics
Unit
TesFng
The
JUnit
and
MRUnit
TesFng
Frameworks
WriDng
unit
tests
with
MRUnit
Running
Unit
Tests
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-90
Using
MRUnit
to
Test
MapReduce
Code
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-91
MRUnit:
Example
Code
Mapper
Unit
Test
(1)
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.junit.Before;
import org.junit.Test;
@Before
public void setUp() {
WordMapper mapper = new WordMapper();
mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>();
mapDriver.setMapper(mapper);
}
@Test
public void testMapper() {
mapDriver.withInput(new LongWritable(1), new Text("cat dog"));
mapDriver.withOutput(new Text("cat"), new IntWritable(1));
mapDriver.withOutput(new Text("dog"), new IntWritable(1));
mapDriver.runTest();
}
}
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-92
MRUnit:
Example
Code
Mapper
Unit
Test
(2)
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.junit.Before;
import org.junit.Test;
publicclass
void asetUp()
s
we
will
{ be
wriFng
a
unit
test
for
our
Mapper.
@Before
@Test
public void testMapper() {
mapDriver.withInput(new LongWritable(1), new Text("cat dog"));
mapDriver.withOutput(new Text("cat"), new IntWritable(1));
mapDriver.withOutput(new Text("dog"), new IntWritable(1));
mapDriver.runTest();
}
}
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-93
MRUnit:
Example
Code
Mapper
Unit
Test
(3)
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.junit.Before;
import org.junit.Test;
@Before
publicMapDriver
is
an
MRUnit
class
(not
a
user-dened
driver).
void setUp() {
WordMapper mapper = new WordMapper();
mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>();
mapDriver.setMapper(mapper);
}
@Test
public void testMapper() {
mapDriver.withInput(new LongWritable(1), new Text("cat dog"));
mapDriver.withOutput(new Text("cat"), new IntWritable(1));
mapDriver.withOutput(new Text("dog"), new IntWritable(1));
mapDriver.runTest();
}
}
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-94
MRUnit:
Example
Code
Mapper
Unit
Test
(4)
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.junit.Before;
import org.junit.Test;
@Before
public void setUp() {
WordMapper mapper = new WordMapper();
mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>();
mapDriver.setMapper(mapper);
}
@Test
publicSet
void up
testMapper()
the
test.
This
{ method
will
be
called
before
every
test,
mapDriver.withInput(new LongWritable(1), new Text("cat dog"));
just
as
with
JUnit.
Text("cat"), new IntWritable(1));
mapDriver.withOutput(new
mapDriver.withOutput(new Text("dog"), new IntWritable(1));
mapDriver.runTest();
}
}
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-95
MRUnit:
Example
Code
Mapper
Unit
Test
(5)
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.junit.Before;
import org.junit.Test;
@Before
The
setUp()
public void test
itself.
{ Note
that
the
order
in
which
the
output
is
WordMapper mapper = new WordMapper();
specied
mapDriver is
important
it
must
mText,
= new MapDriver<LongWritable, atch
Text,
the
order
in
which
IntWritable>();
mapDriver.setMapper(mapper);
} the
output
will
be
created
by
the
Mapper.
@Test
public void testMapper() {
mapDriver.withInput(new LongWritable(1), new Text("cat dog"));
mapDriver.withOutput(new Text("cat"), new IntWritable(1));
mapDriver.withOutput(new Text("dog"), new IntWritable(1));
mapDriver.runTest();
}
}
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-96
MRUnit
Drivers
(1)
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-97
MRUnit
Drivers
(2)
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-98
Chapter
Topics
Unit
TesFng
The
JUnit
and
MRUnit
TesFng
Frameworks
WriFng
Unit
Tests
with
MRUnit
Running
Unit
Tests
Hands-On
Exercise:
WriFng
Unit
Tests
with
the
MRUnit
Framework
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-99
Running
Unit
Tests
From
Eclipse
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-100
Compiling
and
Running
Unit
Tests
From
the
Command
Line
OK (3 tests)
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-101
Key
Points
Copyright
2010-2014
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri>en
consent.
3-102
Bibliography
Copyright 2010-2014 Cloudera. All rights reserved. Not to be reproduced without prior wri>en consent. 3-103