Beruflich Dokumente
Kultur Dokumente
Evangelos Vazaios
vagvaz at gmail.com
Technical University of Crete
What is Hadoop?
Flat Scalability
MapReduce
List Processing
Hadoop MapReduce
word_max = findWordWithMaxCount(dict)
emit(char,word_max)
Writable Example
class complexNum implements Writable{
private float real;
private float im;
public void write(DataOutput out) throws IOException {
out.writeFloat(real);
out.writeFloat(im); }
public void readFields(DataInput in) throws IOException {
real = in.readFloat();
im = in.readFloat();}
Advance Features
Combiner: this class runs after the mapper and before the reducer.
Combiner is a minireducer which operates only on data generated by
one machine
Input & Output Format describes the input/output specs of the job
Input Split represents the data to be processed by an individual
Mapper
RecordReader converts byteoriented view to <key,value>
Chaining jobs: Not every problem can be solved with a MapReduce
program. Many problems can be solved by writing several
MapReduce steps which run in sequential
<Stage 1><Stage 2><Stage 3
MapReduce DataFlow
Zoom In
Reliability
Scalability
Streaming Reads
More on HDFS
Metadata handled
by NameNode
io.sort.mb
memory size for sorting map outputs
io.sort.factor
how many merge streams for sorting
(each thread has io.sort.mb/ io.sort.factor memory)
fs.inmemorysize.mb
size of reduce-side buffer for sorting & merging multi-map output
before spilling to disk
Useful Tools
Distributed
Basic API:
How to Hadoop...
Independent computations