Beruflich Dokumente
Kultur Dokumente
MapReduce - A MapReduce job usually splits the input data-set into independent chunks which
are processed by the map tasks in a completely parallel manner. The framework sorts the outputs
of the maps, which are then input to the reduce tasks. Typically both the input and the output of
the job are stored in a file-system.
BigTable - Bigtable is a compressed, high performance, and proprietary data storage system built
on Google File System, Chubby Lock Service, SSTable (log-structured storage like LevelDB)
and a few other Google technologies.
Twister Another programming model supports iterative MapReduce ( Faster data handling)
Dryad - Dryad was a research project at Microsoft Research for a general purpose runtime for
execution of data parallel applications.
Hadoop - Apache Hadoop - is an open-source software framework used for distributed storage
and processing of very large data sets using the MapReduce programming model. It consists of
computer clusters built from commodity hardware. All the modules in Hadoop are designed with
a fundamental assumption that hardware failures are common occurrences and should be
automatically handled by the framework.
Program Library
Many efforts have been made to design a VM image library to manage images used in academic
and commercial clouds. The basic cloud environments described in this chapter also include
many management features allowing convenient deployment and configuring of images (i.e.,
they support IaaS).
DPFS
This covers the support of file systems such as Google File System (MapReduce), HDFS
(Hadoop), and Cosmos (Dryad) with compute-data affinity optimized for data processing. It
could be possible to link DPFS to basic blob and drive-based architecture, but its simpler to use
DPFS as an application-centric storage model with compute-data affinity and blobs and drives as
the repository-centric view.