Beruflich Dokumente
Kultur Dokumente
I. INTRODUCTION
At this time the big data phenomenon becomes one of the
most interesting trends [1]. Big data has high volume,
velocity and variety (3v) characteristics. The velocity aspect
contributes greatly to the problem of increasing volume. Big Figure 1 The Wisdom Hierachy [3]
data processing is basically aimed to produce value in the So, what is expected of big data is the information assets that
form of knowledge or patterns (knowledge discovery) that are processed into knowledge that can generate value for the
can be used to support decision making. High data velocity organization.
processing requires real-time processing that requires
expensive methods and technologies. If we can reduce B. Characteristics
velocity to a certain level it will produce big data with
reduced volumes and batch processing, thus requiring Big data characteristic according to [4] is 3v, including
cheaper methods and technologies. The challenge is how to high volume, velocity and variety. In general, high size limits
reduce velocity without reducing the meaning of big data in the context of big data follow Moore's law [5]. However,
significantly. In addition, the data reduction results are the current characteristics of big data are depicted as in
expected to be closer to the final knowledge. Thus the Figure 2.
process of further analysis can be run with more
conventional methods and technologies, over a longer time
span. The velocity reduction of the data stream can be done
by sampling, filtering and custering techniques. However, for
clustering technique has its own challenge related to data
stream mining constraint, therefore discussed separately in
this paper.
This paper is a preliminary study in our research on the
reduction of big data streams. This paper discusses the
literature study of big data, starting from the definition and
its characteristics, the analytical methods used, as well as the
technology used today. This paper also discusses the method Figure 2 Big Data Characteristics
of data stream analysis especially data stream clustering. The Big data has high volume characteristics, from terabytes
discussion concludes with a study of the current big data to zettabytes. Consequently, it requires data storage and
research positions and the challenges ahead and the research processing capacity that can not be handled by conventional
that has been done by the author of the last 3 years. methods and technologies. Current applied methods and