Beruflich Dokumente
Kultur Dokumente
4.2.2 Velocity
Velocitty is classicallyy seen as a Bigg Data characteeristic since
system s dealing with high volumes oof data are assoociated with
high veelocity of data processing [5, 20]. Velocity iss also often
understtood to refer to high velocity off data acquisitionn but not all
data haas velocity in thhis sense. We uunderstand veloccity as a Big
Data chharacteristic in tthe accepted sennse of data that is generated
and/or processed at hhigh velocity annd we understannd the data
charactteristic of velociity in Big Data aas support for daata which is
generatted and processeed at high velocity.
4.2.3 Variety
Varietyy is sometimes used as a shorrthand to indicaate that Big
Data suupports unstruc tured and semi-structured dataa as against
(relatioonal) structured ddata [4]. As disccussed in 3.2, wee argue that
Big Daata may includde structured annd relational datta and that
differennt data formats may exist withhin the same Biig Data set.
Figure 1: The components
c of a Big Data Defin
nition We unnderstand the daata characteristicc of variety in B Big Data as
AAs discussed in n section 2, Biig Data cannot be defined lik ke supportt for variety in ddata formats.
relational data with
w reference to o a single unify ying underpinnin ng
thheory. The da ata characteristtics element of o the definitio on 4.3 A
Architectures and Proccessing
ddescribes the datta component off Big Data. Data a Characteristiccs, Definittions of Big Datta which includee architectural eelements are
aas discussed in sections 3.2 and 3.4 do not by th hemselves provid de typicallly at a lower level of abstraaction than attrribute based
a sufficient definnition of Big Datta since data attributes may not beb definitiions and may sppecify physical eelements such aas horizontal
uunique to Big DataD and underrstanding of datta attributes maay scalingg and parallel prrocessing [31] annd even specificc algorithms
eevolve over timee. Technical ad dvances mean th hat ‘Volume’, fo or such as Hadoop MapReduce. Specifying a particular
eexample, is a con nstantly evolvingg concept. We argue
a that the data implemmentation strateggy becomes a snaapshot of the staate of the art
ccharacteristics off Big Data cannot be understood d separately fromm in Big Data rather thaan a definition. T The MapReducee algorithm,
thhe Big Data env vironment which h includes the ap pplication contexxt for ex ample, is stronngly associatedd with Big Daata but has
aand for this reaason include Arrchitectures and d processing an nd acknow wledged limitatiions [34] and iss likely to be aamended or
A B Data as parrt of our definittion. This refleccts
Applications of Big replaceed over time; varriants of Hadoopp MapReduce suuch as Spark
thhe fact that the elements
e of Big Data are interdeependent and neeed have beeen discussed foor a number of yyears [35]. As aarchitectures
too be considered d together rather than in isolationn. The volume of o and proocessing evolvee to meet new challenges, impplementation
BBig Data, for ex xample, determiines the processsing requiremen nts based ddefinitions becom me out of date. A high level off abstraction
bbut the processinng determines ho ow data is supplied to application ns is needded in this elemennt.
wwhich in turn drive the demand forf data volume. We adoopt a data driveen approach to sspecifying the ‘‘how’ of the
WWe next describee each of the com mponents of thee definition of Biig Big Daata environmentt. This recognisees that Big Datta cannot be
DData in terms of data characterisstics, architecturees and processin
ng understtood in isolatiion from the Big Data impplementation
aand the applicatiions of Big Dataa and then provide a definition ofo environnment and view ws the architectuural and processing element
BBig Data as drivven by the data requirements raather than as chharacteristics
which ddefine Big Data. In this approacch to definition, Hadoop, for
44.2 Data
a Characteriistics of Big Data examplle, is a techhnology requireed to supportt the data
FFor the data cha aracteristics elemment of the defiinition, the ‘whaat’ charactteristics of Big DData rather than a technology whhich defines
oof Big Data, we adopt the tradiitional 3 ‘V’s ap pproach (Volum me, Big Daata. This approacch means that thee understanding of Big Data
VVariety, Velocity y) and give a deefinition of thesee elements beloww. is not ttied to a particullar implementatiion technology, recognising
WWe do not includ de any of the pro oposed extension ns to the 3 ‘V’s in that tecchnologies tend tto converge.
thhe data characteeristic element ass we regard charracteristics such as We undderstand Big Daata architecturess and Big Data pprocessing as
vvalue, for exammple, as data qu uality characterristics rather thaan scalablee architectures which support tthe processing rrequirements
eelements uniquelly descriptive off Big Data. of dataa which has highh volume and wwhich may havee a variety of
data foormats and may include high vvelocity data acqquisition and
44.2.1 Volumee processsing.
VVolume is widelly accepted as a key characterristic of Big Datta.
HHowever, it is also
a accepted th hat volume cann not be defined in 4.4 A
Applications of Big Datta
teerms of a fixed quantity of dataa [20] and for th
his reason, volum
me The appplications of Biig Data, form thhe ‘why’ of Biig Data, the
iss usually defiined in the liteerature in comp parative terms, as third ellement in our ddefinition of Bigg Data. Most deefinitions of
vvolumes too laarge to be han ndled by tradittional processin ng Big D Data focus on data characteeristics only oor on data
ccapabilities [16] or in terms ofo the architectu ure or processin ng charactteristics and aarchitectures annd processing although a
required [31]. Ass discussed in 4.1,
4 the data charracteristics of Big
B discusssion of Big Daata analytics foorms part of a number of
DData cannot bee understood separately from m the Big Daata definitiions of Big Datta [7, 20]. We iinclude applicattions of Big
eenvironment and d application con
ntext. We undersstand ‘Volume’ as Data in our definiition because the data chaaracteristics,
architecctures, processses and uses of Big Datta interact.
Architecture and processing is driven by the way in which Big Government (CSIBIG) 1-5 http://doi.org/
Data is used as well as by the characteristics of Big Data. Big 10.1109/CSIBIG.2014.7056930
Data applications are not limited to analytics and include a range [4] Khan,M,Uddin, M. & Gupta N. (2014) Seven V’s of Big
of use cases such as Big Data OLTP [32] and applications which Data; Understanding Big Data to extract value. Proceedings
are better described as reporting than as an analysis applications. of the 2014 Zone 1 Conference of the American Society for
The use cases of Big Data are best seen as a spectrum ranging Engineering Education “Engineering Education: Industry
from, at the reporting end, Big Data OLTP [32] to analytics Involvement and Interdisciplinary Trends” ASEE Zone 1
which are closer to artifical intelligence at the other end of the 2014 http://doi.org/10.1109/ASEEZone1.2014.6820689
spectrum [33]. For this reason we describe the use cases of Big
Data as the Applications of Big Data rather than as Big Data [5] Bedi, P., Jindal, V., & Gautam, A. (2014) Beginning with big
Analytics. data simplified. 2014 International Conference on Data
Mining and Intelligence Computing (ICDMIC) 1-7,
We understand Big Data applications as a family of applications http://doi.org/10.1109/ICDMIC.2014.6954229
which operates upon Big Data and which includes, but is not
limited to, Big Data analytics. [6] Demchenko, Y., Grosso, P., De Laat, C., & Membrey, P.
(2013). Addressing big data issues in Scientific Data
4.5 DEFINING BIG DATA Infrastructure. In 2013 International Conference on
Based on the existing definitions of Big Data reviewed in this Collaboration Technologies and Systems (CTS) (pp. 48–55).
paper, we understand Big Data as: http://doi.org/10.1109/CTS.2013.6567203
The term Big Data describes a data environment in which scalable [7] Demchenko, Y., Gruengard, E. & Klous, S., 2014.
architectures support the requirements of analytical and other Instructional Model for Building Effective Big Data
applications which process, with high velocity, high volume data Curricula for Online and Campus Education. 2014 IEEE 6th
which may have a variety of data formats and which may include International Conference on Cloud Computing Technology
high velocity data acquisition. and Science, pp.935–941. Available at:
http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnum
5. CONCLUSIONS AND FUTURE WORK ber=7037787
We identified as one of the limitations of some existing Big Data [8] Marr, B., 2014. Big Data: The 5 Vs Everyone Must Know.
definitions that the focus on data characteristics meant that the LinkedIn Pulse.
importance of the Big Data environment was not recognised. We https://www.linkedin.com/pulse/20140306073407-
propose an understanding of Big Data which recognises the role 64875646-big-data-the-5-vs-everyone-must-know
of the Big Data environment but provides a high level view which [9] Press, G., 2014. 12 Big Data Definitions: What’s Yours?
is not dependent upon current implementation technologies. We Forbes. http://www.forbes.com/sites/gilpress/2014/09/03/12-
include in our discussion the applications of Big Data to big-data-definitions-whats-yours/#611ecbcc21a9
emphasise the importance of the use cases of Big Data but do not
[10] Hu, H., Wen, Y., Chua, T.-S., & Li, X. (2014). Toward
link this to any specific implementation technology or use case,
Scalable Systems for Big Data Analytics: A Technology
recognising that the uses of Big Data may change over time. We
Tutorial. IEEE Access, 2, 652–687.
base our definition on the traditional 3 Vs approach but extend
http://doi.org/10.1109/ACCESS.2014.2332453
this approach to recognise that while Big Data has the potential to
support velocity and variety, not all Big Data applications will [11] Durham,E.E., Rosen, A. & Harrison R.W. (2014) . A model
require the characteristics of velocity and variety. This is architecture for Big Data applications using relational
particularly significant given that structured data is increasingly databases 2014 International Conference on Big Data 9-16.
being used in Big Data applications. We exclude from our http://doi.org/10.1109/BigData.2014.700
definition data characteristics which would form part of a Data [12] Navathe, S.B., (1992). Evolution of Data Modeling for
Quality Dimension and this recognises the distinction between the Databases. Communications ACM 35, 112–123.
essential characteristics of Big Data and characteristics which are doi:10.1145/130994.131001
targets or goals or markers for data quality. Future work, building
on the understanding of Big Data discussed in this paper, is to [13] Codd, E.F., 1970. A Relational Model of Data for Large
develop Data Quality Dimensions for Big Data to support our Shared Data Banks. Communications. ACM, 13(6), 377–387
work on Big Data quality. [14] Angles, R., & Gutierrez, C. (2008). Survey of Graph
Database Models. ACM Comput. Surv., 40(1), 1:1–1:39.
6. REFERENCES http://doi.org/10.1145/1322432.1322433
[1] Wand, Y & Wang R.Y. (1996) Anchoring Data Quality
[15] Gartner Research http://www.gartner.com/it-glossary/big-
Dimensions in Ontological Foundations Communications of
data/
the ACM 39, 86-95 http://doi.org/10.1145/240455.240479
[16] Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R.,
[2] Gupta, P., Tyagi, N., 2015. An approach towards big data; A
Roxburgh, C., Byers, A.H., (2011) Big data: The next
review, 2015 International Conference on Computing,
frontier for innovation, competition, and productivity.
Communication Automation (ICCCA). Presented at the 2015
McKinsey Global Institute
International Conference on Computing, Communication
http://www.mckinsey.com/business-functions/business-
Automation (ICCCA), 118–123.
technology/our-insights/big-data-the-next-frontier-for-
http://doi.org/10.1109/CCAA.2015.7148356
innovation
[3] Suresh, J. (2014) Bird’s Eye View on Big Data Management
[17] Jacobs, A. (2009). The Pathologies of Big Data. Queue, 7(6),
2014 Conference on IT in Business, Industry and
10. http://doi.org/10.1145/1563821.1563874Jacobs (2009)
[18] Gantz, B. J., & Reinsel, D. (2011). Extracting Value from Science Journal, 14(0), 2. http://doi.org/10.5334/dsj-2015-
Chaos State of the Universe : An Executive Summary. IDC 002
iView, (June), 1–12. Retrieved from [29] Strong, D.M., Lee, Y.W., Wang, R.Y., (1997). Data Quality
http://idcdocserv.com/1142Gantz & Reinsel, 2011 in Context. Communications ACM 40, 103–110.
[19] Chen,M., Mao,S. & Liu, Y.(2014) Big Data: A survey doi:10.1145/253769.253804
Mobile Networks and Applications,19,171-209. [30] Cooper,M., Mell, P. (2012) Tackling Big Data NIST
http://doi.org/10.1007/s11036-013-0489-0 Computer Security Resource Centre
[20] Gandomi, A. & Haider, M., 2015. Beyond the hype: Big data (fcsm_june2012_cooper_mell).
concepts, methods, and analytics. International Journal of [31] NIST Big Data Public Working Group, & Subgroup, T.
Information Management, 35(2), pp.137–144. Available at: (2015). NIST Special Publication XXX-XXX DRAFT NIST
http://www.sciencedirect.com/science/article/pii/S026840121 Big Data Interoperability Framework : Volume 1 ,
4001066. Definitions DRAFT NIST Big Data Interoperability
[21] IBM Big Data & Analytics Hub http://www.ibmbigdatahub Framework : Volume 1 , Definitions, 1.
[22] Saha, B., & Srivastava, D. (2014). Data quality: The other [32] Cetintemel, U. et al., 2014. S-Store: a streaming NewSQL
face of Big Data. Proceedings - International Conference on system for big velocity applications. Proceedings of the
Data Engineering, 1294–1297. VLDB Endowment, 7(13), pp.1633–1636. Available at:
http://doi.org/10.1109/ICDE.2014.6816764 http://dl.acm.org/citation.cfm?doid=2733004.2733048.
[23] Datafloq https://datafloq.com/read/3vs-sufficient-describe- [33] Moreno, A., & Redondo, T. (2016). Text Analytics: the
big-data/166 convergence of Big Data and Artificial Intelligence.
[24] Sagiroglu, S. & Sinanc, D., 2013. Big data: A review. 2013 International Journal of Interactive Multimedia and
International Conference on Collaboration Technologies and Artificial Intelligence, 3(6), 57.
Systems (CTS). pp. 42–47 http://doi.org/10.9781/ijimai.2016.369
[25] Xhafa, F., Naranjo, V., Barolli, L., & Takizawa, M. (2015). [34] Weets, J., Kakhani, M. K., & Kumar, A. (2015). Limitations
On Streaming Consistency of Big Data Stream Processing in and Challenges of HDFS and MapReduce. In 2015
Heterogenous Clutsers. 2015 18th International Conference International Conference on Green Computing and Internet
on Network-Based Information Systems, 476–482. of Things (ICGCIoT), 2015.
http://doi.org/10.1109/NBiS.2015.122 http://doi.org/10.1109/ICGCIoT.2015.7380524
[26] Batini, C., & Scannapieco, M. (2006). Data Quality: [35] Sakr, S., Liu, A., & Fayoumi, A. G. (2013). The Family of
Concepts, Methodologies and Techniques (Data-Centric MapReduce and Large-Scale Data Processing Systems. ACM
Systems and Applications). Secaucus, NJ, USA: Springer- Computing Surveys, 46(1), 1–44.
Verlag New York, Inc. http://doi.org/10.1145/2522968.2522979