Beruflich Dokumente
Kultur Dokumente
Volume
No more GBs of data
TB,PB,EB,ZB
Velocity
High frequency data like in stocks
Variety
Structure and Unstructured data
Complex
No proper understanding of the underlying data
Storage
How to accommodate large amount of data in single
physical machine
Performance
How to process large amount of data efficiently and
effectively so as to increase the performance
12 June 2017 | Proprietary and confidential information. © Mphasis 2014
6 12 June 2017 | Proprietary and confidential information. © Mphasis 2017
data base
• Network
• Data
10 10MB 1+1 = 2
10 100MB 10+10=20
10 1000MB = 1GB 100+100 = 200 (~3.33min)
10 1000GB=1TB 100000+100000=~55hour
Data is moved back and forth over the low latency network where application is running
Efficiency and performance of any application is determined by how fast data can be
read
Storage is problem
Cannot store large amount of data
Upgrading the hard disk will also not solve the problem
(Hardware limitation)
Performance degradation
Upgrading RAM will not solve the problem (Hardware limitation)
Reading
Larger data requires larger time to read
Distributed Framework
Should not shut down entire system if few machines are down
Should result into graceful degradation of performance
Recoverability
Data Avaliability
Consistency
If machines/components are failing the outcome of the job should not be affected
Data Reliability
Upgrading
Adding more machines should not require full restart of the system
Should be able to add to existing system gracefully and participate in job
execution
Storage
performance
IO Bandwidth
InWhat
built are the changes
intelligence in Traditional
to speed Applications
up the application
What is big data
Speculative execution
Challenges in Big Data
What makes Hadoop special
What
Fit are the mile
for different stones of hadoop
applications:
Web log processing
Page Indexing, page ranking
Complex event processing