Reg.no-1825209016 MCA 4th sem Content Introduction What is BIG DATA Examples of BIG DATA Characteristic of BIG DATA The structure of BIG DATA Why BIG DATA How it is different BIG DATA sources Tools used in BIG DATA Application of BIG DATA Risks of BIG DATA Benefits of BIG DATA How BIG DATA impact on IT Future of BIG DATA Conclusion Introduction BIG DATA is a term defined for data sets that are large or complex that traditional data processing applications are inadequate . BIG DATA basically consists of analysis zing , capturing the data , sharing , storage capacity , transfer, visualization and querying and information privacy . What is BIG DATA Big Data means really a big data, it is a collection of large datasets that cannot be processed using traditional computing techniques. Big data is not merely a data, rather it has become a complete subject, which involves various tools, techniques and frameworks. Examples of BIG DATA Ex- Face book, Flicker, YouTube, Twitter, Google, Google news. Face book reports 2.5 billion content items, 105 terabytes of data each half hour, 300M photos and 4M videos posted per day . In Twitter, Over 651 million users, generating over 6,000 tweets per second. 300 hours of video are uploaded to YouTube every minute with more than 1 trillion video views. Google make 50 billion pages indexed and more than 2.4 million queries in every minute . Google news Articles from over 10,000 sources in real time . More than 4.5 million photos uploaded in a day in Flicker. It is estimated that all the global data generated from the Characteristics of BIG DATA Volume The quantity of generated and stored data every second . Here we are talking about Zettabyte or more . It is the task of big data to convert such Hadoop data into valuable information . Data is generated by machine , networks and human interaction on systems like social media . The volume of data to be analyzed is massive . Velocity The speed of generation of data. Perhaps action being taken upon . The highest velocity data normally streams directly into memory versus being written to disk . Some Internet Of Things(IOT) requires real-time evaluation and action . E.g.-almost 2.5 million queries on Google are performed . Around 20 million photos are viewed . Every minute we upload 100 hours of video on Youtube . 300,000 tweets are sent . Every minute over 200 million Emails are sent . Variety BIG DATA is not just numbers , dates ,and strings . BIG DATA is also 3D data ,geospatial , audio and video and unstructured text , including log files and social media . Traditional database systems were designed to address smaller volumes of structured data , fewer updates or a predictable , consistent data structure . BIG DATA includes different types of data . Veracity It is the extended definition for BIG DATA , which refers to the data quality and data value . The data quality of captured data can vary greatly , affecting the accurate analysis . Data quality is unreliable . Data coming from uncontrolled environments . The Structure of big data Now days 8 vs 10 vs Why BIG DATA Growth of BIG DATA is needed . Increase of storage capacities . Increase of processing power . Availability of data (different data types) . Every day we create 2.5 quintillion bytes of data ; 90% of the data in the world today has been created in the last 2 years alone . How it is different Automatically generated by a machine (sensor embedded in an engine) Typically an entirely new source of data (use of the internet) Not designed to be friendly(text streams) May not have much values (need to focus on the important part) Tools used in BIG DATA Distributed servers / cloud (Amazon EC2)-processing is hosted . Distributed storage (Amazon S3)-data is stored . Distributed processing(MapReduce)-programming model . High-performance schema –free database(MongoDB)-data is stored and indexed . Analytic /semantic processing-operations are performed on data . Continue… Hadoop- It is a free , JAVA –based programming framework that supports the processing large data sets in a distributed computing environment . Facebook , LinkedIn , Twitter ,eBay use Hadoop . Hbase- A scalable , distributed database that supports structured data storage for large tables . Hive- A data ware house infrastucture that provides data summerization and ad hoc querying . Application of BIG DATA Benefits of BIG DATA Real-time big data is not just a process for storing petabytes or exabytes of data in a data warehouse, it is about the ability to make better decision and take meaningful actions at the right time . Fast forward to the present and technologies like Hadoop give you the scale and flexibility to store data before you know how you are going to process it . Technologies such as MapReduce , Hive and impala enable you to run queries without changing the data structures underneath . Continue…. Our newest research finds that organizations are using big data to target customer-centric outcomes, tap into internal data and build a better information ecosystem . BIG DATA is already an important part of the $64 billion database and data analytics market . It offers commercial opportunities of a comparable scale to enterprise software in the late 1980s . And the internet boom of the 1990s , and the social media explosion of today . How BIG DATA impacts on IT BIG DATA is a troublesome force presenting opportunities with challenge to IT organization . By 2015 4.4 million IT jobs in BIG DATA ; 1.9 million is in US itself . India will require a minimum of 1 lakh data scientists in the next couple of years in addition to data analysts and data managers to support the BIG DATA space . Future of BIG DATA $15 billion on software firms only specializing in data management and analytics . This industry on its own is worth more than $100 billion and growing at almost 10% a year which is roughly twice as fast as the software business as a whole . In February , 2012 the open source analyst firm Wikibon released the first market forecast for BIG DATA , listing $5.1 billion revenue in 2012 with growth to $53.4 billion in 2017 . The McKinsey Global Institute estimates that data volume is growing 40%per year , and will grow 44x between 2009 and 2020 . Conclusion BIG DATA is now a reality with a huge profit potential . Tools and Technologies are available through Open-Sourse . Each one of us can benefit from working with BIG DATA (Dynamic)in its pure form or in its traditional form (static) . THANK YOU