Computational and Statistical Methods for Analysing Big Data with Applications

Ebook393 pages7 hours

Computational and Statistical Methods for Analysing Big Data with Applications

Name: Computational and Statistical Methods for Analysing Big Data with Applications
Author: Shen Liu
ISBN: 9780081006511

By Shen Liu, James Mcgree, Zongyuan Ge and Yang Xie

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Due to the scale and complexity of data sets currently being collected in areas such as health, transportation, environmental science, engineering, information technology, business and finance, modern quantitative analysts are seeking improved and appropriate computational and statistical methods to explore, model and draw inferences from big data. This book aims to introduce suitable approaches for such endeavours, providing applications and case studies for the purpose of demonstration.

Computational and Statistical Methods for Analysing Big Data with Applications starts with an overview of the era of big data. It then goes onto explain the computational and statistical methods which have been commonly applied in the big data revolution. For each of these methods, an example is provided as a guide to its application. Five case studies are presented next, focusing on computer vision with massive training data, spatial data analysis, advanced experimental design methods for big data, big data in clinical medicine, and analysing data collected from mobile devices, respectively. The book concludes with some final thoughts and suggested areas for future research in big data.

Advanced computational and statistical methodologies for analysing big data are developed
Experimental design methodologies are described and implemented to make the analysis of big data more computationally tractable
Case studies are discussed to demonstrate the implementation of the developed methods
Five high-impact areas of application are studied: computer vision, geosciences, commerce, healthcare and transportation
Computing code/programs are provided where appropriate

Skip carousel

Mathematics

LanguageEnglish

PublisherAcademic Press

Release dateNov 20, 2015

ISBN9780081006511

Author

Shen Liu

Related authors

Skip carousel

Related to Computational and Statistical Methods for Analysing Big Data with Applications

Related ebooks

Skip carousel

Computational Learning Approaches to Data Analytics in Biomedical Applications
Ebook
Computational Learning Approaches to Data Analytics in Biomedical Applications
byKhalid Al-Jabery
Rating: 5 out of 5 stars
5/5
Semantic Models in IoT and eHealth Applications
Ebook
Semantic Models in IoT and eHealth Applications
bySanju Tiwari
Rating: 0 out of 5 stars
0 ratings
Cybermetric Techniques to Evaluate Organizations Using Web-Based Data
Ebook
Cybermetric Techniques to Evaluate Organizations Using Web-Based Data
byEnrique Orduna-Malea
Rating: 0 out of 5 stars
0 ratings
Big Data Analytics for Sensor-Network Collected Intelligence
Ebook
Big Data Analytics for Sensor-Network Collected Intelligence
byHui-Huang Hsu
Rating: 5 out of 5 stars
5/5
Handbook of Data Science Approaches for Biomedical Engineering
Ebook
Handbook of Data Science Approaches for Biomedical Engineering
byValentina Emilia Balas
Rating: 0 out of 5 stars
0 ratings
Computational Intelligence and Pattern Analysis in Biology Informatics
Ebook
Computational Intelligence and Pattern Analysis in Biology Informatics
byUjjwal Maulik
Rating: 0 out of 5 stars
0 ratings
Practical Business Statistics
Ebook
Practical Business Statistics
byAndrew F. Siegel
Rating: 0 out of 5 stars
0 ratings
Cost Estimation: Methods and Tools
Ebook
Cost Estimation: Methods and Tools
byGregory K. Mislick
Rating: 5 out of 5 stars
5/5
Internet Usage in Sierra Leone
Ebook
Internet Usage in Sierra Leone
byDr. Kamara
Rating: 0 out of 5 stars
0 ratings
Medical Data Sharing, Harmonization and Analytics
Ebook
Medical Data Sharing, Harmonization and Analytics
byVasileios Pezoulas
Rating: 0 out of 5 stars
0 ratings
Analyzing the Large Number of Variables in Biomedical and Satellite Imagery
Ebook
Analyzing the Large Number of Variables in Biomedical and Satellite Imagery
byPhillip I. Good
Rating: 0 out of 5 stars
0 ratings
Collective Intelligence for Smart Cities
Ebook
Collective Intelligence for Smart Cities
byChun HO WU
Rating: 0 out of 5 stars
0 ratings
eMaintenance: Essential Electronic Tools for Efficiency
Ebook
eMaintenance: Essential Electronic Tools for Efficiency
byDiego Galar
Rating: 0 out of 5 stars
0 ratings
Statistical Process Monitoring Using Advanced Data-Driven and Deep Learning Approaches: Theory and Practical Applications
Ebook
Statistical Process Monitoring Using Advanced Data-Driven and Deep Learning Approaches: Theory and Practical Applications
byFouzi Harrou
Rating: 0 out of 5 stars
0 ratings
Big Data for Insurance Companies
Ebook
Big Data for Insurance Companies
byMarine Corlosquet-Habart
Rating: 0 out of 5 stars
0 ratings
Statistical Inference: A Short Course
Ebook
Statistical Inference: A Short Course
byMichael J. Panik
Rating: 4 out of 5 stars
4/5
Optimizing the Display and Interpretation of Data
Ebook
Optimizing the Display and Interpretation of Data
byRobert Warner
Rating: 0 out of 5 stars
0 ratings
Statistical Monitoring of Complex Multivatiate Processes: With Applications in Industrial Process Control
Ebook
Statistical Monitoring of Complex Multivatiate Processes: With Applications in Industrial Process Control
byUwe Kruger
Rating: 0 out of 5 stars
0 ratings
Anomaly Detection and Complex Event Processing Over IoT Data Streams: With Application to eHealth and Patient Data Monitoring
Ebook
Anomaly Detection and Complex Event Processing Over IoT Data Streams: With Application to eHealth and Patient Data Monitoring
byPatrick Schneider
Rating: 0 out of 5 stars
0 ratings
Differential Equation Analysis in Biomedical Science and Engineering: Ordinary Differential Equation Applications with R
Ebook
Differential Equation Analysis in Biomedical Science and Engineering: Ordinary Differential Equation Applications with R
byWilliam E. Schiesser
Rating: 0 out of 5 stars
0 ratings
Efficient Management of Large Metadata Catalogs in a Ubiquitous Computing Environment
Ebook
Efficient Management of Large Metadata Catalogs in a Ubiquitous Computing Environment
byDaniel Beatty
Rating: 0 out of 5 stars
0 ratings
Smart Home Technologies and Services for Geriatric Rehabilitation
Ebook
Smart Home Technologies and Services for Geriatric Rehabilitation
byMohamed-Amine Choukou
Rating: 0 out of 5 stars
0 ratings
Large-scale Distributed Systems and Energy Efficiency: A Holistic View
Ebook
Large-scale Distributed Systems and Energy Efficiency: A Holistic View
byJean-Marc Pierson
Rating: 0 out of 5 stars
0 ratings
AI Assurance: Towards Trustworthy, Explainable, Safe, and Ethical AI
Ebook
AI Assurance: Towards Trustworthy, Explainable, Safe, and Ethical AI
byFeras A. Batarseh
Rating: 5 out of 5 stars
5/5
Big Data and Smart Service Systems
Ebook
Big Data and Smart Service Systems
byXiwei Liu
Rating: 0 out of 5 stars
0 ratings
Computational Intelligence for Multimedia Big Data on the Cloud with Engineering Applications
Ebook
Computational Intelligence for Multimedia Big Data on the Cloud with Engineering Applications
byArun Kumar Sangaiah
Rating: 0 out of 5 stars
0 ratings
Statistics for Censored Environmental Data Using Minitab and R
Ebook
Statistics for Censored Environmental Data Using Minitab and R
byDennis R. Helsel
Rating: 0 out of 5 stars
0 ratings
Big Data Analytics for Intelligent Healthcare Management
Ebook
Big Data Analytics for Intelligent Healthcare Management
byElsevier Books Reference
Rating: 0 out of 5 stars
0 ratings
Multiple Imputation and its Application
Ebook
Multiple Imputation and its Application
byJames Carpenter
Rating: 0 out of 5 stars
0 ratings
Trends in Development of Medical Devices
Ebook
Trends in Development of Medical Devices
byPrakash Srinivasan Timiri Shanmugam
Rating: 0 out of 5 stars
0 ratings

Mathematics For You

Skip carousel

Quantum Physics for Beginners
Ebook
Quantum Physics for Beginners
byMax Thomson
Rating: 4 out of 5 stars
4/5
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
Ebook
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
byS. Deviant
Rating: 4 out of 5 stars
4/5
Geometry For Dummies
Ebook
Geometry For Dummies
byMark Ryan
Rating: 5 out of 5 stars
5/5
Statistics 101: From Data Analysis and Predictive Modeling to Measuring Distribution and Determining Probability, Your Essential Guide to Statistics
Ebook
Statistics 101: From Data Analysis and Predictive Modeling to Measuring Distribution and Determining Probability, Your Essential Guide to Statistics
byDavid Borman
Rating: 4 out of 5 stars
4/5
Algebra I Workbook For Dummies
Ebook
Algebra I Workbook For Dummies
byMary Jane Sterling
Rating: 3 out of 5 stars
3/5
Basic Math & Pre-Algebra For Dummies
Ebook
Basic Math & Pre-Algebra For Dummies
byMark Zegarelli
Rating: 4 out of 5 stars
4/5
The Everything Guide to Pre-Algebra: A Helpful Practice Guide Through the Pre-Algebra Basics - in Plain English!
Ebook
The Everything Guide to Pre-Algebra: A Helpful Practice Guide Through the Pre-Algebra Basics - in Plain English!
byJane Cassie
Rating: 5 out of 5 stars
5/5
Game Theory: A Simple Introduction
Ebook
Game Theory: A Simple Introduction
byK.H. Erickson
Rating: 4 out of 5 stars
4/5
Algebra - The Very Basics
Ebook
Algebra - The Very Basics
byMetin Bektas
Rating: 5 out of 5 stars
5/5
Practice Makes Perfect Algebra II Review and Workbook, Second Edition
Ebook
Practice Makes Perfect Algebra II Review and Workbook, Second Edition
byChristopher Monahan
Rating: 4 out of 5 stars
4/5
Build a Mathematical Mind - Even If You Think You Can't Have One: Become a Pattern Detective. Boost Your Critical and Logical Thinking Skills.
Ebook
Build a Mathematical Mind - Even If You Think You Can't Have One: Become a Pattern Detective. Boost Your Critical and Logical Thinking Skills.
byAlbert Rutherford
Rating: 5 out of 5 stars
5/5
Real Estate by the Numbers: A Complete Reference Guide to Deal Analysis
Ebook
Real Estate by the Numbers: A Complete Reference Guide to Deal Analysis
byJ Scott
Rating: 0 out of 5 stars
0 ratings
Precalculus: A Self-Teaching Guide
Ebook
Precalculus: A Self-Teaching Guide
bySteve Slavin
Rating: 4 out of 5 stars
4/5
The Little Book of Mathematical Principles, Theories & Things
Ebook
The Little Book of Mathematical Principles, Theories & Things
byRobert Solomon
Rating: 3 out of 5 stars
3/5
Mental Math Secrets - How To Be a Human Calculator
Ebook
Mental Math Secrets - How To Be a Human Calculator
byRandy Silverman
Rating: 5 out of 5 stars
5/5
Calculus Made Easy
Ebook
Calculus Made Easy
bySilvanus P. Thompson
Rating: 4 out of 5 stars
4/5
The Everything Guide to Algebra: A Step-by-Step Guide to the Basics of Algebra - in Plain English!
Ebook
The Everything Guide to Algebra: A Step-by-Step Guide to the Basics of Algebra - in Plain English!
byChristopher Monahan
Rating: 4 out of 5 stars
4/5
Painless Geometry
Ebook
Painless Geometry
byLynette Long
Rating: 4 out of 5 stars
4/5
Relativity: The special and the general theory
Ebook
Relativity: The special and the general theory
byAlbert Einstein
Rating: 5 out of 5 stars
5/5
Is God a Mathematician?
Ebook
Is God a Mathematician?
byMario Livio
Rating: 4 out of 5 stars
4/5
Mathematical Thinking - For People Who Hate Math: Level Up Your Analytical and Creative Thinking Skills. Excel at Problem-Solving and Decision-Making.
Ebook
Mathematical Thinking - For People Who Hate Math: Level Up Your Analytical and Creative Thinking Skills. Excel at Problem-Solving and Decision-Making.
byAlbert Rutherford
Rating: 3 out of 5 stars
3/5
The Golden Ratio: The Divine Beauty of Mathematics
Ebook
The Golden Ratio: The Divine Beauty of Mathematics
byGary B. Meisner
Rating: 5 out of 5 stars
5/5
ACT Math & Science Prep: Includes 500+ Practice Questions
Ebook
ACT Math & Science Prep: Includes 500+ Practice Questions
byKaplan Test Prep
Rating: 3 out of 5 stars
3/5
The Thirteen Books of the Elements, Vol. 1
Ebook
The Thirteen Books of the Elements, Vol. 1
byEuclid
Rating: 0 out of 5 stars
0 ratings
Summary of The Black Swan: by Nassim Nicholas Taleb | Includes Analysis
Ebook
Summary of The Black Swan: by Nassim Nicholas Taleb | Includes Analysis
byInstaread Summaries
Rating: 5 out of 5 stars
5/5
The Math of Life and Death: 7 Mathematical Principles That Shape Our Lives
Ebook
The Math of Life and Death: 7 Mathematical Principles That Shape Our Lives
byKit Yates
Rating: 4 out of 5 stars
4/5
The Elements of Euclid for the Use of Schools and Colleges (Illustrated)
Ebook
The Elements of Euclid for the Use of Schools and Colleges (Illustrated)
byISAAC TODHUNTER
Rating: 0 out of 5 stars
0 ratings
The Math Book: From Pythagoras to the 57th Dimension, 250 Milestones in the History of Mathematics
Ebook
The Math Book: From Pythagoras to the 57th Dimension, 250 Milestones in the History of Mathematics
byClifford A. Pickover
Rating: 3 out of 5 stars
3/5
My Best Mathematical and Logic Puzzles
Ebook
My Best Mathematical and Logic Puzzles
byMartin Gardner
Rating: 5 out of 5 stars
5/5
See Ya Later Calculator: Simple Math Tricks You Can Do in Your Head
Ebook
See Ya Later Calculator: Simple Math Tricks You Can Do in Your Head
byEditors of Portable Press
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

[Bite] Data Science and the Scientific Method
Podcast episode
[Bite] Data Science and the Scientific Method
byDataCafé
0 ratings
0% found this document useful
Valkyrie on Your Side: CEO Charlie Burgoyne Talks Role of Artificial Intelligence in Decision Making: Valkyrie has brought together some of the best minds to utilize artificial intelligence in operations management. This podcast presents a fascinating conversation on the latest ways . Plug in, listen, and hear Ways this applied-sciences firm uses...
Podcast episode
Valkyrie on Your Side: CEO Charlie Burgoyne Talks Role of Artificial Intelligence in Decision Making: Valkyrie has brought together some of the best minds to utilize artificial intelligence in operations management. This podcast presents a fascinating conversation on the latest ways . Plug in, listen, and hear Ways this applied-sciences firm uses...
byFinding Genius Podcast
0 ratings
0% found this document useful
37. Sean Knapp - The brave new world of data engineering
Podcast episode
37. Sean Knapp - The brave new world of data engineering
byTowards Data Science
0 ratings
0% found this document useful
#2414 Cyvl.ai / HeAR / Climate models / Japan Trauma Data Bank
Podcast episode
#2414 Cyvl.ai / HeAR / Climate models / Japan Trauma Data Bank
byAI News
0 ratings
0% found this document useful
Reimbursement for digital pathology in the clinic – how does that work? w/ Esther Abels, Visiopharm
Podcast episode
Reimbursement for digital pathology in the clinic – how does that work? w/ Esther Abels, Visiopharm
byDigital Pathology Podcast
0 ratings
0% found this document useful
Elevate Your Research with Spatial Insights
Podcast episode
Elevate Your Research with Spatial Insights
byListen In - Bitesize Bio Webinar Audios
0 ratings
0% found this document useful
Journal Review in Surgical Education: The OR Black Box
Podcast episode
Journal Review in Surgical Education: The OR Black Box
byBehind The Knife: The Surgery Podcast
0 ratings
0% found this document useful
The Whole Person: Seqster's System for 360-Degree Healthcare Data: Founder and CEO Ardy Arianpour explains how the company integrates multiple data sources regarding health care into one system. He discusses How they integrate information, medical records, and wearable devises, How this becomes a longitudinal...
Podcast episode
The Whole Person: Seqster's System for 360-Degree Healthcare Data: Founder and CEO Ardy Arianpour explains how the company integrates multiple data sources regarding health care into one system. He discusses How they integrate information, medical records, and wearable devises, How this becomes a longitudinal...
byFinding Genius Podcast
0 ratings
0% found this document useful
How Aira Matrix supports digital pathology with deep learning on demand w/ Chaith Kondragunta
Podcast episode
How Aira Matrix supports digital pathology with deep learning on demand w/ Chaith Kondragunta
byDigital Pathology Podcast
0 ratings
0% found this document useful
AI-powered digital diagnostic tools for medical, veterinary and environmental laboratories. How Techcyte uses AI for digital cytology and smears w/ Ben Cahoon, Techcyte
Podcast episode
AI-powered digital diagnostic tools for medical, veterinary and environmental laboratories. How Techcyte uses AI for digital cytology and smears w/ Ben Cahoon, Techcyte
byDigital Pathology Podcast
0 ratings
0% found this document useful
#90 How Data Science is Transforming the Healthcare Industry
Podcast episode
#90 How Data Science is Transforming the Healthcare Industry
byDataFramed
0 ratings
0% found this document useful
Face/Off: Andy and Dave discuss the latest in AI news and research, including the Defense Innovation Unit releasing Responsible AI Guidelines in Practice, which seeks to ensure tech contractors adhere to the Department of Defense’s existing ethical principles...
Podcast episode
Face/Off: Andy and Dave discuss the latest in AI news and research, including the Defense Innovation Unit releasing Responsible AI Guidelines in Practice, which seeks to ensure tech contractors adhere to the Department of Defense’s existing ethical principles...
byAI with AI: Artificial Intelligence with Andy Ilachinski
0 ratings
0% found this document useful
083R_Operationalising a concept: The systematic review of composite indicator building for measuring community disaster resilience (research summary)
Podcast episode
083R_Operationalising a concept: The systematic review of composite indicator building for measuring community disaster resilience (research summary)
byWhat is The Future for Cities?
0 ratings
0% found this document useful
CTP 024: Using Medical Records to Pre-Qualify for Clinical Trials with Komathi Stem: Using Medical Records to Pre-Qualify for Clinical Trials with Komathi Stem The traditional model involves sponsors and CROs contracting with trial sites and hoping the sites will find and enroll eligible patients. Through her work at...
Podcast episode
CTP 024: Using Medical Records to Pre-Qualify for Clinical Trials with Komathi Stem: Using Medical Records to Pre-Qualify for Clinical Trials with Komathi Stem The traditional model involves sponsors and CROs contracting with trial sites and hoping the sites will find and enroll eligible patients. Through her work at...
byClinical Trial Podcast | Conversations with Clinical Research Experts
0 ratings
0% found this document useful
#338: Site Selection for Clinical Trials
Podcast episode
#338: Site Selection for Clinical Trials
byGlobal Medical Device Podcast powered by Greenlight Guru
0 ratings
0% found this document useful
Deep Learning - Pushing the boundaries of health AI. How do we make it fair and the data safe?: Over the last 5 years there has actually been a confluence of a few different historical threats. We’ve had health data being increasingly digitalised and we’ve had the proliferation of accessible massive scale computing, both of which have...
Podcast episode
Deep Learning - Pushing the boundaries of health AI. How do we make it fair and the data safe?: Over the last 5 years there has actually been a confluence of a few different historical threats. We’ve had health data being increasingly digitalised and we’ve had the proliferation of accessible massive scale computing, both of which have...
byCoda Change
0 ratings
0% found this document useful
Alignment Newsletter #168: Four technical topics for which Open Phil is soliciting grant proposals: Four technical topics for which Open Phil is soliciting grant proposals
Podcast episode
Alignment Newsletter #168: Four technical topics for which Open Phil is soliciting grant proposals: Four technical topics for which Open Phil is soliciting grant proposals
byAlignment Newsletter Podcast
0 ratings
0% found this document useful
Christine L. Borgman, “Big Data, Little Data, No Data: Scholarship in the Networked World” (MIT Press, 2015): Social media and digital technology now allow researchers to collect vast amounts of a variety data quickly. This so-called “big data,” and the practices that surround its collection, is all the rage in both the media and in research circles.
Podcast episode
Christine L. Borgman, “Big Data, Little Data, No Data: Scholarship in the Networked World” (MIT Press, 2015): Social media and digital technology now allow researchers to collect vast amounts of a variety data quickly. This so-called “big data,” and the practices that surround its collection, is all the rage in both the media and in research circles.
byNew Books in Education
0 ratings
0% found this document useful
#353: Fundamentals of AI in MedTech
Podcast episode
#353: Fundamentals of AI in MedTech
byGlobal Medical Device Podcast powered by Greenlight Guru
0 ratings
0% found this document useful
#57 The Credibility Crisis in Data Science
Podcast episode
#57 The Credibility Crisis in Data Science
byDataFramed
0 ratings
0% found this document useful
Enabling cloud transformation: Healthcare today is data driven with the proliferation of electronic medical records and digital tools. Leveraging the cloud is an effective approach for healthcare organizations to unlock the power of data. In this podcast episode, Logan Song,...
Podcast episode
Enabling cloud transformation: Healthcare today is data driven with the proliferation of electronic medical records and digital tools. Leveraging the cloud is an effective approach for healthcare organizations to unlock the power of data. In this podcast episode, Logan Song,...
byModern Healthcare’s Healthcare Insider Podcast
0 ratings
0% found this document useful
Setting the Standard: Impact of Method Standardization in Chromatography
Podcast episode
Setting the Standard: Impact of Method Standardization in Chromatography
byThe Analytical Wavelength
0 ratings
0% found this document useful
Counting Mitoses: SI(ze) matters!: In this episode, Dr. Ian Cree, Head of The WHO Tumour Classification discusses his team's recent open access publication in Modern Pathology. Historically, mitotic figures counting has been done by expressing the number of mitoses per n...
Podcast episode
Counting Mitoses: SI(ze) matters!: In this episode, Dr. Ian Cree, Head of The WHO Tumour Classification discusses his team's recent open access publication in Modern Pathology. Historically, mitotic figures counting has been done by expressing the number of mitoses per n...
byModPath Chat
0 ratings
0% found this document useful
Chasing AIMe: Andy and Dave discuss the latest in AI news and research, including: [1:28] Researchers from several universities in biomedicine establish the AIMe registry, a community-driven reporting platform for providing information and standards of AI research...
Podcast episode
Chasing AIMe: Andy and Dave discuss the latest in AI news and research, including: [1:28] Researchers from several universities in biomedicine establish the AIMe registry, a community-driven reporting platform for providing information and standards of AI research...
byAI with AI: Artificial Intelligence with Andy Ilachinski
0 ratings
0% found this document useful
Privacy-aware Data Pipelines with Skyflow’s Piper Keyes: A data analytics pipeline is important to modern businesses because it allows them to extract valuable insights from the large amounts of data they generate and collect on a daily basis. This leads to better decision making, improved efficiency, and ...
Podcast episode
Privacy-aware Data Pipelines with Skyflow’s Piper Keyes: A data analytics pipeline is important to modern businesses because it allows them to extract valuable insights from the large amounts of data they generate and collect on a daily basis. This leads to better decision making, improved efficiency, and ...
byPartially Redacted: Data Privacy, Security & Compliance
0 ratings
0% found this document useful
Optimising the Future
Podcast episode
Optimising the Future
byDataCafé
0 ratings
0% found this document useful
Beta-blockers for rosacea; conservative approach key for cosmetic laser results in skin of color; regulation of apps frequently used by residents: Dermatology News: Review finds evidence for beta-blockers for some rosacea symptoms () Source: 2019 review: Conservative parameters key to maximizing cosmetic laser results in skin of color () Source: J&J’s one-shot COVID-19 vaccine...
Podcast episode
Beta-blockers for rosacea; conservative approach key for cosmetic laser results in skin of color; regulation of apps frequently used by residents: Dermatology News: Review finds evidence for beta-blockers for some rosacea symptoms () Source: 2019 review: Conservative parameters key to maximizing cosmetic laser results in skin of color () Source: J&J’s one-shot COVID-19 vaccine...
byDermatology Weekly
0 ratings
0% found this document useful
Artificial intelligence in dermatology, plus scabies treatment and teledermatology: The official dermatology podcast of MDedge
Podcast episode
Artificial intelligence in dermatology, plus scabies treatment and teledermatology: The official dermatology podcast of MDedge
byDermatology Weekly
0 ratings
0% found this document useful
Postmarket Surveillance Studies with David Rutledge: In this episode, we’re going to talk about Postmarket Surveillance Studies. In light of the EU Medical Device Regulation (MDR), medical device manufacturers are required to implement a Postmarket Surveillance (PMS) plan, which in turn may...
Podcast episode
Postmarket Surveillance Studies with David Rutledge: In this episode, we’re going to talk about Postmarket Surveillance Studies. In light of the EU Medical Device Regulation (MDR), medical device manufacturers are required to implement a Postmarket Surveillance (PMS) plan, which in turn may...
byClinical Trial Podcast | Conversations with Clinical Research Experts
0 ratings
0% found this document useful
Introducing Decision Intelligence, with Steven Coates
Podcast episode
Introducing Decision Intelligence, with Steven Coates
byLondon Futurists
0 ratings
0% found this document useful

Skip carousel

Circuit Rhythm
Electronic Musician
Article
Circuit Rhythm
Jul 20, 2021
6 min read
The World’s Best Smart Hospitals 2024
Newsweek
Article
The World’s Best Smart Hospitals 2024
Sep 15, 2023
3 min read
The World’s Best Smart Hospitals 2024
Newsweek International
Article
The World’s Best Smart Hospitals 2024
Sep 15, 2023
3 min read
Putting Artificial Intelligence to Work
Rotman Management
Article
Putting Artificial Intelligence to Work
May 1, 2018
11 min read
THE WORLD’S BEST Smart Hospitals 2023
Newsweek International
Article
THE WORLD’S BEST Smart Hospitals 2023
Sep 16, 2022
3 min read
How Data Will Transform Our Lives
Money Magazine
Article
How Data Will Transform Our Lives
May 3, 2023
4 min read
Opinion: Digital Endpoints Library Can Aid Clinical Trials For New Medicines
STAT
Article
Opinion: Digital Endpoints Library Can Aid Clinical Trials For New Medicines
Nov 6, 2019
4 min read
Once An Insider’s Domain, Health IT Conference Embraces Consumer Tech Giants
STAT
Article
Once An Insider’s Domain, Health IT Conference Embraces Consumer Tech Giants
Feb 8, 2019
Health care’s digital transformation will take center stage at #HIMSS19, a gathering whose exponential growth is a metaphor for the change sweeping through one of America’s biggest economic sectors.
3 min read
The Role Of Big-Data In Healthcare Sector
Techfastly
Article
The Role Of Big-Data In Healthcare Sector
Aug 2, 2021
5 min read
Challenges and Impact of Cloud Technology in The Healthcare Industry
Techfastly
Article
Challenges and Impact of Cloud Technology in The Healthcare Industry
Aug 2, 2021
4 min read
The World’s Best smart Hospitals 2021
Newsweek
Article
The World’s Best smart Hospitals 2021
May 28, 2021
10 min read
Opinion: Two Words To Help Ned Sharpless Revolutionize Clinical Trials: Data Standards
STAT
Article
Opinion: Two Words To Help Ned Sharpless Revolutionize Clinical Trials: Data Standards
May 13, 2019
4 min read
Opinion: Electronic Health Records Are Still Waiting To Be Transformed
STAT
Article
Opinion: Electronic Health Records Are Still Waiting To Be Transformed
Apr 11, 2019
Electronic health records aren't yet a transformative tool to support clinical decision-making. Many physicians feel they have traded physical filing cabinets for digital ones.
4 min read
Cloud Computing in Health Care Industry
Techfastly
Article
Cloud Computing in Health Care Industry
Apr 1, 2021
The vast impact of digital transformation in the health care industry makes its future firm. What’s interesting in cloud computing with healthcare? Cloud computing changes the traditional way of dealing with data. Big data analytics with cloud comput
4 min read
How And Where You Use Machine-learning
APC
Article
How And Where You Use Machine-learning
Oct 7, 2019
4 min read
Blockchain Interoperability in Healthcare Industry
Techfastly
Article
Blockchain Interoperability in Healthcare Industry
Aug 2, 2021
6 min read
Emerging Technologies
Techfastly
Article
Emerging Technologies
Aug 2, 2021
3 min read
Free Flow Of Data: What The Corporate World Can Learn From Science
The European Business Review
Article
Free Flow Of Data: What The Corporate World Can Learn From Science
Jul 31, 2020
8 min read
Opinion: Blockchains and Health Care: Promising and Moving Quickly, Though No Silver Bullet
STAT
Article
Opinion: Blockchains and Health Care: Promising and Moving Quickly, Though No Silver Bullet
Dec 27, 2017
5 min read
Changing Dynamics of Healthcare Sector - Quantum Computers Taking A Leap
Techfastly
Article
Changing Dynamics of Healthcare Sector - Quantum Computers Taking A Leap
Oct 1, 2021
5 min read
Building A 'Just In Case' Supply Chain
Business Today
Article
Building A 'Just In Case' Supply Chain
Aug 19, 2021
Our Covid-19 vaccine shot is no less a supply-chain miracle than it is a medical one — moving billions of vials around the world, in the right ambient conditions, with complete track-and-trace, and the ability to match the ‘arm to the vial-code’. The
5 min read
Opinion: Sharing Clinical Trial Data: Lessons From The YODA Project
STAT
Article
Opinion: Sharing Clinical Trial Data: Lessons From The YODA Project
Nov 18, 2019
The culture of clinical research is changing, and there are now expectations that researchers will share data — even when it isn't required.
5 min read
Robot Doctors
Business Today
Article
Robot Doctors
Dec 11, 2017
4 min read
Opinion: Artificial Intelligence In Pharma, Health Care: At The Crossroads Of Hype And Reality
STAT
Article
Opinion: Artificial Intelligence In Pharma, Health Care: At The Crossroads Of Hype And Reality
Dec 6, 2018
Artificial intelligence is at the forefront of the minds of many pharmaceutical and health care executives. Is it hype, or the future?
4 min read
Data Centers Aren’t The Energy Hogs We Thought
Futurity
Article
Data Centers Aren’t The Energy Hogs We Thought
Feb 28, 2020
2 min read
Remote Care
Business Today
Article
Remote Care
Nov 26, 2020
7 min read
Synthetic Data As A Double-Edged Sword In Africa's AI Revolution
Forbes Africa
Article
Synthetic Data As A Double-Edged Sword In Africa's AI Revolution
Sep 29, 2023
Artificial intelligence (AI) is transforming companies and economies worldwide, including in Africa. Data is an essential component in the training of AI systems. Unfortunately, the lack of accurate, high-quality data is a significant impediment in A
3 min read
Opinion: Blockchains For Biomedicine And Health Care Are Coming. Buyer: Be Informed
STAT
Article
Opinion: Blockchains For Biomedicine And Health Care Are Coming. Buyer: Be Informed
Jul 25, 2018
Thinking about investing in a health-care-related blockchain project? Do your homework first.
6 min read
THE WORLD’S BEST Smart Hospitals 2023
Newsweek
Article
THE WORLD’S BEST Smart Hospitals 2023
Sep 16, 2022
7 min read
Triple A.i. Supply Chains
The European Business Review
Article
Triple A.i. Supply Chains
Jun 1, 2022
15 min read

Related categories

Skip carousel

Reviews for Computational and Statistical Methods for Analysing Big Data with Applications

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Computational and Statistical Methods for Analysing Big Data with Applications - Shen Liu

book.

Introduction

Abstract

This chapter provides an introduction to the era of big data, with the primary emphasize on three characteristics: volume, velocity and variety. The structure of this book is also described in this chapter.

Keywords

Big data; Volume; Velocity; Variety

The history of humans storing and analysing data dates back to about 20,000 years ago when tally sticks were used to record and document numbers. Palaeolithic tribespeople used to notch sticks or bones to keep track of supplies or trading activities, while the notches could be compared to carry out calculations, enabling them to make predictions, such as how long their food supplies would last (Marr, 2015). As one can imagine, data storage or analysis in ancient times was very limited. However, after a long journey of evolution, people are now able to collect and process huge amounts of data, such as business transactions, sensor signals, search engine queries, multimedia materials and social network activities. As a 2011 McKinsey report (Manyika et al., 2011) stated, the amount of information in our society has been exploding, and consequently analysing large datasets, which refers to the so-called big data, will become a key basis of competition, underpinning new waves of productivity growth, innovation and consumer surplus.

1.1 What is big data?

Big data is not a new phenomenon, but one that is part of a long evolution of data collection and analysis. Among numerous definitions of big data that have been introduced over the last decade, the one provided by Mayer-Schönberger and Cukier (2013) appears to be most comprehensive:

• Big data is "the ability of society to harness information in novel ways to produce useful insights or goods and services of significant value" and "things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value."

In the community of analytics, it is widely accepted that big data can be conceptualized by the following three dimensions (Laney, 2001):

• Volume

• Velocity

• Variety

1.1.1 Volume

Volume refers to the vast amounts of data being generated and recorded. Despite the fact that big data and large datasets are different concepts, to most people big data implies an enormous volume of numbers, images, videos or text. Nowadays, the amount of information being produced and processed is increasing tremendously, which can be depicted by the following facts:

• 3.4 million emails are sent every second;

• 570 new websites are created every minute;

• More than 3.5 billion search queries are processed by Google every day;

• On Facebook, 30 billion pieces of content are shared every day;

• Every two days we create as much information as we did from the beginning of time until 2003;

• In 2007, the number of Bits of data stored in the digital universe is thought to have exceeded the number of stars in the physical universe;

• The total amount of data being captured and stored by industry doubles every 1.2 years;

• Over 90% of all the data in the world were created in the past 2 years.

As claimed by Laney (2001), increases in data volume are usually handled by utilizing additional online storage. However, the relative value of each data point decreases proportionately as the amount of data increases. As a result, attempts have been made to profile data sources so that redundancies can be identified and eliminated. Moreover, statistical sampling can be performed to reduce the size of the dataset to be analysed.

1.1.2 Velocity

Velocity refers to the pace of data streaming, that is, the speed at which data are generated, recorded and communicated. Laney (2001) stated that the bloom of e-commerce has increased point-of-interaction speed and consequently the pace data used to support interactions. According to the International Data Corporation (https://www.idc.com/), the global annual rate of data production is expected to reach 5.6 zettabytes in the year 2015, which doubles the figure of 2012. It is expected that by 2020 the amount of digital information in existence will have grown to 40 zettabytes. To cope with the high velocity, people need to access, process, comprehend and act on data much faster and more effectively than ever before. The major issue related to the velocity is that data are being generated continuously. Traditionally, the time gap between data collection and data analysis used to be large, whereas in the era of big data this would be problematic as a large portion of data might have been wasted during such a long period. In the presence of high velocity, data collection and analysis need to be carried out as an integrated process. Initially, research interests were directed towards the large-volume characteristic, whereas companies are now investing in big data technologies that allow us to analyse data while they are being generated.

1.1.3 Variety

Variety refers to the heterogeneity of data sources and formats. Since there are numerous ways to collect information, it is now common to encounter various types of data coming from different sources. Before the year 2000, the most common format of data was spreadsheets, where data are structured and neatly fit into tables or relational databases. However, in the 2010s most data are unstructured, extracted from photos, video/audio documents, text documents, sensors, transaction records, etc. The heterogeneity of data sources and formats makes datasets too complex to store and analyse using traditional methods, while significant efforts have to be made to tackle the challenges involved in large variety.

As stated by the Australian Government (http://www.finance.gov.au/sites/default/files/APS-Better-Practice-Guide-for-Big-Data.pdf), traditional data analysis takes a dataset from a data warehouse, which is clean and complete with gaps filled and outliers removed. Analysis is carried out after the data are collected and stored in a storage medium such as an enterprise data warehouse. In contrast, big data analysis uses a wider variety of available data relevant to the analytics problem. The data are usually messy, consisting of different types of structured and unstructured content. There are complex coupling relationships in big data from syntactic, semantic, social, cultural, economic, organizational and other aspects. Rather than interrogating data, those analysing explore it to discover insights and understandings such as relevant data and relationships to explore further.

1.1.4 Another two V’s

It is worth noting that in addition to Laney’s three Vs, Veracity and Value have been frequently mentioned in the literature of big data (Marr, 2015). Veracity refers to the trustworthiness of the data, that is, the extent to which data are free of biasedness, noise and abnormality. Efforts should be made to keep data tidy and clean, whereas methods should be developed to prevent recording dirty data. On the other hand, value refers to the amount of useful knowledge that can be extracted from data. Big data can deliver value in a broad range of fields, such as computer vision (Chapter 4 of this book), geosciences (Chapter 5), finance (Chapter 6), civil aviation (Chapter 6), health care (Chapters 7 and 8) and transportation (Chapter 8). In fact, the applications of big data are endless, from which everyone is benefiting as we have entered a data-rich era.

1.2 What is this book about?

Big data involves a collection of techniques that can help in extracting useful information from data. Aiming at this objective, we develop and implement advanced statistical and computational methodologies for use in various high impact areas where big data are being collected.

In Chapter 2, classification methods will be discussed, which have been extensively implemented for analysing big data in various fields such as customer segmentation, fraud detection, computer vision, speech recognition and medical diagnosis. In brief, classification can be viewed as a labelling process for new observations, aiming at determining to which of a set of categories an unlabelled object would belong. Fundamentals of classification will be introduced first, followed by a discussion on several classification methods that have been popular in big data applications, including the k-nearest neighbour algorithm, regression models, Bayesian networks, artificial neural networks and decision trees. Examples will be provided to demonstrate the implementation of these methods.

Whereas classification methods are suitable for assigning an unlabelled observation to one of several existing groups, in practice groups of data may not have been identified. In Chapter 3, three methods for finding groups in data will be introduced: principal component analysis, factor analysis and cluster analysis. Principal component analysis is concerned with explaining the variance-covariance structure of a set of variables through linear combinations of these variables, whose general objectives are data reduction and interpretation. Factor analysis can be considered an extension of principal component analysis, aiming to describe the covariance structure of all observed variables in terms of a few underlying factors. The primary objective of cluster analysis is to categorize objects into homogenous groups, where objects in one group are relatively similar to each other but different from those in other groups. Both hierarchical and non-hierarchical clustering procedures will be discussed and demonstrated by applications. In addition, we will study fuzzy clustering methods which do not assign observations exclusively to only one group. Instead, an individual is allowed to belong to more than one group, with an estimated degree of membership associated with each group.

Chapter 4 will focus on computer vision techniques, which have countless applications in many regions such as medical diagnosis, face recognition or verification system, video camera surveillance, transportation, etc. Over the past decade, computer vision has been proven successful in solving real-life problems. For instance, the registration plate of a vehicle using a tollway is identified from the picture taken by the monitoring camera, and then the corresponding driver will be notified and billed automatically. In this chapter, we will discuss how big data facilitate the development of computer vision technologies, and how these technologies can be applied in big data applications. In particular, we will discuss deep learning algorithms and demonstrate how this state-of-the-art methodology can be applied to solve large scale image recognition problems. A tutorial will be given at the end of this chapter for the purpose of illustration.

Chapter 5 will concentrate on spatial datasets. Spatial datasets are very common in statistical analysis, since in our lives there is a broad range of phenomena that can be described by spatially distributed random variables (e.g. greenhouse gas emission, sea level, etc.). In this chapter, we will propose a computational method for analysing large spatial datasets. An introduction to spatial statistics will be provided at first, followed by a detailed discussion of the proposed computational method. The code of MATLAB programs that are used to implement this method will be listed and discussed next, and a case study of an open-pit mining project will be carried out.

In Chapter 6, experimental design techniques will be considered in analysing big data as a way of extracting relevant information in order to answer specific questions. Such an approach can significantly reduce the size of the dataset to be analysed, and potentially overcome concerns about poor quality due to, for example, sample bias. We will focus on a sequential design approach for extracting informative data. When fitting relatively complex models (e.g. those that are non-linear) the performance of a design in answering specific questions will generally depend upon the assumed model and the corresponding values of the parameters. As such, it is useful to consider prior information for such sequential design problems. We argue that this can be obtained in big data settings through the use of an initial learning phase where data are extracted from the big dataset such that appropriate models for analysis can be explored and prior distributions of parameters can be formed. Given such prior information, sequential design is undertaken as a way of identifying informative data to extract from the big dataset. This approach is demonstrated in an example where there is interest to determine how particular covariates effect the chance of an individual defaulting of their mortgage, and we also explore the appropriateness of a model developed in the literature for the chance of a late arrival in domestic air travel. We will also show that this approach can provide a methodology for identifying gaps in big data which may reveal limitations in the types of inferences that may be

Enjoying the preview?

Page 1 of 1

Computational and Statistical Methods for Analysing Big Data with Applications

About this ebook

Shen Liu

Related authors

Related to Computational and Statistical Methods for Analysing Big Data with Applications

Related ebooks

Mathematics For You

Related podcast episodes

Related articles

Related categories

Reviews for Computational and Statistical Methods for Analysing Big Data with Applications

What did you think?

Book preview

Computational and Statistical Methods for Analysing Big Data with Applications - Shen Liu

Abstract

Keywords

1.1 What is big data?

1.1.1 Volume

1.1.2 Velocity

1.1.3 Variety

1.1.4 Another two V’s

1.2 What is this book about?