Sie sind auf Seite 1von 49

PHILIPPE JULIO – BIG DATA ANALYTICS ARCHITECT

1
PERSONAL INTERESTS

Philippe Julio – Big Data Analytics Architect


https://www.linkedin.com/in/juliophilippe/

Artificial Intelligence Consulting Architecture


Big Data
Analytics

Astronomy

Digital
Cloud Computing Data Center
Data Warehousing Technology
Healthcare Oncology Data Science

Big data OncoAnalytics 2


ONCOANALYTICS INSIGHT
ONCOPHYSICS ONCOPHARMACEUTICS ONCOGENOMICS

See into the present See into the present See into the present

Radioscopy vision Microscopic vision Microscopic vision

• Objects : x-rays, gamma-rays, magnetic • Objects : tumors, molecules, proteins, • Objects : tumors, cells, chromosomes, genes,
resonance, ultrasounds, laser light... hormones, enzymes, biomarkers, amino DNA, enzyme, hormone, antibody…
• Quantum mechanics : photon, electron, acids … • Quantum biology : DNA mutation, cellular
magnetism… • Quantum chemistry : biomolecular modeling, respiration…
chemical energy, spectra analysis…

ONCOANALYTICS
See into the future
Infinite
Analytics Use Cases
Quantum Physics and • Objects : data, bits, Qbits

Artificial Intelligence
• Quantum computer : artificial
Intelligence, algorithms, cryptography,
search, simulation, linear equations,
prediction, recommendation, risk…
Computer vision

Big data OncoAnalytics 3


KEYS QUESTIONS

• Where are we heading about fight against cancer ?


• How big data analytics is transforming oncology ?
• Which big data OncoAnalytics architecture patterns ?
• Which big data OncoAnalytics solutions ?
• Which Big data OncoAnalytics solution examples ?
• How getting start big data OncoAnalytics project ?
• Which big data OncoAnalytics value proposition ?

4
Where are we heading about
fight against cancer ?
WHAT IS CANCER AND ONCOLOGY ?

• Cancer can start any place in the body


• Cells grow out of control and crowd out normal
cells
• There are more than 200 different types of
cancer
• Cancer starts when gene changes make one cell Grade Tumor Growth and Spread Classification
or a few cells begin to grow and multiply too  Gx: Grade cannot be assessed (undetermined grade)
much  G1: Well differentiated (low grade)
• Cell growth is called a tumor  G2: Moderately differentiated (intermediate grade)

• Primary tumor is the name for where a cancer  G3: Poorly differentiated (high grade)
starts  G4: Undifferentiated (high grade)

• Secondary tumor or a metastasis is a name for


where the cancer spreads to other parts of the Tumor Nodes Metastasis Classification
body  T : (0-4) tumor size or direct extent of the primary tumor
• Oncology is a branch of medicine that  N : (0-2) degree of spread to regional lymph nodes
specializes in the diagnosis and treatment of  M : (0-1) presence of distant metastasis
cancer. It includes medical oncology, radiation  Sub classifications for some cancer types
oncology and surgical oncology

Where are we heading about fight against cancer ? 6


ONCOLOGY GOALS

• Building a world where every cancer patient receives


the right care at the right place at the right time
• Curing or considerably prolonging the life of patients
and to ensure the best possible quality of life for
cancer survivors
• Providing in a equitable and sustainable way the most
effective treatments thanks to early detection,
accurate diagnosis and staging and adhere to
evidence-based standards of care
• Forecasting the cancer for offering a real opportunity
for clinical benefit that is based on anticipatory action

Where are we heading about fight against cancer ? 7


CANCER GENOMICS BASICS

Genome is unique for every person


• All the DNA molecules contained in our cells makes up our
genome
• Genome sequencing complete in 2004
• 99.9% of our genome is the same
• 0,1% is enough to make each one of us unique
• On average of 1-3 bases differ from person to person
• Differences can change the shape and function of a protein,
or they can change how much protein is made, when it's
made, or where it's made
• In most cells, the genome is packaged into two sets of
chromosomes: one set from our mother and one set from
our father
Cancer is a disease of the genome
Cell, Chromosome, DNA, RNA, Gene, Protein, Amino Acid
• In cancer cells, small changes in the genetic letters can
• Human have 23 pairs of chromosomes in every cell (22 pairs of autosomes and
change what a genomic word or sentence means
1 pair of gonosomes (XY male, XX female)
• A changed letter (A, C, T, G) can cause the cell to make a • All DNA complex molecules containing deoxyribose acids in a specific order
protein that doesn’t allow the cell to work as it should determined by the base sequence of 3,2 Billion nucleotides (Adenine, Cytosine,
• Scientists can discover what letter changes are causing a cell Thymine, Guanine) including 20,000 to 25,000 genes coding a protein
to become a cancer • RNA is a molecule composed of nitrogenous bases (Guanine, Uracil, Adenine,
Cytosine) similar to DNA but containing ribose rather than deoxyribose. RNA is
• The genome cancer cell can also be used to tell one type of
formed upon a DNA template. There are several classes of RNA
cancer from another
molecules.(messager RNA : mNRA, transfer RNA : tRNA and ribosomal RNA : rRNA)
• In some cases, studying the genome in a cancer can help • Protein is a large molecule (enzyme, hormone, antibody…) composed of one or
identify a subtype of cancer within that type more chains of amino acids

Where are we heading about fight against cancer ? 8


WORLD CANCER STATISTICS 2018

Cancer burden rises


to 18.1 million new
cases and 9.6
million cancer
deaths in the world

Cancer is the
second leading
cause of death in
the world

The most common


cancers in the world
are (in decreasing
order) those of the
lung, breast,
colorectum and
prostate

Source : International Agency for Research on Cancer

Where are we heading about fight against cancer ? 9


CANCER RISKS FACTORS

• Tobacco
• Alcohol
• Nutrition
• Certain class of drugs
• Genome
• Physical inactivity
• Radio frequency
• Phytosanitary products
• Pollution
• Radio activity
• Skin exposure (sun, ultraviolet light) Source : American Institute for Cancer Research

• Infectious agents (HPV, HBV, HCV, HIV…)

Where are we heading about fight against cancer ? 10


CANCER MANAGEMENT

Diagnostics

• Health history review • Medical imaging (X-ray, PET/CT, MRI,


• Physical examination Ultrasound…)
• Laboratory tests (blood, urine…) • Endoscopy
• Biopsy • Genetic tests

Treatments

• Surgery • Stem cell transplant


• Chemotherapy • Hyperthermia
• Radiotherapy (Curie-Therapy, Proton Therapy) • Photodynamic therapy
• Immunotherapy • Laser
• Genetic CRISPER Cas9 • Blood products, transfusion

Supportive Cares

• Psychiatry • Kinesitherapy
• Psychology • Spiritual
• Sophrology, Meditation, Mindfulness, Wellness • Rehabilitation
• Dietetic • Social work
• Speech therapy • Volunteer

Where are we heading about fight against cancer ? 11


CANCER MEDICINE MODEL

Personal medicine Preventive medicine


taking into consideration health problems by
taking into account the patient genetic
focusing on wellness and not disease
or protein profile

Pathway medicine Predictive medicine


connecting between different medical and indicating the most appropriate
para-medical actors, outpatient medicine, Cancer Medicine treatments for the patient and trying to
digital hospital and home care avoid drug reactions
of the Future

Proof medicine Participative medicine


proving medical service to patients, leading patients to be more responsible for
particularly when it is based on connected their health and care
health and remote medicine

Where are we heading about fight against cancer ? 12


FUTURE OF THE CANCER TREATMENT

• Personal medicine
Vision that all people one day will be offered customized care, with treatments that match our genomic
profiles and personal histories

• Immunotherapy drugs
Immunotherapy is the fruition of a century-old idea: that a person’s own immune system can be stimulated to
fight cancer

• Cell-based therapies
Patient’s own T cells are directly manipulated to more readily attack cancer cells. In this treatment, T cells are
collected from a patient’s blood, genetically engineered to recognize certain proteins on cancer cells, and
loaded back into the patient’s bloodstream

• Epigenetic therapies
Cancer could be treated in a different way, by transforming cancer cells back to normal rather than destroying
them. CRISPR technology is used to easily alter DNA sequences and modify gene function. The protein Cas9 (or
"CRISPR-associated") is an enzyme that acts like a pair of molecular scissors, capable of cutting strands of DNA

• Battling metastases
Metastatic tumor cells have a remarkable tendency to cling to blood vessels, a survival mechanism that might
be important for the spread of many types of cancer

Where are we heading about fight against cancer ? 13


How big data analytics
is transforming oncology ?
ONCOANALYTICS NEEDS

• Providing the best diagnosis and treatment plans


for the patient
• Using the significant advances in data to better
prioritize resources, lowering costs and improving
patient outcomes
• Developing a cognitive interface between clinicians
and technology
• Integrating various primary and secondary sources
for enhance patients and treatment pathway
insights
• Providing a high performance architecture,
scalable, available and secure supporting a large
volume of various data

How big data analytics is transforming Oncology ? 15


ONCOANALYTICS MATURITY MODEL

Value

ADVANCED ANALYTTICS

PRESCRIPTIVE
ANALYTICS
What should be
done ?

PREDICTIVE
ANALYTICS
What could happen ?

DIAGNOSTIC
ANALYTICS
What did it
ANALYTTICS

happen ?
DESCRIPTIVE
ANALYTICS
What
happened ?

Maturity

How big data analytics is transforming Oncology ? 16


ONCOANALYTICS USE CASES EXAMPLES

• Isoforms proteins biomarkers analysis


Estimating cancer patients better survival with isoforms
Analytics
proteins biomarkers
The future of
• Survivorship analysis for breast cancer
Understanding what happens to patients after breast Cancer Care
cancer diagnosis

• Rapid diagnosis of rare leukemia


Identifying cause and life-saving therapy faster for a rare
leukemia

• Improving early detection of breast cancer


Detecting breast cancer at the earliest stage

• Approach to developing cancer drugs


Developing cancer drugs based on genetic mutations
tumors

• Biomarkers for relapse in lymphoma patients


Understanding why some lymphoma patients relapse and
others don’t

How big data analytics is transforming Oncology ? 17


ONCOANALYTICS EMERGING TECHNOLOGIES

ARTIFICIAL INTELLIGENCE
Makes it possible for machines to learn from experience, adjust to new inputs and perform human-like tasks.
Computers can be trained to accomplish specific tasks by processing large amounts of data and recognizing
patterns in the data

COMPUTER VISION NATURAL LANGUAGE MACHINE LEARNING


PROCESSING
DEEP LEARNING
Trains computers to Helps computers Learns without being explicitly Makes the computation of
interpret and understand communicate with doctors programmed to do so multi-layer neural network
the visual world. Using and researchers in their feasible
images, machines can own language, making it
accurately identify and possible for computers to
classify objects and then read text, hear speech,
react to what they “see” interpret it, measure
sentiment and determine
which parts are important

IoT CONVERSATIONAL KNOWLEDGE BLOCKCHAIN QUANTUM


PLATFORM AI PLATFORM GRAPHS FOR DATA SECURTY COMPUTING
Facilitates Uses set of Creates a knowledge Creates continuously Uses a quantum-
communication, data technologies that domain with the help growing list digital mechanical
flows, devices enable computers to of intelligent machine records in packages phenomena such as
management and simulate real learning algorithms (called blocks) which superposition and
applications conversations are linked and secured entanglement to
functionalities using cryptography perform computation

How big data analytics is transforming Oncology ? 18


ONCOANALYTICS BIG DATA MODEL

Volume
Great use of precision medicine, big data
explosion in cancer care, especially as genomic
and environmental data become more ubiquitous

Variety
Great data variety combining traditional clinical and
administrative data, unstructured data (genomics, Value
imaging, text…), socioeconomic data and social data Significant advances in data to better
diagnosis and treatment plans, the
patient outcomes, better prioritize
resources and lowering costs
Big Data for
OncoAnalytics

Velocity Veracity
Rapidly increasing speed at which new data is
Good data quality. Data source is authoritative.
being created by technological advances, and the
Privacy and data protection safeguards. Data are
corresponding need for that data to be integrated
regularly updated. Data are unambiguous,
and analyzed in near real-time
complete, easy to find, understand and use

How big data analytics is transforming Oncology ? 19


ONCOANALYTICS ACTORS AND ROLES

Doctor Researcher Manager Administrator

Gets personalized Receives new insight for Has access to Has access to
guidance on treatment discovery through metrics and tools metrics and tools
decisions by matching access to a massive that support high- that support high-
each patient’s care body of de-identified quality efficient quality efficient
against quality patient care data to cares and costs data and IT costs
standards and data analyze patterns
from patients like
theirs

How big data analytics is transforming Oncology ? 20


ONCOANALYTICS DATA SOURCES

Diagnosis Treatments Supportive cares


Imaging Radiology Literature
Patient data Financial Text Pharmaceutics
Trial data
Research data
Biologic Video Genomics
Publications Studies Documents
Social Economic Environment

How big data analytics is transforming Oncology ? 21


ONCOANALYTICS PATIENT PRIVACY

Data Privacy Method

• De-identifying patient identification code numbers are de-identified by replacing the original code number by a unique
random code number, creating de-identified dataset. It’s reversible process.
• Anonymization destroys all links between the de-identified dataset and the original dataset. It’s non-reversible process

Data Privacy Rules

• Private patient data be used for medical and non-commercial purposes


• Only patient and his doctor are authorized to access to private patient data

Integration Process

Patient data extraction Private data

Patient data transformation de-identification

Patient data loading Public data

How big data analytics is transforming Oncology ? 22


ONCOANALYTICS PRIVATE PATIENT DATA

Private patient data must be Data Sharing


de-identified before integration within public
databases according to data sharing agreements Agreement

• Names
• All geographic subdivisions (except country)
• All elements of dates (except year)
• Telephone numbers
• Fax numbers
• Email addresses
• Social security numbers
• Medical record numbers
• Health plan beneficiary numbers
• Account numbers
• Certificate/license numbers
• Vehicle identifiers and serial numbers
• Device identifiers and serial numbers
• URL
• IP address
• Biometric identifiers, including finger and voice prints
• Full-face photographs and any comparable images
• Any other unique identifying number, characteristic, or
code

How big data analytics is transforming Oncology ? 23


ONCOANALYTICS DISCOVERY

Use Cases Examples


• Molecular analysis • Benchmarking and standardization cares
• Survivorship analysis • Cancer prevention and recommendations
• Early detection of cancer • Prediction with better precision
• Cancer drugs development • Diagnosis and treatments plan
• … • …

Collecting data from various sources by detecting patterns and outliers with
the help of guided advanced analytics and visual navigation of data, thus
enabling consolidation of cellular, patient and population data

Cellular : looking for patterns in the data of individual cancer cells


to discover genetic biomarkers. Finding common features could
help us better predict how individual tumors might mutate and
what drug treatments might be most effective

Patient : patient medical history and DNA data could be used to


help define the best combination of therapies and
recommendations for them, based on their tumor, their genes
and the effects of treatments on patients with similar disease
patterns and genetics

Population : population data can be analyzed to inform treatment


strategies for patients based on their different lifestyles,
geographies, and cancer types

How big data analytics is transforming Oncology ? 24


ONCOANALYTICS CLINICAL

Use Cases Examples


• Patient diagnosis, treatments and supportive cares • Return on investment
• Expenses and investments • Payments models
• Billing and reimbursement • Donation
• Profits and margin • Forecasting
• Cost savings • Predictive financial modeling
• Quality control of cares • …

Diagnostics
Making use patient data to generate insights,
Treatments take decisions, increase revenues, enhance cares
Expenses

Cares support
coordination, minimize abuse and fraud and save
on costs
Coordination

Others

Time
Clinical trials
Investments

Research Making use investments to generate insights and


Technology enhance diagnosis and treatments

Others

How big data analytics is transforming Oncology ? 25


Which big data OncoAnalytics
architecture patterns ?
BIG DATA ONCOANALYTICS ARCHITECTURE PATTERNS

Consisting an integrated and modular architecture for OncoAnalytics

Big data Big data


analytics is the Doctor Researcher Manager Administrator management is
often complex the organization,
process of BIG DATA ANALTICS administration
examining large Network and governance
and varied data Prescriptive Predictive Diagnostics Descriptive of large volumes
Analytics

Governance
Analytics Analytics Analytics
sets of both
Visualization
structured and
Big data unstructured data
warehouse is
mainly technology, BIG DATA WAREHOUSE
which stands on
Storage

Big data
BIG DATA INFRASTRUCTURE

volume, velocity DATA DISCOVERY DATA CLINICAL

BIG DATA MANAGEMENYT


infrastructure is the
and variety of data
DATA LAKE
consistent efficient
sources
hardware

Administration
architecture,
Big data fabric is
BIG DATA FABRIC massively parallel,
a system that highly scalable and
Servers

provides available to handle


seamless, real- Extraction Transformation Loading
very large data
time integration volumes up to several
and access across petabytes
the multiple data
sources
Patient, Trial and Research data

Which big data OncoAnalytics architecture patterns ? 27


BIG DATA ANALYTICS PATTERNS

Helping on doctors, researchers and managers to run different types


of analytics, from dashboard and visualization to big data processing,
real-time analytics, and machine learning to guide better decisions

Doctor Researcher Manager

Descriptive Diagnostic Predictive Prescriptive


Analytics Analytics Analytics Analytics

Visualization

Using data aggregation Using techniques such as Using techniques such Using techniques such as
and data mining to drill-down, data mining as statistics, predictive graph analysis, simulation,
provide insight into the and correlations modeling and complex event processing,
past forecasting machine learning, neural
networks

Which big data OncoAnalytics architecture patterns ? 28


BIG DATA WAREHOUSE PATTERNS

Centralizing all data at any scale with flexible software and available
architecture for massively parallel data processing on a network of
lower costs commodity hardware
Appliance architecture

Analytics Optimized System (Appliance)


• Software system and hardware integrated
Connectors
• Languages (R, SQL, NOSQL…)
• Unstructured/structured data
Languages • Components redundancy
• Massively Parallel Processing
Software system • In-memory computing
• Resources management
• Partitioning, indexing
Operating
Database • Column database or HDFS
System
• Compression
• Connectivity (Hadoop…)
Hardware • Scalability, high Availability
• Security

Storage patterns

Moving data processing to storage Direct Attach Storage (DAS)

• Improve data access • Improve data access


performances performance
• Reduce network latency • Storage inside each server
• Improve data flows (DAS)

Which big data OncoAnalytics architecture patterns ? 29


BIG DATA FABRIC PATTERNS

Combining relevant data residing in different sources and providing


doctors, researchers and managers with a unified view of them

Phases

• Extraction phase : extracts data from various data sources


• Transformation phase : transforms data for storing it in de-identify format or structure for the purposes of querying and analysis
• Loading phase : loads data within the big data warehouse
• Phases are running in real-time streaming or batch processing

ETL
Data sources Transform Database
Extract Load
• While data is being extracted, the transformation phase is executed and the
already received data are prepared for loading. As soon as there is some data Extract Load
ready to be loaded into the big data warehouse, the data loading kicks off
without waiting for the completion of the previous phase Extract Load

ELT
Data sources Database Transform
• While data is being extracted, the already received data are prepared for loading.
As soon as there is some data ready to be loaded into the big data warehouse, Extract Load
the data loading kicks off and transformation is executed in-database without Extract Load
waiting for the completion of the previous phase
Extract Load

Which big data OncoAnalytics architecture patterns ? 30


BIG DATA MANAGEMENT PATTERNS

Consisting an integrated, modular environment to manage application


data and optimize data-driven over their lifetime

Administration
Primary Disaster recovery

Data replication
• Monitoring and scheduling system
• Hardware failure , disk or server crash, rack failure
• Data deletion, data corruption
• Site failure , disaster (fire, water, network, power…)
• Backup and restore management Data backup
Data restore

Monitoring Backup/Restore
Storage Array

Governance

Organization Doctors and researchers relationship, people management and costs control
• Maintaining a full audit history across all data in a
single place Metadata
Managing data about other data generally referred to as content data (catalog,
• Tracking, classifying and locating data to comply dictionary, taxonomy

with governance and compliance rules Data security management is a way to maintain data integrity and to make sure that the
Data Security
• Visualizing the upstream and downstream lineage data is not accessible by unauthorized parties or susceptible to corruption of data
of data to verify reliability Set of characteristics of data : completeness, validity, accuracy, consistency, availability
• Defining and automating complex data lifecycle Data Quality
and timeliness fulfills requirements
activities with integrated metadata policies
• Verifying access privileges Master Data Processes, policies, standards and tools that consistently define and manage the critical
Management data to provide a single point of reference
• Searching metadata and visualizing lineage
• Encrypting or decrypting data Data Life Cycle Managing information throughout its lifecycle, from requirements through retirement..
Management Data archiving and lineage

Which big data OncoAnalytics architecture patterns ? 31


BIG DATA INFRASTRUCTURE PATTERNS

Consisting efficient infrastructure from traditional SQL database


to big data massively parallel, highly scalable and available to handle
very large data volumes
Traditional Infrastructure
Limited Vertical Limited Horizontal
scalability scalability
• SQL Scale-in Scale-out
• High availability
• Limited scalability
• Structured data
• Limited data volume
• Cluster
• Up to several terabytes • CPU add-on • Server add-on
• RAM add-on • Network switch add-on
• Disks add-on • Appliance add-on
• I/O controller add-on

Big Data Infrastructure


Limited Vertical Infinite Horizontal
• SQL and NoSQL scalability scalability
• Very high availability Scale-in Scale-out
• Infinite horizontal scalability
• Unstructured and
structured data
• Massively Parallel
Processing
• High data volume
• Grid • CPU add-on • Server add-on
• RAM add-on • Network switch add-on
• Up to several petabytes
• Disks add-on • Appliance add-on
• I/O controller add-on • Rack add-on

Which big data OncoAnalytics architecture patterns ? 32


BIG DATA CLOUD PATTERNS

Consisting an integrated, modular environment to manage application


data and lowering investment costs

Infrastructure Platform Software


as a Service (IaaS) as a Service (PaaS) as a Service (SaaS)

Software Software Software Hospital

Database Database Database


Cloud Provider

OS Virtualization OS Virtualization OS Virtualization

Hardware Hardware Hardware

PRIVATE CLOUD PUBLIC CLOUD HYBRID CLOUD


The Cloud is hosted within the The hardware and software are owned The hybrid cloud is a mix of private
hospital data center. It is tailored to and managed by a cloud services cloud and public cloud. Cloud services
needs and infrastructure. The data is provider. It is fast and inexpensive to are distributed according to the needs.
located within the hospital data set up. It adapts quickly to the Costs are split between investment
center. Investment costs are fluctuation of needs. De-identify data and services
managed by the hospital is hosted by the provider. Costs
services are billed for use

Which big data OncoAnalytics architecture patterns ? 33


Which big data OncoAnalytics
solutions ?
NO ONE-SIZE-FITS-ALL SOLUTION

The solution lies in greater collaboration, working together to use


multiple software each looking for specific features

• New tools and techniques are required to efficiently process all information, more
data sources emerge
• There is no one cure-all for cancer, there is no single tool for data analytics
• Supercomputing power required to rapidly process huge structured and
unstructured data volume

• Solution is classified into 6 domains :

1. BIG DATA FABRIC 2. BIG DATA WAREHOUSE


3. BIG DATA ANALYTICS 4. BIG DATA MANAGEMENT

5. BIG DATA HADOOP 6. BIG DATA CLOUD PLATFORM

Which big data OncoAnalytics solutions ? 35


BIG DATA MARKET ANALYSIS

BIG DATA FABRIC BIG DATA ANALYTICS BIG DATA WAREHOUSE BIG DATA MANAGEMENT
Bid Data Fabric, Big Data Predictive Analytics Big Data Warehouse, Data Governance Stewardship and
Q2 2018 and Machine Learning Solutions, Q3 2018 Q2 2017 Discovery Providers Q2 2017

Source : Forrester Source : Forrester Source : Forrester Source : Forrester


• Provides single enterprise-class • Discovers hidden insights with a • Provides flexible, high- • Complies with regulations
solution for data integration, personalized analytics performance, secure platform • Executes health center
data quality, data profiling and experience • Delivers powerful ready-to-run programs based on secured data
text data processing • Provides high performance enterprise platform that is pre- of quality
• Allows to integrate data from analytics, analytical data configured and optimized • Ensures analytics and reports be
multiple various sources preparation, data discovery specifically for big data trusted
• Provides superfast processing • Builds platform for big data • Leverages understanding and
• Provides rich connectivity to
for large-scale data storage, refinement and knowledge with data and
many sources and targets
manipulation, exploration, analytics of structured and context
• Manages services advanced analytics, artificial unstructured data • Measures, analyzes and
intelligence and machine • Connects to Hadoop visualizes the cares programs
learning • Manages services • Executes administration process
• Manages services (backup, disaster recovery…)

Which big data OncoAnalytics solutions ? 36


BIG DATA HADOOP MARKET ANALYSIS

HADOOP FRAMEWORK HADOOP ADD-ON BIG DATA HADOOP


Big Data Hadoop-Optimized Systems
• HDFS : scalable, Fault tolerant, High • SPARK : fast and general engine for Q2 2016
performance distributed file system. large-scale data processing. Faster than
namenode holds filesystem metadata. MapReduce. combine SQL, streaming,
Files are broken up and spread over and complex analytics
the datanodes • STORM : a system for processing
• MAPREDUCE : software framework for streaming data in real time and data
distributed computation. JobTracker processing capabilities to Enterprise
schedules and manages jobs. Hadoop
TaskTracker executes individual map • KAFKA : is a distributed streaming
and reduce tasks on each cluster node platform. Kafka brokers massive
• YARN : foundation of the new message streams for low-latency
generation of Hadoop. Architectural analysis in Enterprise Apache Hadoop
center that allows multiple data
processing engines

Source : Forrester
• Runs Hadoop deliver quick
setup, higher performance and
automation
• Helps overcome these issues by
optimizing the infrastructure
with automation, balanced
system resources, and
integrated testing
• Runs Hadoop framework
• Uses Apache Spark and Storm in
option
• Manages services

Which big data OncoAnalytics solutions ? 37


BIG DATA CLOUD MARKET ANALYSIS

Big data fabric, big data warehouse, big data BIG DATA CLOUD PLATFORM
analytics and big data management integrated Forrester Wave : Global public cloud platforms For
enterprise Developers, Q3 2016
in only-one OncoAnalytics Cloud Platform
IBM IBM

IBM CLOUD IBM CLOUD


Watson for Oncology Watson for Genomics

FLATIRON VARIAN

AWS CLOUD MICROSOFT CLOUD


OncoCloud OncoPeer
Source : Forrester
• Provides cloud solution with
ASCO SEVEN BRIDGES massive volumes of medical
documents and patient data
SAP CLOUD GOOGLE CLOUD • Integrates big data fabric, big
data warehouse, big data
CancerLinQ Cancer Genomics Cloud
analytics and big data
management
• Provides artificial intelligence to
GOOGLE the clinicians of open-domain
question answering
GOOGLE CLOUD • Loads de-identified patient data
• Manages cloud services
Google Genomics

Which big data OncoAnalytics solutions ? 38


Which Big data OncoAnalytics
solution examples ?
BIG DATA ONCOANALYTICS ARCHITECTURE

Doctor Researcher Manager Administrator


Software Architecture
BIG DATA ANALTICS • Data Discovery cloud platform
• Data Lake Hadoop appliance
Network

Prescriptive Predictive Diagnostics Descriptive


Analytics • Data Clinical appliance

Governance
Analytics Analytics Analytics
• Data Integration software
Visualization • Data Analytics software
• Data Management software
• Software support
BIG DATA WAREHOUSE
Storage
BIG DATA INFRASTRUCTURE

DATA DISCOVERY DATA CLINICAL

BIG DATA MANAGEMENYT


DATA LAKE

Administration
BIG DATA FABRIC
Technical Architecture
Servers

• Servers
Extraction Transformation Loading • Storage
• InfiniBand & Ethernet Network
• Hardware support

Patient, Trial and Research data

Which Big data OncoAnalytics solution examples ? 40


BIG DATA ONCOANALYTCS DATA FLOWS

IMANAGEMENT
Administraor

SOURCES DATA LAKE DISCOVERY ANALYTICS

Researcher
Public data Public data
de-identified
Doctor

Private Patient data

Private Trial data Private data

Private Research data


CLINICAL
Doctor
Public data
Public data
de-identified
Manager

Doctor

Which Big data OncoAnalytics solution examples ? 41


BIG DATA ONCOANALYTICS INFRASTRUCTURE

Doctor Researcher Manager Administrator

Web Browser
1 to 10Gb Ethernet

MANAGEMENT INTEGRATION ANALYTICS MANAGEMENT INTEGRATION ANALYTICS

Web/Application Web/Application Web/Application Web/Application Web/Application Web/Application


Servers Servers Servers Servers Servers Servers

SECONDARY SITE
PRIMARY SITE

40Gb Infiniband

10Gb Ethernet
Internet

BACKUP DATA LAKE CLINICAL DISCOVERY CLINICAL DATA LAKE BACKUP

Backup Hadoop Analytics Oncology Cloud Analytics Hadoop Backup


Array Appliance Appliance Platform Appliance Appliance Array

10Gb Ethernet

Asynchronous data replication from primary to secondary site


* Persistent Staging Area

Which Big data OncoAnalytics solution examples ? 42


How getting start big data
OncoAnalytics project ?
BIG DATA ONCOANALYTICS PROJECT - GETTING START

• Have a big data insight – understanding concepts of big data, domains, needs, opportunities,
market, social behavior
• Discuss with key people – doctors, researchers, managers…
• Identify new skills and competencies – data scientists, architects…
• Identify alliances – services providers, software and hardware vendors
• Build a case – few public proof points or metrics to leverage, create much of it from scratch, focus
on single problem and only a handful of metrics
• Use internal data in priority – Electronic Medical Records exist in the cares center. Integrate external
data in second step
• Evangelize big data in financial and social terms – make an evangelization deck, explain how the
cares center will benefit from big data and the financial and social opportunities it creates. The
objective is for clinicians to embrace it and include it in their plans. Make it friendly
• Identify a sponsor – here’s the challenge with big data technology, looking for someone dynamic,
who understands the stakes and believes that technology can drive competitive advantage
• Capture metrics and use them to tell a story – identify only a few metrics that will be measure and
tell a story, people will remember the story long after they forget the numbers in the case
• Emphasize on big data opportunity – some people can’t see big data, it’s hard to
get passionate about abstract concepts. Need to visualize the problem and the opportunity.
Do a demonstration of big data project and show what new results will occur. A picture is always
worth than a thousand words

How getting start big data OncoAnalytics project ? 44


BIG DATA ONCOANALYTICS PROJECT

IT Consultant IT Consultant IT Developer IT Engineer


IT Architect IT Architect IT Engineer IT Administrator

Consulting Designing Building Running

• Oncology goals • Data structures • Databases building • Hardware and software


• Oncology needs • Data integration support according to the
• Data flows Service Level Agreement
• Actors and roles • Data analytics
• Algorithms and data • Hardware requirements
• Data sources, processing • Dashboards and reports
• Monitoring and scheduling
• Data flows • Data administration and
• Dashboards and reports governance • Data security
• Data model
• Architecture design • Application deployment • Master data
• Data volume
• Data administration and • Data quality
• Software and tools governance • Hardware, software
• Data life cycle
• Macro architecture • Test plan
installation
• Data backup and disaster
• Documentation • Documentation • Tests recovery
• Documentation • Documentation

IT Manager

Managing

• Planning
• Team organization and coordination
• Meeting with doctors, researchers, manager and IT people

How getting start big data OncoAnalytics project ? 45


Which big data OncoAnalytics
value proposition ?
BIG DATA ONCOANALYTICS VALUE PROPOSITION

• Working with clinicians, researchers and


IT engineers together to fight against cancer
• Patients diagnosis, treatments and care support plan
• Cancer predictions and recommendations
• High-quality efficient cares and costs
• Best practices, publications and cancer literature sharing across
many care centers and laboratories
• EMR and EHR integration (best practices, documents, genomic,
biology, radiology, pharmacology, financial…)
• Advanced algorithms based on naturel language, artificial
intelligence, machine learning and deep learning
• Discovery and clinical unified platform integrating hardware
software and added value services
• Available, scalable and flexible high performance platform
• Hardware and software vendors partnership
• References architectures

Which big data OncoAnalytics value proposition ? 47


PHILIPPE JULIO – BIG DATA ANALYTICS ARCHITECT
48
KEY LINKS

• https://www.iarc.fr
• https://en.e-cancer.fr
• https://www.cancer.org
• https://www.cancer.gov
• https://www.cancer.net
• globalcancermap.com

Big data OncoAnalytics 49

Das könnte Ihnen auch gefallen