Sie sind auf Seite 1von 1

Data Processing Pipeline

School of Data Skill Set

schoolofdata.org

Data Sources

Discovery and Acquisition

Extraction

Cleansing, Transformation and Integration

Analytical Modeling

Presentation, Analysis, Publishing

N web pages Data Relevance

N Searching and Finding N Crowd Sourcing

1 Loading to Data Store 1

Data Formats and Standards

1 Data Granularity

1 2 Using Reference Data

1 Pivoting

1 Visualisation and Plotting N

3 Sorting and Filtering 1 N Story Telling 1

text documents

N 1 Completeness and Stopping 1 Manual Digitisation

2 Scraping

1 Merge/Join

2 Mapping 2

Handling Manual Corrections

2 OLAP

2 Business Rules 2 Regression 2 Clustering Graph/Network Metrics Outliers

Visualisation Method Selection

2 Map Geo-Tagging

structured documents

2 Parsing 2 Crawling 2 Automation Normalisation

2 Entity Uniqueness

Treating Duplicates

Publishing Online

Audit of Existing Resource databases

2 Indexing and Optimisation 2 3

Handling Changing Dimensions

3 Bulk Digitisation scientic data Natural Language Processing

Concept Modelling

3 Fuzzy Matching Simulation 3

3 Data Pipes

Governance

2 ETL Process Management

2 Data Quality Management

N Auditability and Provenance

2 Reference Data Management

1 Metadata

Tools and Technologies


SQL

1 Programming Basics

2 Advanced Programming

Level:

non-technical

beginner

advanced

expert

cba

Stefan Urbanek @Stiivi v0.3

Das könnte Ihnen auch gefallen