Beruflich Dokumente
Kultur Dokumente
D ATA E N G I N E E R I N G F O R E V E R YO N E
Hadrien Lacroix
Content Developer at DataCamp
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
DATA ENGINEERING FOR EVERYONE
A general de nition
Data processing: converting raw data into meaningful information
Remove unwanted data No long term need for testing feature data
Optimize memory, process and network Can't afford to store and stream les this
costs big
Convert data from one type to another Convert songs from .flac to .ogg
What it consists in
Hadrien Lacroix
Content Developer at DataCamp
Scheduling
Can apply to any task listed in data processing
Automatically run if a speci c condition is met Update the department tables if a new
Sensor scheduling employee was added
Hadrien Lacroix
Content Developer at DataCamp
Parallel computing
Basis of modern data processing tools
Necessary:
Mainly because of memory
How it works:
Split tasks up into several smaller subtasks
Advantages
Extra processing power
Disadvantages
Moving data incurs a cost
Communication time
Hadrien Lacroix
Content Developer
Cloud computing for data processing
Servers on premises Servers on the cloud
Bought Rented
Processing power unused at quieter times The closer to the user the better
Hadrien Lacroix
Content Developer at DataCamp
Actually, YOU are the champion!
How important it is
Parallel computing
Cloud computing