Azure SQL DWH: A Closer Look Into Microsoft 'S DWH Solution

Azure SQL DWH
A closer look into

Microsoft’s DWH solution
About me
 Shy Engelberg, CTO @
 Email : Shy@Valinor.co.il
 Phone : 054-771-711-5
 Twitter : @ShyEngelberg
2 | 4/4/2016 Azure SQL DWH

Agenda
 SQL DWH introduction

 Architecture
 Creating a DWH
 Loading data
 Tools
 Summary

Objectives
 Know what Azure SQL Data warehouse is

 Know how Azure SQL Data warehouse works
 Know how to create and connect to Azure
SQL Data warehouse
 Know the basics tools and methods to get
started with developing
 Identify scenarios that this solution might suit.

The data warehouse fairytale
 Once upon a time, data warehouse was an

appliance who required fixed combinations of
storage and compute,
often underutilizing
expensive resources.
Meaning
monstrous
hardware was
lying unused.

Meaning
monstrous
hardware was
lying unused.

Meaning
monstrous
hardware was
lying unused.
What is Azure SQL Data warehouse
 an enterprise-class, distributed database

capable of processing massive volumes of
relational and non-relational data.
 It is the industry's first cloud data warehouse

that combines proven SQL capabilities with
the ability to grow, shrink, and pause in
seconds.

 an enterprise-class,
capable of of
and data.
 It is the industry's first data warehouse

that combines proven with
the ability to in
seconds.

– Azure PaaS
– an MPP
– up to PBs
– a relational DB that can
query also non-relational data
– based on the product we know and
love
– use what you need, when
you need it.

 Easily deploys in seconds.

 Pay for query performance only when you
need it (or you can pause it completely)
 Fully managed service, removes the hassle
of software patching, maintenance, back-ups.

SQL Data Warehouse uses Microsoft’s

massively parallel processing (MPP)
architecture. You pay for time-to-insight, not
hardware. (details are a few slides ahead)

Using PolyBase, leverage Transact-SQL to

query seamlessly across both relational data in
a relational database and non-relational data in
common Hadoop formats.

SQL Data Warehouse is based on the proven

relational database engine of SQL Server and
includes the features you expect, including
stored procedures, UDF’s, partitioning, indexes,
and collations.
If you already know Transact-SQL, its easy to
transfer your knowledge to SQL Data
Warehouse.
You can grow or shrink compute power in

minutes. Take full advantage of storage at
cloud scale, and apply query compute based
on changing performance needs.
When compute is paused, you pay only for
storage.

Architecture
 At its core, SQL Data Warehouse uses

Microsoft’s massive parallel processing
(MPP) architecture, originally designed to run
some of the largest on-premises enterprise
data warehouses.

Architecture
 At its core, SQL Data Warehouse uses

Microsoft’s
architecture, originally designed to run
some of the largest on-premises enterprise
data warehouses.

Architecture – MPP
The coordinated processing of a program by

multiple processors working on different parts
of the program.
Each processor has its own operating system
and memory.
Mission-
process
a lot of
data
The SMP way The MPP way

Scale
for better
performance
The SMP way The MPP way

 Breaks a large queries across nodes for

simultaneous processing.
 Every node is “working” on a local subset of the
data.
 Capable of higher data ingestion rates through
parallelization.
 Scale horizontally by adding nodes, rather than
moving to a server with more CPUs or higher
storage capacity.
 Unlike SMP – there is no single bottleneck.
Architecture – Azure SQL Data warehouse
 SQL Data Warehouse independently scales

compute and storage.
 This concept is what allows us the ability to
pause compute, scale performance in
seconds, and pay only for the performance
we need.
 SQL Data Warehouse

.
 This concept is what allows us the ability to
, and
we need.
Compute node 1 Azure blob storage

Data management service
Control node (MPP engine)
SQL
Data management service Server
User Data
SQL Server
Compute node 2
Control DBs Master TempDB Data management service
SQL
Server
User Data
• “Controls" the system.
• It is the front end that interacts with all
applications and connections.
• powered by SQL Database, and
connecting to it looks and feels the
Data management service same.
• Under the surface, the Control node
SQL Server coordinates all of the data movement
and computation required to run parallel
queries on your distributed data.
Control DBs Master TempDB • When you submit a TSQL query to SQL
Data Warehouse, the Control node
transforms it into separate queries that
will run on each Compute node in
parallel.

SQL
User Data
SQL Server
Compute node 2
SQL
Server
User Data
 SQL Databases which Compute node 1

process your query Data management service
steps and manage To finish the
SQL
your data. Server query, the
User Data
Control node
 The Compute nodes aggregates the
are the workers that
run the parallel results and
queries on your data. returns the
SQL
Server final result.
 After processing, they User Data
pass the results back

to the Control node.

SQL
User Data
SQL Server
Compute node 2
SQL
Server
User Data
 Data Movement Service (DMS)
is our technology for moving
data between the nodes.
Data management service  DMS gives the Compute nodes

access to data they need for
joins and aggregations.
 DMS is not an Azure service. It

is a Windows service that runs
alongside SQL Database on all
the nodes.

SQL
User Data
SQL Server
Compute node 2
SQL
Server
User Data
 Data is stored in Azure Storage Blobs. Azure blob storage

 When Compute nodes interact with
data, they write and read directly to and
from blob storage.
 Since Azure storage expands
transparently and limitlessly, SQL Data
Warehouse can do the same.
 Since compute and storage are
independent, SQL Data Warehouse can
automatically scale storage separately
from scaling compute, and vice-versa.
Azure Storage is fully fault tolerant.
Architecture – scaling
 Since each compute node only works on a

subset of the data, if we want to scale, all we
need to do is add more compute nodes.
 The “Magic” is that we can add more
compute nodes without moving
(redistributing) the data.
 The scaling takes only a couple of minutes
(initializing the compute node)
Architecture – scaling
 Changing the amount of

compute is as simple as
moving a slider to the
left or right, but can also be
scheduled using T-SQL or PShell.
 Compute usage in SQL Data
Warehouse is measured
using SQL Data Warehouse Units (DWUs).
Architecture – data distribution
 All tables are distributed.

 Each distribution is like a bucket;
storing a unique subset of the data.
 For now, SQL DW has 60 distributions.
Each table is divided into 60 different distributions,

from the moment it’s created.
When there’s only one compute node, it holds all
distributions, when there’s more, the distributions
are spread among them.
Architecture – finally
 Bring all the data you want, pay only for the
storage.
 If you want to query your data (dahhh), pay
only for the compute you need, when you
need.
 Classic MPP design, scaling is almost linear.
 We don’t really need to know how many
nodes or distributes are there under the cover
– it’s a PaaS, we are guaranteed a certain
amount of performance.
Creating a DWH
DEMO
 Creating is simple as 1,2,3…
 DWH is defined in a “Server”
just like Azure SQL database.
 Pause and scale are a button away.
Creating a table – distribution
 When a table is created, it is spread across

all of the distributions.
 We need to choose how to distribute the
data:
 Hash
 Round-robin
(evenly but randomly,
default bahaviour)
Creating a table – type
 The default behavior is that a table is created

with a clustered column store index.
(which makes Azure SQL DWH, a columnar
database)
 Choose, what type of table during creation:
Creating a table – statistics
 Statistics are not created automatically,

we have to create them ourselves!
 Statistics are not updated automatically!

Connecting and creating a table
DEMO
 Add firewall rule
 Connect using SSDT
 Create a table
 Create statistics
Loading data
 SQL Data Warehouse supports many loading

methods:
 SSIS
The “Push” methods – a query
 BCP that goes through the “control
node”, which becomes a
 SQLBulkCopy API bottleneck. (single-client gated)
 Azure Data Factory
by far the fastest and most
 PolyBase scalable SQL Data Warehouse
loading method to date
Loading data – PolyBase
 PolyBase is a scalable, query processing

framework compatible with Transact-SQL that
can be used to combine and bridge data across
relational database management systems and
Azure Blob Storage.
 Currently PolyBase can load data from UTF-8
encoded delimited text files as well as the
popular Hadoop file formats RC File, ORC, and
Parquet.
PolyBase can load data from gzip, zlib and
Snappy compressed files.
 A “Pull” method.
 Every compute node,
has an HDFS bridge
in the DMS service.
Every bridge can
parallel connect to
external resources.
 PolyBase data loading is not limited by the

Control node, and so as you scale out your
DWU, your data transfer throughput also
increases.
 A recommended way of loading data:
1. write your source to CSV files
2. put the files in Azure Blob Storage
3. Load using PolyBase
DEMO
 Query using PolyBase
 CTAS to load data using PolyBase
Azure SQL Data warehouse – features
This is SQL Server inside – you know it all.
 Uses almost the same T-SQL.

 Supports views, stored procedures, partitions
and many other known and loved features.
 Built-in HADR (it’s a PaaS, remember?)
 Out-of-the-box backup and restore service.
Azure SQL Data warehouse – tools
 SSMS is not yet supported  (but very soon)

 Visual studio (SSDT) is supported 
 Integrates easily to Azure ML, PowerBI and
Data Factory.
 Many 3rd party solutions are available:
Azure SQL Data warehouse – Summary
 A relational columnar DWH.

 MPP service that allows us to scale compute in
separate from storage.
 We can pause the compute whenever needed.
 Using the infinite power of the cloud, we can
process as much data as we want, and use as
much power as we want.
 We don’t need to buy expensive hardware.
Azure SQL Data warehouse – Summary
Best for:
 Your data is already in Azure.
 You need scheduled computing power.
 You have a lot of data, but don’t want to
spend a lot of money.
Please fill online evaluation for both
speakers and overall event.
You have both links in the last EVENT UPDATE email:
Session evaluation form:

http://www.sqlsaturday.com/481/sessions/sessionevaluation.aspx
Overall event evaluation form:

http://www.sqlsaturday.com/481/EventEval.aspx
Special thanks to our great sponsors!

Azure SQL DWH: A Closer Look Into Microsoft 'S DWH Solution

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Azure SQL DWH: A Closer Look Into Microsoft 'S DWH Solution

Hochgeladen von

Copyright:

Verfügbare Formate

Azure SQL DWH

A closer look into

 Shy Engelberg, CTO @

2 | 4/4/2016 Azure SQL DWH

 SQL DWH introduction

3 | 4/4/2016 Azure SQL DWH

 Know what Azure SQL Data warehouse is

4 | 4/4/2016 Azure SQL DWH

 Once upon a time, data warehouse was an

 Once upon a time, data warehouse was an

 Once upon a time, data warehouse was an

 an enterprise-class, distributed database

 It is the industry's first cloud data warehouse

8 | 4/4/2016 Azure SQL DWH

 It is the industry's first data warehouse

9 | 4/4/2016 Azure SQL DWH

10 | 4/4/2016 Azure SQL DWH

 Easily deploys in seconds.

11 | 4/4/2016 Azure SQL DWH

SQL Data Warehouse uses Microsoft’s

12 | 4/4/2016 Azure SQL DWH

Using PolyBase, leverage Transact-SQL to

13 | 4/4/2016 Azure SQL DWH

SQL Data Warehouse is based on the proven

You can grow or shrink compute power in

15 | 4/4/2016 Azure SQL DWH

 At its core, SQL Data Warehouse uses

16 | 4/4/2016 Azure SQL DWH

 At its core, SQL Data Warehouse uses

17 | 4/4/2016 Azure SQL DWH

The coordinated processing of a program by

The SMP way The MPP way

The SMP way The MPP way

 Breaks a large queries across nodes for

 SQL Data Warehouse independently scales

 SQL Data Warehouse

Compute node 1 Azure blob storage

Control DBs Master TempDB Data management service

Compute node 1 Azure blob storage

Control DBs Master TempDB Data management service

 SQL Databases which Compute node 1

pass the results back

Compute node 1 Azure blob storage

Control DBs Master TempDB Data management service

Data management service  DMS gives the Compute nodes

 DMS is not an Azure service. It

Compute node 1 Azure blob storage

Control DBs Master TempDB Data management service

 Data is stored in Azure Storage Blobs. Azure blob storage

 Since each compute node only works on a

 Changing the amount of

 All tables are distributed.

Each table is divided into 60 different distributions,

 When a table is created, it is spread across

 The default behavior is that a table is created

 Statistics are not created automatically,

 Statistics are not updated automatically!

 SQL Data Warehouse supports many loading

 PolyBase is a scalable, query processing

 PolyBase data loading is not limited by the

This is SQL Server inside – you know it all.

 Uses almost the same T-SQL.

 SSMS is not yet supported  (but very soon)