Sie sind auf Seite 1von 96

Big Data & Machine Learning

1
1 <Start Training>
2
2

3
3

5
5

Module #1: Introducing


6
6

7
7

Google Cloud Platform


8
8

9
9
10
10
11
11
12
12
13
</Start Training>
13
14

14
15

15
16
Version #1.1
16
17

Cloud OnBoard
1

3
Agenda
5

6
Introduction to Google Cloud Platform
7

8 Quiz
9

10

11

12

13

14

15

16

17

18

1
Cloud OnBoard

Computing trends toward pay-as-you-go,


fully automated services
Now Next

Storage P rocessing Mem ory Network Storage P rocessing Mem ory Network

Serverless

User-configured, managed, and maintained Fully automated

Every company is
a data company

2
Cloud OnBoard
1

GCP offers a range of computing architectures


2

10
Kubernetes Cloud Managed
Compute Engine App Engine
11
Engine Functions services

12
IaaS Hybrid PaaS Serverless Automated elastic
13
logic resources

14

15

16

Toward managed infrastructure Toward dynamic infrastructure


17

18

Cloud OnBoard
1

2
Google network: 100,000s of km of fiber cable, 8 subsea cables
3

6 FASTER (US, JP, TW) 2016

10
Unity (US, JP) 2010
11
PLCN (HK, LA) 2019
SJC (JP, HK, SG) 2013
12

13
Network Monet (US, BR) 2017
14
Network sea cable investments
Junior (Rio, Santos) 2017
15
Edge points of presence >100
Tannat (BR, UY, AR) 2017
16
Edge node locations >1000

17 Indigo (SG, ID, AU) 2019

18

3
Cloud OnBoard
1

3
The Network Matters
5

7
Typical Cloud
8 Provider Cloud User
9 Provider
10

11

Google Google Google ISP User


12
Cloud Pop Pop
13 Google Cloud
14

15

16

17

18

Cur r ent r egion and

Google Cloud Platform


Cloud OnBoard num ber of zones
1
Futur e r egion and
num ber of zones
2 Announced Regions
3

5
3 Finland
Netherlands
6
3
London 3 3
Montreal 3
7 Oregon 3 3
Iowa 4 Belgium 3
Zürich
Tokyo 3 N. Virginia Frankfurt
8
Osaka 3 3 Los Angeles 3
3
S. Carolina
9 Hong Kong
3 3
3 Taiwan
10
Mumbai
11 3 Singapore

Jakarta 3
12

13
3

14
3 São Paulo

15 Sydney

16

17

Asia Pacific Americas Europe, Middle East, & Africa


18

4
Cloud OnBoard
1

Google offers customer-friendly


2

5 pricing innovations
6

8
Billing in sub-hour Discounts for Custom VM
increments sustained use instance types
9

10

11
For virtual machines and Automatically applied to Pay only for the resources
12
containers in the cloud; virtual machine use over you need for your application
13 data processing and other 25% of a month
14 services too
15

16

17

18

Cloud OnBoard
1

Security is designed into Google’s


2

5 technical infrastructure
6

7
Layer Notable security measures (among others)
8

9 Intrusion detection systems; techniques to reduce insider risk; employee U2F use; software
Operational security
development practices
10

11
Internet communication Google Front End; designed-in Denial of Service protection
12
Storage services Encryption at rest
13

14 User identity Central identity service with support for U2F

15
Service deployment Encryption of inter-service communication
16

17
Hardware infrastructure Hardware design and provenance; secure boot stack; premises security

18

5
Why choose Google Cloud
Platform?
Google Cloud Platform enables
developers to build, test, and
deploy applications on Google’s
highly secure, reliable, and
scalable infrastructure.

Cloud OnBoard
1

Review: Google Cloud Platform offers a range of


2

3
compute services
5
Compute
6

Compute Kubernetes App Engine Cloud


10
Engine Engine Functions
11

12

13

14

15

16

17

18

6
Cloud OnBoard
1

Google Cloud Platform offers a range of storage services


2

5
Compute Storage
6

Kubernetes Cloud Bigtable Cloud Cloud Cloud Cloud


Compute App Engine
10
Engine Functions Storage SQL Spanner Datastore
Engine
11

12

13

14

15

16

17

18

Cloud OnBoard
1

Google Cloud Platform offers services for getting


2

3
value from data
5
Compute Storage
6

9
Cloud Cloud Cloud
Compute Kubernetes App Engine Cloud Bigtable Cloud
SQL Spanner Datastore
10
Engine Engine Functions Storage
11

12
Big Data Machine Learning

13

14

15

16
BigQuery Pub/Sub Dataflow Dataproc Datalab Natural Vision API Machine Speech Translate
17 Language Learning API API
API
18

7
Cloud OnBoard
1

3
Agenda
5

6
Introduction to Google Cloud Platform
7

8 Quiz
9

10

11

12

13

14

15

16

17

18

Cloud OnBoard
1

5
Name some of Google Cloud
6
Platform’s pricing
7
innovations.
8

10

11

12
Name some benefits of using
13
Google Cloud Platform other
14 than its pricing.
15

16

17

18

8
Cloud OnBoard
1

3
More resources
5

6 Why Google Cloud Platform?


7
https://cloud.google.com/why-google/

Pricing philosophy
8

9 https://cloud.google.com/pricing/philosophy/
10

Data centers
11
https://www.google.com/about/datacenters/
12

13 Google Cloud Platform product overview


http://cloud.google.com/products/
14

15
Google Cloud Platform solutions
16 http://cloud.google.com/solutions/

17

18

Big Data & Machine Learning

1
1 <Start Training>
2
2

3
3

5
5

Module #2: Getting Started with


6
6

7
7

Google Cloud Platform


8
8

9
9
10
10
11
11
12
12
13
</Start Training>
13
14

14
15

15
16
Version #1.1
16
17

9
Cloud OnBoard
1

On- Infrastructure Platform as a Managed


2
Responsibility premises as a Service Service services
3
Content Cloud security
5
Access policies
requires collaboration
6
Usage
● Google is responsible
7
Deployment for Managing its
8
Web application security infrastructure Security.
9
Identity
10
● You are responsible for
Operations Securing your data.
11
Access and authentication
12
Network security
● Google helps you with best
13 practices, templates,
OS, data, and content
products, and solutions.
14
Audit logging
15
Network
16
Customer-mana ge d Google-manage d
Storage and encryption
17
Hardware
18

Cloud OnBoard
1

3
Agenda
5

6
Google Cloud Platform resource hierarchy
7

8 Identity and Access Management (IAM)


9

10
Interacting with Google Cloud Platform
11

Cloud Marketplace
12

13
Quiz
14

15

16

17

18

10
Cloud OnBoard
1

Projects organize resources


2

● Global resource collection


6
○ Track resource and quota usage
7 ○ Enable or disable services and APIs
8 ○ Control permissions and credentials
○ Enable billing account
9

10
● Provides an isolation boundary
11
between resources
12 ○ You can create explicit trust
13 across projects

14

● All Google Cloud Platform services


15
you use are associated with one
16
and only project
17

18

Cloud OnBoard
1

Resource hierarchy levels define


2

5 trust boundaries
6

7
● Group your resources with folders
8 and projects according to your
9 organization structure
10

11
● Levels of the hierarchy provide trust
boundaries and resource isolation
12

13

14

15

16

17

18

11
Cloud OnBoard
1

10

11

12

13

14

15

16

17

18

Cloud OnBoard
1

The organization node organizes projects


2

● The organization node is


6
the root node for Google
7
Cloud resources
8

9 ● Notable organization roles: jenny@example.com


10 ○ Organization Administrator Organization Admin
Broad control over all
11
cloud resources
12 ○ Project Creator
Fine-grained control of
13
project creation
14 Create
15

16 Ex Drive Ex Mail
robin@example.com
17 Project Creator
18

12
Cloud OnBoard
1

2
An example IAM resource hierarchy
3

Organization
5

● A policy is set on a resource


6
○ Each policy contains
7 a set of roles and
8 role members

9
Folders

● Resources inherit policies


10
from parent
Projects

11
○ Resource policies
12 are a union of parent
13 and resource

14

● A less restrictive parent


Resources

15
policy overrides a more
16
restrictive resource policy
17

18

Cloud OnBoard
1

3
Agenda
5

6
Google Cloud Platform resource hierarchy
7

8 Identity and Access Management (IAM)


9

10
Interacting with Google Cloud Platform
11

Cloud Marketplace
12

13
Quiz
14

15

16

17

18

13
Cloud OnBoard
1

Cloud Identity
2

7
● Integrate your cloud and on-premises
8
directories in one IDaaS platform
9

10 ● Single sign-on supports SAML 2.0,


11
OAuth 2.0 and OpenID
12

● Google grade security and scale


13

14
● Suspicious activity detection
15
○ Sessions management tools
16 ○ Security alerts
○ Multi-factor login support and
17
enforcement
18

Cloud OnBoard
1

Google’s Current Offering


2

7
Access
8 (SAML + OIDC)
9

10
Provisioning
Cloud Identity
11

12

13

14 sync
15

16

17

18

14
Cloud OnBoard
1

One independent platform to host and


2

5 manage identity
6

10

11

12

13

14

15

16

17

18

Cloud OnBoard
1

Managing Identity And Access


2

6
For managing users and For granting authorization
authentication to cloud resources
7

Cloud Identity
9
Cloud Console IAM
10
admin console

11

12
● User accounts ● Defining Identity and Access
13
Management roles
● Groups
14
● Authentication options for
15

16
developers

17

18

15
Cloud OnBoard
1

Each action in your environment needs to


2

5 answer 3 questions
6

10

11

12

13

14
who can do what on which resource
15

16

17

18

Cloud OnBoard
1

The most common ways to identify


2

5
who
users or machines are
6

8 Organization-managed users hosted of Google’s secure


9 IDaaP Including GSuite Users
10 you@domain.com
11

12 User managed Google account


13 test@gmail.com
14

15 Service account
16
test@project_id.iam.gserviceaccount.com
17

18

16
Cloud OnBoard
1

Service Account
2

Belongs to your application or a virtual machine (VM),


instead of to an individual end user
5

7
● Provide a machine identity for carrying out server-to-server/service
8
interactions
9

10 ● Default service accounts managed by Google <project_number>-


11
compute@developer.gserviceaccount.com

12
● User Defined Service Accounts
13
<name>@<project_ID>.iam.gserviceaccount.com
14 ○ Provide a meaningful name
15 ○ Use minimal privilege
16
○ Rotate keys periodically

17

18

Cloud OnBoard
1

Example: Service Accounts and IAM


2

● VMs running FrontEnd are granted Ex Mail Ex Drive


6
Editor access to project_b using
7
Service Account 1
8

9 ● VMs running BackEnd are granted Service


FrontEnd VM
10 objectViewer access to bucket_1 Account 1
Editor
11 using Service Account 2
12

● Service account permissions bucket_2


13

can be changed without


14 Service
recreating VMs BackEnd VM
Account 2
15
Storage.
16
objectViewer

17
bucket_1
18

17
Cloud OnBoard
1

3
There are three types of IAM roles
5
can do what
6

10

11

12

13

14

Primitive Predefined
15
Custom
16

17

18

Cloud OnBoard
1

IAM primitive roles apply across all GCP


2

5 services in a project
6

10

11

12

13

14
can do what on all project resources
15

16

17

18

18
Cloud OnBoard
1

IAM primitive roles offer fixed, coarse-


2

5 grained levels of access


6
Viewer Editor Owner Billing Admin Access
7

x x Manage billing
8

9 x x Add and remove administrators

10
x x x Read-only access A project can
11 have multiple
x x Configure services
12
owners, editors,
x x Modify code viewers, and billing
13
administrators.
x Deploy applications
14

15 x Invite members

16
x Remove members

17
x Delete projects
18

Cloud OnBoard
1

IAM predefined roles


2

A set of permissions that grouped together


5

7
can do what on resources in this project, folder, or org
8

10
InstanceAdmin Role
11
compute.instances.delete
12 compute.instances.get
13 compute.instances.list
14
compute.instances.setMachineType
15
compute.instances.start
compute.instances.stop
16

example.com
<service>.<resource>.<verb>
17

18

19
Cloud OnBoard
1

IAM Custom roles


2

lets you define a precise set of permissions


5

7
can do what on resources in this project, folder, or org
8

10
SecurityAudit Role
11
compute.instances.get
12 compute.instances.list
13 containers.pods.getLogs
14
appengine.instances.get
15
logging.logs.list

16

17 example.com
18

Cloud OnBoard
1

Google Groups Best Practices


2

7
● Assign permissions to groups rather than ● Create Groups for each team in your
8
individuals organisation
9

10 ● Make Groups own resources and projects ● Nest Groups for fine grain control
11
for continuity
12
● Groups can also contain service accounts

13

14
SecOps Developers NetOps
15

16

17
App A App B
18

20
Cloud OnBoard
1

3 Audit Logs
5

10

11
Cloud console
12
activity page

13

14

15 Stackdriver
logging
16

17

18

Cloud OnBoard
1

Principle of least privilege


2

9 Everybody Owner Organization


10

11

12

13
Security Admin Security Project A
Group A Admin Role
14

15

16

17

18

21
Cloud OnBoard
1

3
Agenda
5

6
Google Cloud Platform resource hierarchy
7

8 Identity and Access Management (IAM)


9

10
Interacting with Google Cloud Platform
11

Cloud Marketplace
12

13
Quiz
14

15

16

17

18

Cloud OnBoard
1

There are four ways to interact with GCP


2

Cloud Platform Cloud Shell and Cloud Console REST-based API


8

Console Cloud SDK Mobile App


9 For custom
10 Web user interface Command-line For iOS and Android applications
11 interface
12

13
>_
14

15

16

17

18

22
Cloud OnBoard
1

3
Google Cloud Platform Console
5

6
Google Cloud Platform Console ● Centralized console for all project data

● Centralized console for all project data ● Developer tools


8
○ Cloud Source Repositories
9
● Developer tools ○ Cloud Shell
10
○ Cloud Source Repositories ○ Test Lab (mobile app testing)
11
○ Cloud Shell
12 ○ Test Lab (mobile app testing) ● Access to product APIs
13

14
● Access to product APIs ● Manage and create projects

15

● Manage and create projects


16

17

18

Cloud OnBoard
1

3
Google Cloud SDK
5

6
● SDK includes CLI tools for Cloud Platform
products and services
7
○ gcloud, gsutil (Cloud Storage),
8
bq (BigQuery)
9

10
● Available as Docker image
11

12 ● Available via Cloud Shell


13
○ Containerized version of Cloud SDK
14
running on Compute Engine instance

15

16

17

18

23
Cloud OnBoard
1

RESTful APIs
2

6
● Programmatic access to products and services
○ Typically use JSON as an interchange format
7
○ Use OAuth 2.0 for authentication and authorization
8

9
● Enabled through the Google Cloud Platform Console
10

● Most APIs include daily quotas and rates (limits) that can be
11

raised by request
12

13
○ Important to plan ahead to manage your required capacity
14

15
● Experiment with APIs Explorer
16

17

18

Cloud OnBoard
1

3
Cloud Console Mobile App
5

6
Cloud Console Mobile App ● Manage virtual machines and
database instances
7

● Manage virtual machines and


8
database instances ● Manage apps in Google App Engine
9

10
● Manage apps in Google App Engine ● Manage your billing
11

12 ● Manage your billing ● Visualize your projects with a


13
customizable dashboard
14
● Visualize your projects with a
15
customizable dashboard
16

17

18

24
Cloud OnBoard
1

APIs Explorer
2

6
● The APIs Explorer is an interactive tool that lets you easily try Google
7
APIs using a browser.
8

9
● With the APIs Explorer, you can:
○ Browse quickly through available APIs and versions.
10
○ See methods available for each API and what parameters they support
11
along with inline documentation.
12
○ Execute requests for any method and see responses in real time.
13
○ Easily make authenticated and authorized API calls.
14

15

16

17

18

Cloud OnBoard
1

Client Libraries
2

6
● Cloud Client Libraries
○ Community-owned, hand-crafted client libraries
7

8
● Google API Client Libraries
9
○ Open source, generated
10
○ Support various languages
11
Java, Python, JavaScript, PHP, .NET, Go, Node.js, Ruby, Objective-C, Dart
12

13

14

15

16

17

18

25
Cloud OnBoard
1

3
Agenda
5

6
Google Cloud Platform resource hierarchy
7

8 Identity and Access Management (IAM)


9

10
Interacting with Google Cloud Platform
11

Cloud Marketplace
12

13
Quiz
14

15

16

17

18

Cloud OnBoard
1

3
GCP Marketplace gives quick access
5
to solutions
6

7
● A solution marketplace containing pre-
packaged, ready-to-deploy solutions
8
○ Some offered by Google
9
○ Others by third-party vendors
10

11
● You pay for the underlying GCP
12 resource usage.
13 ○ Some solutions also assess third-party
14
license fees.
15

16

17

18

26
Cloud OnBoard
1

3
Agenda
5

6
Google Cloud Platform resource hierarchy
7

8 Identity and Access Management (IAM)


9

10
Interacting with Google Cloud Platform
11

Cloud Marketplace
12

13
Quiz
14

15

16

17

18

Cloud OnBoard
1

3
Quiz
5
True or False: If a Google Cloud
6
IAM policy gives you Owner
7 permissions at the project level,
your access to a resource in the
8
project may be restricted by a more
9
restrictive policy on that
10 resource.
11

12 True or False: All Google Cloud


13
Platform resources are associated
with a project.
14

15

16

17

18

27
Cloud OnBoard
1

3
Quiz: Service Accounts
5
Service accounts are used to provide which of the following?
6

8
Authentication between Google Cloud Platform services
Key generation and rotation when used with App Engine and Compute Engine
9
A way to restrict the actions a resource (such as a VM) can perform
10 A way to allow users to act with service account permissions
11
All of the above

12

13

14

15

16

17

18

Cloud OnBoard
1

3
More resources
5

6 Google Cloud Platform security


7
https://cloud.google.com/security/

8
Configuring permissions
9 https://cloud.google.com/docs/permissions-overview
10

Identity and Access Management (IAM)


11
https://cloud.google.com/iam/
12

13 Cloud SDK installation and quick start


https://cloud.google.com/sdk/#Quick_Start
14

15
gcloud tool guide
16 https://cloud.google.com/sdk/gcloud/
17

18

28
Big Data & Machine Learning

1
1 <Start Training>
2
2

3
3

5
5

Module #3: Data Analysis


6
6

7
7

on the Cloud
8
8

9
9
10
10
11
11
12
12
13
</Start Training>
13
14

14
15

15
16
Version #1.1
16
17

Cloud OnBoard
1

3
Agenda
5

7
Google Cloud Big Data products
8

9 Stepping stones to transformation


10

11

Your SQL database in the cloud


12

13

14 Managed Hadoop in the cloud


15

16

17

18

29
Cloud OnBoard
1

Google’s mission is to organize


7

9
the world’s information and
make it universally accessible
10

11

12

13
and useful
14

15

16

17

18

Cloud OnBoard
1

To organize the world’s


7

9
information,Google has been
building the most powerful
10

11

12

13
infrastructure on the planet
14

15

16

17

18

30
Cloud OnBoard
1

2
In terms of software, organizing the world’s information
3
has meant that Google needed to invent data processing methods
5

7 Flume

8
MapReduce Dremel Millwheel TensorFlow

10
GFS Megastore TPU
11
Bigtable Colossus Pub/Sub
12 Spanner F1
13

14

15

16 2002 2004 2006 2008 2010 2012 2014 2016 2018

17

http://research.google.com/pubs/ papers .html


18

Cloud OnBoard
1

2
Google Cloud opens up that innovation and infrastructure to you
3

Dataflow
8
Dataproc BigQuery Dataflow ML Engine Auto ML
9

10

11 Cloud Storage Datastore


Bigtable Cloud Storage Pub/Sub Cloud Spanner
12

13

14

15

16 2002 2004 2006 2008 2010 2012 2014 2016 2018

17

18

31
Cloud OnBoard
1

2
A suite of products that can be put together for data processing
3

Data-handling
Foundation Databases Analytics and ML
6
frameworks
7

8
Compute Cloud BigQuery Cloud Pub/Sub
9 Engine Spanner

10

Cloud Cloud
11
Cloud SQL Cloud Dataflow
Storage Datalab
12

13
Cloud ML APIs Cloud Dataproc
14 Bigtable

15

16

17 ...
18

Big Data & Machine Learning


Cloud OnBoard
1

2
1 Spotify illustrates the typical journey of companies that come to
3
2 Google Cloud: From lower costs to increased reliability to business
5
3 transformation
6
5

Spend less
1
7
6

8 No-ops, Pay
7
for use, Secure
9
8

10
9

Flexible
2
11
10

12
Complete
11

13
12

1413

1514
Innovative

1615
Powerful 3
1716

1817

32
Big Data & Machine Learning
Cloud OnBoard
1

2
1 A suite of products that can be put together for data processing
3
2

5
3

6
5 Improve scalability
Change where you compute Change how you compute
7 and reliability
6

8
7

9
8

10
9

11
10

12
11

13
12

1413

1514

1615

1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1 FIS was able to improve reliability and scalability
3
2 on a massive data-processing challenge
5
3

6
5

7
6

10 BN
8

6 BILLION 1.7 GIGs


7

9 1.7 GIGABYTES
8
PER SECOND
WRITTEN PER SECOND
10
9 MARKET EVENTS PER HOUR
10 TERABYTES
11
10 WRITTEN PER HOUR 6 TBs
PER HOUR
BURSTS
PER HOUR
12
11

13
12

1413

1514
The Consolidated Audit Trail (CAT) is a data repository of all equities and options
1615 orders, quotes, and events; FIS processed the CAT to organize 100 billion market events
into an “order lifecycle” in a 4-hour window using Cloud Bigtable.
1716

1817

33
Big Data & Machine Learning
Cloud OnBoard
1

2
1 Rooms to Go transformed its business with data and machine learning
3
2

5
3

6
5

7
6

8
7
Google Analytics
completely
9 Rooms Premium designed room
8 Collect
to Go landing pages, Combine
10
9
data
views
data
packages
11
10
BigQuery
Analyze
12
11

13
12

CRM
1413
Customer Relationship Manager
1514 customer demographics, past purchases

1615

1716
https://www.thinkwithgoogle.com/case-studies/rooms-to-go-improves-the-shopper-experience.html

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1 In summary, Google Cloud offers you ways to…
3
2

5
3

6
5

7
6

8
7

9
8

10
9

11
10

12 Spend less Incorporate real- Apply machine Become a truly


11

13 on ops and time data into learning broadly data-driven


12

1413
administration apps and and easily company
architectures
1514

1615

1716

1817

34
Cloud OnBoard
1

3
Agenda
5

7
Google Cloud Big Data products
8

9 Stepping stones to transformation


10

11

Your SQL database in the cloud


12

13

14 Managed Hadoop in the cloud


15

16

17

18

Big Data & Machine Learning


Cloud OnBoard
1

2
1

3
Google Cloud Platform began in 2008, with App Engine,
2

5 a serverless way to run web applications


3

6
5

7
6

8
7

9
8

10
App Engine
9

11
10
Your code
12
11
2
13
12
Upload
1413 1 3
1514 Develop Autoscales Reliable
1615

http://googleappengine.blogspot. com/20 08/04/ introd ucing- google -app-e ngine- our-ne w.html
1716
http://googleappengine.blogspot. com/20 13/05/ the-go ogle-a pp-eng ine-bl og-is- moving .html
1817

35
Big Data & Machine Learning
Cloud OnBoard
1

2
1

3
2

5
3 App Engine

6
5
App Engine
7 Flex
6

8
7 Kubernetes
Engine
9
8

10
9

11
10
Compute
12 Engine
11
There [was] something fundamentally
13
12
wrong with what we were doing in 2008
1413 … We didn't get the right stepping
1514
stones into the cloud …
1615 -- Eric Schmidt, Executive Chairman, Google

1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1 GCP now consists of a suite of products that together provide these
3
2 stepping stones in a business’ transformative journey
5
3

6
5

7 Flexibility, scalability
6
Change where you compute Change how you compute
8
and reliability
7

9
8

10
9

11
10

12
11

13
12

1413

Cost effective virtual machines, Reliable, autoscaling messaging, Fully managed products for data
1514
storage, Hadoop, and MySQL to data processing, and storage. warehousing, data analysis,
1615 migrate your current workloads to streaming, and machine learning.
the public cloud.
1716

1817

36
Big Data & Machine Learning
Cloud OnBoard
1

2
1

3
2
Machine learning. This is the next
transformation … the programming
5
3

6
5

7
paradigm is changing. Instead of
6

8
7
programming a computer, you teach a
9
8 computer to learn something and it
does what you want.
10
9

11
10

12
11
Eric Schmidt,
13
12
Executive Chairman,
1413
Google
1514

1615

1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1

3
WIRED’s headline
2

5
3

6 “If you want to teach a neural network to


5

7
recognize a cat, for instance, you don’t
6

8
tell it to look for whiskers, ears, fur,
7
and eyes. You simply show it thousands
9
8
and thousands of photos of cats, and
10
9
eventually it works things out.”
11
10

12
11

13
12

1413

1514

1615

1716

1817

37
Big Data & Machine Learning
Cloud OnBoard
1

2
1

3
Machine Learning is not new,
2

5 but it is now mainstream


3

6
5

7 Search
6

8 People who bought ...


7

9
Spam filtering
8

10
Suggest next video
9

11
Route planning
10
Smart Reply
12
11

13
12

1413
What’s common to all of
1514
? these use cases of Machine
1615 Learning?
1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1 There are three components in a recommendation system
3
2

5
3 Rating Training Recommending
6
5

7
6 Users rate a few houses A machine learning model is For each user, the model is
8 explicitly or implicitly created to predict a user’s applied to every unrated
7
rating of a house house and the top 5 houses
9
8
for that user are saved.
10
9

11
10

12
11

13
12

1413

1514

?
1615
What else is needed?
1716

1817

38
Big Data & Machine Learning
Cloud OnBoard
1

2
1 The ML algorithm essentially clusters users and items
3
2

1 2
5
3 Who is like this user? Is this a good house?
6
5

7
6

8
7

9
8

10
9

11
10

12
11
How often do you need to compute
13
12

1413
3 Predict rating
Is this house similar to houses that
? the predicted ratings?

people similar to this user like? Where would you save them?
1514
Predicted rating = user-preference *
1615 item-quality
1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1 In addition to the ML algorithm, you also need
3
2 sophisticated data management
5
3

6
5

7
6 Data Collection Scalable front end to collect customer actions
8
7

9
8

10
9
Data Analysis Data that is accessible and not silo-ed
11
10

12
11

Machine Learning (Re-)training and experimentation


13
12

1413

1514 Scalable, real-time system to serve


Serving recommendations
1615

1716

1817

39
Cloud OnBoard
1

3
Agenda
5

7
Google Cloud Big Data products
8

9 Stepping stones to transformation


10

11

Your SQL database in the cloud


12

13

14 Managed Hadoop in the cloud


15

16

17

18

Big Data & Machine Learning


Cloud OnBoard
1

2
1 Choose your storage solution based on your access pattern
3
2

5
Cloud
3
Cloud SQL Datastore Bigtable BigQuery
6 Storage
5

7
6
Capacity Petabytes + Gigabytes Terabytes Petabytes Petabytes
8
7
Access Like files in a Relational Persistent Key-value(s),
9
metaphor file system database Hashmap Relational
8
HBase API
10
9
Have to copy to Filter objects
Read SELECT rows scan rows SELECT rows
local disk on property
11
10

12
11
Write One file INSERT row put object put row Batch/stream
13
12

1413 Update An object


Field Attribute Row Field
granularity (a “file”)
1514
No-ops, high
No-ops SQL Structured Interactive SQL*
1615 throughput,
Usage Store blobs database on data from querying fully
scalable,
the cloud AppEngine apps managed warehouse
1716
flattened data
1817

40
Big Data & Machine Learning
Cloud OnBoard
1

2
1 Cloud SQL is a fully managed database service
3
2

5
3 Flexible pricing
6
5

7
6 Familiar
8
7

9
8
Managed backups
10
9
Cloud SQL
11
10
Google-managed Automatic replication
12
MySQL or Postgres
11

13
Fast connection from GCE & GAE
12

1413

Connect from anywhere


1514

1615
Google Security
1716

1817

Cloud OnBoard
1

3
Agenda
5

7
Google Cloud Big Data products
8

9 Stepping stones to transformation


10

11

Your SQL database in the cloud


12

13

14 Managed Hadoop in the cloud


15

16

17

18

41
Big Data & Machine Learning
Cloud OnBoard
1

2
1

3
There is a rich open-source ecosystem for big data
2

5
3

6
5

7
6

8
7

9
8

10
9

11
10

12
11

13
12

1413
http://hadoop.apache.org/
http://pig.apache.org/
1514 http://hive.apache.org/
1615
http://spark.apache.org/

1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1

3
How do you process large amounts of data?
2

5
3

6
5

7
6

8
7

9
8

10
9

11
10

12
11

13
12

1413

1514

1615

1716

1817

42
Big Data & Machine Learning
Cloud OnBoard
1

2
1

3
Typical Spark and Hadoop deployments involve
2

5
3

6
5

7
6

8
7

9
8

10
9

11
10

12
11

13
12

1413

1514

1615

1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1

3
Typical Dataproc deployments involve...
2

5
3
Scale anytime
6
5

7
6

8
7

9
8

10
9

11
10

12
11 Create cluster Configure cluster Use cluster
13
12

1413
(0 seconds) (20 seconds) (90 seconds)
1514

1615

1716

1817

43
Big Data & Machine Learning
Cloud OnBoard
1

2
1 Dataproc reduces the cost and complexity associated with
3
2 Spark and Hadoop clusters
5
3

6
5 Image Versioning
7
6

8
7 Familiar
9
8

10 Dataproc Resize in seconds


9

11
10
Google-managed:
Hadoop Automated cluster mgmt
12
11
Pig
13
12
Hive Integrates with Google Cloud
1413
Spark
1514 Flexible VMs
1615

1716
Google Security

1817

Big Data & Machine Learning

1
1

2
2

3
3

5
5

6
6

7
7

Module Review
8
8

9
9
10
10
11
11
12
12
13
13
14

14
15

15
16

16
17

44
Big Data & Machine Learning
Cloud OnBoard
1

2
1

3
Module review (1 of 2)
2

5
3

6
Relational databases are a good choice when you need:
5

7 (select all of the correct options)


6

8
7

9
Streaming, high-throughput writes
8

10 Fast queries on terabytes of data


9

11
10
Aggregations on unstructured data
12
11 Transactional updates on relatively small datasets
13
12

1413

1514

1615

1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1

3
Module review (2 of 2)
2

5
3

6
Cloud SQL and Cloud Dataproc offer familiar tools (MySQL and
5
Hadoop/Pig/Hive/Spark). What is the value-add provided by Google Cloud Platform?
7
6

8
(select all of the correct options)
7

9
8

10
It’s the same API, but Google implements it better
9

11 Google-proprietary extensions and bug fixes to MySQL, Hadoop, and so on


10

12
11
Fully-managed versions of the software offer no-ops
13
12
Running it on Google infrastructure offers reliability and cost savings
1413

1514

1615

1716

1817

45
Big Data & Machine Learning

2
Resources
3

5
Cloud SQL https://cloud.google.com/sql/
6

7
Cloud Dataproc https://cloud.google.com/dataproc/
8
Cloud Solutions https://cloud.google.com/solutions/
9

10 http://gcp.solutions/
11

12

13

14

15

16

17

Big Data & Machine Learning

1
1 <Start Training>
2
2

3
3

5
5

Module #4: Scaling Data


6
6

7
7

Analysis
8

9
9

10
10

11
11

12
12

13
</Start Training>
13
14
14

15
15

16
16 Version #1.1
17

46
Cloud OnBoard
1

3
Agenda
5

7
Fast random access
8

9 Warehouse and interactively query petabytes


10

11

Interactive, iterative development + Demo


12

13

14

15

16

17

18

Big Data & Machine Learning


Cloud OnBoard
1

2
1

3
Choosing where to store data on GCP
2

5
3

6
5 unstructured structured
7
6

8
7
Transactional Data analytics
9
8 workload workload
Firebase Cloud
10
9 Storage Storage No-SQL Millisecond
11 SQL Latency
10

Cloud
12
11
Bigtable
13 Cloud
12
SQL
1413
Latency in
1514 Cloud Horizontal seconds
Spanner scalability Firebase Cloud BigQuery
1615
Realtime DB Datastore
1716

1817

47
Big Data & Machine Learning
Cloud OnBoard
1

2
1
Cloud Spanner is horizontally scalable and globally consistent
3
2

5
3

6
5
Typical workloads
• Transactional
7
6

8
7

9
• Scale-out
8

10
9
• Global data plane
11
10 • Database consolidation
12
11
Cloud
13
12 Spanner
Client libraries in popular languages
1413

1514
• Java, Python, Go, Node.js
1615 • JDBC driver
1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1

3
Use cloud spanner if you need globally consistent data or more
2

5 than one Cloud SQL instance


3

6
5

7
6

8
7

9
8

10
9

11
10

12
11

13
12

1413

1514
Source:
1615
https://quizlet.com/blog/
1716
quizlet-cloud-spanner

1817

48
Big Data & Machine Learning
Cloud OnBoard
1

2
1

3
Comparing storage options: technical details
2

5
3

Cloud Cloud Cloud


6
5
Bigtable Cloud SQL BigQuery
7
6
Datastore Storage Spanner
8
7
Type NoSQL NoSQL Blobstore Relational Relational Relational
9
8 document wide column SQL for OLTP SQL for OLTP SQL for OLAP
10
9
Transactions Yes Single-row No Yes Yes No
11
10

12
11
Complex No No No Yes Yes Yes
queries
13
12

1413 Capacity Terabytes+ Petabytes+ Petabytes+ 500 GB Petabytes Petabytes+


1514
Unit size 1 MB/entity ~10 MB/cell 5 TB/object Determined 10,240 MiB/ 10 MB/row
1615
~100 MB/row by DB engine row
1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1

3
Comparing storage options: use cases
2

5
3 Cloud Cloud Cloud
Bigtable Cloud SQL BigQuery
Datastore Storage Spanner
6
5

7
6
Type NoSQL NoSQL Blobstore Relational Relational Relational
8
7 document wide column SQL for OLTP SQL for OLTP SQL for OLAP
9
8

10
Best for Getting “Flat” data, Structured Web Large-scale Interactive
9
started, App Heavy and frameworks, database querying,
11
10 Engine read/write, unstructured existing applications offline
12
applications events, binary or applications (> ~2 TB) analytics
11
analytical object data
data
13
12

1413

Use cases Getting AdTech, Images, User Whenever Data


1514
started, App Financial large media credentials, high I/O, warehousing
1615 Engine and IoT data files, customer global
applications backups orders consistency
1716
is needed
1817

49
Big Data & Machine Learning
Cloud OnBoard
1

2
1

3
Choosing where to store data on GCP
2

5
3

6
5 unstructured structured
7
6

8
7
Transactional Data analytics
9
8 workload workload
Firebase Cloud
10
9 Storage Storage No-SQL Millisecond
11 SQL Latency
10

Cloud
12
11
Bigtable
13 Cloud
12
SQL
1413
Latency in
1514 Cloud Horizontal seconds
Spanner scalability Firebase Cloud BigQuery
1615
Realtime DB Datastore
1716

1817

Cloud OnBoard
1

2
Cloud Bigtable features
3

5 ● Global Availability
6 Place your service and data where you want
7
it, with available regions located around the
8
world.

9
● Security & Permissions
10
Data is encrypted both in-flight and at rest.
11
Enjoy full control over access to data stored
12
in Cloud Bigtable.
13

14 ● Redundant Autoscaling Storage


15 Bigtable is built leveraging a redundant
16
internal storage strategy for high durability—
all managed for you.
17

18

50
Cloud OnBoard
1

3
Bigtable: Scalable, fast NoSQL with auto-balancing
5 Big Fast NoSQL
6

Large quantities (>1 TB) Data that is rapidly No required transactions,


7
of semi-structured or changing, or with a high or strong relational
8 structured data throughput semantics
9

10
and especially good for...
11

12
Time-series data, or data Data involved with Asynchronous batch
13 with a natural semantic machine learning operations or real-time
14
ordering algorithms processing
15
Time Series Machine Learning Big Data
16

17

18

Big Data & Machine Learning


Cloud OnBoard
1

2
1 Bigtable is meant for high throughput data where access is primarily
3
2 for a range of Row Key prefixes
5
3

6
5

7
6
Row Key Column data
8
7

MD:SYMBOL: MD:LASTSALE: MD:LASTSIZE: MD:TRADETIME: MD:EXCHANGE:


9
NASDAQ#1426535612045
8
ZXZZT 600.58 300 1426535612045 NASDAQ
10
9

11 ... ... ... ... ... ...


10

12
11

13
12 Tables should be tall and narrow
1413
Store changes as new rows

1514
Bigtable will automatically
1615 compact the table
1716

1817

51
Cloud OnBoard
1

3
Bigtable separates processing and storage
5

6
Clients Client Client Client Client Client Client
7

9
Fully managed Cluster
10

11
Processing Bigtable Bigtable Bigtable
12 Node Node Node
13

14

15

Storage Colossus Filesystem


16

Shard tables
Table Table Table …...
17
into contiguous t t t
18 rows

Cloud OnBoard
1

3
Cloud Bigtable learns access patterns...
5

6 Clients Client Client Client Client Client Client


7

10
Processing A B C D E
11

12
Node Node Node
13

14

15
Storage
16 Filesystem
17

18

52
Cloud OnBoard
1

3
...and rebalances data accordingly
5

6
Clients Client Client Client Client Client Client
7

10 Processing A B C D B E C
11
Node Node Node
12

13

14

15 Storage
16
Filesystem
17

18

Big Data & Machine Learning


Cloud OnBoard
1

2
1

3
Comparing storage options: technical details
2

5
3

Cloud Cloud Cloud


6
5
Bigtable Cloud SQL BigQuery
7
6
Datastore Storage Spanner
8
7
Type NoSQL NoSQL Blobstore Relational Relational Relational
9
8 document wide column SQL for OLTP SQL for OLTP SQL for OLAP
10
9
Transactions Yes Single-row No Yes Yes No
11
10

12
11
Complex No No No Yes Yes Yes
queries
13
12

1413 Capacity Terabytes+ Petabytes+ Petabytes+ 500 GB Petabytes Petabytes+


1514
Unit size 1 MB/entity ~10 MB/cell 5 TB/object Determined 10,240 MiB/ 10 MB/row
1615
~100 MB/row by DB engine row
1716

1817

53
Big Data & Machine Learning
Cloud OnBoard
1

2
1

3
Comparing storage options: use cases
2

5
3 Cloud Cloud Cloud
Bigtable Cloud SQL BigQuery
Datastore Storage Spanner
6
5

7
6
Type NoSQL NoSQL Blobstore Relational Relational Relational
8
7 document wide column SQL for OLTP SQL for OLTP SQL for OLAP
9
8

10
Best for Getting “Flat” data, Structured Web Large-scale Interactive
9
started, App Heavy and frameworks, database querying,
11
10 Engine read/write, unstructured existing applications offline
12
applications events, binary or applications (> ~2 TB) analytics
11
analytical object data
data
13
12

1413

Use cases Getting AdTech, Images, User Whenever Data


1514
started, App Financial large media credentials, high I/O, warehousing
1615 Engine and IoT data files, customer global
applications backups orders consistency
1716
is needed
1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1
Throughput can be controlled by node count
3
2

5
3 Nodes
6
Node Node Node Node Node Node Node Node
5
Node Node Node Node Node Node Node Node
7
6
4,000,000
Node Node Node Node Node Node Node Node
8
7 Node Node Node Node Node Node Node Node

9
8
Node Node Node Node Node Node Node Node
3,000,000
Node Node Node Node Node Node Node Node
10
9
Node Node Node Node Node Node Node Node
QPS
11
10 Node Node Node Node Node Node Node Node 2,000,000
12 Node Node Node Node Node Node Node Node
11
Node Node Node Node Node Node Node Node
13
12
Node Node Node Node Node Node Node Node
1,000,000
1413 Node Node Node Node Node Node Node Node

1514
Node Node Node Node Node Node Node Node
0
1615
Node Node Node Node Node Node Node Node
0 100 200 300 400
Node Node Node Node Node Node Node Node

1716 Bigtable Nodes


1817

54
Cloud OnBoard
1

3
Agenda
5

7
Fast random access
8

9 Warehouse and interactively query petabytes


10

11

Interactive, iterative development + Demo


12

13

14

15

16

17

18

Big Data & Machine Learning


Cloud OnBoard
1

2
1 BigQuery is a fully managed data warehouse that lets you do ad-hoc
3
2 SQL queries on massive volumes of data
5
3

6
5
BigQuery Service
7
6

8
7

9
8

10 Project X Project Y
9

11
10 Dataset A Dataset B Dataset C Dataset D

12
11

13 Table 1 Table 1 Table 1 Table 1


12

1413

1514
Table 2 Table 2 Table 2 Table 2
1615

1716

1817

55
Big Data & Machine Learning
Cloud OnBoard
1

2
1 A demo of BigQuery on a 10 billion-row dataset shows what it is
3
2 and what it can do
5
3

6
5
#standardsql
7 SELECT Familiar, SQL 2011 query
language
6
language, SUM(views) as views
Interactive ad-hoc analysis
8
7
FROM `bigquery-samples.wikipedia_benchmark.Wiki10B`
9 WHERE of petabyte-scale databases
8

title like "%google%" No need to provision


clusters
10
9
GROUP by language
11
10 ORDER by views DESC
12
11

13
12

1413

1514

1615

1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1 Three ways of loading data into BigQuery
3
2

5
3

6
5 Files on disk or Cloud Stream Data Federated data source
7
6
Storage
8
7

9
8

10
9
CSV
11
10 JSON
AVRO
Google
12
11

Sheets
Serverless
13
12
POST
1413 ETL
1514

1615

1716

1817

56
Cloud OnBoard

Tables and jobs


Project (billing, top-level container) A project contains users and datasets
Use Project to:
Dataset (organization, access control) ● Limit access to datasets and jobs
● Manage billing

Table (data with schema)


A dataset contains tables and views
● Access Control Lists for Reader/Writer/Owner
● Applied to all tables/views in dataset

A table is a collection of columns


● Columnar storage
● Views are virtual tables defined by SQL query
Job (query, import, export, copy) ● Tables can be external (e.g., on Cloud Storage)

A job is a potentially long-running action


● Can be canceled

Cloud OnBoard

BigQuery storage is columnar

Relational database BigQuery Storage

Record-oriented storage Each column in separate, compressed,


Supports transactional updates encrypted file that is replicated 3+ times
No indexes, keys or partitions required
For immutable, massive datasets

57
Big Data & Machine Learning
Cloud OnBoard
1

2
1

3
Comparing storage options: technical details
2

5
3

Cloud Cloud Cloud


6
5
Bigtable Cloud SQL BigQuery
7
6
Datastore Storage Spanner
8
7
Type NoSQL NoSQL Blobstore Relational Relational Relational
9
8 document wide column SQL for OLTP SQL for OLTP SQL for OLAP
10
9
Transactions Yes Single-row No Yes Yes No
11
10

12
11
Complex No No No Yes Yes Yes
queries
13
12

1413 Capacity Terabytes+ Petabytes+ Petabytes+ 500 GB Petabytes Petabytes+


1514
Unit size 1 MB/entity ~10 MB/cell 5 TB/object Determined 10,240 MiB/ 10 MB/row
1615
~100 MB/row by DB engine row
1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1

3
Comparing storage options: use cases
2

5
3 Cloud Cloud Cloud
Bigtable Cloud SQL BigQuery
Datastore Storage Spanner
6
5

7
6
Type NoSQL NoSQL Blobstore Relational Relational Relational
8
7 document wide column SQL for OLTP SQL for OLTP SQL for OLAP
9
8

10
Best for Getting “Flat” data, Structured Web Large-scale Interactive
9
started, App Heavy and frameworks, database querying,
11
10 Engine read/write, unstructured existing applications offline
12
applications events, binary or applications (> ~2 TB) analytics
11
analytical object data
data
13
12

1413

Use cases Getting AdTech, Images, User Whenever Data


1514
started, App Financial large media credentials, high I/O, warehousing
1615 Engine and IoT data files, customer global
applications backups orders consistency
1716
is needed
1817

58
Big Data & Machine Learning
Cloud OnBoard
1

2
1

3
BigQuery offers standard SQL functions
2

5
3

6 ● Standard SQL provides many functions:


5

7
6
○ Aggregate functions
8
7
○ String functions
9
8 ○ Analytic (window) functions
10
9 ○ Datetime functions
○ Array functions
11
10

○ Other functions and operators


12
11

13

● BigQuery also supports UDF(User Defined Function)


12

1413

1514
○ SQL or Javascript
1615

1716

1817

Cloud OnBoard

Example architecture for data analytics

Tableau

QlikView

Build a mobile gaming analytics platform - a reference architecture

59
Cloud OnBoard

BigQuery is a great choice because...

Near-real time No-ops; Durable Immutable Mashing up


analysis of Pay for use (replicated), audit logs different
massive inexpensive datasets to
datasets storage derive insights

Cloud OnBoard
1

3
Agenda
5

7
Fast random access
8

9 Warehouse and interactively query petabytes


10

11

Interactive, iterative development + Demo


12

13

14

15

16

17

18

60
Big Data & Machine Learning
Cloud OnBoard
1

2
1 Increasingly, data analysis and machine learning are carried
3
2 out in self-descriptive, shareable, executable notebooks
5
3

6
5

7
Share
6
Code
8
7

9
8

10
9
A typical notebook
11
10 contains code,
12
Output charts, and
11
explanations
13
12

1413

1514

1615 Image Source:


Markup Git Logo from
1716 Wikipedia

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1 Datalab is an open-source notebook built on Jupyter (IPython)
3
2

5
3

Datalab is free—just pay


5

Analyze data in BigQuery,


7
for Google Cloud resources
6
Compute Engine or Cloud Storage
8
7

9
8

10
9

11
10

12
11

13
12

1413
Use existing
Python packages
1514

1615

1716

1817

61
Big Data & Machine Learning
Cloud OnBoard
1

2
1 Datalab notebooks are developed in an iterative, collaborative process
3
2

5
3 PHASE 5 PHASE 1
6
5 Share and Write code in 2
7 collaborate Python 5 5
6

1
7

9
8
Development
10
9 Process in
11 Cloud Datalab 3
10 PHASE 4 PHASE 2

12
11 Write Run cell
commentary in (Shift+Enter)
4
13
12 markdown

1413

1514
PHASE 3
1615
Examine Output
1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1 Datalab supports BigQuery
3
2

5
3

6
5

7
6

8
7

9
8

10
9

11
10

12
11

13
12

1413

1514

1615

1716

1817

62
Big Data & Machine Learning

1
1

2
2

3
3

5
5

6
6

7
7

Module Review
8
8

9
9
10
10
11
11
12
12
13
13
14

14
15

15
16

16
17

Big Data & Machine Learning


Cloud OnBoard
1

2
1 Module review
3
2

5
3

6
Match the use case on the left with the product on the right
5

7
6

8
7
Global consistency needed 1. Datalab
9
8
High-throughput writes of wide-column data 2. BigTable
10
9

Warehousing structured data 3. BigQuery


11
10

12
11

13
Develop Big Data algorithms interactively in Python 4. Spanner
12

1413

1514

1615

1716

1817

63
Big Data & Machine Learning

1
1 <Start Training>
2
2

3
3

5
5

6
6

Module #5: Machine Learning


7
7

8
8

9
9
10
10
11
11
12
12
13
</Start Training>
13
14

14
15

15
16
Version #1.1
16
17

Cloud OnBoard
1

3
Agenda
5

8 Machine learning with TensorFlow + Demo


9

Pre-built machine learning models + Demo


10

11

12

13

14

15

16

17

18

64
Big Data & Machine Learning
Cloud OnBoard

Stage 1: Train an ML model with examples


1

5
“cat”

6
Make tiny adjustments to model
function so output is closer to
7
“dog” label for a given input
8

10
OUTPUT
“car”
11

12

“apple”
13
A ML model is a
14
mathematical
15 function
label, input
16

17

Big Data & Machine Learning


Cloud OnBoard
1

2
1
Stage 2: Predict with a trained model
3
2

5
3

6
5

7
6

8
7

9
8

10
9
“cat”
11
10

12
11

13
unlabeled photo
12

1413

1514

1615

1716

1817

65
Big Data & Machine Learning
Cloud OnBoard
1

2 Stage 1 & 2 Using Tensorflow and Cloud ML Engine


1

3
2

5
3
Training Prediction
6
5

7
6
“cat”
8
7

9
8
“dog” Cloud ML Engine
10
9
Model training
11
10
“car” Model
12
11

13
12

1413
“apple”

1514

1615 “cat”
1716

1817

Cloud OnBoard
1

In supervised learning, you have labels


3

6 Income vs Job Female


7
Unsupervised Tenure Supervised Restaurant Tips by Gender Male

8
Learning Learning
Tip Amount

9
Income

10

11

12

13
Example mode:
14
Clustering
Is this employee
15
on the "fast-
Years at company Total bill amount
16 track" or not?
In unsupervised learning the In supervised learning, we are learning from past
17
data is not labeled examples to predict future values
18
213

66
Big Data & Machine Learning
Cloud OnBoard

Supervised learning Example :Regression - Predicting Taxi fare


1
1
2
Amount Gather Gather training data
(labels) A Data (input features and labels)
3

5
2
6 Create Create model

7 Mean Error
8
3 Train the model based on
Train
9
input data
10 B > Minimize Error
11

12 4 Use the model on new data


Use
You want to Travel Distance >Prediction
13

14
predict amount (input feature)
at 10km
15

16

17

Big Data & Machine Learning


Cloud OnBoard
1

2
1 Demo: Playing with neural networks to learn what they are
3
2

5
3

6
5

7
6

8
7

9
8

10
9

11
10

12
11

13
12

1413

1514

1615

1716
http://playground.tensorflow.org/
1817

67
Big Data & Machine Learning
Cloud OnBoard
1

2
1 Supervised machine learning requires features and labels
3
2

5
3
Neural Network
6
5

7
6

8
7

9
8
Input

10 features Prediction
9

11
10

12

11

13
12

1413

1514
Cost
1615

1716 Neural network im age by Dake, Mysid [CC BY 1.0], via W ikim edia Com m ons

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1 Machine Learning with TensorFlow involves four steps:
3
2

5 1
Gather training data (input features and labels)
3
Gather
6 Data
5

7
6

8
7
2
9 Create Create model
8

10
9

11
10
3
12 Train Train the model based on input data
11

13
12

1413

4
1514
Use Use the model on new data
1615

1716

1817

68
Big Data & Machine Learning
Cloud OnBoard
1

2
1 Gather training data and select input features
3
2

5 Input features
3

6
5

7 1
6 Gather
Data
8
7

9
8

10
9

11
10

12
11

13
12

1413

1514

1615 discard target


1716 Neural network im age by Dake, Mysid [CC BY 1.0], via W ikim edia Com m ons

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1 All input features need to be numeric
3
2

5
3 Use as-is One-hot encoding
6
5

7 1
6 Gather
Data
8
7

9
8

10
9

11
10

12
11

13
12

1413

1514

1615

1716 Neural network im age by Dake, Mysid [CC BY 1.0], via W ikim edia Com m ons

1817

69
Big Data & Machine Learning
Cloud OnBoard
1

2
1 Create a neural network model, defining the number of feature columns
3
2 and hidden units
5
3

6
5 nhidden
7
6
2
Create
8
7

9
8

10
9
noutputs
11
10
npredictor
s
12
11

13

12

1413

1514

estimator = DNNRegressor( hi dd en_ un its =[ 5] , feature_colum ns =[. .. ])


1615

1716 Neural network im age by Dake, Mysid [CC BY 1.0], via W ikim edia Com m ons

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1 Train the model on the collected data
3
2

5
3

6
model
5

7
6
3 Predicted
value of
Train npredictors
8 taxicab
7 demand

9
8

10
9
Update Cost
11 model based
10
on Cost
12
11

13
12
True value of
taxicab
1413
demand

1514

1615 estimator.fit (p re dic to rs, targets, steps=1000)


1716
Neural network im age by Dake, Mysid [CC BY 1.0], via W ikim edia Com m ons
1817

70
Big Data & Machine Learning
Cloud OnBoard
1

2
1 To predict, the model needs only the input features
3
2

5
3

6
5 model
7
6
4
Use
8
7

9
8 rain
10
Predicted value
9
of taxicab
Max temp
11
10 demand



12
11

13
12
Cost
1413

1514 Update model


1615
based on True value of
Cost taxicab demand
Neural network im age by Dake, Mysid [CC BY 1.0], via W ikim edia Com m ons
1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1 Use the model to predict
3
2

5
3

6
5
input = pd.DataFrame.from_dict(data =
7
6
4 {'dayofweek' : [4, 5, 6],
Use
8
7
'mintemp' : [60, 15, 60],
9
'maxtemp' : [80, 80, 65],
8

10
'rain' : [0, 0.8, 0]})
9

11
10

12
11
# read trained model from /tmp/trained_model
13
estimator = DNNRegressor(model_dir='/tmp/trained_model',
12
hidden_units=[5])
1413

1514 pred = estimator.predict(input.values)


1615 print pred
1716

1817

71
Cloud OnBoard
1

3
Agenda
5

8 Machine learning with TensorFlow + Demo


9

Pre-built machine learning models + Demo


10

11

12

13

14

15

16

17

18

Cloud OnBoard
1

5 ML on Google Cloud Platform


6
Cloud AutoML
7

10
Application developers Data scientists & ML practitioners
11

12

13 Machine Learning
Cloud ML Engine
APIs
14

15

16

17

18
227

72
Big Data & Machine Learning
Cloud OnBoard
1

2
1 The accuracy of a ML problem is driven largely by the size and quality
3
2 of the dataset; this is why ML requires massive compute
5
3

6
5
Scale of Compute Problem
7
6

8
7

9 Accuracy
8

10
9

11
10

12
11

13
12

1413

1514

1615 Size of dataset


1716
https://cloudplatform.googleblog .com/2 016/05 /Googl e-supe rcharg es-mac hine-l earnin g-task s-with -custo m-chip .html

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1 CloudML Engine simplifies the use of Distributed TensorFlow
3
2

...
5
3

6
5

...
7
6

8
7

9
8
. .
10
. .
9

Size of . .
11
10
dataset
12
11

13
...
12

1413

1514

1615

1716

1817

73
Big Data & Machine Learning
Cloud OnBoard
1

2
1 General process of developing and deploying Machine Learning model
3
2

5
3

6 Hyper-parameter
5
tuning
7
6

8
7
Pre Feature Train
9
8
Inputs processing creation model Model
10
9

11
10

12
11
Deploy

13
12

1413

1514
REST API call with
1615
input variables Prediction
Clients Cloud MLE
1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1 Using Cloud Machine Learning Engine
3
2

5
3

6
5 taxifare/
7 taxifare/PKG-INFO
6

8
taxifare/setup.cfg
7 taxifare/setup.py
9
8
taxifare/trainer/
10
taxifare/trainer/__init__.py gcloud ml-engine jobs submit training $JOBNAME \
9
taxifare/trainer/task.py --region=$REGION \
11 taxifare/trainer/model.py
10 --module-name=trainer.task \
12
--job-dir=$OUTDIR --staging-bucket=gs://$BUCKET \
11
--scale-tier=BASIC \
13
12 REST as before
1413

1514

1615

1716

1817

74
Big Data & Machine Learning
Cloud OnBoard
1

2
1 ML APIs are pre-trained ML models (trained off Google’s data) for common
3
2 tasks; they are accessible through REST APIs
5
3

6
5
Use your own data to train models Machine Learning as an API
7
6

8
7

9
8

10
9
Cloud Cloud
11
10 Vision API Speech API
12
TensorFlow Cloud Machine
11
Learning Engine
13
12

1413

1514

1615 Cloud Cloud Cloud Video


Natural Language Translation API Intelligence
1716
API
1817

233
Big Data & Machine Learning

10

11

12
Logo Detection
13

14

15

16

17

75
Big Data & Machine Learning
Cloud OnBoard
1

2
1 Face detection
3
2

5
"faceAnnotations" : [
3 {
6 "headwearLikelihood" : "VERY_UNLIKELY",
5 "surpriseLikelihood" : "VERY_UNLIKELY",
7 rollAngle" : -4.6490049,
6
"angerLikelihood" : "VERY_UNLIKELY",
8
7
"landmarks" : [
{
9
8 "type" : "LEFT_EYE",
10
"position" : {
9 "x" : 691.97974,
11 "y" : 373.11096,
10
"z" : 0.000037421443
12
11
}
},
13
12 ...
], "detectionConfidence" : 0.93568963,
1413
"boundingPoly" : { "joyLikelihood" : "VERY_LIKELY",
"vertices" : [ "panAngle" : 4.150538,
1514
{ "sorrowLikelihood" : "VERY_UNLIKELY",
1615 "x" : 743, "tiltAngle" : -19.377356,
"y" : 449 "underExposedLikelihood" : "VERY_UNLIKELY",
1716 }, "blurredLikelihood" : "VERY_UNLIKELY"
...
1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1 Web annotations
3
2

5 {
3 "entityId": "/m/0gff2yr",
6 "score": 5.92256,
5 "description": "ArtScience Museum"
7 }
6

8 {
7 "entityId": "/m/0h898pd",
{
"score": 7.4162,
9 "entityId": "/m/016ms7",
8 "description": "Harry Potter (Literary Series)"
"score": 1.44038,
}
10 "description": "Ford Anglia"
9
}
11
10

12
11

13
12

1413

1514

1615

1716 CC-BY 2.0 Rev Stan: https://www.flickr.com /photos/revstan/686588024 0

1817

76
Big Data & Machine Learning
Cloud OnBoard
1

2
1 [Demo]Try it in the browser with your own images
3
2

5
3

6
5

7
6

8
7

9
8

10
9

11
10

12
11

13
12

1413

1514

1615
cloud.google.com/vision
1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1 The Translation API supports 100+ languages
3
2

5
3

6
5

7
6

8
7

9
8

10
9

11
10

12
11

13
12

1413

1514

1615
https://cloud.google.com/translate/
1716

1817

77
Big Data & Machine Learning
Cloud OnBoard
1

2
1 Wootric uses the Cloud Natural Language API (entity and sentiment) to
3
2 make sense of qualitative customer feedback
5
3

6
5

7
6

8
7

9
8

10
9

11
10

12
11

13
12

1413

1514

1615

1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1 Extracted entities are tied into a knowledge graph
3
2

5
3
{
"name": "Joanne 'Jo' Rowling",
6
"type": "PERSON",
5
"metadata": {
7 "mid": "/m/042xh",
6 "wikipedia_url": "http://en.wikipedia.org/wiki/J._K. _Rowli ng"
8 }
7

9
8

10
9 Joanne "Jo" Rowling, pen names J. K. Rowling and Robert Galbraith,
11
10
is a British novelist, screenwriter and film producer best known as
12
the author of the Harry Potter fantasy series
11

13
12

{
{
1413 "name": "Harry Potter",
"name": "British",
"type": "PERSON",
"type": "LOCATION",
1514 "metadata": {
"metadata": {
"mid": "/m/078ffw",
"mid": "/m/07ssc",
1615 "wikipedia_url":
"wikipedia_url": "http://en.wikipedia.org/wiki/Unite d_King dom"
"http://en.wikipedia.org/wiki/Ha rry_Po tter"
}
}
1716

1817

78
Big Data & Machine Learning
Cloud OnBoard
1

2
1 When you analyze sentiment, you get a score (positive/negative) as well
3
2 as a magnitude (how intense?)
5
3

6
5

7 The food was excellent, I would definitely go back!


6

8
7

9
{
"documentSentiment": {
8

"score": 0.8,
10
9

11
10 "magnitude": 0.8
12
11
}
13
}
12

1413

1514

1615

1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1 The Cloud Speech API can be used to transcribe audio to text
3
2

5
3

6
5

7
6

8
7

9
8

10
9

11
10

12
11

13
12

1413

1514

http://cloud.google.com/speech
1615

1716

1817

79
Big Data & Machine Learning
Cloud OnBoard
1

2
1 Like the Vision API, the Video Intelligence API can identify labels in a
3
2 video, along with a timestamp
5

{
3

"description": "Bird's-eye view",


5

7
6
"language_code": "en-us",
8
7
"locations": {
9
8
"segment": {
10
9
"start_time_offset": 71905212,
11
10
"end_time_offset": 73740392
12
11
},
13
12
"confidence": 0.96653205
1413

}
1514

}
1615

1716 https://cloud .g oo gle .c om/ vi de o-i nt ell ig en ce/

1817

Big Data & Machine Learning

1
1

2
2

3
3

5
5

6
6 Demo 2 Part 3:
7
7

Machine Learning APIs


8
8

9
9
10
10
11
11
12
12
13
13
14

14
15

15
16

16
17

80
Big Data & Machine Learning
Cloud OnBoard
1

2
1

3
2

5
3

6
Demo 2, Part 3: Machine
5

7
Learning APIs
6

8
7 Use several of the Machine Learning
9
8 APIs (Vision, Translate, Natural
10
9 Language Processing, Speech)
11
10

12
11

13
12

1413

1514

1615

1716

1817

Cloud OnBoard
1

“Thanks to the Google Cloud Platform, Ocado was able to use


3
the power of cloud computing and train our models in
5 parallel.”
6

8
“Hi Ocado,
9
I love your website. I have children so it’s
10
easier for me to do the shopping online.
11
Many thanks for saving my time!
12
Regards”
13
Improves natural
14 language processing
15 of customer service Feedback Customer is happy
16 claims
17

18

81
Cloud OnBoard
1

8 50%
9
of enterprises will be
10 spending more per annum
11 on bots and chatbot
12
creation than traditional
13
mobile app development by
2021 – Gartner
14

15

16

17

18

Cloud OnBoard
1

Custom image Build off Use Vision Use


5

model to NLP API to API as-is to Dialogflow


6

price cars route find text in to create a


7

customer memes new shopping


8

9
emails experience
10

11

12

13

14

15

16

17

18

82
Introducing Cloud Big
AutoML
Data & Machine Learning

A technology that can automatically create a Machine Learning Model


1

DATA
DATA ML MODEL
ML MODEL DESIGN TUNE ML
TUNE ML MODEL
MODEL EVALUATE
EVALUATE DEPLOY UPDATE
UPDATE
10
PREPROCESSING
PREPROCESSING DESIGN DEPLOY
PARAMETERS
PARAMETERS
11

12

13

14

15

16
Confidential & P roprietary

17

Cloud OnBoard
1

Cloud AutoML Vision


2

Train your model


5

6 Upload and label in minutes or one day Evaluate


7
images
8

10

11

12

13 Cloud AutoML
14 Handbag Shoe Hat
15

16 Model is now trained and ready to make prediction.


17 This model can scale as needed to adapt to customer demands.
18

83
Cloud OnBoard
1

“How much is this car worth?”


2

10

11

12

13

14

15

16

17

18

Big Data & Machine Learning

1
1

2
2

3
3

5
5

6
6 Demo:
7
7

Module Review
8
8

9
9
10
10
11
11
12
12
13
13
14

14
15

15
16

16
17

84
Big Data & Machine Learning
Cloud OnBoard
1

2
1

3
Module review
2

Match the use case on the left with the


3

6
5

7
6 product on the right
8
7

9
8 1. Vision API
10 Create, test new machine learning methods
9

No-ops, custom machine learning applications at scale 2. TensorFlow


11
10

12
Automatically reject inappropriate image content 3. Speech API
11
Build application to monitor Spanish twitter feed
13
12
4. Cloud ML
Transcribe customer support calls
1413

5. Translation API
1514

1615

1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1

3
Resources (1 of 2)
2

5
3

6
5

7
6
Cloud Spanner https://cloud.google.com/spanner/
8
7

9
Cloud Bigtable https://cloud.google.com/bigtable/
8

10

Google BigQuery https://cloud.google.com/bigquery/


9

11
10

Cloud Datalab https://cloud.google.com/datalab/


12
11

13
12

1413
TensorFlow https://www.tensorflow.org/
1514

1615

1716

1817

85
Big Data & Machine Learning
Cloud OnBoard
1

2
1

3
Resources (2 of 2)
2

5
3

6
5

7
6
Cloud Machine Learning https://cloud.google.com/ml/
8

Vision API https://cloud.google.com/vision/


7

9
8

Translation API https://cloud.google.com/translate/


10
9

11
10

12
11 Speech API https://cloud.google.com/speech/
13
12

1413 https://cloud.google.com/video-
Video Intelligence API
1514
intelligence
1615

1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1

3
Module review
2

5
3

6
Match the use case on the left with the product on the right
5

7
6

8
7 A. Decoupling producers and consumers of data 1. Cloud Dataflow
9
8
in large organizations and complex systems
10
9
B. Scalable, fault-tolerant multi-step
11
10 processing of data 2. Cloud Pub/Sub
12
11

13
12

1413

1514

1615

1716

1817

86
Big Data & Machine Learning
Cloud OnBoard
1

2
1

3
Resources (1 of 2)
2

5
3

6
5

7
6
Cloud Pub/Sub https://cloud.google.com/pubsub/
8

Cloud Dataflow https://cloud.google.com/dataflow/


7

9
8

Processing media using


10
9

11 https://cloud.google.com/solutions/me
Cloud Pub/Sub and
10

12
11 dia-processing-pub-sub-compute-engine
13
Compute Engine
12

1413

1514

1615

1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1

3
Resources (2 of 2)
2

5
3

6
5

7 Reverse Geocoding of
https://cloud.google.com/solutions/reverse-
6

8 Geolocation Telemetry
geocoding-geolocation-telemetry-cloud-maps-
7

9 in the Cloud Using the


ap
8

10
9 Maps API
11
10

12
11 Using Cloud Pub/Sub for https://cloud.google.com/solutions/us
13
12 Long-running Tasks ing-cloud-pub-sub-long-running-tasks
1413

1514

1615

1716

1817

87
Big Data & Machine Learning

1
1

2
2
Cloud OnBoard
3
3

5
5

Summary
6

7
7

8
8

9
9
10
10
11
11
12
12
13
Cloud OnBoard
13
14

14
15

15
16
Version #1.1
16
17

Big Data & Machine Learning


Cloud OnBoard

1 Google Cloud provides a way to take advantage of Google’s


2 investments in infrastructure and data processing innovation
3

7
Cloud DataStore Pub/Sub Cloud
8 Storage Spanner

10

11 Auto ML
Cloud DataProc Bigtable BigQuery DataFlow DataFlow ML Engine
Storage
12

13

14

15 2002 2004 2006 2008 2010 2012 2014 2016 2018

16

17

88
Big Data & Machine Learning
Cloud OnBoard
1

2
1

3
An Evolving Cloud
2

5
3

6
5
1st Wave
7
6
Your kit, someone
8 else’s building.
7
Yours to manage.
9
8
2nd Wave
10
9
Standard virtual
11 kit,for rent.
10
Still yours to manage.
12
11

3rd Wave
13
12

Invest your energy


1413
in great apps

1514

1615

1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1

3
Typical Big Data Processing
2

5
3
Monitoring Programming
6
5

7
6

8 Performance Resource
7
tuning provisioning
9
8

10
9

11
10

12
11

13 Utilization Handling
12
improvements growing scale
1413

1514

Deployment &
1615 Reliability
configuration
1716

1817

89
Big Data & Machine Learning
Cloud OnBoard
1

2
1

3
Big Data with Google: Focus on insight, not infrastructure.
2

5
3
Programming
6
5

7
6

8
7

9
8

10
9

11
10

12
11

13
12

1413

1514

1615

1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1

3
Choosing where to store data on GCP
2

5
3

6
5 unstructured structured
7
6

8
7
Transactional Data analytics
9
8 workload workload
Firebase Cloud
10
9 Storage Storage No-SQL Millisecond
11 SQL Latency
10

Cloud
12
11
Bigtable
13 Cloud
12
SQL
1413
Latency in
1514 Cloud Horizontal seconds
Spanner scalability Firebase Cloud BigQuery
1615
Realtime DB Datastore
1716

1817

90
Cloud OnBoard

Example architecture for data analytics

Tableau

QlikView

Build a mobile gaming analytics platform - a reference architecture

Cloud OnBoard
1

5 ML on Google Cloud Platform


6
Cloud AutoML
7

10
Application developers Data scientists & ML practitioners
11

12

13 Machine Learning
Cloud ML Engine
APIs
14

15

16

17

18
270

91
Big Data & Machine Learning
Cloud OnBoard
1

2
1 General process of developing and deploying Machine Learning model
3
2

5
3

6 Hyper-parameter
5
tuning
7
6

8
7
Pre Feature Train
9
8
Inputs processing creation model Model
10
9

11
10

12
11
Deploy

13
12

1413

1514
REST API call with
1615
input variables Prediction
Clients Cloud MLE
1716

1817

Big Data & Machine Learning


Cloud OnBoard
1

2
1

3
In summary, GCP offers you ways to...
2

5
3

6
5

7
6

8
7

9
8

10 Spend less on ops Incorporate real-time Apply machine Create citizen


9
and administration data into apps and learning broadly data scientists
11
10 architectures and easily
12
11 We’ve “automated To get the most We make it simple and Transform your
out” the complexity out of data and practical to organization into
13
12
of building and secure competitive incorporate machine a truly data driven
learning models company. Putting
1413
maintaining data advantage. within custom tools into hands of
1514
and analytics applications. domain experts.
systems.
1615

1716

1817

92
Big Data & Machine Learning

1
1 <Start Training>
2
2

3
3

5
5

6
6

7
7

#GoogleCloudOnBoard
8
8

9
9
10
10
11
11
12
12
13
</Start Training>
13
14

14
15

15
16
Version #1.1
16
17

93

Das könnte Ihnen auch gefallen