AWS Certified Solutions Architect Professional - Study Guide - Domain 7 0 Scalability and Elasticity

AWS Certified Solutions Architect Professional Study Guide
Domain 7.0: Scalability and Elasticity (15%)
7.1 Demonstrate the ability to design a loosely coupled system
Amazon CloudFront is a web service (CDN) that speeds up

distribution of your static and dynamic web content, for example,
.html, .css, .php, image, and media files, to end users. CloudFront
delivers your content through a worldwide network of edge locations.
When an end user requests content that youre serving with CloudFront,
the user is routed to the edge location that provides the lowest latency,
so content is delivered with the best possible performance. If the content
is already in that edge location, CloudFront delivers it immediately. If the
content is not currently in that edge location, CloudFront retrieves it from
an Amazon S3 bucket or an HTTP server (for example, a web server)
that you have identified as the source for the definitive version of your
content.
CloudFront has two aspects origin and distribution. You create a

distribution and link it to an origin, such as S3, an EC2 instance, existing
website etc
Two types of distributions, web and RTMP
Geo restrictions can be used to white or blacklist traffic from specific

countries, blocking access to the distribution
GET, HEAD, PUT, POST, PATCH, DELETE and OPTIONS HTTP

commands supported
Allowed methods are what CloudFront will pass on to the origin

server. If you do not need to modify content, consider not allowing PUT,
POST, PATCH, DELETE to ensure users to not modify content
CloudFront does not cache responses to POST, PUT, DELETE and

PATCH requests, can POST content to an Edge location and then this is
send on to the origin server
SSL can be used to provide HTTPS. Can either use CloudFronts own
certificate or use your own
To support older browsers, need dedicated SSL IP certificate

per edge location, can be very expensive
SNI (Server Name Indication) custom SSL certs can be used by

adding all hostnames behind the certificate but it is presented as a
single IP address. Uses SNI extensions in newer browsers
100 CNAME aliases per distribution, can use wildcard CNAMEs
Use Invalidation Requests to forcibly remove content from Edge

locations. Need to use API call to do this or do it from the console, or set
a TTL on the content
Alias records can be used to map a friendly name to a CloudFront

URL (Route 53 supports this). Supports zone apex entry (name without
www, such as example.com). DNS records for the same name must
have the same routing type (simple, weighted, latency, etc) or you will
get an error in the console
Alias records can then have evaluate target set to yes so that
existing health checks are used to ensure the underlying resources are
up before sending traffic onwards. If a health check for the underlying
resource does not exist, evaluate target settings have no effect
AWS doesnt charge for mapping alias records to CloudFront

distributions
CloudFront supports dynamic web content using cookies to forward

on to the origin server
Forward query strings passes the whole URL to the origin if

configured in CloudFront, but only for a web server or application as S3
does not support this feature
Cookie values can then be logged into CloudFront access logs
CloudFront can be used to proxy upload requests back to the origin to

speed up data transfers
Use a zero value TTL for dynamic content
Different URL patterns can send traffic to different origins
Whitelist certain HTTP headers such as cloudfront-viewer-country so

that locale details can be passed through to the web server for custom
content
Device detection can serve different content based on the User Agent
string in the header request
Invalidating objects removes them from CloudFront edge caches. A

faster and less expensive method is to use versioned object or directory
names
Enable access logs in CloudFront and then send them to an S3

bucket. EMR can be used to analyse the logs
Signed URLs can be used to provide time limited access or access to

private content on CloudFront. Signed cookies can be used to limit
secure access to certain parts of the site. Use cases are signed URLs
for a marketing e-mail and signed cookies for web site streaming or
whole site authentication
Cache-control max-age header will be sent to browser to control how

long the content is in the local browser cache for, can help improve
delivery, especially of static items
If-modified-since will allow the browser to send a request for content

only if it is newer than the modification date specified in the request. If
the content has not changed, content is pulled from the browser cache
Set a low TTL for dynamic content as most content can be cached
even if its only for a few seconds. CloudFront can also present stale
data if TTL is long
Popular Objects report and cache statistics can help you tune
CloudFront behaviour
Only forward cookies that are used to vary or tailor user based
content
Use Smooth Streaming on a web distribution for live streaming using

Microsoft technology
RTMP is true media streaming, progressive download downloads in

chunks to say a mobile device. RTMP is Flash only
Supports existing WAF policies
You can create custom error response pages
Two ElastiCache engines available Redis and Memcached. Exam

will give scenarios and you must select the most appropriate
As a rule of thumb, simple caching is done by memcached and

complex caching is done by Redis
Only Redis is multi-AZ and has backup and restore and persistence
capabilities, sorting, publisher/subscriber, failover
Redis uses a persistence key store or caching engine for persistence
Redis has backup and restore and automatic failover and is best used
for frequently changing data in a complex scale
Doesnt need a database to backend it like memcached does
Leader boards is a good use case for Redis
Redis can be configured to use an Append Only File (AOF) that will
repopulate the cache in case all nodes are lost and cache is cleared.
This is disabled by default. AOF is like a replay log
Redis has a primary node and read only nodes. If the primary fails, a
read only node is promoted to primary. Writes done to primary node,
reads done from read replicas (asynchronous replication)
Redis snapshots are used to increase the size of nodes. This is not
the same as EC2 snapshots, the snapshot creates a new node based
on the snapshot and size is picked when launching
Redis can be configured to automatically backup daily in a window or

manual snapshots. Automatic have retention limits, manual dont
Memcached can scale horizontally and is multi-threaded, supports

sharding
Memcached uses lazy loading, so if an app doesnt get a hit from the
cache, it requests it from the DB and then puts that into cache. Write
through updates the cache when the database is updated
TTL can be used to expire out stale or unread data from the cache
Memcached does not maintain its own data persistence, database

does this, scale by adding more nodes to a cluster
Vertically scaling memcached nodes requires standing up a new

cluster of required instance sizes/types. All instance types in a cluster
are the same type
Single endpoint for all memcached nodes
Put memcached nodes in different AZs
Memcache nodes are empty when first provisioned, bear this in mind
when scaling out as this will affect cache performance while the nodes
warm up
For low latency applications, place Memcache clusters in the same

AZ as the application stack. More configuration and management but
better performance
When deciding between Memcached and Redis, here are a few

questions to consider:
Is object caching your primary goal, for example to offload your

database? If so, use Memcached.
Are you interested in as simple a caching model as possible? If

so, use Memcached.
Are you planning on running large cache nodes, and require

multithreaded performance with utilization of multiple cores? If so, use
Memcached.
Do you want the ability to scale your cache horizontally as you

grow? If so, use Memcached.
Does your app need to atomically increment or decrement

counters? If so, use either Redis or Memcached.
Are you looking for more advanced data types, such as lists,
hashes, and sets? If so, use Redis.
Does sorting and ranking datasets in memory help you, such as

with leaderboards? If so, use Redis.
Are publish and subscribe (pub/sub) capabilities of use to your

application? If so, use Redis.
Is persistence of your key store important? If so, use Redis.
Do you want to run in multiple AWS Availability Zones (MultiAZ) with failover? If so, use Redis.
Amazon Kinesis is a managed service that scales elastically for realtime processing of streaming data at a massive scale. The service
collects large streams of data records that can then be consumed in real
time by multiple data-processing applications that can be run on Amazon
EC2 instances.
Youll create data-processing applications, known as Amazon Kinesis

Streams applications. A typical Amazon Kinesis Streams application
reads data from an Amazon Kinesis stream as data records. These
applications can use the Amazon Kinesis Client Library, and they can
run on Amazon EC2 instances. The processed records can be sent to
dashboards, used to generate alerts, dynamically change pricing and
advertising strategies, or send data to a variety of other AWS services.
The PutRecord command is used to put data into a stream
Data is stored in Kinesis for 24 hours, but this can go up to 7 days
You can use Streams for rapid and continuous data intake and
aggregation. The type of data used includes IT infrastructure log data,
application logs, social media, market data feeds, and web clickstream
data. Because the response time for the data intake and processing is in
real time, the processing is typically lightweight
The following are typical scenarios for using Streams
Accelerated log and data feed intake and processing
Real-time metrics and reporting
Real-time data analytics
Complex stream processing

An Amazon Kinesis stream is an ordered sequence of data records.
Each record in the stream has a sequence number that is assigned by
Streams. The data records in the stream are distributed into shards
A data record is the unit of data stored in an Amazon Kinesis stream.

Data records are composed of a sequence number, partition key, and
data blob, which is an immutable sequence of bytes. Streams does not
inspect, interpret, or change the data in the blob in any way. A data blob
can be up to 1 MB
Retention Period is the length of time data records are accessible

after they are added to the stream. A streams retention period is set to a
default of 24 hours after creation. You can increase the retention period
up to 168 hours (7 days) using the IncreaseRetentionPeriod operation
A partition key is used to group data by shard within a stream
Each data record has a unique sequence number. The sequence

number is assigned by Streams after you write to the stream with
client.putRecords or client.putRecord
In summary, a record has three things:
Sequence number
Partition key
Data BLOB
Producers put records into Amazon Kinesis Streams. For example, a
web server sending log data to a stream is a producer
Consumers get records from Amazon Kinesis Streams and process

them. These consumers are known as Amazon Kinesis Streams
Applications
An Amazon Kinesis Streams application is a consumer of a stream

that commonly runs on a fleet of EC2 instances
A shard is a uniquely identified group of data records in a stream. A

stream is composed of one or more shards, each of which provides a
fixed unit of capacity
Once a stream is created, you can add data to it in the form of

records. A record is a data structure that contains the data to be
processed in the form of a data blob. After you store the data in the
record, Streams does not inspect, interpret, or change the data in any
way. Each record also has an associated sequence number and partition
key
There are two different operations in the Streams API that add data to
a stream, PutRecords and PutRecord. The PutRecords operation sends
multiple records to your stream per HTTP request, and the singular
PutRecord operation sends records to your stream one at a time (a
separate HTTP request is required for each record). You should prefer
using PutRecords for most applications because it will achieve higher
throughput per data producer
An Amazon Kinesis Streams producer is any application that puts

user data records into an Amazon Kinesis stream (also called data
ingestion). The Amazon Kinesis Producer Library (KPL) simplifies
producer application development, allowing developers to achieve high
write throughput to a Amazon Kinesis stream.
You can monitor the KPL with Amazon CloudWatch
The agent is a stand-alone Java software application that offers an

easier way to collect and ingest data into Streams. The agent
continuously monitors a set of log files and sends new data records to
your Amazon Kinesis stream. By default, records within each file are
determined by a new line, but can also be configured to handle multi-line
records. The agent handles file rotation, checkpointing, and retry upon
failures. It delivers all of your data in a reliable, timely, and simple
manner. It also emits CloudWatch metrics to help you better monitor and
troubleshoot the streaming process.
You can install the agent on Linux-based server environments such

as web servers, front ends, log servers, and database servers. After
installing, configure the agent by specifying the log files to monitor and
the Amazon Kinesis stream names. After it is configured, the agent
durably collects data from the log files and reliably submits the data to
the Amazon Kinesis stream
SNS is Simple Notification Services publisher creates a topic and

then subscribers get updates sent to topics. This can be push to
Android, iOS, etc
Use SNS to send push notifications to desktops, Amazon Device

Messaging, Apple Push for iOS and OSX, Baidu, Google Cloud for
Android, MS push for Windows Phone and Windows Push notification
services
Steps to create mobile push:
Request credentials from mobile platforms
Request token from mobile platforms
Create platform application object
Publish message to mobile endpoint
Grid computing vs cluster computing
Grid computing is generally loosely coupled, often used with

spot instances and tend to grow and shrink as required. Use different
regions and instance types
Distributed workloads
Designed for resilience (auto scaling) horizontal scaling rather

than vertical scaling
Cluster computing has two or more instances working together

in low latency, high throughput environments
Uses same instance types
GPU instances do not support SR-IOV networking

Elastic Transcoder encodes media files and uses a pipeline with a
source and destination bucket, a job and a pre-set (what media type,
watermarks etc). Pre-sets are templates and may be altered to provide
custom settings. Pipelines can only have one source and one
destination bucket
Integrates into SNS for job status updates and alerts

AWS Certified Solutions Architect Professional - Study Guide - Domain 7 0 Scalability and Elasticity

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

AWS Certified Solutions Architect Professional - Study Guide - Domain 7 0 Scalability and Elasticity

Hochgeladen von

Copyright:

Verfügbare Formate

AWS Certified Solutions Architect Professional Study Guide

Domain 7.0: Scalability and Elasticity (15%)

7.1 Demonstrate the ability to design a loosely coupled system

Amazon CloudFront is a web service (CDN) that speeds up

CloudFront has two aspects origin and distribution. You create a

Two types of distributions, web and RTMP

Geo restrictions can be used to white or blacklist traffic from specific

GET, HEAD, PUT, POST, PATCH, DELETE and OPTIONS HTTP

Allowed methods are what CloudFront will pass on to the origin

CloudFront does not cache responses to POST, PUT, DELETE and

To support older browsers, need dedicated SSL IP certificate

SNI (Server Name Indication) custom SSL certs can be used by

100 CNAME aliases per distribution, can use wildcard CNAMEs

Use Invalidation Requests to forcibly remove content from Edge

Alias records can be used to map a friendly name to a CloudFront

AWS doesnt charge for mapping alias records to CloudFront

CloudFront supports dynamic web content using cookies to forward

Forward query strings passes the whole URL to the origin if

Cookie values can then be logged into CloudFront access logs

CloudFront can be used to proxy upload requests back to the origin to

Use a zero value TTL for dynamic content

Different URL patterns can send traffic to different origins

Whitelist certain HTTP headers such as cloudfront-viewer-country so

Invalidating objects removes them from CloudFront edge caches. A

Enable access logs in CloudFront and then send them to an S3

Signed URLs can be used to provide time limited access or access to

Cache-control max-age header will be sent to browser to control how

If-modified-since will allow the browser to send a request for content

Use Smooth Streaming on a web distribution for live streaming using

RTMP is true media streaming, progressive download downloads in

Supports existing WAF policies

You can create custom error response pages

Two ElastiCache engines available Redis and Memcached. Exam

As a rule of thumb, simple caching is done by memcached and

Redis uses a persistence key store or caching engine for persistence

Doesnt need a database to backend it like memcached does

Leader boards is a good use case for Redis

Redis can be configured to automatically backup daily in a window or

Memcached can scale horizontally and is multi-threaded, supports

Memcached does not maintain its own data persistence, database

Vertically scaling memcached nodes requires standing up a new

Single endpoint for all memcached nodes

Put memcached nodes in different AZs

For low latency applications, place Memcache clusters in the same

When deciding between Memcached and Redis, here are a few

Is object caching your primary goal, for example to offload your

Are you interested in as simple a caching model as possible? If

Are you planning on running large cache nodes, and require

Do you want the ability to scale your cache horizontally as you

Does your app need to atomically increment or decrement

Does sorting and ranking datasets in memory help you, such as

Are publish and subscribe (pub/sub) capabilities of use to your

Is persistence of your key store important? If so, use Redis.

Youll create data-processing applications, known as Amazon Kinesis

Data is stored in Kinesis for 24 hours, but this can go up to 7 days

The following are typical scenarios for using Streams

Accelerated log and data feed intake and processing

Real-time metrics and reporting

Real-time data analytics

Complex stream processing

A data record is the unit of data stored in an Amazon Kinesis stream.

Retention Period is the length of time data records are accessible

A partition key is used to group data by shard within a stream