Sie sind auf Seite 1von 19

An HTTP 404 status code means communication with the service was successful,

but the request received a "not found" response. MissingSecurityHeader is an error


message received if the S3 API call is missing security API information which
prevents the request from being executed successfully. 400 bad request would be
the HTTP response code for MissingSecurityHeader. Error code list:
https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html

And MyClothes.com allows people to buy clothes online. And there is a shopping
cart when you navigate MyClothes.comand we're having hundreds of users at the
same time
so all these users are navigating the website and we wanna be able to scale
maintain horizontal scalabilityand keep our application web tier as stateless as
possible. So even though there is a state of shopping cart,
we want to be able to scale our web application as easily as possible.
So users, that means that they should not lose their shopping cart while navigating
our website, that would be really bad.
And maybe also have their details such as address, etcetera,
in a database that we can store effectively
and make accessible from anywhere.
So let's see how we can proceed.
You'll see it's going to be yet another fun
but challenging discussion.
Okay, so this is our application and I'm going to go fast.
Here's the kind of architecture we've seen
in the previous lecture.
So we have our user, Route 53, Multi AZ ELB,
Auto Scaling group three AZ, very basic.
So our application is accessing our ELB and our ELB says
"Alright, you're gonna talk to this instance."
And you create a shopping cart,
and then the next request is going to go
not to the same instance but to another instance,
so now the shopping cart is lost
and the user says "Oh, there must just be a little bug,
I'm going to try again."
So he adds something into the shopping cart
and it gets redirected to the third instance
which doesn't have their shopping cart.
So basically the user is going crazy and say,
"Wait, I'm losing my shopping cart every time I do something. This is really weird,
MyClothes.com is a bad website. I don't wanna shop on it, it will lost money."
So how do we fix this?
Well, we can introduce Stickiness or Session Affinity
and that's an ELB feature so we enable ELB Stickiness and now our user talks to
our first instance, adds something into the shopping cart and then the second
request goes to the same instance because of Stickiness and the third request also
goes to the same instance and actually every request will go to the same instance
because of Stickiness.
This works really well but if an ec2 instance
gets terminated for some reason,
then we still lose our shopping cart.
But there is definitely some kind of improvement here, thanks to Stickiness and
Session Affinity.
So now, let's look at the completely different approach
and introduce user cookies. So basically, instead of having the ec2 instances
store the content of the shopping cart, let's say that the user is the one storingthe
shopping cart content and so every time it connects to the load balancer, it basically
is going to say
"By the way, in my shopping cart I have all these things."
and that's done through web cookies.
So now if it talks to the first server,
the second server or the third server,
each server will know what the shopping cart content is
because the user is the one sending
the shopping cart content directly into our ec2 instances.
So it's pretty cool right?
We achieved stateless test because now each ec2 instance doesn't need to know
what happened before. The user will tell us what happened before but the HTTP
request, they are getting heavier.
So because we sent the shopping cart content in web cookies
we're sending more and more data every time
we add something into our shopping cart.
Additionally, there is some level of security risk
because the cookies, they can be altered by attackers maybe and so maybe our
user may have a modified shopping cart all of a sudden. So, when we do have this
kind of architecture make sure that your ec2 instances do validate the content of the
user cookies.
And then, the cookies overall, they can only be so big. They can only be less than
4KB total
so there's only a little information you can store in the cookies. You cannot store big
data sets, okay?
So this is the idea. So, this works really well. This is actually a pattern that many
web application frameworks use but what if we do something else?
Let's introduce this concept of server session. So now, instead of sending a whole
shopping carts in web cookies, we're just going to send a session ID.
That is just this one for the user. So we're gonna send this and in the background,
we're gonna have to maybe in the ElastiCache cluster and what will happened is
that when we send a session ID, we're gonna talk to ec2 instance
and say we're going to add this thing to the cart
and so the ec2 instance will add the cart content into the ElastiCache and the ID to
retrieve this cart content is going to be a session ID.
So when our user basically does the second request
with the session ID and it goes to another ec2 instance, that other ec2 instance is
able using that session ID to look up the content of the cart from ElastiCache
and retrieve that session data. And then, for last request, the same pattern. The
really cool thing with ElastiCache, remember, it is sub-millisecond performance
so all these things happen really quickly and that's really great. An alternative, by the
way, for storing session data we haven't seen it yet, it's called DynamoDB.
But I'm just putting it out here
just in case you know what DynamoDB is. So, it's a really cool pattern here.
It's more secure because now ElastiCache is a source of truth and no attackers can
change what's in ElastiCache.
So we have a much more secure type of pattern and it's very common. So now,
okay we have ElastiCache, we figured this out. We wanna store user data in the
database, we wanna store the user address. So again, we're gonna talk to our ec2
instance and this time, it's going to talk to an RDS instance. And RDS, it's going to
be great because it's for long term storage and so we can store
and retrieve user data such as address, name, etcetera directly by talking to RDS.
And each of our instances can talk to RDS and we effectively get, again,
some kind of Multi AZ stateless solution. So our traffic is going great. Our website is
doing amazing and now we have more and more users and we realized that most of
the thing they do is they navigate the website.
They do reads, they get product information, all that kind of stuff. So how do we
scale reads? Well we can use an RDS Master which takes the writes but we can
also have RDS Read Replicas with some replication happening.
And so anytime we read stuff, we can read from the Read Replica and we can have
up to five Read Replicas in RDS.
And it will allow us to scale the reads of our RDS database. There's an alternative
pattern called Write Through where we use the cache and so the way it works is that
our user talks to an ec2 instance.It looks in the cache and said,
"Do you have this information?"
If it doesn't have it then it's going to read from RDS and put it back into ElastiCache
so just this information is cached. And so the other ec2 instances, they're doing the
same thing but this timewhen they talk to ElastiCache,
they will have the information and they get a cache hit and so, they directly get the
response right away because it's been cached. And so this pattern allows us to do
less traffic on RDS. Basically, decrease the CPU usage on RDS
and improve performance at the same time. But we need to do cache maintenance
now and it's a bit more difficult and again
this has to be done application side.
So pretty awesome now, we have our application, it's scalable, it has many many
reads but we wanna survive disasters,
we don't wanna be stricken by disasters. So how do we do? Our user talks to our
Route 53 but now we have a Multi AZ ELB. And by the way, Route 53 is already
highly available.
You don't need to do anything. Well so far, a load balancer, we're going to make it
Multi AZ. Our auto scaling group is Multi AZ and then RDS,
there's a Multi AZ feature.
The other one is going to be a standby replica that can just takeover whenever
there's a disaster. And ElastiCache also has a Multi AZ feature if you use Redis.
So really cool.
Now we basically have a Multi AZ application all across the board and we know for
sure that we can survive unavailabilities on in AWS going down.
Now for security groups, we wanna be super secure.
So maybe we'll open HTTP, HTTPS traffic from anywhere on the ELB side.
For the ec2 instance side, we just wanna restrict traffic
coming from the load balancer and maybe for my ElastiCache, we just wanna
restrict traffic coming from the ec2 security group and from RDS, same thing.
We want to restrict traffic coming directly from the ec2 security group.
So, that's it! So now, let's just talk about this architecture for web application.
So we have discussed ELB sticky sessions, web client for storing cookies
and making our web app stateless or using maybe a session ID and a session
cache for using ElastiCache and as an alternative, we can use DynamoDB.
We can also use ElastiCache to cache data from RDS in case of reads and we can
use Multi AZ to be surviving disasters. RDS, we can use it for storing user data,
so more durable type of data. Read replicas can be used for scaling reads
or we can also use ElastiCache and then we have Multi AZ for disaster recovery.
And on top of it, Tight Security for security groups referencing each other. So this is
a more complicated application just three tier because there's the web tier, the client
tier, the web tier, and the database tier.
But this is a very common architecture overall. And yes, it may start to increase in
cost but it is okay, at least we know the trade-offs we're making. If we want Multi AZ,
yes for sure we have to pay more. If we want to scale the reads, yes for sure we'll
have to pay more as well. But it gives us some good trade-offs
in architecture decisions that we have to make.

so that it works that WordPress will store the pictures somewhere on some drive and
then basically all your instances must access that data as well.
And so our user data, the blog content and everything should be stored in a MySQL
database and we want this to scale globally
so let's see how we can achieve this. So the first thing we have to do is to create a
layer that has RDS so we are now very familiar with this kind of architecture with
RDS in the back end its Multi AZ it's going to be kind of get through all of these 2
instances but what if I just wanna go and go big and really scale up, maybe I want to
replace this layer with Aurora MySQL an I can have Multi AZ, read replicas
even global databases if I wanted to. Right here we're just basically getting less
operations by using Aurora, it's just a choice I'm making as a solutions architect you
don't have to make that choice but I like Aurora, I like the fact that it scales better
and I like the fact that it is easier to upwrite. Okay, excellent, so now lets talk about
storing images
so let's go back to the very simple solution architecture when we have one EC2
instance and it has one EBS volume attached to it so it's in one AZ and so we're
gonna get to our loaded answer and so our user wants to
send an image to our loaded answer and that image makes it all the way through to
EBS so the image is stored on EBS so now it works really well
we only have one EC2 instance and so it goes straight the the EBS Volume and
we're happy. If we wanted to read that image, same thing, the image can be read
from the EBS Volume and sent back to the user so very good, right?
The problem arrives when we start scaling
so now we have two EC2 instances and two different AZ and each of these EC2
instances have their own EBS Volumes and so what happens is that if I send
an image right here from this instance and it gets stored on that EBS Volume
if I want to read that image maybe I'll make it this way and yes, I can read it or, very
common mistake I can read that image and it will go here and here on the bottom
(diff EC2)there is no image present and so because it's not the same EBS Volume
and so here I won't be able to access my image and so that's really, really bad. So
the problem with EBS Volumes is that it works really well when you have one
instance but when you start scaling across multiple AZ or multiple instances
then it's starting to become problematic. So for this we have seen it and how to store
it basically we can use EFS so let's use the exact same architecture but now we are
recording in EFS Network File System Drive so EFS is NFS and so EFS basically
creates ENI's
for Elastic Network Interface
and it creates these ENI's into each AZ and this ENI can be used for all our EC2
instances to access our EFS drive and the really cool thing here is that the storage
is shared between all the instances so if we send an image to the M5 instance
to the ENI, to EFS so the image is stored in EFS now if you wanna read the image,
it goes all the way to the bottom and through the ENI and it's going to read on EFS
and yes EFS has that image available so we can send it back and so this is a very
common way of scaling website storage across many different EC2 instances to
allow them all to have access to the same files regardless of their availability zone
and how many instances we have. So that's it that's that little subtlety for WordPress
but I wanted to discuss EBS vs EFS. So we talked about Aurora Database
to basically have less operations and have multi AZ and read replicas and we've
talked about storing data in EBS which works great when we're in a single instance
application but it doesn't work really great when we have many, and so maybe we
can use EFS then to have a distributed application across multi AZ and that kind of
stuff now the costing aspect of it is that EBS is cheaper than EFS but we do get a lot
of advantages by using EFS especially in these kind of use cases so again, it's up to
you as a solution architect to really understand the trade offs for doing and why
you're doing things and the cost implications of what you're doing.
you're really minimizing a risk over time and you save costfrom disasters and you
really don't want to have a risk, or a security issue in your company.
Now how do we design strong security? Well we need to have a strong identity
foundation.
So we want to centralize how we manage user accounts. We want to rely on least
privilege and maybe IAM is going to be one of these services to help us do that.
We want to enable traceability, that means we need to look at all the logs, all the
metrics and store them and automatically respond and take action, every time
something looks really weird. We need to apply security at all layers, okay?
You need to secure every single layer, such as if one fail maybe the next one will
take over. So edge network, VPC, subnet, load balancer, every institute instance
you have, the OS, patching it, the application, making sure it's up to date,
all these things.You need to automate security best practices, okay?
Security is not something you do manually, it's mostly done well, when it's
automated. You need to protect data in transit and at rest. That means always
enable encryption, always do SSL, always use tokenization and do access control.
And you need to keep people away from data. So why is someone requesting data?
Isn't that a risk when you allow someone to access data, do they really need it or is
there a way to automate the need for that direct access in that manual processing of
data?
And then you need to prepare for security events. So security events must happen
some day in every company, I think, and so run response simulations, use tools to
automate the speed of detection, investigation and recovery, okay?
So in terms of AWS Services, what does that mean? Well the first one is going to be
identity and access management. So we know what that means, that means IAM,
STS for generating temporary credentials, maybe Multi-Factor Authentication token
and AWS Organization to manage multiple address accounts centrally. Then
detective controls. So how do we detect stuff goes wrong?
AWS Config for compliance,
CloudTrail to look at API calls that look suspicious. CloudWatch to look at metrics
and things that may go completely out of norm. And then we get infrastructure
protection. So how do we protect our own Cloud? Well CloudFront is going to be a
really great first line of defense against DDoS attacks.
Amazon VPC to secure your network and making sure you set the right ACLs.
And then Shield, we haven't seen this, but it's like basically a way to protect your
AWs account from DDoS. WAF which is a Web Application Firewall and Inspector to
basically look at the securityof our institute instances.
Now we haven't see all these services, they are more for the SysOps exam, but
again I just wanna give you here an overview of all the AWS Services that can help
you achieve full security.For data protection, well we know there is KMS
to encrypt all the data at rest. Then there is S3, which has a tons of encryption
mechanism, we have SSE S3, we have SSE KMS,SSEC or client setting encryption.
On top of it we get bucket policies and all that stuff. And then every, you know, every
managed service has some kind of data protection. So Load Balancer can enable to
expose a HTPS end point. EBS volumes can be encrypted at rest, RDS instances
also can be encrypted at rest and they have SSL capability. So all these things are
here to protect your data. Incident response, what happens when there's a problem?
Well IAM is going to be your first good line of defense if there is an account being
compromised and just delete that account or give it zero privilege. CouldFormation
will be great, for example
if someone deletes your entire infrastructure, how do you get back into a running
state? Well CloudFormation is the answer.
Then, for example, how do we automate all these incident response? How do we
automate the fact that if someone deletes a resource, maybe we should alert,
CloudWatch Events could be a great way of doing that.
So that's it, again just to show you the synergy.
Reliability is the ability of a system to recover from infrastructure or service
disruptions, dynamically acquire computing resources to meet demand, and mitigate
disruptions such as misconfiguration. So it's about making sure your application
runs no matter what.The design principles are simple. We need to test recovery
procedures
so you need to use automation to simulate different failures or to recreate scenarios
that led to failures before. We need to automatically recover from failure. That
means that you need to anticipate and remediate failures before they occur.
Then scale horizontally in case you need to have increased system availability, or
increased load. And then stop guessing capacity. So basically that means that if you
think, oh, I need four streams in this for my application, that probably isn't going to
work in the long term.
Use auto scaling wherever you can
to make sure you have the right capacity at any time. And then in terms of
automation, you need to basically change everything through automation, and this is
to ensure that your application will be reliable or you can roll back, or whatever. In
terms of AWS Services, what do we have?
Well the foundations of reliability is going to be IAM, again making sure that no one
has too many rights to basically wreak havoc on your account. Amazon VPC, this is
a really strong foundation for networking.
And Service Limits, making sure that you do set appropriate service limits. Not too
high, and not too low, just the right amount of service limits, and you monitor them
over time. Such as if your application has been growing, and growing, and growing,
and you're about to reach that service limit. You don't want to get any service
disruptions, so you would contact AWS, and increase that service limit over time.
Trusted Advisor is also great, about how we can basically look at these service
limits, or look at other things, and get strong foundations over time.
Change management, so how do we manage change overall? Well, Auto Scaling is
a great way. Basically if my application gets more popular over time and I have set
up auto scaling then I don't need to change anything, which is great.
CloudWatch is a great way also of looking at your metrics. For your databases for
your application, making sure everything looks reliable over time, and if the CP
utilization starts to ramp up maybe do something about it.
CloudTrail in terms of are we secure enough to track our API calls?And Config,
again. Failure Management, so how do we manage failures?
Well, we'll see this for disaster recovery explanation
in this section, but you can use backups, all along the way to basically make sure
that your application can be recovered if something really really bad happens.
CloudFormation to recreate your whole infrastructure at once, S3, for example, to
backup all your data or, you know, S3 Glacier if we're talking about archives that you
don't need to touch once in a while. Finally maybe you want to use a reliable,
highly available global DNS system, so Route 53 could be one of them. And in case
of any failures, maybe you want to change Route 53 to just point to a new
application stack somewhere else and really make your your application has some
kind of disaster recovery mechanism.
performance efficiency.
So what is it?
It includes the ability to use computing resources efficiently to meet system
requirements, and to maintain that efficiency as demand changes and technologies
evolve. So it's all about adapting and providing the best performance.
The design principles are simple. First, you need to use advanced technologies,
okay? You need to democratize them, and basically,
as the services become available, maybe they can be helpful for your product
developments so track them. You need to be able to go global in minutes
so if you need to deploy in multiple regions it shouldn't last days, it should last
minutes. Maybe using cloud formation. Use serverless infrastructure. So that's the
golden state. So that means you don't manage any servers, and everything scales
for you, which is really awesome.
Experiment more often.
Maybe you have something working really well today, but you think it won't scale to
10 times the load. Experiment maybe try serverless architectures. See if that works
for you. Basically, give it a go. And mechanical sympathy.
So be aware of all the AWS services, and that's really, really hard. Reading some
blogs is also the right way of doing it. But you still need to be on top of the game.
Because when new changes have to happen, they can really change dramatically,
your solution architecture. So for AWS services for performance efficiency, well,
selections, so Auto Scaling, Lamda, EBS, S3, RDS
so you have so many choices of technology that scale a little different patterns.
So choose the right one for you. Definitely, for example, Lamda needs to serverless,
Auto Scaling is going to be for more EC2,
EBS is when you know you need to have a disc, but you can sort of manage
performance over time using GB2 or I01.
S3 if you want to scale globally, RDS, maybe you wanna use it to provision a
database and maybe you want to migrate to Aurora.
So how do we know we are performing
really well and as expected?
Then CloudWatch with CloudWatch Alarms, CloudWatch metrics.
All these things CloudWatch dashboards can help you
understand better how things work.
AWS Lamda, as well.
Making sure that you don't throttle, that your application Lambda function runs in a
minimal time, all that kind of stuff,
And tradeoffs.
So how do we make sure that we are doing the right performance decision? So RDS
maybe versus Aurora.
ElastiCache if you want to improve real performance,
maybe using Snowball.
So Snowball, for example, will give us a lot of data moving very fast, but it will take
maybe a week for the data to arrive.
So the tradeoff is, do we want the data right away in the cloud and use all our
network capacity or do we wanna move that data through a track and get this in a
week from now.
So always Tradeoffs, right?
With ElastiCache, always a Tradeoff as well. Do I want to have possibly outdated,
stale data in a cache but really improve performance? Or do I wanna get the latest
and not use ElastiCache?
CloudFront same thing. It does cache stuff around the edges. So if you use
CloudFront, yes, you go global in minutes, but you have the possibility of everything
being cached for one day on people's laptops.
So when you release an update to your websites, maybe it will take time for people
to get the new stuff. So think about all these things.

That is cost optimization.


it's the ability to run systems to deliver business value but at the lowest price point
possible that makes a lot of sense.
Now, design principles we'll adopt a consumption mode. So, pay for only what you
use. So, for example, AWS Lambda is one of these services,
if you don't use AWS Lambda you don't pay for it,
where as RDS, if you don't use your database you still pay for it because you've
provisioned your database. So it's a really interesting trade off here.
Measure overall efficiency, use CloudWatch, are you using your resources
effectively? Then we have a small ad for AWS, but the idea is that if you move to the
cloud then you stop spending money on data center operations
because AWS has the infrastructure for you, and they just allow you to focus on your
applications, your systems, and you need to analyze and attribute expenditure, so
that means if you don't use tags on your AWS resources, you're going have a lot of
trouble figuring out which application is costing you a lot of money,
so using tags insures that you are able to track the cost of each application and
optimize them over time and get a ROI based on how much money you generate
from your business. Finally, use managed and application level services to
reduce the cost of ownership. So that means that because managed services
operate a cloud scale, they can offer such a lower cost per transaction or service
and that's really something you have to remember about the cloud, they operate at
cloud scale, I've read news, about people, like three engineers, only three of them,
AWS engineers and they manage an application that serves five million people,I
mean imagine that, three people manage a global application on AWS for five million
people,
just because they are able to leverage the cloud and operate at cloud scale. So in
terms of cost optimization, what do we get? Making sure we know what costs us
something, so Budgets, Cost and Usage Reports, Cost Explorer and for example
Reserved Instance Reporting, making sure that if we do reserve and instance,
we're actually using them and not just paying for unused reserved instances.
Cost-Effective Resources, are we using the right stuff, for example can we use Spot
Instances, they are considerably cheaper, yes they do have some trade offs but can
we use them? Can we use cost-effective resources?
Or if we know we're using a EC2 instance for over a year, maybe three years
because we provisioned a database on it. Can we use reserved instances?

That'll be a great way of saving money. AWS Glacier, so are we basically putting our
archives in the lowest price point possible and Glacier is the lowest price point
possible. Are we matching supply and demand? So are we not over provisioning. So
again, auto scaling or maybe AWS Lambda if you're using serverless infrastructure
and are we optimizing over time,
so getting information from trusted advisor, or again looking at our cost and usage
report, or even reading the news blog. So let me just share with you a small story.
There was this ELB feature and it allowed to use HTTP and HTTPS traffic going in
but you couldn't do redirect of HTTP to HTTPS beforenand so you have to spin up
an application that was doing the redirect behind the scenes and that application
was costing me a little bit of money, but then reading the news blog they said now
you can straight from the ELB configure redirect of HTTP to HTTPS and that was
great, it saved me a bit of money every month just for that one feature, so reading
the news does allow you to optimize your costs and make sure you have the right
price point. Last story is for example,
if you're running an application on DynamoDB, but it's really inactive, it's really slow
application or you don't need to use a lot of operations. Maybe you're way better of
using the on demand feature of DynamoDB instead of using the reserved capacity
that'll use you RCU and so on.

So what is a disaster?
Well it's any event that has a negative impact on a company's business continuity or
finances, and so disaster recovery is about preparing and recovering from these
disasters.
So what kind of disaster recovery can we do on AWS or on general?
Well we can do on-premise to on-premise. That means we have a first data
center, maybe in California, another data center, maybe in Seattle, and so this is
traditional disaster recovery and it's actually very, very expensive.
Or we can start using the cloud and do on-premise as a main data center and then
if we have any disaster, use the cloud. So this is called a hybrid recovery.
Or if you're just all in the cloud then you can do AWS Cloud Region A to Cloud
Region B, and that would be a full cloud type of disaster recovery.
Now before we do the disaster recovery, we need to define two key terms, and you
need to understand them from an exam perspective. The first one is called
RPO, recovery point objective, and the second one is called
RTO, recovery time objective.
So what is RPO and RTO?
The first one is the RPO, recovery point objective,
this is basically how often basically you run backups, how back in time can
you to recover.
And when a disaster strikes, basically, the time between the RPO and the disaster
is going to be a data loss. For example, if you back up data every hour and a
disaster strikes then you can go back in time for an hour and so you'll have lost one
hour of data. So the RPO, sometimes it can be an hour, sometimes it can be maybe
one minute.
It really depends on our requirements,
but RPO is how much of a data loss are you willing to accept in case of a
disaster happens?
RTO on the other end is when you recover from your disaster and so on, between
the disaster and the RTO is the amount of downtime your application has.
So sometimes it's okay to have 24 hours of downtime, I don't think it is.
Sometimes it's not okay and maybe you need just one minute of downtime, okay.
So basically optimizing for the RPO and the RTO does drive some solution
architecture decisions, and obviously the smaller you want these things to be,
usually the higher the cost. So let's talk about disaster recovery strategies.
The first one is backup and restore. Second one is pilot light,
third one is warm standby, and fourth one is hot site or multi site approach.
So if we basically rank them, all will have different RTO.
Backup and restore will have the smaller RTO.
Pilot light, then warm standby and multi site, all these things cost more money but
they get a faster RTO.
That means you have less downtime overall. So let's look at all of these one by one
in details to really understand from an architectural standpoint what they mean.
Backup and restore has a high RPO. That means that you have a corporate data
center, for example, and here is your AWS Cloud and you have an S3 bucket. And
so if you want to backup your data over time, maybe we can use AWS' Storage
Gateway or and have some lifecycle policy put data into Glacier for cost optimization
purposes, or maybe once a week you're sending a ton of data into Glacier using
AWS' Snowball. So here you know if you use Snowball, your RPO is gonna be about
one week because if your data center burns or whatever and you lose all your data
then you've lost one week of data because you send that Snowball device once a
week. If you're using the AWS' Cloud instead, maybe EBS volumes, Redshift and
RDS. If you schedule regular snapshots and you back them up
then your RPO is going to be maybe 24 hours or one hour based on how frequently
you do create these snapshots. And then when you have a disaster strike you and
you need to basically restore all your data then you can use AMIs to recreate EC2
instances and spin up your applications or you can restore straight from a snapshot
and recreate your Amazon RDS database or your EBS volume or your Redshift,
whatever you want. And so that can take a lot of time as well to restore this data and
so you get a high RTO as well.
But the reason we do this is actually it's quite cheap to do backup and restore. We
don't manage infrastructure in the middle, we just recreate infrastructure when we
need it, when we have a disaster and so the only cost we have is the cost of storing
these backups.
So it gives you an idea. Backup and restore, very easy, pretty expense-- not too
expensive and you get high RPO, high RTO.
The second one is going to be pilot light.
So here with pilot light, a small version of the app is always running in the cloud, and
so usually that's going to be your critical core, and this is what is called pilot light. So
it's very similar to backup and restore, but this time it's faster because your critical
systems, they're already up and running and so when you do recover, you just need
to add on all the other systems that are not as critical.
So let's have an example.
This is your data center, it has a server and a data base, and this is the AWS' Cloud.
Maybe you're doing to do continuous data replication from your critical database into
RDS which is going to be running at any time so you get an RDS database ready to
go running. But your EC2 instances, they're not critical just yet. What's really
important is your data, and so they're not running, but in case you have a disaster
happening, Route 53 will allow you fail over from your server on your data center,
recreate that EC2 instance in the cloud and make it up and running, but your RDS
database is already ready. So here what do we get?
Well we get a lower RPO, we get a lower RTO and we still manage costs. We still
have to have an RDS running, but just the RDS database is running, the rest is not
and your EC2 instance only are brought up, are created when you do a disaster
recovery. So pilot light is a very popular choice.
Remember, it's only for critical core assistance.
Warm standby is when you have a full system up and running but at a minimum
size so it's ready to go, but upon disaster, we can scale it to production load.
So let's have a look. We have our corporate data center.
Maybe it's a bit more complicated this time. We have a reverse proxy, an app
server, and a master database, and currently our Route 53 is pointing the DNS
to our corporate data center. And in the cloud, we'll still have our data replication to
an RDS Slave database that is running.
And maybe we'll have an EC2 auto scaling group, but running at minimum capacity
that's currently talking to our corporate data center database. And maybe we'll have
an ELB as well, ready to go. And so if a disaster strikes you, because we have a
warm standby,
we can use Route 53 to fail over to the ELB and we can use the failover to also
change where our application is getting our data from. Maybe it's getting our data
from the RDS Slave now, and so we've effectively basically stood by and then
maybe using auto scaling, our application will scale pretty quickly.
So this is a more costly thing to do now because we already have an ELB and EC2
Auto Scaling running at any time, but again, you can decrease your RPO and your
RTO doing that. And then finally we get the multi site/hot site approach. It's very low
RTO, we're talking minutes or seconds but it's also very expensive. But you get two
full production scales running on AWS and On Premise. So that means we have
your On Premise data center, full production scale, you have your AWS data center,
full production scale with some data replication happening. And so here what
happens is that because you have a hot site that's already running, your Route 53
can route request to both your corporate data center and the AWS Cloudland it's
called an active, active type of setup. And so the idea here is that the failover can
happen. Your EC2 can failover to your RDS Slave database if need be, but you get
full production scale running on AWS and On Premise, and so this costs a lot of
money, but at the same time, you're ready to fail over, you're ready and you're
running into a multi DC type of infrastructure which is quite cool. Finally, if you
wanted to go all cloud, you know it would be the same kind of architecture.
It will be a multi region so maybe we could use Aurora here because we're really in
the cloud, so we have a master database in a region and then we have your Aurora
Global database that's been replicated to another region as a Slave
and so these both regions are working for me
and when I want to failover, you know,
I will be ready to go full production scale again
in another region if I need to. So this gives you an idea of all the strategies you can
have for disaster recovery. It's really up to you to select the disaster
recoverystrategy you need, but the exam will ask you basically based on some
scenarios, what do you recommend?Do you recommend backup and restore?
Pilot light?
Do you recommend multi site or do you recommend hot site?All that kind of
stuff.Warm backups and all that stuff.Okay so finally, disaster recovery tips, and it's
more like real life stuff.So for backups, you can use EBS Snapshots,RDS automated
snapshots and backups, et cetera.And you can push all these snapshotsregularly to
S3, S3IA, Glacier.You can implement a Lifecycle Policy.You can use Cross Region
Replicationif you wanted to make sure these backupswould be in different
regions.And if you want to share your data from On-Premiseto the cloud, Snowball
or Storage Gatewaywould be great technologies.For high availability, using Route
53 to migrate DNSfrom a region to another regionis really, really helpful and easy to
implement.We can also use technology to have multi-AZ implemented,such as RDS
Multi-AZ, ElastiCache Multi-AZ,EFS, S3, all these things arehighly available by
default if you enable that website.If you're talking about the high availabilityof your
network, maybe you've implementedDirect Connect to connect from yourcorporate
data center to AWS.But what if the connection goes down for whatever
reason?Maybe you can use Site to Site VPNas a recovery option for your network.In
terms of replication, you can useRDS Replication Cross Region, Aurora,and Global
Databases.Maybe you can use a database replication softwareto do your on-
premise database to RDS,or maybe you can use Storage Gateway as well.In terms
of automation, so how do we recover from disasters?I think you would know
already,Cloudformation/Elastic Beanstalk can help recreatewhole new environments
in the cloud very quickly.Or maybe if you use CloudWatch, we can recoveror reboot
our EC2 instanceswhen the CloudWatch alarms fail.AWS Lambda can also be great
to customize automation.So they're great to do rest API
but they can also be used to automate your entireAWS infrastructure, and so
overall,if you can manage to automate your whole disaster recoverythen you are
really, really well-set for success.And then finally, chaos testing,so how do we know
how to recover from a disaster?Then you create disasters, and soand example
that's, I think,widely quoted now in the AWS' worldis that Netflix, they run everything
on AWS,and they have created something called asimian-army, and they randomly
terminateEC2 instances, for example.They do so much more, but basicallythey just
take an application serverand terminate it randomly.In production, okay?Not in divert
test, in production. So they want to make sure that their infrastructureis capable to
survive failures,and so that's why they're runninga bunch of chaos monkeys that just
terminate stuff randomly just to make sure that their infrastructureis rock-solid and
can survive any types of failures.
So that's it for this section on disaster recovery.

What is NOT an advantage of using Typed Parameters


It ensures your CloudFormation will not fail

Even if all the parameters are valid, your CloudFormation may still fail. Maybe the
combination of parameters is not valid (subnets not belonging to the selected VPC for
example). You cannot have constraints or sub paratemers (yet). All in all, CloudFormation
templates can fail even if using Typed Parameters. They just greatly reduce the risk of
errors

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ec2-
instance.html
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-template-resource-type-
ref.html

When is a mapping a good choice against a parameter?

Mappings have to be completely exhaustive and thus they reduce the template
flexibility. Parameters are perfect for highly changing values, while mappings are
great for stable values that won't change in time, but can be determined ahead of
time

What are pseudo parameters?

What are pseudo parameters?


They're pre-populated AWS parameters that you can reference in your template, to
figure out the region, the stack name etc

AWS::AccountId, AWS::NotificationARNs, AWS::NoValue, AWS::Region ,


AWS::StackId , AWS::StackName are all valid pseudo parameters
Any outputs can be referenced crossed stack..NO
ANS:
You need to EXPORT the output value before being able to use it in another stack

Das könnte Ihnen auch gefallen