App Engine Web Architecture

Architecture: Web Application on Google App Engine
Developers leverage Google App Engine to simplify development and deployment of Web Applications. These applications use the autoscaling
compute power of App Engine as well as the integrated features like distributed in-memory cache, task queues and datastore, to create robust
applications quickly and easily.
App Engine is Googles PaaS platform, a robust development environment for applications written in Java, Python, PHP and Go. The SDK for App
Engine supports development and deployment of the application to the cloud. App Engine supports multiple application versions which enables easy
rollout of new application features as well as traffic splitting to support A/B testing.
Integrated within App Engine are the Memcache and Task Queue services. Memcache is an in-memory cache shared across the AppEngine
instances. This provides extremely high speed access to information cached by the web server (e.g. authentication or account information).
Task Queues provide a mechanism to offload longer running tasks to backend servers, freeing the front end servers to service new user requests.
Finally, App Engine features a built-in load balancer (provided by the Google Load Balancer) which provides transparent Layer 3 and Layer 7 load
balancing to applications.
Googles Cloud DNS service can be used to manage your DNS zones.
How to leverage App Engine to create scalable and cost-effective

applications.
Building a Scalable Website
Imagine you were asked to build a website that tens of thousands, or even millions of people will be accessingsometimes all at the same time. You
would be challenged to provide a fast, reliable service, and it would be essential for the website to scale appropriately to demand.
Does your team have sufficient experience to build a scalable and reliable cluster of web servers? Can you easily and accurately estimate the cost of
developing, operating, and maintaining your servers? Do you have domain expertise to hire the best people for the job?
Google App Engine offers excellent solutions for these challenges. In this document, we will look at how App Engine handles incoming user requests
and how it scales your application as traffic increases and decreases. You will learn how to configure App Engine's scaling behavior to find the optimal
balance between performance and cost, so that you can successfully harness the power and flexibility of Google App Engine for your projects.
Life of a Request in the App Engine Architecture

To begin, you need to understand how user requests are handled inside App Engine, how they are delivered to your application instance, and how the
response is returned to the user. Understanding this overall flow will help you determine how to optimize your application. Figure 1 shows how service
requests and responses flow through App Engines internal architecture.
Figure 1:
How user requests are routed to application instances
A user request is routed to the geographically closest Google data center, which may be in the United States, the European Union, or the Asia-Pacific
region. In that data center, an HTTP server known as the Google front end receives the request and routes it through the Google-owned fiber
backbone to an App Engine data center that runs your application.
The App Engine architecture includes the following components:
App Engine front end servers are responsible for load balancing and failover of App Engine applications. As shown in Figure 1, the App
Engine front end receives user requests from the Google front end and dispatches each request either to an app server for dynamic content or
to a static server for static content.
App servers are containers for App Engine applications. An app server creates application instances and distributes requests based on
traffic load. The application instances contain your codethe request handlers that implement your application.
The app server runtime environment includes APIs to access the full suite of App Engine services, allowing you to easily build scalable and
highly available web applications.
Static servers are dedicated to serving static files for App Engine applications; they are optimized to provide the content rapidly with minimal
latency. (To further reduce latency, Google may move high-volume public content to an edge cache.)
The app master is the conductor of the whole App Engine orchestra. When you deploy a new version of an App Engine application, the app
master uploads your program code to app servers and your static content to static servers.
The App Engine front end and the app servers work together, tracking the available application instances as your application uses more or fewer
instances.
App Engine is a Container Technology

App Engine is a container-based platform as a service (PaaS) offering. An application instance, running in an app server, is the container that isolates
resources between applications. Each application instance is guaranteed to receive dedicated server resources, such as CPU time and memory, as
well as strict security isolation.
Application instances are created and managed like Linux processes, so they can be started quickly and consume minimal system resources. In
contrast, the unit of allocation and management in a typical infrastructure as a service (IaaS) offering is a hypervisor-based virtual machine (VM).
Figure 2 illustrates the differences in the amount of resources that need to be initialized for a new VM instance compared to a new App Engine
instance.
Figure 2: Comparing VMs to application instances

The most significant difference between VMs and App Engine is in how the operating system overhead is managed. Each VM deployment hosts its
own image of an operating system, which requires memory resources in the host machine. The developer is typically responsible for creating, storing,
and managing that image. This may include applying security patches, installing device drivers, and performing administrative and maintenance tasks.
It can take tens of seconds to boot up an operating system each time a new VM is added to handle more requests. In contrast, an application instance
can be created in a few seconds.
Application instances operate against high-level APIs, eliminating the need to communicate through layers of code with virtual device drivers.
In summary, application instances are the computing units that App Engine uses to scale your application. Compared to VMs, application instances
are fast to initialize, lightweight, memory efficient, and cost effective.
Overall, App Engine scales easily because its lightweight-container architecture can spin up application instances far more quickly than VMs can be
started.
The next section describes how you can design your application to take advantage of that architecture.
Optimizing your App Engine Application

When you design and deploy your App Engine application, strive to minimize the following two related factors:
The response time to user requests
The overall cost of application instances running inside the app servers
To reduce the response time, App Engine automatically increases the number of instances based on the current load and developer-specific
configuration parameters. However, additional instances cost more. To minimize additional cost, you need to understand how and when instances are
created or deleted by App Engine in response to changes in the traffic load.
The following sections explain what an instance is and how to configure its parameters to balance responsiveness and cost effectiveness for your
application.
Best Practices for Optimizing Scalability

Web applications are typically partitioned into two types of processing: immediate, real-time, interactive processing for a user request, and longer-term
processing, such as complex database updates, batch processing, or integration with other, slower systems.
App Engine recognizes this distinction by providing frontend instance classes (F1, F2, F4) for low-latency interactive responses and backend instance
classes (B1, B2, B4, B8) for high-latency background processing.
Code modules deployed to a frontend instance receive requests from clients and process them quickly, typically in the range of tens of milliseconds up
to a few seconds. App Engine requires that frontend instances respond to each request within 60 seconds. If your application logic cannot fully
process a request in that time, you can use task queues to defer the processing.
Code modules deployed to a backend instance do not have a limit on processing time and can be used to process tasks from a task queue, do longrunning computations, run MapReduce jobs or implement other data-processing pipelines.
There are three types of instance scaling available. For modules on backend instance classes, you use either:
Manual scalingApp Engine creates the number of instances that you specify in a configuration file. The number of instances you configure
depends on the desired throughput, the speed of the instances, and the size of your dataset balanced against cost considerations. Manually
scaled instances run continuously, so complex initializations and other in-memory data is preserved across requests.
Basic scalingApp Engine creates instances to handle requests and releases the instances when they becomes idle. Basic scaling is ideal
and cost effective if your workload is intermittent or driven by user activity.
For modules on frontend instance classes, you use automatic scaling: App Engine adjusts the number of instances based on the request rate,
response latencies, and other application metrics. You can control the scaling of your instances to meet your performance requirements.
Overall, the dynamic scalability and cost effectiveness of an App Engine application is primarily controlled by the design and configuration of the
frontend instances. Follow these important best practices to optimize their scalability:
1. Design for reduced latency and more queries per second (QPS).
2. Optimize idle instances and pending latency.
The rest of this paper will focus on applying these practices to automatic-scaling frontend instance classes. Refer to the App Engine Modules
documentation (Java, Python, Go, PHP) for information on managing backend instance classes.
Less Latency, More QPS

Queries per second (QPS) characterizes the capacity of an instance. QPS is defined as the number of HTTP requests one instance can process in
one second1. For example, peak traffic in the Open for Questions application was about 700 QPS, and when App Engine hosted the Royal Wedding
website, the worldwide media coverage generated around 32,000 QPS at its peak.
To handle more QPS, add more instances, as expressed by this formula:
Total QPS = Average QPS x Number of Instances
You can find the total number of instances and average QPS for your application on the Instances page of the App Engine Admin Console, as
shown in Figure 3.
Figure 3: Instances page of the Admin Console

In this example, the average QPS is 10.479. Because there is a total of seven instances running for this application, the site is processing 73.353 QPS
in total.2
As a best practice, strive to minimize the time required to process each request. You can check on the total processing time for requests on the Logs
page of the Admin Console (Figure 4).
Figure 4: Latency shown in the log

The example request in Figure 4 shows a latency of 402 ms. This time is important for two reasons:
A slow response directly impacts the user experience.
Your instance is busy during that time, so App Engine may create another instance to handle additional requests. If you can optimize your code
to execute in 200 ms, the user experience is improved, and your QPS may be doubled without running extra instances.
Appstats is a powerful tool you can use to understand, optimize, and improve your applications QPS. As shown in Figure 5, it shows the number of
RPC calls that are invoked inside each request, the duration of each RPC call (such as Datastore or Memcache access), and the contribution of each
RPC call to the overall latency of the request. This information gives you hints for finding bottlenecks in your application.
Figure 5: Example timeline from Appstats
If the Appstats graphs indicate that your applications bottleneck is in CPU-intensive tasks, rather than waiting for RPC calls to return, you could try
a higher CPU class for the frontend instances to reduce the latency. While this increases the CPU cost of each instance, the number of instances
required to support the load will decrease, and the user experience improves without a major shift in the total cost.
You can also increase the QPS by letting App Engine assign multiple requests to each instance simultaneously. By default, one instance can run only
one thread to prevent unexpected behavior or errors caused by concurrent processing. If your application code is thread-safe and implements proper
concurrency control, you can increase the QPS of each instance without additional cost by specifying the threadsafe element in the configuration
file.
Optimize Idle Instances and Pending Latency

While QPS represents the total throughput of your application, other parameters, such as idle instances and pending latency, determine the elasticity
of your application scalability. Configure these parameters on the Application Settings page of the Admin Console 3 (Figure 6).
Figure 6: Idle instances and pending latency parameters
Idle Instances
Idle instances help your site handle a sudden influx of requests. Usually, requests are handled by existing, active, available application instances. If a
request arrives and there are no available application instances, App Engine may need to activate an application instance to handle that request
(called a loading request). A loading request takes longer to respond, because it must wait while the new instance is initialized.
Idle instances (also called resident instances) represent the number of instances that App Engine keeps loaded and initialized, even when the
application is not receiving any requests. The default is to have zero idle instances, which means requests will be delayed every time your application
scales up to more instances.
You can adjust the minimum and maximum number of idle instances independently with sliders in the Admin Console.
We recommend that you maintain idle instances if you do not want requests to wait for instance creation and initialization. For example, if you specify
a minimum of ten idle instances, your application will be able to service a burst of requests immediately on those ten instances. We recommend that
you allocate idle instances carefully because they will always be resident and incur some cost.
You can also set an upper limit to the number of idle instances. This parameter is designed to control how gradually App Engine reduces the number
of idle instances as load levels return to normal after a spike. This helps your application maintain steady performance through fluctuations in request
load, but it also raises the number of idle instances (and consequently running cost) during periods of heavy load. Lowering the maximum number of
idle instances can reduce cost.
Pending Latency
Pending latency is the time that a request spends in a pending queue for an app server. You can set minimum and maximum values for this
parameter.
When an App Engine front end receives a request from a user and no instance is available to service that request, the request is added to a pending
queue until an instance becomes available. App Engine tracks how long requests are held in this queue. If requests are held for too long, App Engine
creates another instance to distribute the load. Figure 7 shows how instances are added or deleted based on traffic volume.
Figure 7: Busy instances, pending queue and pending latency

The minimum pending latency is the expected and acceptable latency for the pending queue. App Engine will always wait the specified minimum
pending latency for an instance to become available. Once the minimum is reached, App Engine applies heuristics to determine whether and when to
start an additional instance.4 (Waiting for an existing instance to become available may be faster, and it is certainly cheaper, than starting a new one.)
The maximum pending latency is the threshold of unacceptable latency. If a request is still pending when the specified maximum latency is reached,
App Engine immediately starts a new instance to serve it. For example, if you set the the maximum pending latency to one second, App Engine will
create a new instance if a request has been waiting in the pending queue for more than one second. Adding more instances results in increased
throughput and incurs more cost.
Note: If you have specified a minimum number of idle instances, the pending latency parameters will have little or no effect (unless there is a
sustained traffic spike that grows to exhaust the idle instances faster than they can be initialized).
Best Practices and Anti-Patterns

Table 1 describes what it means to set the minimum and maximum values on idle instances and pending latency. Based on this matrix, you can
optimize these parameters for your requirements.
Table 1: Semantics of idle instances and pending latency
Idle Instances Minimum5
Specifies
Idle Instances Maximum
Pending Latency
Pending Latency
Minimum
Maximum6
Minimum number of
Maximum number of
Time to hold requests on
Time to wait before
resident instances
resident instances
Pending Queues
creating new instances
Low
- Fewer instances before
- Fewer instances after
- More instance creation
- More instance creation
settings
spike
spike
- Higher cost
- Higher cost
- Lower cost
- Lower cost
- More instances before
- More instances after
- Slower response
- Slower response
High
settings
spike
spike
- Higher cost
- Higher cost
- Lower cost
- Lower cost
For example, if you expect high traffic to your site because you have scheduled an event or expect major media coverage related to a product release,
you could increase the minimum number of idle instances and decrease the maximum pending latency shortly before and during the event to
smoothly handle traffic spikes.
Known anti-patterns are to set the minimum and maximum idle instances close to each other or specify a very small pending latency gap. Either of
these may cause unexpected scaling behavior in your application.
We recommend the following configurations:
Best performanceIncrease the value for the minimum number of idle instances and lower the maximum pending latency while leaving the
other settings on automatic.
Lowest costKeep the number of maximum idle instances low and increase the minimum pending latency while leaving the other settings on
automatic.
We also recommend that you conduct a load test of your application before trying out the recommended settings. This will help you choose the best
values for idle instances and pending latency.
Minimizing Loading Request Time

It is also important to reduce the time it takes for a loading request to complete. Loading requests take a long time to complete and result in a poor
user experience for the users who happen to trigger them. In extreme cases, the user request may time out.
Do the following to minimize the time required for loading requests:
Load only the minimum amount of code required for startup.
Access the disk as little as possible.
Load code from a zip or jar file, which is faster than loading from many separate files.
If you cannot decrease the time required for a loading request to complete, you may need to have more idle instances to ensure responsiveness when
the load increases. Reducing the loading request time increases the elasticity of your application and lowers the cost.
Special Notes:One of the biggest advantages of Google App Engine is that lightweight application instances can be added within a few seconds. This enables highly
elastic scaling which adapts to sudden increases in traffic volume. To benefit from this power, you have to understand how requests are distributed to
application instances, how to maximize the QPS of your application by increasing the throughput per instance, and how to control elasticity. By
following best practices, you can build web applications that scale smoothly when traffic increases rapidly. In addition, following best practices helps
you tune your application for an optimal balance of cost and performance.
1. The term QPS is Googles terminology to express requests per second. It includes all HTTP requests to the servers and is not restricted to
search queries.
2. For the mathematically precise: QPS is computed over the past 60 seconds. The seven instances handled 4401 requests in 60 seconds for
4401 / 60 = 74.35 QPS, so the average is 74.35 / 7 = 10.479 QPS. For the first instance: 13.133 QPS implies that the instance processed
13.333 * 60 = 787 requests.
3. If you convert your application to use modules, this graphical interface is replaced by parameter settings in the per-module configuration
files.
4. App Engine knows what requests are outstanding, how long those requests are likely to take (from past statistics), and how loaded the various
app servers are. This means it can predict whether an instance will be available in time to service a request before the maximum pending
latency is reached.
5. The minimum idle instances is enabled using the Console for a paid app.
Application hierarchy
At the highest level, an App Engine application is made up of one or more modules. Each module consists of source code and configuration files. The
files used by a module represent a version of the module. When you deploy a module, you always deploy a specific version of the module. For this
reason, whenever we speak of a module, it usually means a version of a module.
You can deploy multiple versions of the same module, to account for alternative implementations or progressive upgrades as time goes on.
Every module and version must have a name. A name can contain numbers, letters, and hyphens. It cannot be longer than 63 characters and cannot
start or end with a hyphen.
While running, a particular module/version will have one or more instances. Each instance runs its own separate executable. The number of instances
running at any time depends on the module's scaling type and the amount of incoming requests:
Stateful services (such as Memcache, Datastore, and Task Queues) are shared by all the modules in an application. Every module, version, and
instance has its own unique URI (for example, v1.my-module.my-app.appspot.com). Incoming user requests are routed to an instance of a
particular module/version according to URL addressing conventions and an optional customized dispatch file.
Note: After April 2013 Google does not issue SSL certificates for double-wildcard domains hosted at appspot.com (i.e.*.*.appspot.com). If you
rely on such URLs for HTTPS access to your application, change any application logic to use "-dot-" instead of ".". For example, to access version
v1 of application myapp use https://v1-dot-myapp.appspot.com. The certificate will not match if you
use https://v1.myapp.appspot.com, and an error occurs for any User-Agent that expects the URL and certificate to match exactly.
Instance scaling and class

While an application is running, incoming requests are routed to an existing or new instance of the appropriate module/version. The scaling type of a
module/version controls how instances are created. There are three scaling types: manual, basic, and automatic.
Manual Scaling
A module with manual scaling runs continuously, allowing you to perform complex initialization and rely on the state of its memory over time.
Basic Scaling
A module with basic scaling will create an instance when the application receives a request. The instance will be turned down when the app
becomes idle. Basic scaling is ideal for work that is intermittent or driven by user activity.
Automatic Scaling
Automatic scaling is the scaling policy that App Engine has used since its inception. It is based on request rate, response latencies, and other
application metrics. Previously users could use the Admin Console to configure the automatic scaling parameters (instance class, idle
instances and pending latency) for an application's frontend versions only. These settings now apply to every version of every module that has
automatic scaling.
Each scaling type offers a selection of instance classes, with different amounts of CPU and Memory. The following tables list the features of the three
types of scaling, and the service levels and costs of the various instance classes:
Scaling Types
Feature
Automatic Scaling
Manual Scaling
Basic Scaling
Deadlines
60-second deadline for
Requests can run indefinitely. A manually-scaled instance
Same as manual
HTTP requests, 10-
can choose to handle `/_ah/start` and execute a program
scaling.
minute deadline for
or script for many hours without returning an HTTP
tasks
response code. Tasks can run up to 24 hours.
Not allowed
Allowed
Allowed
Configurable by
Configurable by selecting a B1, B2, B4, B4_1G, or B8
Configurable by
selecting an F1, F2, F4,
instance class
selecting a B1, B2, B4,
Background
Threads
CPU/Memory
or F4_1G instance class
B4_1G, or B8 instance
class
Residence
Instances are evicted
Instances remain in memory, and state is preserved across
Instances are evicted
from memory based on
requests. When instances are restarted, an `/_ah/stop`
based on the ìdle-
usage patterns.
request appears in the logs. If there is a registered stop
timeout` parameter. If
callback method, it has 30 seconds to complete before
an instance has been
shutdown occurs.
idle, i.e. has not

received a request, for
more than ìdletimeout`, then the

instance is evicted.
Startup and
Instances are created
Instances are sent a start request automatically by App
Instances are created
Shutdown
on demand to handle
Engine in the form of an empty GET request to `/_ah/start`.
on demand to handle
requests and
An instance that is stopped with àppcfg stop` (or via the
requests and
automatically turned
Admin Console UI) has 30 seconds to finish handling
automatically turned
down when idle.
requests before it is forcibly terminated.
down when idle, based

on the ìdle-timeout`
configuration
parameter. As with
manual scaling, an
instance that is
stopped with àppcfg
stop` (or via the Admin
Console UI) has 30
seconds to finish
handling requests
before it is forcibly
terminated.
Instance
Instances are
Instances are addressable at URLs with the form:

`http://instance.version.module.app_id.appspot.com`. If you
Same as manual
Addressability
anonymous.
have set up a wildcard subdomain mapping for a custom
scaling.
domain, you can also address a module or any of its

instances via a URL of the form
`http://module.domain.com` or
`http://instance.module.domain.com`. You can reliably
cache state in each instance and retrieve it in subsequent
requests.
Scaling
App Engine scales the
You configure the number of instances of each module
A basic scaling module
number of instances
version in that modules configuration file. The number of
version is configured
automatically in
instances usually corresponds to the size of a dataset
with a maximum
response to processing
being held in memory or the desired throughput for offline
number of instances
volume. This scaling
work.
using the
factors in the
`basic_scaling`
àutomatic_scaling`
setting's
settings that are
`max_instances`
provided on a per-
parameter. The
version basis in the
number of live
configuration file
instances scales with
uploaded with the
the processing volume.
module version.
Free Daily
28 instance-hours
8 instance-hours
8 instance-hours
Usage Quota
Instance classes
Instances are priced based on an hourly rate determined by the instance class.
Instance Class
Memory Limit
CPU Limit
Cost per Hour per Instance
B1
128 MB
600 Mhz
$0.05
B2
256 MB
1.2 Ghz
$0.10
B4
512 MB
2.4 Ghz
$0.20
B4_1G
1024 MB
2.4 Ghz
$0.30
B8
1024 MB
4.8 Ghz
$0.40
F1
128 MB
600 Mhz
$0.05
Instance Class
Memory Limit
CPU Limit
Cost per Hour per Instance
F2
256 MB
1.2 Ghz
$0.10
F4
512 MB
2.4 Ghz
$0.20
F4_1G
1024 MB
2.4 Ghz
$0.30
Manual and basic scaling instances are billed at hourly rates based on uptime. Billing begins when an instance starts and ends fifteen minutes after a
manual instance shuts down or fifteen minutes after a basic instance has finished processing its last request. Runtime overhead is counted against
the instance memory limit. This will be higher for Java than for other languages.
Important: When you are billed for instance hours, you will not see any instance classes in your billing line items. Instead, you will see the appropriate
multiple of instance hours. For example, if you use an F4 instance for one hour, you do not see "F4" listed, but you will see billing for four instance
hours at the F1 rate.

App Engine Web Architecture

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

App Engine Web Architecture

Hochgeladen von

Copyright:

Verfügbare Formate

Architecture: Web Application on Google App Engine

How to leverage App Engine to create scalable and cost-effective

Life of a Request in the App Engine Architecture

App Engine is a Container Technology

Figure 2: Comparing VMs to application instances

Optimizing your App Engine Application

The response time to user requests

Best Practices for Optimizing Scalability

Less Latency, More QPS

Figure 3: Instances page of the Admin Console

Figure 4: Latency shown in the log

A slow response directly impacts the user experience.

Figure 5: Example timeline from Appstats

Optimize Idle Instances and Pending Latency

Figure 6: Idle instances and pending latency parameters

Figure 7: Busy instances, pending queue and pending latency

Best Practices and Anti-Patterns

Idle Instances Minimum5

Idle Instances Maximum

Time to hold requests on

Time to wait before

creating new instances

- Fewer instances before

- Fewer instances after

- More instance creation

- More instance creation

- More instances before

- More instances after

Minimizing Loading Request Time

Load only the minimum amount of code required for startup.

Access the disk as little as possible.

Instance scaling and class

60-second deadline for

Requests can run indefinitely. A manually-scaled instance

HTTP requests, 10-

can choose to handle `/_ah/start` and execute a program

minute deadline for

or script for many hours without returning an HTTP

response code. Tasks can run up to 24 hours.

Configurable by selecting a B1, B2, B4, B4_1G, or B8

selecting an F1, F2, F4,

selecting a B1, B2, B4,

or F4_1G instance class

Instances are evicted

Instances remain in memory, and state is preserved across

Instances are evicted

from memory based on

requests. When instances are restarted, an `/_ah/stop`

based on the `idle-

request appears in the logs. If there is a registered stop

callback method, it has 30 seconds to complete before

an instance has been

idle, i.e. has not

more than `idletimeout`, then the

Instances are created

Instances are sent a start request automatically by App

Instances are created

Engine in the form of an empty GET request to `/_ah/start`.

An instance that is stopped with `appcfg stop` (or via the

Admin Console UI) has 30 seconds to finish handling

down when idle.

requests before it is forcibly terminated.

down when idle, based

Instances are addressable at URLs with the form: