Sie sind auf Seite 1von 3

Linux/Apache on ARM Processors

In The Case for Low-Cost, Low-Power Servers, I made the argument that the
right measures of server efficiency was work done per dollar and work done
per joule. Purchasing servers on single dimensional metrics like performance
or power or even cost alone, makes no sense at all. Single dimensional
purchasing leads to micro-optimizations that push one dimension to the
detriment of others. Blade servers have been one of my favorite examples of
optimizing the wrong metric (Why Blade Servers arent the Answer to All
Questions). Blades often trade increased cost to achieve server density. But
density doesnt improve work done per dollar nor does it produce better work
done per joule. In fact, density often takes work done per joule in the wrong
direction by driving higher power consumption due to the challenge of
cooling higher power densities.
There is no question that selling in high volume drives price reductions so
client and embedded parts have the potential to be the best
price/performing components. And, as focused as the server industry has
been on power of late, the best work is still in the embedded systems world
where a cell phone designer would sell their souls for a few more amp-hours
if they could have it without extra size or extra-weight. Nobody focuses on
power as much as embedded systems designers and many of the tricks
arriving in the server world showed up years ago in embedded devices.
A very common processor used in cell phone applications is the ARM. The
ARM business is model is somewhat unusual in that they sells a processor
design and then the design is taken and customized by many teams
including Texas Instruments, Samsung, and Marvel. These processors find
their way into cell phones, printers, networking gear, low-end Storage Area
Networks, Network Attached Storage devices, and other embedded
applications. The processors produce respectable performance and great
price/performance and absolutely amazing power/performance.
Could this processor architecture be used in server applications? The first
and most obvious push back is that its a differentinstruction set
architecture but servers software stacks really are not that complex. If you
can run Linux and Apache some web workloads can be hosted. There are
many Linux ports to ARM -- the software will run. The next challenge, and
this one is the hard one, does the workload partition into sufficiently fine
slices to be hosted on servers built using low end processors. Memory size
limitations are particularly hard to work around in that ARM designs have the
entire system on the chip including the memory controller and none Ive
seen address more than 2GB. But, for those workloads that do scale
sufficiently finely, ARM can work.
Ive been interested in seeing this done for a couple of years and have been
watching ARM processors scale up for quite some time. Well, we now have

an example. Check out That web site is hosted on 7

servers, each running the following:

Single 1.2Ghz ARM processor, Marvell MV78100

1 disk
1.5 GB DDR2 with ECC!
Debian Linux
Nginx web proxy/load balancer
Apache web server

Note that, unlike Intel Atom based servers, this ARM-based solution has the
full ECC memory support we want in server applications (actually you really
want ECC in all applications from embedded through client to servers).
Clearly this solution wont run many server workloads but its a step in the
right direction. The problems I have had when scaling systems down to
embedded processors have been dominated by two issues: 1) some
workloads dont scale down to sufficiently small slices (what I like to call bad
software but, as someone who spent much of his career working on database
engines, I probably should know better), and 2) surrounding component and
packaging overhead. Basically, as you scale down the processor expense,
other server costs begin to dominate. For example, If you half the processor
cost and also the throughput, its potentially a step backwards since all the
other components in the server didnt also half in cost. So, in this example,
you would get the throughput with something more than the cost.
Generally not good. But, whats interesting are those cases where its nonlinear in the other direction. Cut the cost to N% with throughput at M% where
M is much more than N. As these system on a chip (SOC) server solutions
improve, this is going to be more common.
Its not always a win based upon the discussion above but it is a win for
some workloads today. And, if we can get multi-core versions of ARM, itll be
a clear win for many more workloads. Actually, the Marvel MV78200 actually
is a two core SOC but its not cache coherent which isnt a useful
configuration in most server applications.
The ARM is a clear win on work done per dollar and work done per joule for
some workloads. If a 4-core, cache coherent version was available with a
reasonable memory controller, we would have a very nice server processor
with record breaking power consumption numbers. Thanks for the great work
ARM and Marvel. Im looking forward to tracking this work closely and I love
the direction its taking. Keep pushing.
James Hamilton

b: /