Sie sind auf Seite 1von 57

FACILITATING THE SPREAD OF KNOWLEDGE AND INNOVATION IN PROFESSIONAL SOFTWARE DEVELOPMENT

DevOps Toolchain
for Beginners
eMag Issue 23 - February 2015

ARTICLE

REVIEW

ARTICLE

Orchestrating Your
The LogStash Book,
Getting Started with
Delivery Pipelines
Log Management
Monitoring using
with Jenkins
Made Easy
Graphite
DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015
1

Orchestrating Your Delivery


Pipelines with Jenkins

Andrew Phillips and Jenkins creator Kohsuke Kawaguchi review state-of-the-art


plugins and solutions in the Jenkins ecosystem for achieving efficient, reproducible
and secure delivery pipelines.

Chef and to Cookbook


Development Flow
Infrastructure as Code is a tenet of the DevOps community.
But treating Infrastructure as Code is a tall order. Development
practices have also evolved rapidly and nowadays that means
continuous integration, automated tests and more. Well make a
brief introdution to Chef, a well-known IT automation tool, and
use it to illustrate the state of the art.

Docker: Using Linux


Containers to Support
Portable Application
Deployment
Docker is an open source tool to run
applications inside of a Linux container,
a kind of light-weight virtual machine. In
addition to running, it also offers tools to
distribute containerized applications through
the Docker index - or your own hosted Docker
registry - simplifying the process of deploying
complex applications.

Introduction to Puppet

In this article Susannah Axelrod, gives an overview of


both Puppet, the language, and Puppet, the platform,
discussing all the main concepts around them. Susannah
also writes about how to start an Infrastructure as Code
initiative as well as sharing additional learning resources
for those who want to know Puppet in-depth.

The LogStash Book,


Log Management
Made Easy
Getting Started with Monitoring
using Graphite
Setting up a new monitoring system might seem daunting at
first. Franklin guides us through the first steps and explains the
architecture and inner workings of a Graphite-based monitoring
system. Key takeaways are understanding time series data and
configuration, datapoint formats, aggregation methods and
retention.

FOLLOW US

James Turnbull makes a compelling case


for using Logstash for centralizing logging
by explaining the implementation details
of Logstash within the context of a logging
project. The book targets both small
companies and large enterprises through a
two sided case; both for the low barrier to
entry and the scaling capabilities.

CONTACT US
GENERAL FEEDBACK feedback@infoq.com
ADVERTISING sales@infoq.com
EDITORIAL editors@infoq.com

facebook.com
/InfoQ

@InfoQ

google.com
/+InfoQ

linkedin.com
company/infoq

One source of truth


See all your data. Boost performance. Drive accountability for everyone.

Mobile Developers
End-to-end visibility,
24/7 alerting, and
crash analysis.

Front-end Developers
Deep insights into
your browser-side
apps engine.

IT Operations

App Owners

Faster delivery.
Fewer bottlenecks.
More stability.

Track engagement.
Pinpoint issues.
Optimize usability.

Move from finger-pointing blame to data-driven accountability.


Find the truth with a single source of data from multiple views.
newrelic.com/truth

2008-15 New Relic, Inc. All rights reserved.

MANUEL
PAIS

is InfoQs DevOps Lead Editor and an enthusiast of


Continuous Delivery and Agile practices. Manuel Pais
tweets @manupaisable

A LETTER FROM THE EDITOR

Culture, collaboration and sharing are keywords


for enabling DevOps in any organization. Adopting
tools doesnt magically create a DevOps culture, but
designing and sharing the right tool chain for the
organization can bring about important benefits.
Time to deliver is reduced and perhaps more
importantly becomes predictable. Automation saves
time in the long run which can be used for forging a
DevOps culture and improving collaboration.
Furthermore, a clearly laid out toolchain
illustrates the flow of work from inception to
operations thus improving visibility of work to be
done and promoting continuous feedback. Typically
such a toolchain requires infrastructure, provisioning
and configuration management tools for testing
and deployment, but also build/delivery pipelines to
move code from source control all the way to running
in production. And lets not forget the need for some
monitoring love!
This eMag aims at providing an overview of an
example set of tools that would constitute a typical
toolchain. These are popular tools today, but you
should look at them as illustrations of the kind of

tasks and workflows you might need to perform in


your organization as you move along a DevOps path.
The crucial part is understanding your own
journey, your system requirements and getting all
the teams sharing a workflow that is conducive to
continuous delivery and feedback.
In the continuous integration tool space Jenkins
is one of the leading tools, to a large extent due to
its ever-growing plugins ecosystem. However, this
also makes it hard to figure out how to put in place a
robust delivery pipeline with so many plugins. Jenkins
creator Kohsuke Kawaguchi and Andrew Phillips from
XebiaLabs come to the rescue and illustrate some
good practices with an example pipeline from code
commits to production deployment.
Virtualization has become a standard way to
deal with provisioning and especially scaling systems
according to demand in a repeatable fashion. Docker
brought some novelty into the virtualization market
as a lightweight alternative to virtual machines
(faster, more portable deployments) and is quickly
gaining traction. Zef Hemel describes the challenges
companies face in deploying complex systems today

and how Docker can help solve this problem in an easy


to follow introduction.
Infrastructure as code is another pillar in coping
with complex deployments and all but the most trivial
infrastructure. Manually updating hundreds or even
dozens of machines is unrealistic and scripting can only
go so far before it becomes a maintenance nightmare
and a source of more problems. Configuration
management tools provide coding languages that
allow the specifying of machine state, and jobs to be
done during deployment or maintenance. Such tools
have evolved in the last years and now feature rich
ecosystems that support multiple types of testing,
static validation and dependency management, for
instance.
In this eMag we have included introductory
articles for two of the main configuration management
tools in the market today.
Joo Miranda describes the process of developing
a simple cookbook (a desired state for one or more

machines) using Chef and the surrounding ecosystem,


explaining the fundamental Chef concepts along the
way.
Susannah Axelrod from Puppet Labs explains
the fundamental concepts in Puppet and provides
useful advice on taking the first steps in adopting the
practice of infrastructure as code.
The last third of the eMag focuses on monitoring
your applications and making sense of your logs.
Logstash is a popular solution for centralized
log management and searching. InfoQs editor Aslan
Brooke reviewed James Turnbulls The Logstash
Book. The book digs into the tools nuts and bolts but
the review gives an overview of the use cases and
architecture in a friendly introduction.
Finally, Franklin Angulo describes the architecture
and practical usage of the Graphite stack for timeseries data storage and visualization of relevant
application and business metrics.

READ ONLINE ON InfoQ

ANDREW
PHILLIPS

is VP of products for XebiaLabs, providers of


application delivery automation solutions.
Andrew is an expert in cloud, service delivery,
and automation, and has been part of the shift to more automated
application-delivery platforms. In his spare time as a developer,
he worked on Multiverse, the open-source STM implementation,
contributes to Apache jclouds, the leading cloud library, and comaintains the Scala Puzzlers site.

KOHSUKE
KAWAGUCHI

Kawaguchi is CloudBees CTO and the


creator of Jenkins. He is a well-respected
developer and popular speaker at industry
and Jenkins community events. Hes often asked to speak about his
experience and approach in creating Jenkins, a CI platform that has
become a widely adopted and successful community-driven opensource project.

ORCHESTRATING
YOUR DELIVERY
PIPELINES WITH
JENKINS
In a previous article, we covered useful
preparatory steps for implementing
continuous delivery (CD), including
defining pipeline phases, preconditions
and required approvals, owners and
access control requirements, resource
requirements such as number of
concurrent build machines, identifying
which phases can run in parallel, and
more.

ORCHESTRATING YOUR DELIVERY PIPELINES WITH JENKINS


Here, we will discuss how
to put a number of these
recommendations into practice
through setting up a delivery
pipeline in Jenkins. Many of
the steps carry over to other
continuous integration (CI) and
orchestration tools, and there
are analogous extensions or core
features for many of the plugins
we will introduce.
We are focusing on Jenkins
because it is the most widely
used CI server. If you are using
different CI servers or services,
it should be relatively easy to
experiment with the steps we
will cover in a sandbox Jenkins
installation before carrying them
over to your own CI environment.

Prerequisites
Before diving into Jenkins, we
need to discuss two important
prerequisites. Our pipeline, or at
least the part of the pipeline that
we are looking to implement
here (going all the way to
production may not be the most
sensible initial goal), is:
Predictable
and
standardized, i.e. that the
steps and phases we want to
run each time the pipeline is
triggered are the same.
Largely automated. We will
cover ways to handle manual
approvals to bless a certain
build, but that is about it.
If the current release process
does
not
display
these
characteristics, i.e. every release
ends up a little different or it
still requires many manual steps
(reviewing test plans, preparing
target environments), building a
pipeline via a CI tool or generic
automation orchestrator may
not be the most appropriate step
at this point.
It is probably advisable
to first increase the level of
standardization and automation,
and to look at tools such as
XL Release in the release
coordination or CD release
8

management categories to help


with that.

The steps
We will cover the following topics
to build our delivery pipeline:
1. Ensuring
reproducible
builds.
2. Sharing
build
artifacts
throughout the pipeline.
3. Choosing
the
right
granularity for each job.
4. Parallelizing and joining jobs.
5. Gates and approvals.
6. Visualizing the pipeline.
7. Organizing and securing
jobs.

Our sample project


In order to make our scenarios
and approaches more tangible,
well base this discussion on a
sample development project.
Lets assume were working on
the server-side component of
a mobile app for Android and
iOS. The delivery process for our
application is as follows:
1. Whenever a code change
is committed, we build
the code and, if successful,
package the current version
as a candidate version for
release (Basic Build and
Package).
2. Now that we know that the
code compiles and passes
our unit tests, we trigger
a code-quality build that
performs static analysis to
verify code quality (Static
Code-Quality Analysis).
3. The static analysis can take
some time, so in parallel we

deploy the candidate version


to two functional testing
environments, one for the
Android app and one for the
iOS app, in preparation for
testing (Deploy to Android
Func Test Env and Deploy
to iOS Func Test Env). We
use two test environments
so we can easily identify
differences in how the back
end behaves when talking to
either version of the app.
4. When both deployments
have completed, we trigger
functional tests, with the iOS
and Android apps talking
to their respective back end
(Func Tests).
5. If the functional tests pass, we
deploy our release candidate
in parallel to a regression
test and a performance test
(Deploy to Regr Test Env
and Deploy to Perf Test Env).
The completion of each
deployment triggers the
appropriate tests (Regr Test
and Perf Test).
6. If the regression and
performance
tests
and
our static code analysis
successfully complete, we
make the candidate available
for business approval and
notify the business owner.
7. The business owner can
approve, in a manual step,
the candidate build.
8. Approval
triggers
an
automated deployment to
production (Deploy to Prod).
Schematically, our delivery
pipeline looks like this (Figure 1).

Figure 1: Our sample projects delivery pipeline.

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

We do not intend this to


be interpreted as a good, bad,
or
recommended
pipeline
structure. The pipeline that works
best for you will not be a direct
copy of this example, but will
depend on your own applications
and process.

Ensuring reproducible
builds
One of the key principles of our
pipeline is that we produce a
single set of build artifacts to
pass through the various pipeline
stages for testing, verification,
and, ultimately, release. We want
to be sure that this is a reliable
process and that this initial build
is carried out in a reproducible
way that does not somehow
depend on the local dependency
cache of the slave we happen to
be building on, for example. In
our project, weve taken steps to
achieve this:
Use clean repositories local to
the workspace
Weve configured the build
system to use a clean repository
local to the build jobs workspace,
rather than one that is shared
by all builds on that slave. This
ensures that the build does not
happen to succeed because of
an old dependency that is no

longer available in your standard


repositories but was published
to that slaves repo at some
point. Consider regularly clearing
your build jobs workspace
(most SCM plugins have a clean
build option and for things like
partial cleanup, the Workspace
Cleanup plugin can help) or at
least wiping its local repo. For
Maven builds, the location of the
build repository can easily be
configured via the main Jenkins
settings, and overridden per job
where necessary.
Use clean slaves based on a
known template
We can take this a step further by
running our builds on clean slaves
created on demand and initialized
to a known, reproducible state
where possible. Plugins such as
the Amazon EC2 plugin, Docker
plugin, or jclouds plugin can be
used for this purpose, and some
hosted services such as DEV@
cloud provide this functionality.
Spinning up build slaves on
demand also has the substantial
advantage of helping to avoid
long build-queue times if you
have only a limited pool of
slaves and a growing number of
pipeline runs.

Use a central, shared repository


for build dependencies
Were using a centralized artifact
repository across all our projects,
rather than allowing each
project to decide from where to
download build dependencies.
This ensures that two projects
that
reference
the
same
dependency will get the identical
binary, and allows us to enforce
dependency policies (such as
banning certain dependencies) in
a central location. If you are using
a build system that supports
Maven-based
dependency
management, a Maven proxy
such as Nexus or Artifactory is
ideal.

Sharing build artifacts


throughout the pipeline
Once we have built the candidate
artifact in our initial build job, we
need to find a way to ensure that
all the subsequent builds in our
pipeline use this exact artifact.
Retrieve build artifacts from
upstream jobs
Jenkins provides a couple
of ways to share artifacts
produced by an upstream job
with subsequent downstream
jobs. We are using the Copy
Artifact plugin, which allows us
to retrieve build artifacts from

Figure 2: Copying pipeline artifacts using the Copy Artifact plugin

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

ORCHESTRATING YOUR DELIVERY PIPELINES WITH JENKINS

Figure 3: Passing the unique pipeline identifier to downstream builds

Figure 4: The pipeline-version-environment option of the Delivery Pipeline plugin.

another job with a convenient


build step. Were copying from a
fixed build (i.e. specified by build
number or build parameter),
which is preferable to referring
to a variable upstream build
(such as the Last successful
build option). In the latter case,
we cannot be sure that we will
be referencing the artifacts
that triggered this pipeline run,
rather than those produced by a
subsequent commit.
Alternatives
If you want to also access
the artifact outside Jenkins,
you can save the candidate
artifact as a build artifact of
the initial job, then use the
Jenkins APIs to download it
(e.g. using wget or cURL) in
downstream jobs.
10

If you want to treat


candidate artifacts as build
dependencies, the Jenkins
Maven Repository Server
plugin makes build artifacts
available via a Maven repocompliant interface, which
can be used by Maven,
Gradle, Ant, and other build
tools to retrieve artifacts.
It also provides additional
options for referencing
artifacts via the SHA1 ID
of the Git commit that
produced
the
artifacts
(especially useful if the Git
commit ID is your unique
build identifier), as well as for
accessing artifacts of a chain
of linked builds.
If you already maintain a
definitive software library
outside Jenkins, you can

create a setup similar to


that offered by the Maven
Repo Server plugin with an
external Maven repo. In that
case, you would publish the
artifacts to the repo using
a Maven identifier that
includes the build number,
commit ID, or whatever you
consider a stable, unique
identifier.
Identify the correct upstream
build throughout the pipeline
Whichever
alternative
we choose, we need to pass a
stable identifier to downstream
builds so we can pick the
right candidate artifact for our
pipeline run. In our pipeline, we
have parameterized most of the
downstream builds and use the
Parameterized Trigger plugin to
pass the identifier.

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

Figure 5: Fingerprinting build artifacts using the Copy Artifact plugin and via a post-build action.
Alternatives
We can also use the Delivery
Pipeline plugin (we will meet
it later), which optionally
creates an environment
variable that is available in
all downstream jobs.

Choosing the right


granularity for each job

Use fingerprints to track


artifact usage
However you end up passing
the stable pipeline identifier to
downstream pipeline phases,
setting all jobs in the pipeline to
use fingerprints is almost always
a good idea. Jenkins fingerprints
artifacts by storing their MD5
checksums and using these to
track use of an artifact across
jobs. It allows us to check, at
the end of a pipeline run, which
artifacts have been used in which
builds and so to verify that our
pipeline has indeed consistently
been testing and releasing the
correct artifact. Jenkins provides
a post-build task that allows us
to explicitly record fingerprints
for files in the workspace. Certain
plugins, such as the Copy Artifact
plugin, automatically fingerprint

This may seem obvious, but


choosing the correct granularity
for each job, i.e. how to distribute
all the steps in our pipeline
across multiple jobs, will help
us make our pipeline more
efficient and allow us to identify
bottlenecks more easily. As
a rough rule of thumb, every
stage in your pipeline can be
represented by a separate job or,
in the case of multi-dimensional
tests, a matrix job. This is why, for
instance, we have not combined
build and deployment to the
test environments or added
deployment to the regression test
environment as single jobs in our
pipeline. If, for instance, we had
merged Deploy to Regr Test and
Regr Test into one multi-stage
job that fails ten times, we would
need to analyze the failures to
figure out if the deployment or

artifacts when copying them


from an upstream build, in which
case we can omit the post-build
step.

the tests themselves are the real


problem.
The flipside of avoiding
multi-stage jobs is, of course, that
we need to manage and visualize
more jobs: ten, in our relatively
simple example.

Parallelizing and joining


jobs
Especially when we run multiplatform tests, but also if were
building artifacts for different
target platforms, we want to
make our pipeline as efficient
as possible by running builds in
parallel. In our case, we want to
parallelize our functional tests for
Android and iOS, as well as run
the performance and regression
tests in parallel. Were using a
couple of Jenkins mechanisms
for this.
Run parallel instances of
the same job with different
parameters
For the functional tests, which are
variants of the same build (same
steps, but different configuration
parameters), were using a

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

11

ORCHESTRATING YOUR DELIVERY PIPELINES WITH JENKINS


job (Basic Build and Package, in
our case) completes
Alternatives
If you want to coordinate sets
of parallel jobs, you might also
consider the Multijob plugin,
which adds a new project type
that allows multiple jobs to run
in parallel. It can also orchestrate
multiple pipeline phases.

Figure 6: Tracking the usage of build artifacts via fingerprints.


standard
multi-configuration
project (often called a matrix
build). If we needed to handle
potentially spurious failures
for some of the matrix builds,
we could also add the Matrix
Reloaded plugin.

Run different jobs in parallel


For the deployments to the two
functional test environments,
where we need to run different
jobs, were using the standard
option of simple build triggers
to kick off multiple downstream
jobs in parallel once the upstream

Join parallel sections of the


build pipeline
Joining is waiting until all
the parallel builds have been
completed before continuing to
the downstream phases, which
the matrix job type handles
automatically. In our example,
we have configured Func Tests
to trigger the downstream
builds, Deploy to Regr Test Env
and Deploy to Perf Test Env, on
success and Func Tests will only
trigger them if both the Android
and iOS builds in the matrix
successfully complete.
For the deployment to the
two functional test environments,
where we simply trigger multiple
jobs to run in parallel, we face
the diamond problem: how to
rejoin the parallel jobs Deploy
to Android Func TestEnv and
Deploy to iOS Func Test Env to
trigger one subsequent job, Func
Tests. Here, were using the Join
plugin, which weve configured in

Figure 7: Func Tests in our sample pipeline is a multi-configuration (matrix) project.


12

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

Figure 8: Triggering Func Tests in our sample pipeline by using the Join plugin to wait for the directly downstream jobs Deploy to Android Func Test Env
and Deploy to iOS Func Test Env to complete.

the job at the top of the diamond


to trigger the job below it once
the parallel deployment jobs
have completed successfully. We
do not need to explicitly specify
the deployment jobs the plugin
kicks off the Func Tests job once
all direct downstream jobs have
finished. The Join plugin also
supports the passing of build
parameters, which we need to
identify the build artifacts for this
pipeline run.
Handle more complex job
graphs
If you have more complicated
job graphs, you may also want
to have a look at the Build Flow
plugin, which allows you to
define job graphs, including
parallel sections and joins,
programmatically.

Gates and approvals


As
the
pipeline
stages
get closer to the QA and
production environments, many
organizations require some
form of sign-off or approval
before tasks can be carried out.
We require a manual sign-off
from the business owner before
kicking off the Deploy to Prod
job, for instance.
As
previously
noted,
Jenkins and other CI tools and

generic orchestrators do not


offer comprehensive support for
manual pipeline tasks, but there
are a couple of options to handle
approvals.
Support approvals based on
multiple conditions
Were using the Promoted Builds
plugin, which offers manual
approval (and a corresponding
e-mail notification to the
approver) as one of a number
of possible ways to promote a
build. It also supports a variety of
actions on promotion, including
triggering downstream jobs.
Alternatives
A simple can ensure that
the gated downstream job
is only triggered manually
and can only be executed
by a limited number of
approvers. In this case,
triggering a build constitutes
approval. This pattern can
also be automated for
example, by using the
ScriptTrigger
plugin
to
search for an approval in an
external system. However,
this breaks the approach
of using parameterized
triggers to pass on required
information, such as the
unique artifact ID. If we adopt

this pattern, we need to find


another way to ensure that
the appropriate information
is passed for example, by
prompting the approver
to enter the appropriate
parameters manually or by
having the trigger script
retrieve them from the
approval record (e.g. a JIRA
ticket).
If you want to ensure only
that a task is manually
triggered but do not need
to track multiple conditions,
you might want to look at
the Build Pipeline plugin,
which provides a post-build
step to manually execute
downstream projects. This
step also allows parameters,
such as our build identifier,
to be passed to the manually
triggered downstream job.

Visualizing the pipeline


A clear, highly accessible
visualization of our build
pipelines is important for a
successful CD implementation,
not just to ensure the team is
always aware of the current
pipeline state but also to
simplify communication with the
business and other stakeholders.

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

13

ORCHESTRATING YOUR DELIVERY PIPELINES WITH JENKINS

Figure 9: The Basic Build and Package job triggers a production deployment after manual approval by the business owner and confirmation
that all downstream jobs have successfully completed.

14

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

Figure 10: The Build Pipeline plugins post-build step and manual trigger in the pipeline view.

Use standard views


Views are standard Jenkins
features were using to collect
the jobs that constitute our
pipeline in one overview. The
Multijob plugin, which we briefly
mentioned above, provides a
similar list-style view. A drawback
of both alternatives, however,
is that these views show the
currently executing builds for
each job in the pipeline, which
may be working on different
release candidates. For example,

the Perf Tests and Regr Tests jobs


may be testing one particular
candidate version while the Basic
Build and Package job is already
busy with the next commit.
Specialized pipeline views
With a CD perspective, however,
we want to see all the builds that
make up a particular instance of
a pipeline run, i.e. all the builds
related to one candidate version
of the application. The Build
Pipeline plugin and the Delivery

Figure 11: A standard list view showing active jobs working on different release candidates.

Pipeline plugin both support


this kind of view. Note that both
plugins fail to capture the link to
the Deploy to Prod job, which is
not an immediate downstream
build but is triggered by the
Promoted Builds plugin.

Organizing and securing


jobs
Handle many jobs
Even if each of our pipelines
only consists of a handful of
jobs, once we start setting up
pipelines for multiple projects
or versions, well soon have
many Jenkins jobs to manage.
With our example having ten
phases per pipeline, wed quickly
be looking at hundreds of jobs
to manage! Creating one or
multiple views per pipeline is
an obvious approach but it still
leaves us with an incredibly large
All jobs view in Jenkins not
fun to navigate and manage
(in fact, it starts to get so big
that you may want to consider
replacing it entirely). It generally
also requires us to adopt jobnaming conventions along the
lines of myProject-myVersionpipelinePhase, so that all jobs for
a pipeline are listed together and
to let us use regular expressions
when defining views rather

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

15

ORCHESTRATING YOUR DELIVERY PIPELINES WITH JENKINS

are compatible with standard


list views, so we can keep our
existing MyProject Jobs view.
We can define access-control
policies at the folder level, which
is much more convenient than
having to secure individual jobs.

Figure 12: Build Pipeline and Delivery Pipeline plugin views of our sample pipeline.
than having to select individual
pipeline jobs for a view.
Configure access control
This approach offers challenges
when we start to implement
access-control policies for our
pipelines. We need to ensure
that different phases of the
pipeline have different accesscontrol policies (in our example,
developers are not authorized
to trigger the QA jobs or the
deployment to production), and
setting these policies on each

16

job individually is maintenanceintensive and prone to error.


In our example, were using
the CloudBees Folders plugin in
combination with the Matrix
Authorization Strategy plugin.
The combination allows for both
convenient job organization
and efficient access-control
configuration. Weve organized
our pipeline jobs in three folders,
MyProject/1 Developer Jobs,
My Project/2 QA Jobs, and
MyProject/3 Business Owner
Jobs, and put each pipeline job
in the appropriate folder. Folders

Alternatives
If you want to apply permissions
based on job name, consider
the Role Strategy plugin, which
allows you to define different
roles for different parts of a
pipeline. One drawback is that
the jobs to which a role definition
applies are determined by a
regular expression. This can
lead to additional complexity
in the job-naming scheme
though (myProject-myVersionowningGroup-pipelinePhase,
anyone?) and may break if jobs
are renamed.
Good practice: Version your
Jenkins configuration
A good Jenkins practice in
pretty much all circumstances
is to assign versions to job
configurations. This allows us
to easily track any changes and
revert to earlier configurations if
necessary. Were using both the
JobConfigHistory plugin (which
provides a nice diff view) and SCM
Sync Configuration plugin (which
stores the configuration off-disk
in a repository). Depending on
your needs, typically one or the
other will suffice.

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

Conclusion
Setting up secure, efficient CD
pipelines in Jenkins that are
easy to use and manage can
quickly become challenging.
Weve discussed important
prerequisites, made a number
of
recommendations,
and
introduced a set of freely
available plugins that can
make the process a lot easier.
Hopefully, youre now in a better
position to identify whether
Jenkins is the right orchestrator
for your current process, to
painlessly build pipelines, to
make life better for your teams,
and to deliver business value to
your customers.

Figure 13: The CloudBees Folders plugin in action, with folder-level security configured using the
Matrix Authorization Strategy plugin.

Figure 14: The JobConfigHistory plugins diff view and the configuration settings for the SCM Sync Configuration plugin.

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

17

READ ONLINE ON InfoQ

Docker: Using Linux Containers to Support


Portable Application Deployment

Zef Hemel is a team lead at STX Next,. Previously, he was a


product manager and developer advocate at LogicBlox, and the
VP of engineering atCloud9 IDE, which develops a browser-based
IDE. Zef is a native of the Web and has been developing Web
applications since the 90s. Hes a strong proponent of declarative
programming environments.

Dockeris an open-source tool to run applications inside of a Linuxcontainer,


a kind of lightweight virtual machine. It also offers tools to use to distribute
containerized applications through your own hosted Docker registry,
simplifying the process of deploying complex applications.
Companies face challenges in deploying complex systems and Docker can be
a valuable tool in solving these problems and others.
The deployment
challenges
Deployment
of
server
applications
is
getting
increasingly complicated. No
longer can you install server
applications by copying a few Perl
scripts into the right directory.
Today, software can have many
types of requirements:
dependencies on installed
software
and
libraries

18

(depends on Python >=


2.6.3 with Django 1.2);
dependencies on running
services (requires a MySQL
5.5 database and a RabbitMQ
queue);
dependencies on a specific
operating systems (built
and tested on 64-bit Ubuntu
Linux 12.04);
resource requirements:
-- minimum
amount
of available memory

(requires
1GB
of
available memory);
-- ability to bind to specific
ports (binds to port 80
and 443).
For example, consider the
deployment of a relatively simple
application:WordPress. A typical
WordPress installation requires
Apache 2,
PHP 5,
MySQL,
the WordPress source code,

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

a
WordPress
MySQL
database, with WordPress
configured to use this
database,
Apache should be configured
to load the PHP module,
to enable support for
URL rewriting and .
htaccessfiles,
with the
DocumentRoot
pointing to the WordPress
sources.
While deploying and running a
system like this on your server,
you may run into some problems
and challenges:
1. Isolation: If you are already
hosting an existing site on
this server that runs only
on nginx, youre in a bit of
a pickle. Both nginx and
Apache try to listen on port
80. Running both is possible,
but
requires
tweaking
configurations (changing the
port to listen to), setting up
reverse proxies, etc. Similar
conflicts can occur at the
library level. If you also run an
ancient application that still
depends on PHP 4, you have
a problem since WordPress
no longer supports PHP 4
and its difficult to run PHP
4 and 5 simultaneously.
Since applications running
on the same server are
not isolated (in this case at
the levels of file system and
network), they may conflict.
2. Security:WordPress does not
have the best track record in
security. It would be nice to
sandbox WordPress so that
once hacked, it wont impact
other running applications.
3. Upgrades and downgrades:
Upgrading an application
typically involves overwriting
existing files. What happens
during an upgrade window?
Is the system down? What
if the upgrade fails, or turns
out to be faulty? How do
we roll back to a previous
version quickly?

4. Snapshotting and backing


up: Once everything is set
up successfully, it would be
nice to take a snapshot of
a system that you can back
up or replicate on a different
server.
5. Reproducibility: Its good
practice
to
automate
deployment and to test a
new version of a system on
a test infrastructure before
pushing it to production.
This usually works through a
tool likeChef orPuppetthat
automatically installs a
bunch of packages on
the server and, when
everything works, runs that
same deployment script
on the production system.
This will work 99% of the
time. That 1% of times, the
package repository has been
updated between deploying
to testing and production
with
newer,
possibly
incompatible versions of a
package you depend on. As a
result, your production setup
is different than testing,
possibly breaking your
production system. Without
taking control of every little
aspect of your deployment
(e.g. hosting your own
APT or YUM repositories),
consistentlyreproducingthe
exact same system in
multiple setups (e.g. testing,
staging, production) is hard.
6. Constrain resources: What
if your WordPress goes CPU
crazy and starts to take up all
our CPU cycles, completely
blocking other applications
from doing any work? What
if it uses up all available
memory? Or generates logs
like crazy, clogging up the
disk? It would be convenient
to be able to limit resources
available to the application,
like CPU, memory, and disk
space.

7. Ease of installation: There


may be Debian or CentOS
packages or Chef recipes
that automatically execute
all the complicated steps
of a WordPress installation.
However, these recipes
are tricky to get rock-solid
because they need to take
into account the many
possible configurations of
the target system. In any
cases, these recipes only
work on clean systems. You
will probably have to replace
some packages or Chef
recipes with your own. This
makes installing complex
systems not something you
try during a lunch break.
8. Ease of removal: Software
should be easily and cleanly
removable without leaving
traces behind. However, as
deploying an application
typically requires tweaking
of existing configuration
files and putting state
(MySQL database data, logs)
left and right, removing an
application completely is not
that easy.
How do we solve these issues?

Virtual machines!
When we decide run each
individual application on a
separate virtual machine (VM),
for instance on Amazon EC2,
most of our problems go away:
1. Isolation:
Install
one
application per VM and
applications are perfectly
isolated, unless they hack
into each others firewall.
2. Security: Since we have
complete isolation, if the
WordPress
server
gets
hacked, the rest of the
infrastructure is not affected
unless you litter SSH keys
or reuse the same passwords
everywhere,
but
you
wouldnt do that, would you?

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

19

DOCKER: USING LINUX CONTAINERS TO SUPPORT PORTABLE APPLICATION DEPLOYMENT


3. Upgrades and downgrades: Do what Netflix
does and simply deploy a new version to a new
VM, then point your load balancer from the
old VM to the VM with the new version. Note
thatthis doesnt work well with applications that
locally store states that you need to keep.
4. Snapshotting and backing up: You can take a
snapshot of an EBS disk with a click of a button
(or API call). Your snapshots are backed up
toAmazon S3.
5. Reproducibility: Prepare your system just the
way you like and then create an AMI. You can
now instantiate as many instances of this AMI as
you like. Its fully reproducible.
6. Constrain resources: A VM is allocated certain
share of CPU cycles, available memory, and disk
space that it cannot exceed (without your paying
more for it).
7. Ease of installation: An increasing number of
applications are available as EC2 appliances and
can be instantiated with the click of a button
from theAWS Marketplace. It takes a few minutes
to boot, but thats about it.
8. Ease of removal: Dont need an application?
Destroy the VM. Clean and easy.
Perfect! Except you now have a new problem:
itsexpensive,in two ways:
Money:Can you really afford to boot up an EC2
instance for every application you need? And
can you predict the instance size you will need? If
you need more resources later, you need to stop
the VM to upgrade it. You can buy what you think
you need but risk overpaying for resources you
dont end up using. (Solaris zones, like Joyent
uses, can be resized dynamically.)
Time: Many operations on virtual machines
are typically slow: booting takes minutes,
snapshotting can take minutes, creating an
image takes minutes. The world keeps turning
and we dont have that kind of time!
Can we do better?
Enter Docker.
The people of dotCloud, a public platform-as-aservice provider, launched Docker in early 2013.
From a technical perspective, Docker is plumbing
(primarily written in Go) to make two existing
technologies easier to use:
1. LXC for Linux containers allows individual
processes to run at a higher level of isolation
than regular Unix process. The term for this
iscontainerization; a process is said to run in
acontainer. Containers support isolation at the
level of:
file system: a container can only access its
own sandboxed file system (chroot-like)
20

unlessspecifically mountedinto the containers


file system;
user namespace: a container has its own user
database (i.e. the containers root does not equal
the hosts root account);
process namespace: within the container, only
the processes part of that container is visible (i.e.
a very cleanps auxoutput);
network namespace: A container gets its own
virtual network device and virtual IP (so it can
bind to whatever port it likes without taking up
its hosts ports).
2. aufs, the advanced multilayered unification file
system, can create union, copy-on-write file
systems.
While Docker can be installed on any Linux
system with aufs support and kernel version 3.14
and up, conceptually it does not depend on these
technologies and may in the future also work with
similar technologies, such asSolaris zones, BSD jails,
or aZFS file system.
So, why is Docker interesting?
Its very lightweight. Booting up a VM takes up
a significant amount of memory but booting
up a Docker container has very little CPU and
memory overhead and is very fast. Its almost
comparable to starting a regular process. Not
only is running a container fast, building an
image and snapshotting the file system is as well.
It works inestablished virtualized environments.
You can run Docker inside an EC2 instance,
a Rackspace VM, or VirtualBox. In fact, the
preferred way to use it on Mac OS and Windows
isusing Vagrant.
Docker containers areportableto any operating
system that runs Docker. Whether its Ubuntu or
CentOS, if Docker runs, your container runs.
So, lets get back to our list of deployment and
operation problems and lets see how Docker scores:
1. Isolation:Docker isolates applications at the filesystem and networking levels. It feels a lot like
running real virtual machines in that sense.
2. Security: Docker containers are more secure
than regular process isolation.
3. Upgrades and downgrades: Boot up the new
version of an application first, then switch your
load balancer from the old port to the new just
like for Amazon EC2 VMs.
4. Snapshotting and backing up:Docker supports
committing and tagging of images, which, unlike
snapshotting on Amazon EC2, isinstantaneous.
5. Reproducibility: Prepare your system just the
way you like it (either by logging in and aptgetin all software or by using aDockerfile) and
then commit your changes to an image. You can

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

now instantiate as many instances of it as you


like or transfer this image to another machine to
reproduce the same setup.
6. Constrain resources:Docker currently supports
limiting CPU usage to a certain share of CPU
cycles. You can also limit memory usage. It does
not yet support restricting disk usage.
7. Ease of installation: Docker has the Docker
Index, a repository with off-the-shelf Docker
images you can instantiate with a single
command. For instance, to use my Clojure REPL
image, run: docker run -t -i zefhemel/
clojure-replto automatically fetch the image
and run it.
8. Ease of removal: Dont need an application?
Destroy the container.

How to use it
Lets assume you have Docker installed. To run Bash
in an Ubuntu container, use:
001 docker run -t -i ubuntu /bin/bash

Docker will use the ubuntu image youve already


downloaded or download one itself, then run/bin/
bash in an Ubuntu container. Inside this container
you can now do pretty much do all your typical
Ubuntu stuff, for instance installing new packages.
Lets install hello.
001 $ docker run -t -i ubuntu /bin/bash
002 root@78b96377e546:/# apt-get install
hello
003 Reading package lists... Done
004 Building dependency tree... Done
005 The following NEW packages will be
installed:
006
hello
007 0 upgraded, 1 newly installed, 0 to
remove and 0 not upgraded.
008 Need to get 26.1 kB of archives.
009 After this operation, 102 kB of
additional disk space will be used.
010 Get:1 http://archive.ubuntu.com/ubuntu/
precise/main hello amd64 2.7-2 [26.1
kB]
011 Fetched 26.1 kB in 0s (390 kB/s)
012 debconf: delaying package configuration,
since apt-utils is not installed
013 Selecting previously unselected package
hello.
014 (Reading database ... 7545 files and
directories currently installed.)
015 Unpacking hello (from .../archives/
hello_2.7-2_amd64.deb) ...
016 Setting up hello (2.7-2) ...
017 root@78b96377e546:/# hello
018 Hello, world!

019
020
021
022
023

root@78b96377e546:/# exit
exit
$ docker run -t -i ubuntu /bin/bash
root@e5e9cde16021:/# hello
bash: hello: command not found

Where did your beautiful hello command go?


You just started a newcontainer, based on the clean
Ubuntu image. To continue from your previous
container, you need tocommitit to arepository. Exit
this container and identify the container that you
launched.
$ docker ps -a
ID
IMAGE COMMAND
e5e9cde16021 ubuntu:12.04 /bin/bash
78b96377e546 ubuntu:12.04 /bin/bash
CREATED STATUS PORTS
About a minute ago
Exit 127
ubuntu:12.04
/bin/bash
2 minutes ago
Exit 0

The docker ps command lists currently running


containers;docker ps -aalso shows containers that
have already exited. Each container has a unique ID,
which is more or less analogous to a Git commit hash.
The command also lists the image the container was
based on, the command it ran, when it was created,
its current status, and the ports it exposed and how
these map to the hosts ports.
The top container in the output was the newer
one you launched without hello in it. You want to
keep and reuse the first container, so commit it and
create a new container from there.
001 $ docker commit 78b96377e546
zefhemel/ubuntu
002 356e4d516681
003 $ docker run -t -i zefhemel/ubuntu /
bin/bash
004 root@0d7898bbf8cd:/# hello
005 Hello, world!

These commands commit the container (based


on its ID) to a repository. A repository, analogous
to a Git repository, consists of one or more tagged
images. If you dont supply a tag name (as above),
the command will name it latest. To see all locally
installed images, run adocker images command.
Docker comes with a few base images
(e.g.ubuntuandcentos) and you can create your own
images as well. User repositories follow aGitHub-like
naming model withyour Docker usernamefollowed
by a slash and the repository name.
This is one way to create a Docker image
the hacky way, if you will. The cleaner way uses
aDockerfile.

Now, exit and rerun the same Docker command.


DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

21

DOCKER: USING LINUX CONTAINERS TO SUPPORT PORTABLE APPLICATION DEPLOYMENT


Building images with a Dockerfile
A Dockerfile is a simple text file consisting of
instructions for building an image from a base image.
I havea few of them on GitHub. Heres a simple one
for running and installing an SSH server:
001
001
002
003
004
005

FROM ubuntu
RUN apt-get update
RUN apt-get install -y openssh-server
RUN mkdir /var/run/sshd
RUN echo root:root | chpasswd
EXPOSE 22

This
should
be
almost
self-explanatory.
The FROMcommand defines the base image to start
from. This can be one of the official ones, but could
also be the zefhemel/ubuntuthat we just created.
The RUN commands are commands to be run to
configure the image. In this case, were updating
the APT package repository, installing the opensshserver, creating a directory, and then setting a poor
password for our root account. TheEXPOSEcommand
exposes port 22 (the SSH port) to the outside world.
Lets see how to build and instantiate this Dockerfile.
The first step is to build an image. In the
directory containing the Dockerfile, run:
006 $ docker build -t zefhemel/ssh .

This will create a zefhemel/sshrepository with our


new SSH image. If this was successful, you can
instantiate it with:
007 $ docker run -d zefhemel/ssh /usr/
sbin/sshd -D
This is different than the earlier command. The-druns
the container in the background, and instead of
running Bash, we now run the sshd daemon (in
foreground mode, which is what the-Dis for).
Lets see what it did by checking your running
containers:
$ docker ps
ID IMAGE
23ee5acf5c91 zefhemel/ssh:latest

COMMAND
/usr/sbin/sshd -D

CREATED
3 seconds ago

STATUS PORTS
Up 2 seconds
49154->22

You can see that your container is up. The interesting


bit is under the PORTS header. Since you exposed
port 22, this port is now mapped to a port on your
host system (49154 in this case). Lets see if it works.

22

001 $ ssh root@localhost -p 49154


002 The authenticity of host [localhost]:49154
([127.0.0.1]:49154) cant be established.
003 ECDSA key fingerprint is
f3:cc:c1:0b:e9:e4:49:f2:98:9a:af:3b:30:59:77:35.
004 Are you sure you want to continue connecting (yes/
no)? yes
005 Warning: Permanently added [localhost]:49154
(ECDSA) to the list of known hosts.
006 root@localhosts password: <I typed in root
here>
007 Welcome to Ubuntu 12.04 LTS (GNU/Linux
3.8.0-27-generic x86_64)
008
009 * Documentation: https://help.ubuntu.com/
010
011 The programs included with the Ubuntu system are
free software;
012 the exact distribution terms for each program are
described in the
013 individual files in /usr/share/doc/*/copyright.
014
015 Ubuntu comes with ABSOLUTELY NO WARRANTY, to the
extent permitted by
016 applicable law.
017
018 root@23ee5acf5c91:~#

Success once more! A SSH server is now running and


you are able to log in to it. Exit from SSH and kill the
container before somebody from the outside figures
out your password and hacks into it.
001 $ docker kill 23ee5acf5c91

Our containers port 22 was mapped to host port


49154, and thats random. To map it to a specific port
on the host, pass in the-pflag to the run command.
001 docker run -p 2222:22 -d zefhemel/ssh
/usr/sbin/sshd -D

Now your port will be exposed on port 2222 if its


available. You can make your image slightly more
user-friendly by adding the following line at the end
of the Dockerfile:
001 CMD /usr/sbin/sshd -D
CMD signifies that a command isnt to be run
whenbuildingtheimage, but wheninstantiatingit.
When no extra arguments are passed, it will execute
the/usr/sbin/sshd -D command. Now, just run:
001 docker run -p 2222:22 -d zefhemel/ssh
Youll get the same result as before. To publish your
newly created marvel, simply run a docker push
command.
001 docker push zefhemel/ssh
After logging in, everyone can use it by using that
previousdocker run command.

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

Lets circle back to our


WordPress example. How would
you use Docker to run WordPress
in a container? In order to build a
WordPress image, youd create a
Dockerfile that:
1. installs Apache, PHP 5, and
MySQL;
2. downloads WordPress and
extract it somewhere on the
file system;
3. creates a MySQL database;
4. updates the WordPress
configuration file to point to
the MySQL database;
5. makes
WordPress
the DocumentRoot for
Apache;
6. starts MySQL and Apache
(e.g. usingSupervisor).
Luckily,
several
people
have already done this; for
instance, John Finks GitHub
repository contains everything
you need to build such a
WordPress image.

Docker use cases


Besides deploying complex
applications easily in a reliable
and reproducible way, Docker
has many more uses. Interesting
Docker uses and projects include:
Continuous
integration
and deployment: Build software
inside of a Docker container
to ensure isolation of builds.
Built software images can
automatically be pushed to
a private Docker repository
and deployed to testing or
production environments.
Dokku: A simple platformas-a-service built in fewer than
100 lines of Bash.
Flynn and Deis are two
open-source
platform-as-aservice projects using Docker.
Run a desktop environment
in a container.
A project that brings Docker
to its logical conclusion isCoreOS,
a lightweight Linux distribution
in which all applications are

installed and run using Docker,


managed bysystemd.

What Docker is not


While Docker helps in deploying
systems reliably, it is not a fullblown
software-deployment
system by itself. It operates
at the level of applications
running inside containers. Which
container to install on which
server and how to start them are
factors outside Dockers scope.
Similarly,
orchestrating
applications that run across
multiple containers, possibly on
multiple physical servers or VMs,
is beyond the scope of Docker.
For containers to communicate,
they need some type of discovery
mechanism to figure out the IPs
and ports through which other
applications are available. This
resembles service discovery
across regular virtual machines.
You can use a tool like etcd or
any other service-discovery
mechanism for this.

SETTING UP
CONTINUOUS
DELIVERY PIPELINES
IN JENKINS THAT ARE
SECURE, EFFICIENT,
AND EASY TO USE
AND MANAGE CAN
QUICKLY BECOME
CHALLENGING.
Andrew Phillips

Conclusion
While you could do everything
described in this article before
Docker with raw LXC, cgroups,
and aufs, it was never this easy
or simple. This is what Docker
brings to the table: a simple way
to package complex applications
into containers that can be
easily versioned and reliably
distributed. As a result, it gives
lightweight Linux containers
about the same flexibility and
power as authentic virtual
machines but at lower cost and
in a more portable way. A Docker
image created with Docker
running in a Vagrant VirtualBox
VM on a MacBook Pro will run
great onAmazon EC2,Rackspace
Cloud, or on physical hardware,
and vice versa.
Docker is available for free
fromthe Web site. A good place
to start is the interactivetutorial.

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

23

READ ONLINE ON InfoQ

Chef and Cookbook Development Flow

Joo Miranda(@jhosm)started his career in 2000, at the height of the


dotcom bubble. That enlightening experience led him to conclude that
agile practices are the best way to respond to the business needs of
almost all organizations. He currently is a principal software engineer at
OutSystems, a PaaS provider, where he helps to remove all friction that
may hinder the development teams fast pace.

Infrastructure as codeis a tenet of the DevOps community. It might even be


called revolutionary if you can remember the days when virtual machines were
a novel thing and physical hardware was the norm. But treating infrastructure
as code is a tall order. Development practices have also evolved rapidly and
nowadays that means continuous integration (even delivery!), automated
tests, code coverage, and more. How far can we go with the infrastructure as
code aphorism? Pretty far, actually. Well useChef, a well-known IT automation
tool, to illustrate the state of the art.
Well start by going through a quick overview of
Chefs main concepts. Our sample cookbook will be
1. statically validated withRuboCopandFoodcritic,
2. unit-tested withChefSpec,
3. integration-tested with Test Kitchen and
Serverspec.

Chef for beginners


If you already know the basics of Chef and are in a
hurry, you can jump to the Cookbook development
process section.
24

Chef uses aninternal DSL(domain specific language)


in Ruby. This has powerful implications. An
embedded DSL means you get all the power of a real
programming language: powerful abstractions that
let you do (virtually) whatever you need; a standard
library; thousands of open-source packages; a
strong body of knowledge and conventions; and a
large community. On the other hand, all that power
brings complexity, which might be warranted or not
depending on your use cases.

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

Chef has a steep learning curve, with many concepts


and tools to be learned. Ill introduce you to some
of its concepts. I assume youre working in a Linux
environment, but well also cover Windows.
Earlier this year, Chef released the Chef
development kit(Chef DK), which greatly simplifies
setting up your development environment. Ill start
there so download Chef DK if you want to follow
along. Chef DK includes:
chef, a command-line tool, still in its early stages,
that aims to streamline the Chef development
workflow;
Berkshelf, a cookbook dependency manager;
Foodcritic, a cookbook-linting tool;
ChefSpec, an unit-testing tool;
Test Kitchen, an integration-testing tool.
The kit also includes a host of other Chef tools:chefclient; Ohai; Knife; and Chef Zero. The chef-client
agent runs inside a node (i.e., a machine or server)
and given a run_list (a set of cookbooks) configures
it. Ohais main purpose is to gather the attributes
(e.g. memory, CPU, platform information) of a node
and feed them to chef-client. Knife is a commandline tool that interacts with Chef. Its name should
be prefixed with Swiss Army. If youre curious,
typeknife -hat the terminal. Finally Chef Zero is an
in-memoryChef server mainly used for testing.
You might notice that I did not mention
therealChef server. That is a whole other topic and
an article in itself, so well ignore it.
Well use VirtualBoxas the virtual-machine host
environment andVagrantas its driver. Again, if you
want to follow along,get them now.
With our development environment set up, its
time to create our first cookbook.

Creating our cookbook


Lets use chef to generate our first cookbook,
calledmy_first_cookbook.
001 $ chef generate cookbook my_first_
cookbook

Youll notice thatchefuses Chefs recipes to generate


your repository skeleton. (see below)

Your cookbook will have the following structure:


my_first_cookbook
recipes
|
default.rb
.gitignore
.kitchen.yml
Berksfile
chefignore
metadata.rb
README.md

Lets go through each item:


my_first_cookbook/ - Contains the my_first_
cookbook cookbook.
my_first_cookbook/recipes - Contains the
cookbooks recipes.
my_first_cookbook/recipes/default.
rb - The default recipe. It can be seen as the
cookbooks entry point (similarly to main() in
Java or C#).
m y _ f i r s t _ c o o k b o o k / .
gitignore - chef assumes youll store your
cookbook on Git, so it produces .gitignoreto
ignore files that shouldnt be under version
control.
my_first_cookbook/.kitchen.yml - Test
Kitchen configuration file.
my_first_cookbook/Berksfile - Berkshelfs
configuration file. It mainly informs Berkshelf
of the cookbooks dependencies, which can
be specified directly in this file or indirectly
through metadata.rb, as well see. It also
tells Berkshelf where it should look for those
dependencies, usually at Chef Supermarket, the
cookbook community site.
my_first_cookbook/chefignore - In the same
vein as .gitignore, it tells Chef which files to ignore
when uploading the cookbook to a Chef server
or when sharing them with Chef Supermarket.
my_first_cookbook/metadata.rb Meta
information about your cookbook, such as
name, contacts, or description. It can also state
the cookbooks dependencies.

001 Compiling Cookbooks...


002 Recipe: code_generator::cookbook
003
* directory[/Users/joaomiranda/Dev/chef-test/my_first_cookbook] action create
004
- create new directory /Users/joaomiranda/Dev/chef-test/my_first_cookbook
005
* template[/Users/joaomiranda/Dev/chef-test/my_first_cookbook/metadata.rb] action
create_if_missing
006
- create new file /Users/joaomiranda/Dev/chef-test/my_first_cookbook/metadata.rb
007
- update content in file /Users/joaomiranda/Dev/chef-test/my_first_cookbook/
metadata.rb from none to 760bcb
008
(diff output suppressed by config)
009 [...]

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

25

CHEF AND COOKBOOK DEVELOPMENT FLOW


my_first_cookbook/README.me

Documentation entry point for the repo.


Thats a lot of stuff to wrap our heads around! Lets
discuss some of it in more detail, starting with the
cookbook. According to Chefs docs a cookbook is
the fundamental unit of configuration and policy
distribution.
For instance, if you need to install nginx on
your node, youll use a cookbook to do that. There
are about 1,800 community-provided cookbooks at
Chef Supermarket.
A cookbook may contain many different types
of artifacts. The most common are recipes and
attributes, which well talk about later. It might also
includelibrariesof custom Ruby code, templatesfor
files to be created/configured on nodes,definitionsof
reusable resource collections;custom resources and
providers, or files to be transferred to the nodes
under configuration.
Before writing our first recipe, we have an
important task: to describe our cookbook in
metadata.rb. Make sure you set the name of your
cookbook and itsversion. You can add many different
pieces of information, but Id like to highlight that if
your cookbook depends on other cookbooks, you
are strongly urged to state those dependencies
through the use of thedepends keyword.
name
my_first_cookbook
maintainer
Joo Miranda
maintainer_email joao.hugo.miranda@gmail.com
license MIT
description
A simple cookbook to

illustrate some infrastructure

as code concepts
version 0.1.0
depends

windows, ~> 1.34

The above is a sample metadata.rb file. Note how


the cookbook depends on thewindowscookbook.

Recipes
The next step is to create a recipe. According to Chefs
docs, a recipe is the most fundamental configuration
element within the organization.
Thats not exactly helpful. An example will come
to the rescue. For the purpose of this article, well
use the hello world of configuration-management
tools: well install a Web server and publish an HTML
page.
If youre on Red Hat Enterprise Linux (RHEL)
or CentOS, place the following inside my_first_
cookbook/recipes/default.rb:

26

001
002
003
004
005
006
007
008
009
010
011
012
013

package httpd
service httpd do
action [:enable, :start]
end
file /var/www/html/index.html do
content <html>
<body>
<h1>#{node[index_message]}</h1>
</body>
</html>
end

Replacehttpdwithapache2in the previous file


if youre on Ubuntu.
If youre on Windows, use the following instead:
001 [IIS-WebServerRole, IIS-WebServer].each
do |feature|
002
windows_feature feature do
003
action :install
004
end
005 end
006
007 service w3svc do
008
action [:start, :enable]
009 end
010
011 file c:\inetpub\wwwroot\Default.htm do
012
content <html>
013
<body>
014
<h1>#{node[index_message]}</h1>
015
</body>
016 </html>
017 end
Those tiny recipes allow us to touch on several
concepts in one swoop.
A crucial property of recipes (and resources) is
that they should be idempotent. We should be able
to run a recipe any number of times and always get
one of two results. Either the node is in its specified
state and stays that way or the nodes state drifts
and converges to the desired state. Idempotency is a
concept that all tools like Chef provide.
You might have noticed that the second and
third steps in the blocks of code above are common
both to Linux and Windows, except for the service
name and the file paths. Recipes are written in
a declarative style and try to abstract away the
underlying OS-specific algorithms that converge the
node to the desired state. As youve seen, there are
some differences that have to be accounted for but it
does a good job considering how different operating
systems can be.
The recipes execution order is determined by
reading the recipe top to bottom. Execution order
is a contentious theme in the configuration-tools
community. Some tools, such as Puppet, favor

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

explicit dependencies where each configuration


step declares what other steps need to be executed
beforehand. This is similar to stating a tasks
dependencies in build tools such as Make or Rake.
Others, like Chef and Ansible, favor implicit ordering.
In Chefs case, for instance, order of execution is
determined by the order resources are placed in the
recipe file.

Resources
So, what are the recipes doing? First of all, they are
making sure that the Web server is installed:
001 package httpd
002 In Windows, this looks like:
003 [IIS-WebServerRole, IISWebServer].each do |feature|
004
windows_feature feature do
005
action :install
006
end
007 end

Bothpackageandwindows_featureare resources.
A resource describes a desired state you want
the node to be in. The important point is that we
are describing, or declaring, that desired state,
but we are not explaining how to get there. The
Linuxpackagesays that we want thehttpdpackage
installed. In Windows, windows_feature asks for
installation of aWindows role or feature. Notice how
were using a standard Ruby array to enumerate
thewindows_featureresource twice.
The second step declares that we need a
service (httpd or w3svc) enabled and started. The
actions, as specified by action, vary from resource
to resource.
001 service httpd do
002
action [:start, :enable]
003 end

The third step locally creates a file on the node.


We are using the content attribute to specify the
file content. A resource can have any number of
attributes, which vary from resource to resource.
When the files content needs to be dynamically
generated, youre better served with templates.
001 file /var/www/html/index.html do
002
content <html>
003
<body>
004
<h1>#{node[index_message]}</
h1>
005
</body>
006
</html>
007 end

Attributes

doing here is referencing a nodes attributes. Every


node has a set of attributes that describes it. Yes,
Chef uses the same word to describe two different
concepts: there are a resources attributes and a
nodes attributes. They might seem similar at first;
both describe properties of something. But node
attributes are one of the pillars of a cookbook.
Node
attributes
are
so
important
thatseveralcookbookpatternsrely on them. Node
attributes allow for reusable cookbooks, because
they make them configurable and flexible. Usually,
a cookbook defines default values for the attributes
it uses. These default values are placed on Ruby files
inside the cookbooksattributesfolder. This folder
is not created upon when the cookbook is created so
you have to create it manually. Then you can create a
Ruby file, e.g. default.rb, and define attributes like
this:
001 default[index_message] = Hello
World!

Attributes can then be overridden in a number


of ways. They can be defined in several places:
the nodes themselves; attribute files; recipes;
environments; and roles. Ohai gathers a host of
node attributes automatically: kernel data; platform
data; and fully qualified domain names (FQDN)
among many others. Environment (i.e. dev, QA,
production) attributes are useful to specify data such
as connection strings and settings that change from
environment to environment. Roles can also have
attributes, but even Chef co-founder Adam Jacob
discourages(see the comments) this option.
You can define many types of attributes in
many places. You can also override attributed values.
You have a lot of power in your hands. All this power
can make it hard to understand how Chef finds the
actual attribute value during aChef run, so make sure
you understand the rules of attributesprecedence.

Providers
Given that resources abstract away thehow to, which
piece of Chefs machinery is responsible for putting
a node in its desired state? This is where providers
come in. Each resource has one or more providers.
A provider knows how to translate the resource
definition to executable steps on a specific platform.
For instance, the service resource has providers for
Debian, Red Hat, and Windows among others. Its
outside the scope of this article to explain how to
create your own custom resources and providers,
called lightweight resource providers (LWRPs), but
if youre interested in learning more, Chefs site has
anarticlethat shows how simple the process is.

This third step also introduces something we havent


seen before: node[index_message]. What were
DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

27

CHEF AND COOKBOOK DEVELOPMENT FLOW


Cookbook development

Is our cookbook a good Ruby citizen?

What we have learned so far lets us write recipes


and thus configure nodes. We could stop there, but
were treating infrastructure as code. We need a
development process that allows us to grow while
maintaining quality code. Lets see how we can do
that with Chef and its ecosystem.
Modern development practices include
a build process, linting tools, unit testing, and
integration testing. Well use Rake to define our
build process. Its a simple one, with only four tasks
forRuboCop,Foodcritic,ChefSpecandTest Kitchen.
TheRakefile, which should be at the cookbooks root
directory (e.g. likemetadata.rb), looks like this:

The build process starts by running two static analysis


tools: RuboCop and Foodcritic.
RuboCop inspects your Ruby code for
compliance with the community Ruby style guide.
Within Chefs context, recipes, resources, providers,
attributes, and libraries are Ruby code and so all
should be good Ruby citizens. If you are new to Ruby,
RuboCop helps you get up to speed faster, teaching
you the way (some) things are done in Ruby.
To see RuboCop in action, lets assume we
are checking our Windows recipe. If we execute
chef exec rakeat the cookbooks root directory,
RuboCop will break the build and provide this
information (you might get additional messages):

001
002
003
004
005
006
007
008
009
010
011
012

require
require
require
require

rspec/core/rake_task
rubocop/rake_task
foodcritic
kitchen

# Style tests. Rubocop and Foodcritic


namespace :style do
desc Run Ruby style checks
RuboCop::RakeTask.new(:ruby)
desc Run Chef style checks
FoodCritic::Rake::LintTask.
new(:chef) do |t|
t.options = {
fail_tags: [any]
}
end
end

013
014
015
016
017
018
019 desc Run all style checks
020 task style: [style:ruby,
style:chef]
021
022 desc Run ChefSpec examples
023 RSpec::Core::RakeTask.new(:unit) do
|t|
024
t.pattern = ./**/unit/**/*_spec.
rb
025 end
026
027 desc Run Test Kitchen
028 task :integration do
029
Kitchen.logger = Kitchen.default_
file_logger
030
Kitchen::Config.new.instances.each
do |instance|
031
instance.test(:always)
032
end
033 end
034
035 # Default
036 task default: %w(style unit)
037
038 task full: %w(style unit integration)

28

001
002
003
004
005
006

007
008
009

010

Inspecting 1 file
C
Offenses:
s.rb:1:2: C: Prefer single-quoted
strings when you dont need string
interpolation or special symbols.
[IIS-WebServerRole, IISWebServer].each do |feature|
^^^^^^^^^^^^^^^^^^^
s.rb:1:23: C: Prefer single-quoted
strings when you dont need string
interpolation or special symbols.
[IIS-WebServerRole, IISWebServer].each do |feature|
^^^^^^^^^^^^^^^

011
012
013 1 file inspected, 2 offenses detected

Tools like RuboCop can reveal a huge number of


violations, especially for codebases that do not use
them from the start. RuboCop is configurable: you
can switch on or off specific style checks any way
you want. You can even tell RuboCop togeneratea
baseline configuration based on your existing
codebase so you do not get overwhelmed with
violations.
Your team can also follow some specific
guidelines and in that case you can write your own
style checks, calledcustom cops, and plug them into
RuboCop.

Are we writing good recipes?


Once you fix all issues found by RuboCop, Foodcritic
will check your recipe. Foodcritic has the same kind
of role as RuboCop, but while the latter focuses on
generic Ruby code issues, the former targets recipeauthoring practices.
Lets
temporarily
rename metadata.
rb to metadata.rb_ and execute chef
exec
rakeagain. We should get something like this:

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

001 [...]
002 FC031: Cookbook without metadata file:
/Users/joaomiranda/Dev/chef-test/
my_first_cookbook/metadata.rb:1
003 FC045: Consider setting cookbook name
in metadata: /Users/joaomiranda/Dev/
chef-test/my_first_cookbook/metadata.
rb:1

Foodcritic is telling us that we are violating rules


FC031 and FC045. Why does FoodCritic enforce
these rules? Well, one of Foodcritics great features is
that it clearly explains each of its rules. For instance,
Foodcritics docs say the following about rule FC031:
FC031:
Cookbook
without
metadata
file
Chef cookbooks normally include a metadata.rb file
which can be used to express a wide range of metadata
about a cookbook. This warning is shown when a
directory appears to contain a cookbook, but does not
include the expected metadata.rb file at the top-level.
As with RuboCop, Foodcritic is also configurable.
You can turn on or off each rule and create your own
rules. Etsy published its own Foodcritic rules, for
instance.
Static-analysis tools are a great addition to
your toolbox. They can help you find some errors
early and we all know how fast feedback loops are
important. These tools also help the newcomer learn
about a given language or tool. But I would say that
their most important factor is the consistency they
promote. As is often said, code is read many more
times by many more people than it is written. If we
promote consistency, all code becomes easier to
read, as readers do not have to grapple to appreciate
each coders style. Readers can instead focus on
understanding the big picture.
It should be clear that static-analysis tools do
not have much to say about the larger design and
structure of our code. They may give some hints, but
this is the realm of the creative mind.

These are important properties! We can test our


code without actually using virtual machines or
cloud providers. ChefSpec achieves this using
mock providers behind the scenes so that the
configurations are not applied to any possible node.
Lets write two simple tests to show ChefSpec
in action. Well start by creating two files that are not
strictly needed to run ChefSpec but which helps us
fine-tune the environment.
At the root directory of the cookbook, create a
file named.rspecwith the following content:
--default-path ./test/unit
--color
--format documentation
--require spec_helper

This file sets some options that are passed to RSpec


when it executes. It saves us from having to type
them whenever we want to run RSpec. The options
weve set:
assume the default path to look for examples
(tests) is ./test/unit (well understand why in
a minute);
colorize the output;
print RSpecs execution output in a format that
also serves as documentation;
automatically require thespec_helper.rbfile.
This last item brings us to the second file we must
create, spec_helper.rb. Use this file to write code
that is needed for all examples (a.k.a. tests). Put it
inside my_first_cookbook\test\unit, with the
following content:
001 require chefspec
002 ChefSpec::Coverage.start!
003 require chefspec/berkshelf

The spec_helper.rb:
requires ChefSpec so that we can use it with
RSpec;

Fast feedback with ChefSpec


Static-analysis tools, as their name implies, cannot
do dynamic analysis. Its time to turn our attention
to unit testing. The Chef tool of choice isChefSpec.
ChefSpec is a Chef unit-testing framework built
on top of RSpec, meaning it follows the behaviordriven-development school of thought. According
to ChefSpecs excellent docs:
ChefSpec runs your cookbook(s) locally with Chef
Solo without actually converging a node. This has two
primary benefits:
Its really fast!
Your tests can vary node attributes, operating
systems, and search results to assert behavior
under varying conditions.

Want to make sure


Dev and Ops are on
the same page?
Learn more about Dev and Ops

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

29

CHEF AND COOKBOOK DEVELOPMENT FLOW


enables resource coverage so we can see if were


touching every resource when the tests are
executed;
tells ChefSpec that we are using Berkshelf so that
it can find the cookbooks dependencies and
activate anymatchersthat it might find.
Finally, lets create a test for the following resource:
001 service httpd do
002
action [:start, :enable]
003 end
004
Create a default_spec.rb file inside my_first_
cookbook\test\unitwith this content:
001 describe my_first_cookbook::default
do
002
let(:chef_run) {
ChefSpec::SoloRunner.
converge(described_recipe) }
003
004
subject { chef_run }
005
006
it { is_expected.to enable_
service(httpd) }
007
it { is_expected.to start_
service(httpd) }
008 end

by ChefSpec. They are the ones that allow us to assert


facts about the recipes execution. As always, we can
define our own custom matchers, but ChefSpec
already includes themost common ones.
If we execute chef exec rake, well get this
output (see Output 1 below)
Youll notice that at the start of the output we
have English-like sentences. They are directly derived
from the tests and can be seen as aspecificationof
what the recipe is supposed to do.
Due to the way Chef works internally, it is not
possible to use regular code-coverage tools, as
ChefSpec author Seth Vargo explains. So ChefSpec
provides something a bit less exhaustive: resource
coverage. We see from the output that the recipe
contains three resources, but the tests only touched
one. How much coverage is enough? Try to reach at
least 80%.

Our cookbook in the real world

It looks remarkably like English, doesnt it? We


are describing the default recipe of my_first_
cookbok. We are simulating a nodesconvergenceby
faking a chef_run, i.e., faking Chefs execution of a
recipe on a node. We are also telling ChefSpec that
thesubjectof our test is the chef_run. We close the
description by telling ChefSpec that weexpectchef_
run to be enabled and to start the httpd service
upon the convergence.
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018

It is important to note that enable_


serviceand start_serviceare matchers defined

Its time to exercise our recipe in the real world and


do some integration testing. This kind of testing
should not reveal many surprises if two conditions
are met. First, we know how the resources that our
recipes use behave in the platform were targeting.
Second, weve written a good set of ChefSpec tests,
meaning they cover all the different configuration
scenarios that our recipes are supposed to handle.
With integration testing, running times slow by an
order of magnitude, so the more we can do at the
unit-testing level, the better. But integration testing
is where the rubber hits the road. Integration testing
allows us to exercise our recipes against real (okay,
most likely virtual) machines.

[...]
my_first_cookbook::default
should enable service httpd
should start service httpd
Finished in 0.0803 seconds (files took 6.42 seconds to load)
2 examples, 0 failures
ChefSpec Coverage report generated...
Total Resources:
3
Touched Resources: 1
Touch Coverage:
33.33%
Untouched Resources:
package[httpd]
file[/var/www/html/index.html]

my_first_cookbook/recipes/default.rb:1
my_first_cookbook/recipes/default.rb:7

Output 1
30

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

Integration testing significantly increases the


complexity of our infrastructure. Which operating
systems do we have to support? Do we support
Linux and Windows? Where are our nodes? In the
cloud (AWS, DigitalOcean, Azure)? Are they onsite
(e.g. managed by VMwares vSphere)? How many
server roles do we have? It quickly gets complicated.
Fortunately, clever people already grappled
with this problem. According to Test Kitchens
authors, the product...
is a test harness tool to execute your configured
code on one or more platforms in isolation. A
driver plugin architecture is used which lets
you run your code on various cloud providers
and virtualization technologies. Many testing
frameworks are already supported out of the box
including Bats, shUnit2, RSpec, Serverspec, with
others being created weekly.
So, Test Kitchen is our friend. The idea behind it
is simple and easier to understand it with an example.
Our cookbook root directory contains a .kitchen.
ymlfile that looks like this:
001 --002 driver:
003
name: vagrant
004
005 provisioner:
006
name: chef_zero
007
008 platforms:
009
- name: ubuntu-12.04
010
- name: centos-6.4
011
012 suites:
013
- name: default
014
run_list:
015
- recipe[my_first_
cookbook::default]
016
attributes:
This simple file touches on (almost) all the concepts
Test Kitchen relies on. It contains a list of platforms,
the list of machines where well run our tests. Platforms
usually map to a bare-bones machine - youre testing
their configuration processes, after all - but they can
be any kind of machine with any configuration. There
is also a list of suites, each specifying a Chef runlistwith (optional) attributes definitions. Adriver(in
our case,Vagrant) manages the platforms. Finally,
the provisioner (Chef Zero in our case) applies
each suite to each platform, unless we have
explicitly excluded it from the suite.
We can treat Test Kitchen as an orchestrator.
Notice how we havent mentioned anything about
tests, which might seem a bit weird. Well get to that
in due time.

Test Kitchen defines a state machine to control


its execution. It starts with creating a platform
instance by asking the driver to create a virtual
machine. It then tells the provisioner to converge
the node. After the node has converged, Test Kitchen
looks for tests and runs any it finds, and puts the
instance into the verified state. The cycle closes when
Test Kitchen destroys the instance. Given that this
cycle can be slow, Test Kitchen helps when things
go wrong by reverting to the last good state when
one of the steps fails. For instance, if a convergence
succeeds but a test fails, making the verify phase fail,
then the instance is kept in the converged state.
So, if we go to the command line and typechef
exec rake full, we will eventually run Test Kitchen:
001 [...]
002 -----> Cleaning up any prior instances of
<default-ubuntu-1204>
003 -----> Destroying <default-ubuntu-1204>...
004
Finished destroying <defaultubuntu-1204> (0m0.00s).
005 -----> Testing <default-ubuntu-1204>
006 -----> Creating <default-ubuntu-1204>...
007
Bringing machine default up with
virtualbox provider...
008
==> default: Box opscodeubuntu-12.04 could not be found.
Attempting to find and install...
009
default: Box Provider: virtualbox
010
default: Box Version: >= 0
011
==> default: Adding box opscodeubuntu-12.04 (v0) for provider:
virtualbox
012
default: Downloading: https://
opscode-vm-bento.s3.amazonaws.com/vagrant/
virtualbox/opscode_ubuntu-12.04_chefprovisionerless.box
013
014 [...]
015
016
- my_first_cookbook
017 Compiling Cookbooks...
018 Converging 3 resources
019 Recipe: my_first_cookbook::default
020
* package[httpd] action install[201411-17T18:50:19+00:00] INFO: Processing
package[httpd] action install (my_first_
cookbook::default line 1)
021
022
023
=======================================
=========================================
024
Error executing action `install` on
resource package[httpd]
025
=======================================
=========================================
026
027 [...]

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

31

CHEF AND COOKBOOK DEVELOPMENT FLOW


A couple of interesting things just happened.
First, Test Kitchen told Vagrant to launch a new
machine, defaulting to a box, which you can think
as a virtual-machine template, provided by Chef
in this line: Downloading: https://opscodevm-bento.s3.amazonaws.com/vagrant/
virtualbox/opscode_ubuntu-12.04_chefprovisionerless.box. This box corresponds to
the ubuntu-12.04 platform we stated earlier in
.kitchen.yml. You canspecifyyour own boxes, of
course.
Second, we got an error! When we check
our Test Kitchen instances, we see that defaultubuntu-1204 is in the Createdstate, because the
convergence step failed.
$ kitchen list
Instance
default-ubuntu-1204
default-centos-64
Provisioner
ChefZero
ChefZero

Driver
Vagrant
Vagrant

Last Action
Created
<Not Created>

We could log in to the instance by doing kitchen


loginand inspect the machine configuration to find
out what went wrong. But the error occurred on the
Ubuntu platform and our recipe does not support
Ubuntu. We dont want to support it, so lets remove
the - name: ubuntu-12.04 line from .kitchen.
yml.
Lets execute Rake again. This time everything
should run smoothly. (See Code 1, next page)
Although a successful Chef run tells us a
lot, especially when we have the ChefSpec safety
net, we can add an additional layer of testing. Test
Kitchen does not provide a testing framework, so
it is unable to execute automated tests by itself.
Test Kitchen relies on existing test frameworks,
such as Bats, shUnit2, RSpec, and Serverspec. Well
useServerspecto write a simple test.
Serverspec like ChefSpec is built on top of
RSpec, but their mechanics are completely different.
While ChefSpec has an intimate knowledge of Chef
and its inner workings, Serverspec has no idea that
Chef even exists. Serverspec just makes assertions
about a machines state. Is this package installed? Is
that service enabled? Serverspec has no idea how
that package was installed or the service enabled.
For all it cares, those operations could have been
performed manually!
Lets create a simple test. Create and name a
file default_spec.rb insidemy_first_cookbook\
test\integration\default\serverspec
with
the following content:
32

001 describe package(httpd)do


002
it { should be_installed }
003 end

The directory structure follows some specific


conventions:
test\integration- Test Kitchen looks for tests
here.
default- This is the exact name of the suite we
want to test.
serverspec - This tells Test Kitchen to use
Serverspec as its test framework.
If we execute chef exec rake full again, Test
Kitchen will find our test and execute it.
001 [...]
002 -----> Verifying <defaultcentos-64>...
003
Removing /tmp/busser/suites/
serverspec
004
Uploading /tmp/busser/
suites/serverspec/default_spec.rb
(mode=0644)
005 -----> Running serverspec test suite
006 [...]
007
Package httpd
008
should be installed
009
010
Finished in 0.0623 seconds
(files took 0.3629 seconds to load)
011
1 example, 0 failures
012
Finished verifying <defaultcentos-64> (0m1.54s).
013 -----> Kitchen is finished. (0m3.95s)
The test succeeds because Serverspec asserted that
the package is indeed installed on that instance.
The ChefSpec equivalent would only assert that
thepackageChef resource had been touched.
When do we write Serverspec tests? When do
we write ChefSpec tests? Thats material for a whole
new article. Id suggest that thetest pyramidcould be
applied to infrastructure testing as well, so you should
have a larger number of ChefSpec tests. Actually,
before writing integration tests with Serverspec or
a similar framework, ask if your ChefSpec tests and
successful Chef runs already cover your validation
needs.
Weve seen Test Kitchens work on Linux. What
about Windows? Unfortunately, Test Kitchen does
not officially supportWindows at the moment, but
there is hope! Salim Afiune is working on bringing
that support and Matt Wrock wrote an article that
shows how you can indeed use Test Kitchen with
Windows today. There are some rough edges that
pop up in medium to large tests, but they can be
overcome.

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

001
002
003
004
005
006
007
008
009
010
011
012

[...]
-----> Cleaning up any prior instances of <default-centos-64>
-----> Destroying <default-centos-64>...
Finished destroying <default-centos-64> (0m0.00s).
-----> Testing <default-centos-64>
-----> Creating <default-centos-64>...
Bringing machine default up with virtualbox provider...
==> default: Box opscode-centos-6.4 could not be found. Attempting to find
and install...
default: Box Provider: virtualbox
default: Box Version: >= 0
==> default: Adding box opscode-centos-6.4 (v0) for provider: virtualbox
default: Downloading: https://opscode-vm-bento.s3.amazonaws.com/vagrant/
virtualbox/opscode_centos-6.4_chef-provisionerless.box

013
014 [...]
015
016
Installing Chef
017
installing with rpm...
018
warning: /tmp/install.sh.2579/chef-11.16.4-1.el6.x86_64.rpm: Header V4 DSA/
SHA1 Signature, key ID 83ef826a: NOKEY
019 Preparing...
##### ###########################################
[100%]
020
1:chef
###########################################
[100%]
021
Thank you for installing Chef!
022
023 [...]
024
025
[2014-11-21T00:47:59+00:00] INFO: Run List is [recipe[my_first_
cookbook::default]]
026
[2014-11-21T00:47:59+00:00] INFO: Starting Chef Run for default-centos-64
027
[2014-11-21T00:47:59+00:00] INFO: Loading cookbooks [my_first_cookbook@0.1.0]
028
Synchronizing Cookbooks:
029
- my_first_cookbook
030
Compiling Cookbooks...
031
Converging 3 resources
032
Recipe: my_first_cookbook::default
033
* package[httpd] action install[2014-11-21T00:47:59+00:00] INFO: Processing
package[httpd] action install (my_first_cookbook::default line 1)
034
[2014-11-21T00:48:30+00:00] INFO: package[httpd] installing httpd-2.2.15-39.
el6.centos from base repository
035
036
- install version 2.2.15-39.el6.centos of package httpd
037
038 [...]
039
040 [2014-11-21T00:48:50+00:00] INFO: Chef Run complete in 51.251318527 seconds
041
042
Running handlers:
043
[2014-11-21T00:48:50+00:00] INFO: Running report handlers
044
Running handlers complete
045
[2014-11-21T00:48:50+00:00] INFO: Report handlers complete
046
Chef Client finished, 4/4 resources updated in 58.012060901 seconds
047
Finished converging <default-centos-64> (2m4.75s).

Code 1

Wrapping it up
We know the basic concepts of Chef. We know how to harness tools to help our Chef development process.
Whats next? Actually, quite a lot. Chef is a (very)largeproduct. I hope this gave you a step up. Start small. Make
sure you fully understand Chefs concepts. Create some cookbooks.Reusesome cookbooks. Youll be a master
chef in no time.
DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

33

READ ONLINE ON InfoQ

Introduction to Puppet

Susannah Axelrodjoined Puppet Labs in 2013 from Huron Consulting,


where she was director of product management. Prior to Huron, Susannah
held product leadership roles at Thomson Reuters, Sage Software, Intuit, and
Intel. She loves figuring out what customers need and working to solve their
problems. Susannah received her BA from the University of Chicago and her
MBA from the Wharton School at the University of Pennsylvania.

Every IT professional has suffered the frustration of code that breaks in


production. Experienced developers pour hours, days, and weeks into creating
applications, only to have to patch them repeatedly after release. QA engineers
are certain theyve hit targets for high performance and low riskon their test
systems. And ops follows every deployment checklist to the letter, only to find
themselves working late night after night, trying to keep these applications
running (or limping along) in production.
Meanwhile, executives wring their hands and
fret about all the money being spent with such
mediocre results. Why does it take so long for us
release features, and even bug fixes? Customers are
defecting. Competitors technology is way ahead,
and Wall Street is taking notice.
IT organizations in situations like the above are
often strictly siloed. Dev, ops, and testers are managed
separately, have different metrics and goals, may
work in different buildings, and sometimes have
never even met each other. These teams are likely
working on different technology stacks with distinct
34

configurations. The application code may stay


consistent but nothing else does. What works on a
devs laptop or in the QA environment often doesnt
work when deployed to production. Worst of all, no
one understands the root causes of their problems.
The founder of Puppet Labs, Luke Kanies, was
one of those ops folks stuck working late nights in the
data center. His dissatisfaction with the status quo
led him to write the software that became Puppet.
But, wait we were just talking about
organizational problems. How can software solve
cultural issues and enforce collaboration? The answer

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

is, it cant at least, not by itself.


Puppet is a great infrastructuremanagement platform that
any system administrator can
use to get work done more
efficiently, even from within a
siloed ops team. However, for an
organization thats ready to lift
collaboration to the next level,
Puppet supplies the powerful
glue of a shared codebase that
unifies different teams.
Bear with me for a bit as
we walk through how Puppet
works, and discuss how it helps
teams at all stages of enhancing
collaboration around software
development and release an
evolution thats often referred to
as DevOps.

What is Puppet?
Puppet really refers to two
different things: the language
in which code is written and
the platform that manages
infrastructure.

Puppet: the language


Puppet is a simple modeling
language used to write code
that automates management of
infrastructure. Puppet allows you
to simply describe the end state
into which you want to get your
systems (we call them nodes).
Contrast that with procedural
scripts: to write one, you need
to know what it will take to get a
specific system to a specific state,
and to be able to write those

steps out correctly. With Puppet,


you dont need to know or specify
the steps required to get to the
end state, and you arent at risk of
getting a bad result because you
got the order wrong or made a
slight scripting error.
Also, unlike procedural
scripts, Puppets language works
across different platforms. By
abstracting state away from
implementation, Puppet allows
you to focus on the parts of the
system you care about, leaving
implementation details like
command names, arguments,
and file formats to Puppet itself.
For example, you can use Puppet
to manage all your users the
same way, whether a user is
stored in NetInfo or /etc/passwd.
This concept of abstraction
is key to Puppets utility. It allows
anyone whos comfortable with
any kind of code to manage
systems at a level appropriate
for their role. That means teams
can collaborate better, and
people can manage resources
that would normally be outside
their ken, promoting shared
responsibility among teams.
Another advantage of
the modeling language is that
Puppet is repeatable. Unlike
scripts, which you cant continue
to run without changing the
system, you can run Puppet over
and over again, and if the system
is already in its desired state,
Puppet will leave it in that state.

Resources
The foundation of the Puppet
language is its declaration
of resources. Each resource
describes a component of a
system, such as a service that
must be running or a package
that must be installed. Some
other examples of resources:
A user account;
A specific file;
A directory of files;
Any software package;
Any running service.
Its helpful to think of resources
as building blocks that can be
combined to model the desired
state of the systems you manage.
This leads us naturally to
Puppets further definitions,
which allow you to combine
things in an economical way
economy being one of Puppets
key attributes.

Types and providers


Puppet groups similar kinds of
resources into types. For example,
users fall into one type, files
into another, and services into
another. Once you have correctly
described a resource type, you
simply declare the desired state
for that resource; instead of
saying, Run this command that
starts XYZ service, you simply
say, Ensure XYZ is running.
Providers
implement
resource types on a specific kind
of system using the systems
own tools. The division between

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

35

INTRODUCTION TO PUPPET
types and providers allows a
single resource type (such as
package) to manage packages
on many different systems.
For example, your package
resource could manage yum on
Red Hat systems, dpkg and APT
on Debian systems, and ports on
BSD systems.
Providers
are
less
commonly declared by admins,
and only if they want to change
the system defaults. Providers
are written into Puppet precisely
so you dont have to know how
to manage each operating
system or platform running on
your infrastructure. Again, its
Puppet abstracting away details
you shouldnt have to worry
about. If you do need to write a
provider, these are often simple
Ruby wrappers around shell
commands so they are usually
short and easy to create.
Types and providers enable
Puppet to function across all
major platforms and allow
Puppet to grow and evolve to
support additional platforms
beyond compute servers, such as
networking and storage devices.
The
example
below
demonstrates the simplicity
of the Puppet language by
showing how a new user and
group are added with a shell
script, contrasted to the identical
action in Puppet. In the Puppet
example, user and group are
types, and Puppet automatically
discovers
the
appropriate
provider for your platform. The
platform-specific,
procedural
scripts are much harder both to
write and to understand.

Classes, manifests, and


modules
Every other part of the Puppet
language exists to add flexibility
and convenience to how
resources are declared. Classes
are Puppets way of extricating

36

chunks of code, combining


resources into larger units of
configuration. A class could
include all the Puppet code
needed to install and configure
NTP, for example. Classes can be
created in one place and invoked
in another.
Different sets of classes
are applied to nodes that serve
different roles. We call this node
classification and its a powerful
capability that allows you to
manage your nodes based on
their capabilities rather than
their names. Its the cattle, not
pets approach to managing
machines that is favored in fastmoving organizations.
Puppet language files are
called manifests. The simplest
Puppet deployment is a lone
manifest file with a few resources.
Giving the basic Puppet code in
the above example the filename
user-present.pp would make it
a manifest.
Modules are a collection of
classes, resource types, files, and
templates, all organized around a
particular purpose and arranged
in a specific, predictable
structure. There are modules
available for all kinds of purposes,
from completely configuring an
Apache instance to setting up
a Rails application, and many,
many more. Including the
implementation of sophisticated
features in modules allows
admins to have much smaller,
more readable manifests which
simply are the modules.
One huge benefit of Puppet
modules is that they are reusable.
You can use modules written by
other people, and Puppet has
a large, active community of
people who freely share modules
theyve written. Thats in addition
to the modules written by Puppet
Labs employees. Altogether,
youll find more than 3,000
modules available for free on

the Puppet Forge. Many of these


were created for some of the
most common tasks sysadmins
are responsible for, so theyll save
you a lot of time. For example,
you can manage everything from
simple server building blocks
(NTP, SSH) to sophisticated
solutions (SQL Server, F5).
Classes, manifests, and
modules are all just code. They
can and should, as well discuss
later be checked into version
control, just like any other code
your organization needs.
Puppet: the platform
The language alone does
not make up Puppet. People
need to deploy Puppet code
across infrastructure, periodically
update code with configuration
changes, remediate unintended
changes, and inspect their
systems to ensure everything is
working as intended. To meet
these needs, most customers
run Puppet in a master-agent
structure, comprised of a number
of components. Customers can
run one or more Puppet masters,
depending on their needs. An
agent is installed on each node,
which then establishes a secure,
signed connection with the
master.
The master-agent structure
is used to deploy Puppet code
to nodes and to maintain the
configuration of those nodes
over time. Before configuring a
node, Puppet compiles manifests
into a catalog. Catalogs are static
documents that define resources
and the relationships between
them. A given catalog applies to
a single node, according to its job
and the context in which it will
do its job. A catalog defines how
a node will function, and is used
by Puppet to check whether a
node is correctly configured and
to apply a new configuration if
needed.

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

Each node-based agent


checks in periodically with
a master server during each
regular Puppet run. Puppet can
then do any of the following:
remediate any configurations
that have drifted from what
they should be;
report on the state of
nodes without making any
changes;
apply
any
desired
configuration
changes,
using Puppets orchestration
tooling;
collect data from nodes
and events, and store it for
retrieval;
Puppet Labss commercial
solution, Puppet Enterprise,
adds customer support
and a variety of advanced,
mission-critical capabilities;
sophisticated
nodemanagement capabilities;
role-based access control;
operational metrics and a
reporting console.

Putting it all together


Now you have a basic
understanding of how Puppet
works, but you may still be
wondering how it can help
your organization fix its deeper
problems and enable people to
collaborate more easily.
It all boils down to this:
when you use Puppet, you are
modeling your infrastructure
as code. You can treat Puppet
and, by extension, your
infrastructures configuration
just like any other code. Puppet
code is easily stored and reused.
It can be shared with others
on the ops team and people
on other teams who need to
manage machines. Dev and ops
can use the same manifests to
manage systems from the laptop
dev environment all the way to
production, so there are fewer
nasty surprises when code is

released into production. That


can yield big improvements in
deployment quality, especially
for some organizations weve
seen.
Treating configuration as
code also makes it possible for
sysadmins to give devs the ability
to turn on their own testing
environments, so devs no longer
see sysadmins as standing in
their way. You can even hand
Puppet code to auditors, many of
whom accept Puppet manifests
as proof of compliance. All of
this improves efficiencies, and
peoples tempers, too.
Perhaps, most important of
all, you can check Puppet code
into a shared version-control
tool. This gives you a controlled,
historical record of your
infrastructure. You can adopt
the same peer-review practices
in ops that software developers
use, so ops teams can continually
improve configuration code,
updating and testing until you
are secure enough to commit
configurations to production.
Because Puppet has the
ability to run in simulation or noop mode, you can also review
the impact of changes before
you make them. This helps make
deployments much less stressful,
since you can roll back if needed.
By using Puppet with
version control and the practices
outlined above, many of our
customers achieve the holy grail
of continuous delivery, delivering
code more frequently into
production with fewer errors.
When you deploy applications
in smaller increments, early and
frequent customer feedback
tells you whether or not you are
headed down the right road.
This saves you from delivering a
big wad of code after six to 12
months of development, only to
discover it doesnt fit user needs
or simply doesnt please them.

Our customers evolve the


configuration of dev, test, and
production environments in
step with application code from
developers. This allows devs to
work in an extremely realistic
environment, often identical
to production. Applications
no longer break in production
due to unknown configuration
differences between dev and
test. Devs and QA get to deploy
more good software, ops no
longer burns the midnight oil,
and executives are finally
well, if not happy, at least
satisfied enough to shift their
focus to concerns other than IT
efficiency!

Taking the first step


Most organizations we see
admittedly are pretty far
from an advanced level of
continuous collaboration, let
alone
continuous
delivery.
The nice thing about Puppet
is that it grows and scales as
your team and infrastructure
grow and scale. You may not be
ready yet to roll out companywide DevOps practices and
thats okay. Many customers
successfully use Puppet as a
configuration-management tool
in conservative, complianceoriented industries such as
banking
and
government.
These organizations may have
little need to adopt continuous
delivery but, nonetheless, storing
and versioning infrastructure
as code vastly improves their
change control and security
practices.
We recommend you start
by automating one thing that
will make your job easier. For
instance, many admins start by
automating management of
NTP, DNS, SSH, firewalls, or users
and groups all things that are
completely routine and that suck
up a lot of time.

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

37

Want to
make sure
Dev and
Ops are on
the same
page?
Learn more about
Dev and Ops
38
newrelic.com/dev-ops

After gaining experience with


Puppet, many people move up
the stack, writing more complex
modules to manage services like
Tomcat monitoring or their JBoss
application servers. Others adopt
and adapt Puppet Forge modules.
When youre ready to dive in
further, you can make sure all the
machines in the data center and
in the cloud are equipped to do
the jobs theyre supposed to do,
that theyre actually doing those
jobs, and that the overall system
is functioning properly to run
the applications that serve your
business.
Its important to remember
that you dont have to wade
into infrastructure as code all
by yourself. Others have solved
these problems before you, so
make good use of their work! We
already mentioned the thousands
of modules available on the
Puppet Forge. You can also rely
on the Puppet community, which
numbers in the tens of thousands.
Subscribe to Puppet Users on
Google Groups and check out
Puppet Ask, and get to know the
engaged and responsive people
there. Attend a Puppet Camp or a
meeting of a Puppet User Group
in your area to meet people in
person. You can use Puppet Labs
learning resources, both free
and paid, and theres always our
YouTube channel and our official
documentation, too.
This is just a taste of what
you can find in the Puppet
ecosystem. We look forward
to seeing you and helping you
learn how Puppet can make your
infrastructure, your business, and
your work life run so much better.

READ ONLINE ON InfoQ

The LogStash Book,


Log Management Made Easy

James Turnbullis the author of six technical books about open source software and a long-time
member of the open source community. James authored the first (and second!) books about
Puppet and works for Puppet Labs running Operations and Professional Services.Jamesspeaks
regularly at conferences including OSCON, Linux.conf.au, FOSDEM, OpenSourceBridge,
DevOpsDays and a number of others. He is a past president of Linux Australia, a former
committee member of Linux Victoria, was Treasurer for Linux.conf.au 2008, and serves on the
program committee ofLinux.conf.au and OSCON.

James Turnbull makes a compelling case for using LogStash for centralizing
logging by explaining the implementation details of LogStash within the context
of a logging project. The Logstash Book targets both small companies and large
enterprises through a two sided case; both for the low barrier to entry and the
scaling capabilities. James talked about the book on Hangops: Its designed for
people who have never seen LogStash before, sysadmins, developers, devops,
operations people. I expect you to know a little about unix or linux. He continued,
Additionally it assumes you have no prior knowledge of LogStash.
The Problem of Over
Simplifying Log
Management
James comes from a system
administrator
and
security
background. He explains how
computing environments have
evolved log management in
ways that do not scale.
He shares that it generally
falls
apart
through
an

evolutionary process, starting


with when logs become most
important to people, that is to
say when trouble strikes. At that
time new administrators will
start examining the logs with the
classical tools cat, tail, sed, awk,
perl, and grep. This practice helps
develop a good skill set around
useful tools, however it does not
scale beyond a few hosts and log

file types. Upon realization of the


scalability issue, teams will evolve
into using centralized logging
with tools such as rsyslog and
syslog-ng.
While this starts to handle
the scale issue, James shares
that it doesnt really solve the
problem of log management
though because now there
is an overwhelming number

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

39

THE LOGSTASH BOOK, LOG MANAGEMENT MADE EASY

Image 1
of different log event types,
different formats, different time
zones and basically a lack of
easily understandable context.
Finally a team may retrofit
their computing environment
with logging technology that
can handle large amounts of
storage, search, filtering, and the
like. In the end unfortunately
this approach includes a lot of
waste and has a relatively high
cost. LogStash saves the day
by satisfying the need for a low
barrier to entry like the classical
system administrator tools, but is
fully architected to scale to large
web scale deployments.

LogStash Architecture
Overview
LogStash
provides
an
architecture
for
collecting,
parsing, and storing logs. In
addition, one of the main cross
cutting use cases for a LogStash
implementation is the viewing/
searching of the managed log
events. Kibana was a natural
fit because it provides a user
friendly search interface that
integrates with Elasticstorage,
the storage engine for LogStash.
After Logstash was bought
by Elasticsearch the company
bundled the three tools and
40

In the book, James drills


announced them as the ELK
into the three primary functions
Stack.
within a LogStash instance:
The following is an out-ofgetting input events, filtering
the-box Kibana screenshot in an
event data, and outputting
ELK setup (Image 1)
events. These three functions of
Beyond the viewing of
LogStash are performed based
Logs there is an architecture of
on configuration information
components that manages the
stored in an easy to understand
flow of logs from disparate servers
.conf file. The .conf file has
through a broker and ultimately
sections for the three different
into storage. James takes readers
types of plugins LogStash uses:
through an exploration of each
input, filter, and output. Each
component in the LogStash
LogStash instance is customized
setup, which uses Redis, an open
to meet the requirements of its
source key value store, to queue
role in the overall architecture.
logs in preparation for indexing.
For example this configuration
It also uses Elasticsearch for
for a shipper contains one input
storage of logs and as a back
and two outputs:
end for the viewing system.
The following
d i a g r a m
from
the input {
redis {
book
shows

host => 10.0.0.1
the
distinct

type => redis-input
a r c h i t e c t u r e
data_type => list
c o m p o n e n t
key => logstash
t y p e s
}
including:
}
shipper, broker,
output {
indexer, and stdout {
debug => true
viewer (where

}
Kibana is the
web interface elasticsearch {
cluster => logstash
in the ELK

}
stack) (Image 2)

}

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

LogStash
Components:
Shipper, Broker,
Indexer
The book covers the three
LogStash plugin types in
the context of their usage
in shippers and indexers.
James shows how to use
the following input plugins
with LogStash: file, stdin,
syslog, lumberjack, and
redis. For environments
where LogStash cant
be installed, there are
other options for sending
events
that
integrate

with LogStash: syslog,


Lumberjack, Beaver and
Woodchuck.
There
is
overlap between input and
output plugins in LogStash,
for example there are both
input and output redis
plugins. In addition to the
main two outputs covered,
redis and elasticsearch,
James
also
includes
outputs that integrate
with
other
systems
including: Nagios, email
alerts, instant messages,
and StatsD/Graphite. The
filters covered in the book

include: grok, date, grep,


and multiline. James shows
how the filter plugins can
enable efficient processing
of postfix logs and java
application logs. In some
cases the logs can be
filtered before LogStash
uses them as input, for
example Apache logging
has a custom format
capability that allows for
logging in a JSON format
that LogStash can easily
process without an internal
filter plugin. The broker,
which we have specified as

Image 2

Image 3
DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

41

THE LOGSTASH BOOK, LOG MANAGEMENT MADE EASY

COMPUTING
ENVIRONMENTS
HAVE EVOLVED LOG
MANAGEMENT IN
WAYS THAT DO NOT
SCALE. SCALING
WITH LOGSTASH
ACCOMPLISHES
THREE MAIN
GOALS: RESILIENCY,
PERFORMANCE, AND
INTEGRITY.
Aslan Brooke

Redis, is for managing event flow,


LogStash supports the following
other queue technologies in this
role: AMPQ and ZeroMQ. The
Indexer instance of LogStash
performs the routing to search/
storage.

Scaling LogStash
Scaling LogStash accomplishes
three main goals: resiliency,
performance, and integrity. The
following diagram from the book
illustrates the scaling of Redis,
LogStash, and Elasticsearch: (Image
3)
LogStash does not depend
on Redis to manage failover itself.
Instead LogStashh sends events
to one of two Redis instances
it has configured. Then if the
selected Redis instance becomes
unavailable, LogStash will begin
sending events to another
configured Redis instance. As an
Indexer, LogStash is easily scaled
by creating multiple instances that
continually pull from all available
Brokers and output to Elasticsearch.
Within this design events only make
it to one Broker so there should
be no duplicates being passed
through the LogStash indexer into
Elasticsearch. Elasticsearch easily
clusters itself when you install
multiple instances and set the
configurations to have common
settings. It uses multicast, unicast,
or an EC2 plugin to cluster itself
based on configuration settings in
each individual instance. As long
as network allows for the instances
to communicate they will cluster
themselves and begin dividing the
data up among the cluster nodes.
The divisions in the data are made
automatically to provide resiliency
and performance..

from your applications is often


the best source of information
when you have a problem in your
infrastructure. They also represent
an excellent source of data for
monitoring the state and events in
your infrastructure and for building
metrics that demonstrate how your
applications are performing.
This being said, different
teams in enterprise organizations
care about different aspects of those
logging use cases. For example,
operations teams focus on the
trouble-shooting and performance
data logs can provide. Application
developers are keenly interested
in using log output to help find
and fix bugs. Security teams focus
on identifying vulnerabilities and
security incidents that log data
might highlight.

Logging Use Cases


James Turnbull described to InfoQ
the main use cases for logging data
in an enterprise setting:
The best use cases for
logging are trouble-shooting
and monitoring. The log data
42

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

READ ONLINE ON InfoQ

Monitoring with Graphite

Franklin Angulooversees the teams that build and maintain the large-scale back-end engine
at the core ofSquarespace, a website-building platform based in New York City. Franklin
is a seasoned professional with experience leading complex, large-scale, multidisciplinary
engineering projects. Before joining Squarespace, he was a senior software engineer at
Amazon.com, working on route-planning optimizations, shipping-rate shopping, and capacity
planning algorithms for global inbound logistics and the Amazon Locker program.

Graphite stores and graphs numeric time-series data collected by other tools. This
article intends to guide you through the setting up a monitoring system using a
Graphite stack.
First and foremost, you
time-series data. The
need hardware on which
front-end components
to run the Graphite
retrieve the metric data
stack. For simplicity, I will
and optionally render
use Amazon EC2 hosts
graphs.
Lets
focus
but feel free to use any
first on the back-end
computer in your office
components: Carbon and
or at home.
Whisper.
The Amazon EC2
Carbon refers to
specifications are:
a series of daemons
Metrics can be published to a load balancer or directly to a Carbon
operating
system:
that make up the
process. The Carbon process interacts with the Whisper database
Red Hat Enterprise
storage back end of a
library to store the time-series data to the file system.
Linux (RHEL) 6.5;
Graphite
installation.
instance
type:
The daemons listen for
m3.xlarge;
Graphite is composed of
time-series data using an event elastic block store (EBS)
multiple back-end and frontdriven networking engine called
volume: 250 GB;
end components. The back-end
Twisted. The Twisted framework
Python: version 2.6.6.
components store the numeric
permits Carbon daemons to
DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

43

MONITORING WITH GRAPHITE


handle a large number of clients and a large amount
of traffic with low overhead.
To install Carbon, run the following commands
(assuming a RHEL operating system):
001 # sudo yum groupinstall Development
Tools
002 # sudo yum install python-devel
003 # sudo yum install git
004 # sudo easy_install pip
005 # sudo pip install twisted
006 # cd /tmp
007 # git clone https://github.com/
graphite-project/carbon.git
008 # cd /tmp/carbon
009 # sudo python setup.py install

The /opt/graphite directory should now have the


carbon libraries and configuration files.
001 # ls -l /opt/graphite
002 drwxr-xr-x. 2 root root 4096 May 18
23:56 bin
003 drwxr-xr-x. 2 root root 4096 May 18
23:56 conf
004 drwxr-xr-x. 4 root root 4096 May 18
23:56 lib
005 drwxr-xr-x. 6 root root 4096 May 18
23:56 storage
Inside the bin folder, youll find the three different
types of Carbon daemons.
Cache: accepts metrics over various protocols
and writes them to disk as efficiently as possible;
caches metric values in RAM as they are received
and flushes them to disk on a specified interval
using the underlying Whisper library.
Relay: serves to replicate and shard incoming
metrics.
Aggregator: runs in front of a cache to buffer
metrics over time before reporting them to
Whisper.
Whisper is a database library for storing time-series
data that is then retrieved and manipulated by
applications using the create, update, and fetch
operations. To install Whisper, run the following
commands:
001 # cd /tmp
002 # git clone https://github.com/
graphite-project/whisper.git
003 # cd /tmp/whisper
004 # sudo python setup.py install
The Whisper scripts should now be in place (Code 1)

Start a Carbon cache process


The Carbon installation comes with sensible
defaults for port numbers and many other
configuration parameters. Copy the existing sample
configuration files (Code 2)

44

001 # ls -l /usr/bin/whisper*
002 -rwxr-xr-x. 1 root root 1711 May 19 00:00 /
usr/bin/whisper-create.py
003 -rwxr-xr-x. 1 root root 2902 May 19 00:00 /
usr/bin/whisper-dump.py
004 -rwxr-xr-x. 1 root root 1779 May 19 00:00 /
usr/bin/whisper-fetch.py
005 -rwxr-xr-x. 1 root root 1121 May 19 00:00 /
usr/bin/whisper-info.py
006 -rwxr-xr-x. 1 root root 674 May 19 00:00 /
usr/bin/whisper-merge.py
007 -rwxr-xr-x. 1 root root 5982 May 19 00:00 /
usr/bin/whisper-resize.py
008 -rwxr-xr-x. 1 root root 1060 May 19 00:00 /
usr/bin/whisper-set-aggregation-method.py
009 -rwxr-xr-x. 1 root root 969 May 19 00:00 /
usr/bin/whisper-update.py

Code 1
001 # cd /opt/graphite/conf
002 # cp aggregation-rules.conf.example
aggregation-rules.conf
003 # cp blacklist.conf.example blacklist.conf
004 # cp carbon.conf.example carbon.conf
005 # cp carbon.amqp.conf.example carbon.amqp.
conf
006 # cp relay-rules.conf.example relay-rules.
conf
007 # cp rewrite-rules.conf.example rewriterules.conf
008 # cp storage-schemas.conf.example storageschemas.conf
009 # cp storage-aggregation.conf.example
storage-aggregation.conf
010 # cp whitelist.conf.example whitelist.conf
011 # vi carbon.conf

Code 2
Under the cache section, the line receiver port has
a default value and it is used to accept incoming
metrics through the plaintext protocol (see below).
001 [cache]
002 LINE_RECEIVER_INTERFACE = 0.0.0.0
003 LINE_RECEIVER_PORT = 2003

Start a carbon-cache process by running the


following command:
001 # cd /opt/graphite/bin
002 # ./carbon-cache.py start
003 Starting carbon-cache (instance a)
The process should now be listening on port 2003.
(Code 3)

Publish metrics
A metric is any measurable quantity that can vary
over time, for example:
number of requests per second;
request processing time;

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

001
002
003
004
005

# ps -efla | grep carbon-cache


1 S root 2674 1 0 80 0 - 75916 ep_pol 00:18 ?
# netstat -nap | grep 2003
tcp
0
0 0.0.0.0:2003

00:00:03 /usr/bin/python ./carbon-cache.py start

0.0.0.0:*

LISTEN

2674/python

Code 3
001 # tail -f /opt/graphite/storage/log/carbon001 sudo yum install nc
cache/carbon-cache-a/creates.log
002 echo carbon.agents.graphite-tutorial.
002 19/05/2014 10:42:44 :: creating database
metricsReceived 28198 `date +%s` | nc
file /opt/graphite/storage/whisper/carbon/
localhost 2003
agents/graphite-tutorial/metricsReceived.wsp
003 echo carbon.agents.graphite-tutorial.creates 8
(archive=[(60, 129600)] xff=0.5 agg=average)
`date +%s` | nc localhost 2003
003 19/05/2014 10:42:53 :: creating database file /
004 echo PRODUCTION.host.graphite-tutorial.
opt/graphite/storage/whisper/carbon/agents/
responseTime.p95 0.10 `date +%s` | nc
graphite-tutorial/creates.wsp (archive=[(60,
localhost 2003
129600)] xff=0.5 agg=average)
Code 4
004 19/05/2014 10:42:57 :: creating database file
/opt/graphite/storage/whisper/PRODUCTION/
CPU usage.
host/graphite-tutorial/responseTime/p95.wsp
A data point is a tuple
(archive=[(60, 1440)] xff=0.5 agg=average)
with a metric name,

with a measured value,


at a specific point in time (usually a timestamp).
Client applications publish metrics by sending
data points to a Carbon process. The application
establishes a TCP connection on the Carbon processs
port and sends data points in a simple plaintext
format. In our example, the port is 2003. The TCP
connection may remain open and can be reused as
many times as necessary. The Carbon process listens
for incoming data but does not send any response
back to the client.
The data point format is defined as:
a single line of text per data point;
a dotted metric name at position 0;
a value at position 1;
a Unix Epoch timestamp at position 2;
spaces for the position separators.
For example, here are some valid data points:
The number of metrics received by the carboncache process every minute.
c a r b o n . a g e n t s . g r a p h i t e - t u t o r i a l .
metricsReceived 28198 1400509108
The number of metrics created by the carboncache process every minute.
c a r b o n . a g e n t s . g r a p h i t e - t u t o r i a l .
creates 8 1400509110
The p95 response times for a sample server
endpoint over a minute.
PRODUCTION.host.graphite-tutorial.
responseTime.p95 0.10 1400509112
Client applications have multiple ways to publish
metrics:
using the plaintext protocol with tools such as
the netcat (nc) command;
using the pickle protocol;

Code 5

using the Advanced Message Queuing Protocol


(AMQP);
using libraries such as the Dropwizard Metrics
library.
For simplicity, in this tutorial well use the plaintext
protocol through the netcat command. To publish
the sample data points listed above, run the following
commands (Code 4)
The carbon-cache log files will contain
information about the new metrics received and
where the information was stored (Code 5)
Carbon interacts with Whisper to store the timeseries data in the file system. Navigate the file system
to make sure the data files have been created:
001 # ls -l /opt/graphite/storage/
whisper/carbon/agents/graphitetutorial/
002 total 3040
003 -rw-r--r--. 1 root root 1555228 May
19 10:42 creates.wsp
004 -rw-r--r--. 1 root root 1555228 May
19 10:42 metricsReceived.wsp
005 # ls -l /opt/graphite/storage/
whisper/PRODUCTION/host/graphitetutorial/responseTime/
006 total 20
007 -rw-r--r--. 1 root root 17308 May 19
10:42 p95.wsp

Finally, you can retrieve metadata information


about the Whisper file that was created for the metric
using the whisper-info script (Code 6)
The whisper-dump script is a more complete
script that outputs the original data for all storage

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

45

MONITORING WITH GRAPHITE


001 # whisper-info.py /opt/graphite/
storage/whisper/PRODUCTION/host/
graphite-tutorial/responseTime/p95.
wsp
002 maxRetention: 86400
003 xFilesFactor: 0.5
004 aggregationMethod: average
005 fileSize: 17308
006
007 Archive 0
008 retention: 86400
009 secondsPerPoint: 60
010 points: 1440
011 size: 17280
012 offset: 28

Code 6
retention periods along with the metadata
information about the Whisper file:
001 # whisper-dump.py /opt/graphite/
storage/whisper/PRODUCTION/host/
graphite-tutorial/responseTime/p95.
wsp
002 Meta data:
003
aggregation method: average
004
max retention: 86400
005
xFilesFactor: 0.5
006
007 Archive 0 info:
008
offset: 28
009
seconds per point: 60
010
points: 1440
011
retention: 86400
012
size: 17280
013
014 Archive 0 data:
015 0: 1400609220, 0.10000000000000000555
11151231257827
016 1: 0,
0
017 2: 0,
0
018 3: 0,
0
019 4: 0,
0
020 5: 0,
0
021 ...
022 1437: 0,
0
023 1438: 0,
0
024 1439: 0,
0
Aggregation method, max retention, xFilesFactor,
and all of the other attributes of the Whisper file are
important to understand. Well be covering these in
more detail in the next section.

Whisper storage schemas and


aggregations
There might be some confusion when developers
and system administrators start publishing data
points and get unexpected results.
Why are our data points getting averaged?
46

001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025

Meta data:
aggregation method: average
max retention: 604800
xFilesFactor: 0.5
Archive 0 info:
offset: 52
seconds per point: 10
points: 2160
retention: 21600
size: 25920
Archive 1 info:
offset: 25972
seconds per point: 60
points: 1440
retention: 86400
size: 17280
Archive 2 info:
offset: 43252
seconds per point: 600
points: 1008
retention: 604800
size: 12096

Code 7

Weve been publishing data points intermittently


so why are there no data points?
Weve been publishing data points for many days
so why are we only getting data for one day?
You first need to understand how data is stored in the
Whisper files. When a Whisper file is created, it has a
fixed size that will never change. Within the Whisper
file are potentially multiple buckets that you need
to define in the configuration files, for data points at
different resolutions. For example:
Bucket A: data points with 10-second resolution.
Bucket B: data points with 60-second resolution.
Bucket C: data points with 10-minute resolution.
Each bucket also has a retention attribute that
indicates how long the bucket should retain data
points.
Bucket A: data points with 10-second resolution
retained for 6 hours.
Bucket B: data points with 60-second resolution
retained for 1 day.
Bucket C: data points with 10-minute resolution
retained for 7 days.
Given these two pieces of information, Whisper
performs some simple math to figure out how many
points it will need to keep in each bucket.
Bucket A: 6 hours x 60 min/hour x 6 data points/
min = 2,160 points.
Bucket B: 1 day x 24 hours/day x 60 min/hour x 1
data point/min = 1,440 points.
Bucket C: 7 days x 24 hours/day x 6 data points/
hour = 1,008 points.

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

A Whisper file with this storage configuration will


have a size of 56 kB. Running it through the whisperdump.py script will produce the following output.
Note that an archive corresponds to a bucket and the
seconds per point and points attributes match our
computations above. (Code 7)
Aggregations come into play when data from
a high-precision bucket moves to a bucket with less
precision. Lets use Bucket A and B from our previous
example.
Bucket A: 10-second resolution retained for 6
hours (higher precision).
Bucket B: 60-second resolution retained for 1 day
(lower precision).
We might have an application publish data points
every 10 seconds. Any data points published more
recently than six hours ago will be found in Bucket
A. A query for data points published earlier than six
hours ago will find them in Bucket B.
The lower precision value is divided by the
higher precision value to determine the number of
data points that will need to be aggregated.
60 seconds (Bucket B) / 10 seconds (Bucket A) =
6 data points to aggregate.
Note that Whisper needs the lower precision value to
be cleanly divisible by the higher precision value (i.e.
the division must result in a whole number) or the
aggregation might not be accurate.
To aggregate the data, Whisper reads six
10-second data points from Bucket A and applies
a function to them to come up with the single
60-second data point that will be stored in Bucket B.
There are five options for the aggregation function:
average, sum, max, min, and last. The choice of
aggregation function depends on the data points
youre dealing with. Ninety-fifth percentile values,
for example, should probably be aggregated with
the max function. For counters, on the other hand,
the sum function would be more appropriate.
When aggregating data points, Whisper also
handles the concept of an xFilesFactor, which
represents the ratio of data points a bucket must
contain to be aggregated accurately. In our previous
example, Whisper determined that it needed to
aggregate six 10-second data points. Its possible,
for example, that only four data points represent
data while the other two are null due to networking
issues, application restarts, etc.
A Whisper file with an xFilesFactor of 0.5 will
only aggregate data points if at least 50% of the
data points are present. If more than 50% of the data
points are null, Whisper will create a null aggregation.
In the previous paragraph, we have four out of six
data points: 66%. With an xFilesFactor of 0.5, the
aggregation function will be applied to the non-null
data points to create the aggregated value.

You may set the xFilesFactor to any value


between 0 and 1. A value of 0 indicates that the
aggregation should be computed even if there is
only one data point. A value of 1 indicates that the
aggregation should be computed only if all data
points are present.
The configuration files that control how Whisper
files are created are:

/opt/graphite/conf/storage-schemas.
conf
/opt/graphite/conf/storageaggregation.conf

Default storage schemas and


aggregation
The storage-schemas configuration file is composed
of multiple entries containing a pattern against which
to match metric names and a retention definition. By
default, there are two entries: carbon and everything
else.
The carbon entry matches metric names that
start with the carbon string. Carbon daemons emit
their own internal metrics every 60 seconds by default
(we can change the interval). For example, a carboncache process will emit a metric for the number of
metric files it creates every minute. The retention
definition indicates that data points reported every
60 seconds would be retained for 90 days.
001 [carbon]
002 pattern = ^carbon\.
003 retentions = 60s:90d

The everything else entry captures any metric that


is not carbon-related by specifying a pattern with an
asterisk. The retention definition indicates that data
points reported every 60 seconds will be retained for
one day.
001 [default_1min_for_1day]
002 pattern = .*
003 retentions = 60s:1d
The storage-aggregation configuration file is also
composed of multiple entries, which contain:
a pattern against which to match metric names;
a value for xFilesFactor;
an aggregation function.
By default, there are four entries:
Metrics ending in .min:
-- Use the min aggregation function.
-- At least 10% of data points should be present
to aggregate.
Metrics ending in .max:
-- Use the max aggregation function.
-- At least 10% of data points should be present
to aggregate.
Metrics ending in .count:
-- Use the sum aggregation function.

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

47

MONITORING WITH GRAPHITE


--

Aggregate if there is at least one data point.


Any other metrics:
-- Use the average aggregation function.
-- At least 50% of data points should be present
to aggregate.
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019

[min]
pattern = \.min$
xFilesFactor = 0.1
aggregationMethod = min
[max]
pattern = \.max$
xFilesFactor = 0.1
aggregationMethod = max
[sum]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum
[default_average]
pattern = .*
xFilesFactor = 0.5
aggregationMethod = average

The default storage schemas and storage


aggregations work well for testing, but for real
production metrics you might want to modify the
configuration files.

Modify storage schemas and


aggregation

001 [production_staging]
002 pattern = ^(PRODUCTION|STAGING).*
003 retentions =
10s:3d,1min:180d,10min:180d

Metrics that are not carbon, production, or staging


metrics are probably just test metrics. Well keep
those for only one day and assume that they will be
published every minute.
001 [default_1min_for_1day]
002 pattern = .*
003 retentions = 60s:1d

Well keep the default storage aggregation entries


but add a couple more for metrics ending in ratio,
m1_rate, and p95.
Note that any new entries should be added
before the default entry.
001
002
003
004
005
006
007
008
009
010
011
012
013
014

[ratio]
pattern = \.ratio$
xFilesFactor = 0.1
aggregationMethod = average
[m1_rate]
pattern = \.m1_rate$
xFilesFactor = 0.1
aggregationMethod = sum
[p95]
pattern = \.p95$
xFilesFactor = 0.1
aggregationMethod = max

First, lets modify the carbon entry. Wed like to keep


the metrics reported by Carbon every 60 seconds for
180 days (six months). After 180 days, Id like to roll
the metrics to a precision of 10 minutes and keep
those for another 180 days.

At this point, we have configured our Graphite back


end to match the data-point publishing rates of our
application and fully understand how the data points
are stored in the file system. In the next section, well
attempt to visualize the data.

001 [carbon]
002 pattern = ^carbon\.
003 retentions = 1min:180d,10min:180d

The Graphite webapp

At Squarespace, I use the Dropwizard framework


to build RESTful Web services. I have many of
these services running in staging and production
environments and they all use Dropwizards
Metrics library to publish application and business
metrics every 10 seconds. Here, wed like to keep
the 10-second data for three days. After three days,
the data should be aggregated to one-minute data
and kept for 180 days (six months). Finally, after six
months, the data should be aggregated to 10-minute
data and kept for 180 days.
Note that if our metrics library published data
points at a different rate, our retention definition
would need to change to match it.

48

Now that we have the back-end components up


and running and storing numeric time-series data in
the formats that we have specified, its time to take
a look at the front-end components of Graphite.
Specifically, we need a way to query and visualize the
information that is stored.
The Graphite web application is a Django
application that runs under Apache/mod_wsgi,
according to the GitHub readme file. In general, it
provides:
a URL-based API endpoint to retrieve raw data
and generate graphs;
a user interface to navigate metrics and build
and save dashboards.
The installation of graphite-web is a maze. I have
installed it multiple times - in RHEL, CentOS, Ubuntu,
and Mac OS X - and every time the steps have been
different. Treat it as a game, enjoy it, and youll know

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

that youve found the way through the maze when


all the required dependencies have been installed.
Here are installation instructions for RHEL 6.5:
001 # cd /tmp
002 # git clone https://github.com/graphiteproject/graphite-web.git
003 # cd /tmp/graphite-web
004 # python check-dependencies.py
005 [REQUIRED] Unable to import the django
module, do you have Django installed for
python 2.6.6?
006 [REQUIRED] Unable to import the pyparsing
module, do you have pyparsing module
installed for python 2.6.6?
007 [REQUIRED] Unable to import the tagging
module, do you have django-tagging
installed for python 2.6.6?
008 [OPTIONAL] Unable to import the memcache
module, do you have python-memcached
installed for python 2.6.6? This feature
is not required but greatly improves
performance.
009 [OPTIONAL] Unable to import the txamqp
module, this is required if you want to
use AMQP as an input to Carbon. Note that
txamqp requires python 2.5 or greater.
010 [OPTIONAL] Unable to import the pythonrrdtool module, this is required for
reading RRD.
011 3 optional dependencies not met. Please
consider the optional items before
proceeding.
012 3 necessary dependencies not met. Graphite
will not function until these dependencies
are fulfilled.

The goal is to install at least all of the required


dependencies. You should install the optional
dependencies if youre planning on using the AMQ
functionality or the caching functionality using
Memcache.
001
002
003
004
005
006
007
008
009
010
011

# sudo yum install cairo-devel


# sudo yum install pycairo-devel
# sudo pip install django
# sudo pip install pyparsing
# sudo pip install django-tagging
# sudo pip install python-memcached
# sudo pip install txamqp
# sudo pip install pytz
# cd /tmp/graphite-web
# python check-dependencies.py
[OPTIONAL] Unable to import the pythonrrdtool module, this is required for
reading RRD.
012 1 optional dependencies not met. Please
consider the optional items before
proceeding.
013 All necessary dependencies are met.

Weve installed enough packages to meet the


required dependencies. We can now install graphiteweb.
001
002
003
004
005

# cd /tmp/graphite-web
# sudo python setup.py install
# ls -l /opt/graphite/webapp/
total 12
drwxr-xr-x. 6 root root 4096 May 23
14:33 content
006 drwxr-xr-x. 15 root root 4096 May 23
14:33 graphite
007 -rw-r--r--. 1 root root 280 May 23
14:33 graphite_web-0.10.0_alphapy2.6.egg-info

The setup script moves the graphite-web application


files to the proper location under /opt/graphite/
webapp.

Initialize the database


The graphite-web application maintains an internal
database in which it stores user information and
dashboards. Initialize the database by running the
following:
001 # cd /opt/graphite
002 # export PYTHONPATH=$PYTHONPATH:`pwd`/
webapp
003 # django-admin.py syncdb
--settings=graphite.settings
004 You just installed Djangos auth
system, which means you dont have
any superusers defined.
005 Would you like to create one now?
(yes/no): yes
006 Username (leave blank to use root):
feangulo
007 Email address: feangulo@yaipan.com
008 Password:
009 Password (again):
010 Error: Blank passwords arent
allowed.
011 Password:
012 Password (again):
013 Superuser created successfully.
014 Installing custom SQL ...
015 Installing indexes ...
016 Installed 0 object(s) from 0
fixture(s)

The following creates a new database and stores it in


the /opt/graphite/storage directory:
001 # ls -l /opt/graphite/storage/
graphite.db
002 -rw-r--r--. 1 root root 74752 May
23 14:46 /opt/graphite/storage/
graphite.db

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

49

MONITORING WITH GRAPHITE


Graphite webapp settings
The configuration file containing the graphitewebapp settings is located in the /opt/graphite/
webapp/graphite folder. Lets copy the sample
configuration file.
001 # cd /opt/graphite/webapp/graphite
002 # cp local_settings.py.example local_
settings.py
Now lets make some customizations to our settings:
001 # vi /opt/graphite/webapp/graphite/
local_settings.py
002 #########################
003 # General Configuration #
004 #########################
005 TIME_ZONE = UTC
006 ##########################
007 # Database Configuration #
008 ##########################
009 DATABASES = {
010
default: {
011
NAME: /opt/graphite/
storage/graphite.db,
012
ENGINE: django.
db.backends.sqlite3,
013
USER: ,
014
PASSWORD: ,
015
HOST: ,
016
PORT:
017
}
018 }
By following previous instructions, you should
only have one carbon-cache process running on
port 2003 with a query port on 7002. These are the
defaults expected by the graphite-webapp so you
have no other changes to make to the configuration
file.
001 # ps -efla | grep carbon-cache
002 1 S root
14101
1 0 80
0 75955 ep_pol May20 ?
00:00:26
/usr/bin/python ./carbon-cache.py
start
003 # netstat -nap | grep 2003
004 tcp
0
0 0.0.0.0:2003
0.0.0.0:*
LISTEN
14101/python
005 # netstat -nap | grep 7002
006 tcp
0
0 0.0.0.0:7002
0.0.0.0:*
LISTEN
14101/python
However, you could explicitly specify which carboncache process to read from in the settings file:
001 # vi /opt/graphite/webapp/graphite/
local_settings.py
002 #########################
003 # Cluster Configuration #
004 #########################
005 CARBONLINK_HOSTS =
[127.0.0.1:7002:a]

50

This means that you have a carbon-cache process


running locally, with the query port set to 7002
and the name set to a. If you look at the Carbon
configuration file, you should see something like this:
001
002
003
004
005
006

# vi /opt/graphite/conf/carbon.conf
[cache]
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2003
CACHE_QUERY_INTERFACE = 0.0.0.0
CACHE_QUERY_PORT = 7002

Where did the a come from? Thats the default name


assigned. To define more caches, youd need to create
additional named sections in the configuration file.
001
002
003
004
005

[cache:b]
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2004
CACHE_QUERY_INTERFACE = 0.0.0.0
CACHE_QUERY_PORT = 7003

The Graphite webapp comes with dashboard


and graph template defaults. Copy the sample
configuration files:
001 # cd /opt/graphite/conf
002 # cp dashboard.conf.example
dashboard.conf
003 # cp graphTemplates.conf.example
graphTemplates.conf

Lets modify the dashboard configuration file to have


larger graph tiles.
001 # vi /opt/graphite/conf/dashboard.
conf
002 [ui]
003 default_graph_width = 500
004 default_graph_height = 400
005 automatic_variants = true
006 refresh_interval = 60
007 autocomplete_delay = 375
008 merge_hover_delay = 750

And lets modify the default graph template to have


a black background and a white foreground. Well
also choose a smaller font.
001 # vi /opt/graphite/conf/
graphTemplates.conf
002 [default]
003 background = black
004 foreground = white
005 minorLine = grey
006 majorLine = rose

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

Run the Web application


We are finally ready to run the Web application. Were
going to run it on port 8085 but we may set the port
to any value wed like. Run the following commands:
001 # cd /opt/graphite
002 # PYTHONPATH=`pwd`/storage/whisper ./
bin/run-graphite-de
003 vel-server.py --port=8085
--libs=`pwd`/webapp /opt/graphite
1>/opt/graphite/storage/log/webapp/
process.log 2>&1 &
004 # tail -f /opt/graphite/storage/log/
webapp/process.log

Open a Web browser and point it to http://yourip:8085. Make sure that the Graphite webapp loads. If
youre tailing the process.log file, you should be able
to see any resources that are loaded and any queries
that are made from the Web application. (Image 1)
In a previous section, we had published a
couple of metrics to the carbon-cache using the
netcat command. Specifically, we had published the
following:
001 carbon.agents.graphite-tutorial.
metricsReceived
002 carbon.agents.graphite-tutorial.
creates
003 PRODUCTION.host.graphite-tutorial.
responseTime.p95

The Web application displays metrics as a tree. If we


navigate the metric tree in the left panel, we should
be able to see all of these metrics.

You may click on any metric and it will be graphed (it


shows the past 24 hours by default) in the panel on
the right. To change the date range to query, use the
buttons in the panel above the graph.
The default view is great for quickly browsing and
visualizing metrics, but to build a dashboard, point
your browser to http://your-ip:8085/dashboard. The
top portion of the page is another way to navigate
your metrics. You can either click on the options to
navigate or start typing to get suggestions. If you
click on a metric, the corresponding graph tile will
appear in the bottom section. As you keep clicking
on new metrics, additional tiles appear in the panel
below thereby creating a dashboard.
At times, you might want to display multiple
metrics in a single graph. To do this, drag and drop

Image 1
DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

51

MONITORING WITH GRAPHITE

Image 2

a tile on top of another one and the metrics will


be graphed together. You may also change the
position of the tiles in the layout by dragging
them around. (Image 2)
The user interface looks simple, but it lets
you perform powerful operations on your metric
data. If you click on one of the graph tiles, you get
a dialogue that displays the list of metrics being
graphed, which you may directly edit. You have
multiple menus in the dialogue for applying
functions to the data, changing aspects of the
visualization, and many other operations. (Image
3)
You may also configure and save your
dashboard, load other dashboards, change the
date range of the current dashboard, and share
a dashboard, among other things, using the
top-most menu. By far my favorite thing is the
Dashboard -> Edit Dashboard feature. It saves
me a lot of time when I need to create or modify
dashboards.(Image 4)
To illustrate, lets build a dashboard
to monitor the carbon-cache process. As
mentioned, Carbon processes report internal
metrics. I dont like to build dashboards manually,
preferring to use the Edit Dashboard feature.
52

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

Image 3

Image 4
Lets build a dashboard to monitor the carbon-cache process. This dashboard will monitor all carbon-cache
processes that we have running. Notice the use of the asterisk (*) in the metric name to match all values following
the carbon.agents prefix.
We specify the following in the Edit Dashboard window.
001 [
002
{
003
target: [
004
aliasByNode(carbon.agents.*.metricsReceived,2)
005
],
006
title: Carbon Caches - Metrics Received
007
},
008
{
009
target: [
010
aliasByNode(carbon.agents.*.creates,2)
011
],
012
title: Carbon Caches - Create Operations
013
},
014
{
015
target: [
016
aliasByNode(carbon.agents.*.cpuUsage,2)
017
],
018
title: Carbon Caches - CPU Usage
019
},
020
{
021
target: [
022
aliasByNode(carbon.agents.*.memUsage,2)
023
],
024
title: Carbon Caches - Memory Usage
025
}
026 ]
DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

53

MONITORING WITH GRAPHITE

Image 5

Update the dashboard definition and you should


now see something like this (Image 5)
Changing content in the Edit Dashboard
dialogue updates the dashboard on the browser
but does not save
it to Graphites
internal database
of
dashboards.
You need to save
the dashboard to
share it or open it
later.

To look up the dashboard, open the Finder:

On a production Graphite installation, the Graphite


aches dashboard would look more like this (Image 6)

Its all about the API


Graphite has some drawbacks, like any other tool. It
doesnt scale well and its storage mechanism isnt the
most optimal - but Graphites API is a beauty. Having
a user interface is nice, but most important is that
whatever you can do through the UI, you can also
accomplish via graphite-web API requests. Users are
able to request custom graphs by building a simple
URL. The parameters are specified in the query string
54

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

Image 6

of the HTTP GET request. By default, a PNG image


is returned as the response, but the user may also
indicate the required format of the response - for
example, JSON data.
Sample request #1
Metric: CPU usage of all carbon-cache processes.
Graph dimensions: 500x300.
Time range: 12 hours ago to 5 minutes ago.
Response format: PNG image (default). (right)
http://your-ip:8085/render?target=carbon.agen
ts.*.cpuUsage&width=500&height=300&from=12h&until=-5min

Sample request #2
Metric: CPU usage of all carbon-cache processes.
Graph dimensions: 500x300.

Time range: 12 hours ago to 5 minutes ago.


Response format: JSON data. (below)
http://your-ip:8085/render?target=carbon.agen
ts.*.cpuUsage&width=500&height=300&from=12h&until=-5min&format=json

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

55

MONITORING WITH GRAPHITE


Graphites API supports a wide variety of display
options as well as data-manipulation functions
that follow a simple functional syntax. Functions
can be nested, allowing for complex expressions
and calculations. View the online documentation
to peruse all of the available functions at http://
graphite.readthedocs.org/en/latest/functions.html
Lets say you have an application that runs
on hundreds of servers each of which publishes its
individual p95 response times every 10 seconds.
Using functions provided by the API, you could
massage the metrics and build an informative graph:
averageSeries: computes the average of all the
values in the set.

Lets us see the mean p95 latency.


scale: multiplies a value by a constant.
-- Latencies are reported in milliseconds, but we
want to display them in seconds.
alias: changes the name of the metric when
displaying.
-- Instead of the metrics full name, we want only
avg p95 in the graph legend.
The argument passed as part of the
metric
query
to
the
API
would
be:
alias(scale(averageSeries(PRODUCTION.
host.*.requests.p95),0.001),avg p95)
The API would return the following graph:

Congratulations! You have installed and


configured Carbon, Whisper and the Graphite
webapp. Youve published metrics, navigated
metrics, and built a dashboard. You can now build
your own awesome dashboards for your business
and application metrics.

This was an introductory article on Graphite.


For advanced topics see:
Stress Testing Carbon Caches
Carbon Aggregators
Graphite Querying Statistics on an ELK Stack

56

--

DevOps Toolchain for Beginners // eMag Issue 23 - Feb 2015

PREVIOUS ISSUES

21

Continuous Delivery
Stories

Reaping the benefits of continuous delivery is hard work!


Culture, processes or technical barriers can challenge or
even break such endeavors. With this eMag we wanted
to share stories from leading practitioners whove been
there and report from the trenches. Their examples
are both inspiring and eye opening to the challenges
ahead.

Infrastructure
Configuration
Management Tools

22

Web APIs:
From Start to Finish

Designing, implementing, and maintaining APIs for the


Web is more than a challenge; for many companies, it is
an imperative. This eMag contains a collection of articles
and interviews from late 2014 with some of the leading
practictioners and theorists in the Web API field. The material
here takes the reader on a journey from determining the
business case for APIs to a design methodology, meeting
implementation challenges, and taking the long view on
maintaining public APIs on the Web over time.

20
19

Infrastructure configuration management tools are


one of the technical pillars of DevOps. They enable
infrastructure-as-code, the ability to automate your
infrastructure provisioning.

Automation in the Cloud


and Management at Scale

In this eMag, we curated a series of articles that look at


automation in the cloud and management at scale. We
spoke with leading practitioners who have practical,
hands-on experience building efficient scalable
solutions that run successfully in the cloud.

Das könnte Ihnen auch gefallen