You are on page 1of 118

Junos® Networking Technologies

DAY ONE: SCALING BEYOND A SINGLE


JUNIPER SRX IN THE DATA CENTER

When you can no longer upgrade to a


higher capacity firewall, or add additional
Service Processing Cards (SPCs), the only
other choice is to scale horizontally. Here’s
a proof of concept for doing just that.

By Douglas Hanks Jr.


DAY ONE: SCALING BEYOND A SINGLE JUNIPER SRX
IN THE DATA CENTER

The traditional approach of putting dedicated firewalls within a given physical loca-
tion in order to provide security services is indeed capable of scaling, but it comes at a
cost. Furthermore, within large-scale data center networks, the traditional approach to
securing data using firewall clusters isn’t often suitable because the data has grown to
proportions that no single firewall cluster is capable of handling.

Day One: Scaling Beyond a Single Juniper SRX in the Data Center elegantly addresses the
problem and provides unique insight into how to provide security to outbound traffic at
levels that can scale to meet the needs of even the largest networks. Follow along with
this proof of concept and get the configuration for doing so at the end.

“Scaling network security infrastructure can be a very challenging endeavor. This book cites
potential solutions to these challenges and offers an elegant architecture, one that allows
large scale and to add capacity rapidly with minimal effort.”
Daniel Sullivan, Senior Security Engineer, Zynga

IT’S DAY ONE AND YOU HAVE A JOB TO DO, SO LEARN HOW TO:
„ Understand the concept of scaling traffic beyond a single Juniper SRX firewall.
„ Articulate the difference between ECMP and Filter-based Forwarding (FBF).
„ Understand the use cases that drive the requirements for ECMP or FBF.
„ Perform per-flow load balancing in the master instance while preserving the per-prefix load
balancing within routing instances.
„ Configure static routes and qualified next-hops that use BFD for liveness detection.
„ Understand how hash calculations in the Forwarding Information Base (FIB) can impact your
network.

Juniper Networks Books are singularly focused on network productivity and efficiency. Peruse the
complete library at www.juniper.net/books.

Published by Juniper Networks Books

ISBN 978-1936779468
51600

9 781936 779468 07100153


Junos Networking Technologies
®

Day One: Scaling Beyond a Single Juniper SRX


in the Data Center

By Douglas Hanks Jr.

Chapter 1: The Challenge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Chapter 2: The Test Bed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Chapter3: Equal Cost Multi-Path (ECMP) Routing. . . . . . . . . . . . . . . . . . . . . . 53

Chapter 4: Filter-Based Forwarding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

Chapter 5: Proof of Concept. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Appendix: Device Configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87


ii

© 2012 by Juniper Networks, Inc. All rights reserved. About the Author
Douglas Richard Hanks Jr. is a Senior Systems Engineer
Juniper Networks, the Juniper Networks logo, Junos, with Juniper Networks. He is certified in Juniper
NetScreen, and ScreenOS are registered trademarks of Networks as JNCIE-ENT #213 and JNCIE-SP #875.
Juniper Networks, Inc. in the United States and other Douglas’ interests are network engineering and
countries. Junose is a trademark of Juniper Networks, architecture for both Enterprise and Service Provider
Inc. All other trademarks, service marks, registered routing and switching.
trademarks, or registered service marks are the property
of their respective owners. Author’s Acknowledgments
Thanks, Dad. This book is for you.
Juniper Networks assumes no responsibility for any
inaccuracies in this document. Juniper Networks reserves
the right to change, modify, transfer, or otherwise revise
this publication without notice. Products made or sold by
Juniper Networks or components thereof might be
covered by one or more of the following patents that are
owned by or licensed to Juniper Networks: U.S. Patent
Nos. 5,473,599, 5,905,725, 5,909,440, 6,192,051,
6,333,650, 6,359,479, 6,406,312, 6,429,706,
6,459,579, 6,493,347, 6,538,518, 6,538,899,
6,552,918, 6,567,902, 6,578,186, and 6,590,785. ISBN: 978-1-936779-46-8 (print)
Printed in the USA by Vervante Corporation.
Published by Juniper Networks Books
Author: Douglas Hanks Jr. ISBN: 978-1-936779-47-5 (ebook)
Technical Reviewers:
Stefan Fouant, Juniper Networks Version History: v1 April 2012
Dathen Allen, Juniper Networks 2 3 4 5 6 7 8 9 10 #7100153-en
Daniel Sullivan, Zynga
Artur Makutunowicz <artur@makutunowicz.net> This book is available in a variety of formats at:
Justin Smith, Armature Systems www.juniper.net/dayone.
Editor in Chief: Patrick Ames
Editor and Proofer: Nancy Koerbel Send your suggestions, comments, and critiques by email
J-Net Community Manager: Julie Wider to dayone@juniper.net.
iii

Welcome to Day One


This book is part of a growing library of Day One books produced and
published by Juniper Networks Books.
Day One books were conceived to help you get just the information that
you need on day one. The series covers Junos OS and Juniper Networks
networking essentials with straightforward explanations, step-by-step
instructions, and practical examples that are easy to follow.
The Day One library also includes a slightly larger and longer suite of
This Week books, whose concepts and test bed examples are more
similar to a weeklong seminar.
You can obtain either series in multiple formats:
„„ Download a free PDF edition at http://www.juniper.net/dayone.
„„ Get the ebook edition for iPhones and iPads from the iTunes Store.
Search for Juniper Networks Books.
„„ Get the ebook edition for any device that runs the Kindle app
(Android, Kindle, iPad, PC, or Mac) by opening the the Kindle app
on your device and going to the Kindle Store. Search for Juniper
Networks Books.
„„ Purchase the paper edition at either Vervante Corporation (www.
vervante.com) or Amazon (www.amazon.com) for between
$12-$28, depending on page length.
„„ Note that Nook, iPad, and various Android apps can also view
PDF files.
„„ If your device or ebook app uses .epub files, but isn't an Apple
product, open iTunes and download the .epub file from the iTunes
Store. You can then drag and drop the file out of iTunes onto your
desktop and sync with your .epub device.
iv

What You Need to Know Before Reading This Book


The reader is expected to have extensive previous hands-on experience
working with the Junos operating system and network devices. The
majority of this book deals with actual Junos configuration.
It’s beneficial, though not required, for the reader to hold a JNCIS
certification in Enterprise and Security tracks from Juniper Networks.
Topics in this book build on the basics found in the JNCIS material and
extend into the intermediate to expert level of configuration.
This book makes use of the Juniper MX, EX, and SRX devices.
Knowledge of routing, switching, and firewalls will be required.

NOTE To perform stateful traffic load generation, this book leveraged IXIA
IxLoad hardware and software; all measurements and reports were
generated using this tool.

After Reading This Book, You’ll Be Able To...


„„ Understand the concept of scaling traffic beyond a single Juniper
SRX firewall.
„„ Articulate the difference between ECMP and Filter-based
Forwarding (FBF).
„„ Understand the use cases that drive the requirements for ECMP
or FBF.
„„ Perform per-flow load balancing in the master instance while
preserving the per-prefix load balancing within routing instances.
„„ Configure static routes and qualified next-hops that use BFD for
liveness detection.
„„ Understand how hash calculations in the Forwarding Informa-
tion Base (FIB) can impact your network.
v

Why Scale the SRX Beyond a Single Box?


If you’re in a scenario where you can no longer upgrade to a higher
capacity firewall, or add additional Service Processing Cards (SPCs),
the only other choice is to stop scaling vertically and begin scaling
horizontally. Although the flagship Juniper SRX5800 can support over
150Gbps of traffic, it’s measured as large packet. Typically most
deployments use a method called Internet MIX (IMIX), as this mea-
sures traffic using a mixture of different packet sizes of 64, 570, and
1518 bytes, in a ratio of 7:4:1.
As of Junos 10.4, the Juniper SRX5800 supports about 47.5Gbps of
IMIX firewall processing when using two line cards and ten SPCs.
Because the Juniper SRX represents a finite amount of processing
power, each operation has a different set of performance characteris-
tics. It’s common to have a scenario where the traffic doesn’t exceed the
maximum 47.5Gbps of IMIX firewalling, but instead hits a ceiling of
the maximum IDP throughput or maximum number of sessions, first.
Use cases such as ingress web traffic that exceeds the performance
characteristics of a single Juniper SRX, could easily be handled by load
balancers and additional Virtual IPs (VIP) that can split the traffic into
smaller chunks and can be handled by multiple firewalls. However
other use cases, such as egress Internet traffic, cannot use the load
balancer method to split up traffic.
Such a problem requires another solution to split the stateful traffic
into smaller chunks, and the two methods to split stateful traffic are
Equal Cost Multi-Path (ECMP) routing and Filter-Based Forwarding
(FBF).
This book covers both the ECMP and the FBF solutions. As with
anything in life, each solution has its own set of benefits and draw-
backs. There are particular use cases that are more suited to the use of
ECMP, while other use cases will require the use of FBF.
But there are also a couple of beneficial side effects to using multiple
SRXs. The first is that you no longer have to put all your eggs in one
basket when providing firewall services. Each additional firewall
creates a smaller maintenance domain and reduces risk.
For example, if you had to provide firewall services for 100Gbps of
IMIX traffic with four SRXs, each SRX would handle about 25Gbps
vi

or 25% of the total traffic. If one of the SRXs were to fail, only a subset
of the traffic would be impacted and need to be redirected to another
firewall.
Another benefit to using multiple SRXs is that you’re able to tightly
control what traffic flows through which firewall. For example, you
can split all egress HTTP traffic across all four standalone firewalls,
but a special subset of egress traffic can be directed off to a dedicated
SRX cluster for maximum redundancy. In effect, you can create and
control Service Level Agreements (SLA) with a pool of firewalls that
have specific functions for performance or redundancy.
Douglas Hanks Jr., April 2012
Foreword

With the promise of cloud computing and a move to centralize re-


sources, Data Center networks are growing at an incredible rate. In
order to deal with the needs of today and continue to future-proof the
network such that it is capable of scaling to the needs of tomorrow, a
different way of thinking is required, and different models need to be
used to provide scalability. At the center of these new models is
virtualization.
The promise of virtualization, needs to be realized not just in servers
but also in the surrounding network and security gear. For far too long
the discussion surrounding virtualization has revolved around separat-
ing server functionality from the underlying physical hardware. The
idea behind this separation is to decouple services from discrete
physical devices and locations. But it also goes beyond just server
applications to the applications that are provided to the network itself,
such as security services.
The traditional approach of putting dedicated firewalls within a given
physical location in order to provide security services is indeed capable
of scaling, but it comes at a cost – it is exorbitantly expensive and often
times results in wasted resources. Furthermore, when talking about
large-scale Data Center networks, the traditional approach to securing
data using firewall clusters isn’t often suitable because the data has
grown to proportions that no single firewall cluster is capable of
handling. As such, new and interesting ways of load-balancing the
traffic across a large number of firewall devices are being looked at by
data center architects to scale the network to previously unheard of
proportions.
While much literature exists throughout the industry today concerning
methods for scaling traffic destined into a data center, very little exists
that discusses how to load balance the traffic in the opposite direction.
However, in a growing number of business cases, the traffic may
actually originate inside the data center and be destined for hosts that
exist outside the Data Center. The traditional method of using load-
balancers or clustered firewalls wasn’t designed for this type of model
where the majority of the traffic is going out.
This Day One book elegantly addresses this situation and provides
unique insight into how to provide security to outbound traffic at
viii

levels that can scale to meet the needs of even the largest organizations.
Two different approaches are outlined, giving the network architect
multiple options for load-balancing traffic while addressing the
concerns regarding physical placement of such devices at the same
time. The approaches outlined in this guide give network architects
new options to address the virtualization of security services, showing
how the strict placement of firewall devices is no longer required to
achieve security at scale.
Furthermore, what makes this Day One book so invaluable is that it is
backed up by proven research – not only has Doug covered the theo-
retical aspects of these different design approaches in addition to
providing the required configurations – he backs it all up with testing
using traffic generators to gauge latency and other performance
metrics under steady-state and failure scenarios.
For those who are working in large scale data center networks, this
Day One book will prove to be an invaluable asset covering aspects
that have been largely ignored by much of the literature today. But it’s
a must have for network architects or designers responsible for
building out large-scale data center networks. Doug’s expertise and his
clear writing elucidate a complex subject and distill it in a way that is
easy to digest and understand.
Stefan Fouant
April 2012, Ashburn, Virginia

Stefan Fouant is a technical trainer at Juniper Networks and has helped hundreds of
engineers earn their certifications. He is JNCIE-SEC, JNCIE-SP, JNCIE-ER, and JNCI.
Chapter 1

The Challenge

The Challenge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Dealing with Traffic at Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
10 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

Scaling ingress Internet traffic is a well understood problem that has


been solved many times over. When dealing with ingress traffic that’s
destined to your network, it’s much easier to control the flow with
traditional methods such as DNS round-robin, load balancing, and
scaling the number of VIPs.
In the example shown in Figure 1.1, the Traffic0 represents ingress
Internet traffic that’s destined to your network.

Figure 1.1 Illustration of Demultiplexing Ingress Internet Traffic

As shown, the first stage of demultiplexing is to break up the traffic


with DNS. Suppose the traffic was destined for example.com; the DNS
round-robin would contain N VIPs and return an incremented value to
each client until it reached the final VIP, then it would start all over
again from the first VIP. For the sake of clarity, Figure 1.1 is simplified
and does not include the client to DNS lookups, but instead focuses on
how the traffic is demultiplexed at different stages.

NOTE There are many different types of Stage 1 / DNS solutions on the
market such as managed DNS, geographically-aware DNS, anycast,
and many others. To keep things simple, let’s just use DNS round-robin
in this example.

The second stage of demultiplexing typically happens with load


balancers. Now that the traffic has been broken down into multiple
VIPs, there is a load balancer configured with the destination address
of VIP. The load balancer will then have N pools defined to handle the
traffic destined for each VIP.
Let’s assume that Traffic0 represents 100Gbps of IMIX ingress Internet
traffic. The DNS round-robin has a pool of ten VIPs that it will return
uniformly to clients. Each VIP is handled by a load balancer that is
configured with a pool of ten web servers. Assuming uniform distribu-
tion, each web server will only have to handle 1Gbps of traffic.
Chapter 1: The Challenge 11

Although Traffic0 represents 100Gbps of IMIX ingress Internet traffic


– because the destination address is known in advance – it’s very easy
to break up the traffic into smaller pieces that are handled by a stateful
firewall.

The Challenge
The more interesting challenge is providing high-scale firewall services
for egress Internet traffic where the destination of the traffic isn’t
known in advance. Traditional methods such as DNS round-robin and
traditional load balancing do not apply to this type of traffic pattern
and require a different approach.
The first problem is that the rate of Traffic0 exceeds the capacity of a
single Juniper SRX. What’s required is a method to take a large stream
of traffic and break it out deterministically into multiple outputs that
are mapped to a particular Juniper SRX. This can be accomplished
with demultiplexing and multiplexing as shown in Figure 1.2.

!
Figure 1.2 Illustration of a Demultiplexer and Multiplexer

The second problem is demultiplexing a large amount of traffic that


has a destination that isn’t known in advance. In addition, the demulti-
plexing algorithm used on egress and return traffic needs to be invert-
ible. For example, in Figure 1.2 assume that Traffic0 contained Flow0
through FlowN. If Flow0 was mapped to Output2 when egressing the
network, the same needs to be true when handling the return traffic
associated with Flow0. The demultiplexer handling the return traffic
needs to ensure that Flow0 is mapped to Output2.
12 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

Dealing with Traffic at Scale


Perhaps the first question to ask when taking a look at 100Gbps of
IMIX egress Internet traffic is, where does it come from? The answer is
just a matter of scale. Figure 1.3 illustrates a sample data center with
20,000 servers. The math works out to only 50Mbps per server to
reach the 100Gbps of egress Internet traffic.

Figure 1.3 Data Center with 20,000 Servers and 100Gbps IMIX Egress Internet Traffic

When working at large-scale, the details matter and have a large


impact on the overall aggregate. For example, if each server grew from
50Mbps to only 100Mbps, that would result in an aggregate of
200Gbps of traffic.

Traditional Firewall Placement


It’s common to place a firewall at logical boundaries in a network
where the traffic has already been reduced. For example, in Figure 1.3
it would make sense to place a firewall inside of each POD handling
5,000 hosts. Assuming uniform distribution, each POD would then
have to handle 25Gbps of egress Internet traffic.
Chapter 1: The Challenge 13

There are a couple of drawbacks to using this approach. The first is


that each POD requires a dedicated firewall, and as the network grows,
it will require a firewall for every POD. The second drawback is that
this approach assumes uniform distribution, which is not always the
case. For example, it’s common to have a subset of applications that
require more bandwidth than other applications. This could lead to
POD-3 requiring 70Gbps of traffic while POD-1, POD-2, and POD-4
only require 10Gbps each. It would create interesting challenges when
trying to size a firewall for each POD. While it’s simple enough to
measure the bandwidth at any point in time, it’s difficult to predict
what the bandwidth requirements will be next month, next quarter, or
next year.
Another drawback when implementing a firewall per POD is that this
approach would require many firewalls to manage.

An Alternative Approach
One alternative approach would be to break the ratio of one firewall
per POD, and associate the number of firewalls to the amount of
aggregate egress traffic. For example, let’s assume that one SRX5800
can provide stateful processing of 47.5Gbps of IMIX traffic. If the
amount of egress Internet traffic is 100Gbps, this would only require
three SRX5800 firewalls which would provide a pool of firewall
resources that’s capable of over 140Gbps, like in Figure 1.4. In this
specific example, you can reduce the total amount of firewalls by 25%.

Figure 1.4 An Alternative Approach to Scale the Firewall Performance to the Total
Bandwidth Aggregate

This alternative approach grants the network the opportunity to treat


the firewalling services as a pool of resources, which has many distinct
advantages, such as:
„„ Eliminating being tied to scaling the number of firewalls per
POD.
14 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

„„ Granting the ability to scale the firewall performance directly to


the total amount of traffic needing firewall services.
„„ Reducing the CapEx and OpEx by requiring less hardware and
using it more efficiently.
„„ Distributing the egress Internet traffic uniformly across all
firewalls as an aggregate.
„„ Creating specialized firewall pools that are optimized for perfor-
mance or redundancy. Traffic can then be steered to different
firewall pools depending on the level of performance or redun-
dancy required.
The alternative approach requires placing the pool of firewalls higher
into the network design and away from the PODs. Figure 1.5 illus-
trates the new position of the firewalls in the data center.
Here, firewalls are no longer required for each POD, but are instead
pooled together as a collective resource that’s available to all PODs.
There’s no limit to the number of types of firewall pools. In our sample
shown in Figure 1.5, there are two firewall pools. The first pool is
optimized for performance while the other pool is optimized for
redundancy. The CORE is responsible for directing traffic into the pool
of firewalls based on need. For example, there could be VOIP traffic in
POD-1 that requires firewall performance, while POD-2 and POD-3
terminate remote access SSL VPN traffic that requires firewall redun-
dancy.
The two solutions reviewed in this book, that allow the CORE to
direct traffic to firewall pools based on need, are ECMP and FBF.

Summary
When there’s a requirement to provide firewall services to traffic at a
large-scale, you must consider the behavior of the traffic and what’s
known in advance. When looking at ingress traffic the destination
network is always known. Using the destination address it’s possible to
use traditional methods such as DNS round-robin and load balancers
to break the traffic down into manageable streams and apply firewall
services.
Egress Internet traffic has different characteristics. Generally the
destination network isn’t known in advance, because the number of
routable addresses in the Internet is very large. However, what is
known in advance is the source address, because the egress Internet
traffic in a data center is originated locally.
Chapter 1: The Challenge 15

Figure 1.5 Firewall Placement in the Alternative Approach

Armed with this information, it becomes apparent that methods used


to break down ingress Internet traffic cannot be used when dealing
with egress Internet traffic. The traditional approach is to place a
firewall in every server POD and provide localized firewall services.
There are a few drawbacks to this method as it ties the number of
firewalls directly to the number of PODs and increases the CapEx and
OpEx. It becomes increasingly hard to manage the firewalls in each
POD, because each POD will require different bandwidth today versus
tomorrow. If the traffic is not uniform across the entire data center,
some PODs will require less bandwidth than others. This inherently
creates inefficiency in the architecture because the cost to secure 1Gbps
of traffic is dependent on a per POD basis.
For example, let’s assume that POD-1 requires 50Gbps of traffic while
POD-2 requires 80GBps of traffic. Given that each SRX5800 can
support about 47.5Gbps of IMIX traffic when providing security
16 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

policy, both POD-1 and POD-2 will require two firewalls each. To
make the math easy, let’s assume that each SRX5800 costs $100,000
each. POD-1 only has 50Gbps of traffic, which costs $4,000 per 1Gbps
to secure. However POD-2 requires 80Gbps of traffic, which costs
about $2,500 per 1Gbps to secure. Between POD-1 and POD-2 the
average cost per 1Gbps is $3,250.
An alternative approach would be to remove the firewalls from the
PODs and instead pool the firewalls together higher in the network
architecture. This creates a common pool of firewall services that can
be consumed by any POD. Using the same numbers from before,
POD-1 and POD-2 require a combined 130Gbps of egress traffic. This
only requires three SRX5800 firewalls. This represents about $2,300
to secure 1Gbps of traffic compared to the previous $3,250 to secure
traffic on a per POD basis.
The centralized architecture is more efficient because the distribution is
uniform. And, as the traffic requirements increase, it will increase
evenly across all firewalls in the pool. There are, however, some
drawbacks to a centralized approach, and the details and caveats will
be openly discussed at length in subsequent chapters.
Let’s get the test bed working, so we can see for ourselves.
Chapter 2

The Test Bed

Physical Topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Layer 2 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Layer 3 Topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

IS-IS Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

BGP Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Traffic Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Return Traffic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
18 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

This chapter introduces the test bed used to verify the centralized
firewall architecture explained in Chapter 1. Its goals are to verify that
the architecture is functional, what conditions will trigger a failure,
and how the traffic is impacted during a failure scenario. Several
components are needed:
„„ Border router
„„ Core switch
„„ Aggregation switch
„„ Firewalls
„„ A stateful testing and load device

NOTE The number of components in the book’s topology have been scaled
back since, as noted above, the only goal is to test the functionality and
failure conditions of the centralized firewall architecture.

Physical Topology
The physical topology is comprised of nine devices: (4) SRX5800s, (2)
EX4500s, (1) EX8208, (1) MX240, and (1) IXIA chassis running
IxLoad for testing. All devices are connected with 2x10GE connections
running IEEE 802.3ad. Figure 2.1 shows the actual physical topology
used for testing in this book.
To keep the topology simple, all redundant devices have been removed,
except where absolutely necessary in order to demonstrate the func-
tionality of centralized firewall architecture. There is a single border
router, an aggregation switch, and a core switch. However the focus of
the testing revolves around the firewall pool, so (4) SRX5800s have
been included into the topology.
Chapter 2: The Test Bed 19

!
Figure 2.1 The Physical Topology
20 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

Aggregation
Every device connects into the EX4500-VC, which is acting as the
aggregation switch. This switch isn’t required, but it provides a nice
mechanism to simplify the topology and provide 10GE port density.

Figure 2.2 Aggregation Switch: Juniper EX5400-VC

One of the core requirements when scaling a network horizontally is to


provide connectivity between the devices being scaled. An easy way to
meet this requirement is to place the devices into the same broadcast
domain.

NOTE This book does not debate whether Layer 2 or Layer 3 is a better
mechanism for horizontal connectivity, and leaves it as an exercise for
the reader.

The aggregation switch in this topology is simply providing Layer 2


bridging; there’s no routing and no protocol being used. In a way, it’s
forcing a hub and spoke physical topology.

Border
The role of the border router is to provide network connectivity to
upstream transit providers; in the physical topology this is the MX240.
It’s connected into the EX4500-VC via 2x10GE ports using IEEE
802.3ad.

Figure 2.3 Border Router: Juniper MX240


Chapter 2: The Test Bed 21

There are also 2x10GE ports that are connected to the IXIA; however
these are configured as Layer 2 access ports and use an irb interface to
peer with IXIA.

Firewalls
This book uses (4) SRX5800 in the physical topology. No clustering
will be used in this topology and the firewalls will be operated in
standalone mode. Each firewall is connected into the aggregation
switch via IEEE 802.3ad and IEEE 802.1Q.

Figure 2.4 Firewall: Juniper SRX5800

The purpose of IEEE 802.1Q is to enable the firewall to have a net-


work on two networks: TRUST and UNTRUST. These networks will
be explained in more detail later in this chapter.

Core
The core switch in the topology is represented by the Juniper EX8200.
Its role is to provide both Layer 2 and Layer 3 services. It plays a
critical role in the demultiplexing of egress traffic and the decision
process of how traffic is mapped to firewalls.

Figure 2.5 Core Switch: Juniper EX8200

Two 10GE ports are connected via IEEE 802.3ad to the aggregation
switch. These ports are configured as a Layer 2 trunk port and peer
with other devices via an irb interface.
22 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

Test Device
The testing device is an IXIA chassis running the IxLoad software.
Although the topology shows two separate testing devices, in reality it
was a single chassis with multiple ports connecting into the topology;
two ports on the top acting as an HTTP Server and two ports on the
bottom acting as an HTTP Client.

Layer 2 Topology
Four major VLANs are used in this book’s topology: IXIA, TRUST,
UNTRUST, and DC. Each VLAN represents a logical separation in the
network and partitions each device by function and responsibility. The
TRUST and UNTRUST VLANs are specifically designed to work well
with the Juniper SRX security zone architecture. Figure 2.6 illustrates
the Layer 2 topology.

IXIA VLAN
The IXIA VLAN only exists on two ports on the MX240. This allows
the IXIA device to have two physical ports connected into the MX240
and to use the same network subnet on each physical port.
Let’s take a look at the interface configuration on the MX240 for the
IXIA VLAN:
interfaces {
xe-0/2/0 {
encapsulation ethernet-bridge;
unit 0 {
family bridge;
}
}
xe-0/3/0 {
encapsulation ethernet-bridge;
unit 0 {
family bridge;
}
}
irb {
unit 100 {
family inet {
address 10.7.7.1/24;
}
}
}
}
Chapter 2: The Test Bed 23

Figure 2.6 Layer 2 Topology


24 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

Interfaces xe-0/2/0 and xe-0/3/0 are configured as vanilla access ports.


This allows the IXIA Server to have two IP addresses on the 10.7.7/24
network to accept test traffic.

NOTE If you’re new to bridging on the Juniper MX, be sure to check out a
forthcoming book to be published by O’Reilly Media in Q3 of 2012:
Juniper MX Series: A Practical Guide to Trio Technologies.

Now, let’s take a look at the VLAN definition:


bridge-domains {
IXIA {
vlan-id 100;
interface xe-0/2/0.0;
interface xe-0/3/0.0;
routing-interface irb.100;
}
}

The IXIA VLAN is configured with a vlan-id of 100 and includes the
xe-0/2/0 and xe-0/3/0 interfaces. The routed interface is assigned to
irb.100, which has the IP address 10.7.7.1/24.

TRUST VLAN
The TRUST VLAN is defined on both the EX8200 and EX4500-VC.
Each port on the EX4500-VC that connects to the firewalls and the
EX8200 is configured as either an access or a trunk port. The general
idea is that devices behind the firewalls are trusted and anything
beyond the firewalls towards the MX240 is untrusted.

Figure 2.7 Logical Illustration of the Juniper SRX Firewall Sitting Between the UNTRUST
and TRUST VLANs

Let’s take a quick look at the VLAN configuration on the EX4500-VC


to confirm:
vlans {
UNTRUST {
vlan-id 300;
}
}
Chapter 2: The Test Bed 25

That’s about as simple as you can get when it comes to defining a


VLAN. No bells or whistles here, just vanilla bridging.

UNTRUST VLAN
The UNTRUST VLAN is also defined on the EX4500-VC and extends
to the other arm of the firewalls and finally to the MX240. Because the
firewalls are running IEEE 802.1Q, they can have two logical inter-
faces each on the TRUST and UNTRUST VLANs. Figure 2.7 illus-
trates that as data passes through the firewalls, it will flow from
TRUST to UNTRUST.

DC VLAN
The final VLAN is DC and is defined only on the EX8200. It represents
the rest of the data center that would exist in a real production envi-
ronment. The IXIA HTTP Client is connected into the DC VLAN to
source stateful HTTP traffic, which will ultimately flow through the
firewalls and out the MX240 to the IXIA HTTP Server.
Let’s view the VLAN configuration on the EX8200:
vlans {
DC {
vlan-id 1000;
l3-interface vlan.1000;
}
}
interfaces {
vlan {
unit 1000 {
family inet {
address 192.168.1.1/24;
}
family iso;
}
}
}

The DC VLAN is configured with a vlan-id of 1000 with a routed


interface of vlan.1000; this interface serves as the default gateway of
the IXIA Client.

Layer 3 Topology
There are four major networks defined in this topology that build on
top of the Layer 2 VLAN structure: IXIA Server, Untrust, Trust, and
Data Center.
26 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

Figure 2.8 Layer 3 Topology


Chapter 2: The Test Bed 27

As mentioned previously, the aggregation switch EX4500-VC, is a


simple bridge and doesn’t participate in routing: it merely filters,
forwards, and floods between all the connected devices.

10.7.7/24
The 10.7.7/24 network sits between the IXIA Server and the MX240.
The MX240 has an IP address of 10.7.7.1/24 while the IXIA Server
uses two IP addresses, 10.7.7.2/24 and 10.7.7.3/24. Let’s take a look at
the interface and bridge domain configuration for the MX240:
interfaces {
xe-0/2/0 {
encapsulation ethernet-bridge;
unit 0 {
family bridge;
}
}
xe-0/3/0 {
encapsulation ethernet-bridge;
unit 0 {
family bridge;
}
}
irb {
unit 100 {
family inet {
address 10.7.7.1/24;
}
}
}
}
bridge-domains {
IXIA {
vlan-id 100;
interface xe-0/2/0.0;
interface xe-0/3/0.0;
routing-interface irb.100;
}
}

Both interfaces xe-0/2/0 and xe-0/3/0 are configured as a standard


access port belonging to the bridge domain IXIA. The interface irb.100
is assigned as the routed interface for this bridge domain and provides
connectivity up to the IXIA Server.
28 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

Figure 2.9 IXIA Server and MX240 Connectivity

In effect what is happening is that the MX240 is acting like a switch


with a routed interface on 10.7.7.1/24 while the IXIA Server has an IP
assigned to each port. In later chapters, the IXIA Server IP addresses
will be used as the destination address during the firewall tests.

10.3/24
In Junos, 10.3/24 is valid shorthand for the network 10.3.0.0/24. This
network maps directly to the UNTRUST VLAN. All of the firewalls
and the MX240 use this network for reachability. The MX240 is
assigned the IP address 10.3.0.1/24, while the firewalls are assigned
10.3.0.11/24 through 10.3.0.14/24, respectively. Let’s take a look at
the ae0 interface on the first firewall SRX-1:
interfaces {
ae0 {
vlan-tagging;
aggregated-ether-options {
lacp {
periodic fast;
}
}
unit 200 {
vlan-id 200;
family inet {
address 10.2.0.11/24;
}
family iso;
}
unit 300 {
vlan-id 300;
family inet {
address 10.3.0.11/24;
}
family iso;
}
}
}
Chapter 2: The Test Bed 29

The SRX-1 firewall is configured for IEEE 802.1Q, effectively creating


two logical interfaces from ae0: ae0.200 and ae0.300. Each logical
interface is configured for IPv4 and ISO, which are required for IS-IS to
operate.

Figure 2.10 The SRX-1’s Two Logical Interfaces

Although using a single ae0 interface, by using IEEE 802.1Q, it’s


possible to create two logical interfaces to provide logical separation.
This book will demonstrate that as traffic flows through the firewalls it
will ingress from TRUST and egress from UNTRUST.

10.2/24
This network is very similar to 10.3/24, but provides connectivity, yet
again, for all of the devices in the TRUST VLAN, including the
firewalls since they’re using IEEE 802.1Q and have two logical inter-
faces. The firewall IP addresses are 10.2.0.11/24 through 10.2.0.14/24,
respectively. Since the EX8200 is part of the TRUST VLAN it partici-
pates in this network with the IP address of 10.2.0.10/24.

192.168.1/24
This network represents the data center where the test traffic will be
originated. Only the EX8200 and IXIA Client sit on this network. The
EX8200 has a Layer 3 interface with an IP address of 192.168.1.1/24,
which acts as a default gateway for the IXIA Client.

Figure 2.11 EX8200 and IXIA Client IP Addressing


30 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

Let’s take a look at the EX8200 interface and VLAN configuration:


interfaces {
xe-0/0/2 {
unit 0 {
family ethernet-switching {
port-mode access;
vlan {
members DC;
}
}
}
}
xe-0/0/3 {
unit 0 {
family ethernet-switching {
port-mode access;
vlan {
members DC;
}
}
}
}
vlan {
unit 1000 {
family inet {
address 192.168.1.1/24;
}
family iso;
}
}
}
vlans {
DC {
vlan-id 1000;
l3-interface vlan.1000;
}
}

Similar in function to the MX240, the EX8200 is acting as a switch to


the IXIA Client with a routed interface / RVI on the VLAN DC. The
interface vlan.1000 is configured as 192.168.1.1/24, which serves as a
gateway to the IXIA Client.

Loopback Addresses
Each device has its own loopback address that’s used for reachability
and in some cases for IBGP peering, as listed in Table 2.1.
Chapter 2: The Test Bed 31

Table 2.1 Device Loopback Address Assignments

Device Loopback Address

MX240 10.3.255.10/32

SRX-1 10.3.255.11/32

SRX-2 10.3.255.12/32

SRX-3 10.3.255.13/32

SRX-4 10.3.255.14/32

EX8200 10.2.255.10/32

The loopback addresses try and use the same network address assign-
ments. For example, the MX240 has a loopback address of
10.3.255.10/32 which has the same first two octets of 10.3 belonging
to devices in the UNTRUST VLAN and the last octet of .10, which
matches the last octet of its 10.3.0.10/24 IP address. This makes the
loopback address easy to remember without having to return to this
chapter as a reference.

IS-IS Configuration
The topology this book has elected to use is IS-IS as the Interior
Gateway Protocol (IGP). OSPF has been beaten to death and it’s
always a good idea to mix it up.
To keep things simple, all devices share the same IS-IS area 49.0000 as
shown in Figure 2-12. To further reduce complexity, all of the interface
adjacencies are Level 2 only. Notice that only devices with a direct
connection have adjacency with each other. For example, the MX240
and EX8200 only have an IS-IS adjacency with the firewalls, however,
the firewalls have an IS-IS adjacency with every single device.
Although there are some devices that do not have a full mesh of IS-IS
adjacencies, each device has a complete route table with connectivity
to all networks and loopback addresses in this network.
32 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

Figure 2.12 IS-IS Topology and Area


Chapter 2: The Test Bed 33

Let’s take a look at the IS-IS configuration on SRX-1:


interfaces {
lo0 {
unit 0 {
family inet {
address 10.3.255.11/32;
}
family iso {
address 49.0000.1111.1111.1111.00;
}
}
}
}
protocols {
isis {
apply-groups isis-bfd;
level 1 disable;
interface ae0.200;
interface ae0.300;
interface lo0.0 {
passive;
}
}
}

Simple enough. This is a basic IS-IS configuration placing the firewall


into the area 49.0000 and forcing all the interfaces as Level 2 only.
Both the interfaces ae0.200 and ae0.300 are placed into the IS-IS
configuration so that adjacencies to the MX240 and EX8200 can be
established. Also, the loopback interface is included into the IS-IS
configuration so that SRX-1 can be reached via 10.3.255.11/32.
Let’s verify that IS-IS is working correctly with the show isis adja-
cency command:

jnpr@SRX-1> show isis adjacency 
Interface             System         L State        Hold (secs) SNPA
ae0.200               SRX-2          2  Up                   20  0:1f:12:f1:ff:c0
ae0.200               SRX-3          2  Up                   24  0:1f:12:f6:ef:c0
ae0.200               SRX-4          2  Up                   23  0:1f:12:fa:f:c0
ae0.200               EX8208-SW1-RE0 2  Up                    6  0:22:83:6a:32:1
ae0.300               MX240-RE0 2       Up                    8  0:1f:12:b7:77:c0
ae0.300               SRX-2          2  Up                   22  0:1f:12:f1:ff:c0
ae0.300               SRX-3          2  Up                   25  0:1f:12:f6:ef:c0
ae0.300               SRX-4          2  Up                   19  0:1f:12:fa:f:c0

The output is outstanding – each interface is showing its respective


neighbors with a State of Up. (The great thing about IS-IS is that it
includes a feature that makes it easier for humans to identify the
neighboring router. Notice that the second column “System” shows
the neighboring router’s hostname.)
34 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

Let’s verify that IS-IS is properly installing prefixes into the routing
table with the show route protocol isis command:
jnpr@SRX-1> show route protocol isis

inet.0: 15 destinations, 15 routes (15 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.2.255.10/32     *[IS-IS/18] 01:03:33, metric 10
                    > to 10.2.0.253 via ae0.200
10.3.255.10/32     *[IS-IS/18] 01:03:33, metric 10
                    > to 10.3.0.1 via ae0.300
10.3.255.12/32     *[IS-IS/18] 01:03:33, metric 10
                    > to 10.3.0.12 via ae0.300
                      to 10.2.0.12 via ae0.200
10.3.255.13/32     *[IS-IS/18] 01:03:33, metric 10
                      to 10.3.0.13 via ae0.300
                    > to 10.2.0.13 via ae0.200
10.3.255.14/32     *[IS-IS/18] 01:03:33, metric 10
                    > to 10.3.0.14 via ae0.300
                      to 10.2.0.14 via ae0.200
192.168.1.0/24     *[IS-IS/18] 01:03:33, metric 20
                    > to 10.2.0.253 via ae0.200

Each of the devices configured to use IS-IS is showing up in the route


table of SRX-1 and is reachable by using the respective loopback
address.There’s no need to review each of the IS-IS configurations on
all of the devices because they’re nearly identical aside from the
interface names, however, the Appendix at the end of this book lists all
of the device configurations.
The one exception is that the MX240 has IS-IS specifically configured
so as not to include the interface irb.1000. This effectively keeps the
10.7.7/24 network isolated between the MX240 and IXIA Server. Any
devices further into the topology have no knowledge of this network.
(The traffic flow and test cases are described later in this chapter.) The
10.7.7/24 network will be used as a destination address on the IXIA
Client, but it’s interesting to note that no devices beside the MX240
and IXIA Server know about this network. In order for the IXIA Client
to have connectivity to the 10.7.7/24 network, BGP will be used.

Bidirectional Forwarding Detection


Using the default IS-IS timers is a bit slow; one method to speed up the
detection of a dead peer is the use of bidirectional forwarding detection
(BFD). BFD is a very simple hello protocol that’s only purpose in life is
to detect the liveness of peers. The benefit to BFD is that it’s very
lightweight and can support sub-second hellos and dead peer detec-
tion. The other benefit is that other protocols such as OSPF, IS-IS, BGP,
and even static routes can consume the services of BFD.
Chapter 2: The Test Bed 35

In this book, each device participating in IS-IS will use BFD for liveness
detection. For sub-second detection, an interval of 300ms will be used
with a multiplier of 3, meaning that if three hellos are missed, the
neighbor is considered down.
Let’s take a look at the BFD configuration of EX8200:
protocols {
isis {
level 1 disable;
interface lo0.0 {
passive;
}
interface vlan.200 {
bfd-liveness-detection {
minimum-interval 300;
multiplier 3;
}
}
interface vlan.1000;
}
}

BFD is applied on a per-interface basis. In this case, the EX8200 only


has a single interface – vlan.200 – that connects it to the four Juniper
SRX firewalls. The show bfd session command displays the number of
neighbors detected and the current status of each.
jnpr@EX8208-SW1-RE0> show bfd session 
                                                  Detect   Transmit
Address                  State     Interface      Time     Interval  Multiplier
10.2.0.11                Up        vlan.200       0.900     0.300        3   
10.2.0.12                Up        vlan.200       0.900     0.300        3   
10.2.0.13                Up        vlan.200       0.900     0.300        3   
10.2.0.14                Up        vlan.200       0.900     0.300        3   

4 sessions, 12 clients
Cumulative transmit rate 13.3 pps, cumulative receive rate 13.3 pps

The EX8200 was able to see all of the other firewalls on the interface
vlan.200. Multiplying the multiplier times the minimum-interval
results in the Detect Time of 0.900 seconds.
BFD becomes especially important when two devices are peering over
a bridge such as the EX8200 and the Juniper SRX firewalls. Recall that
all devices are physically cabled to the EX4500-VC. Imagine if SRX-1
had an interface failure. From the vantage point of the EX8200,
everything is still operational, because its interface – which is connect-
ed to the EX4500-VC – is still up. In this example, BFD would trigger
IS-IS that the SRX-1 is down in 900ms.
36 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

BGP Configuration
At this point you should have a very good understanding of how the
topology is connected and how devices are able to communicate. Let’s
move on to the real meat of the topology and take a look at BGP. BGP is
the glue that brings everything together in terms of being able to route
through the topology between the IXIA Client and IXIA Server.
Traffic will be sourced from the IXIA Client, which is sitting on the
network 192.168.1/24, and the packets will be destined to 10.7.7.2 and
10.7.7.3. The only problem is that the next hop router for the IXIA
Client has no idea how to reach 10.7.7.2 or 10.7.7.3. This is where BGP
comes in – a default route will be originated from the MX240 and
propagated throughout the topology.

!
Figure 2.13 BGP Within the Topology

At a high-level there are two autonomous system numbers (ASNs):


1234 and 4567. The ASN 1234 is limited to the EX8200 while the ASN
4567 includes all of the firewalls and MX240.

NOTE To solve the IBGP split horizon problem, EBGP is used between the
EX8200 and the firewalls instead of configuring a BGP route reflector.
Creating a full mesh is another option, but an IBGP connection between
the EX8200 and MX240 would defeat the purpose of the firewalls.
Chapter 2: The Test Bed 37

MX240
The advertisement of prefixes throughout the network is very simple. It
all begins at the MX240 with a default route:
routing-options {
static {
route 0.0.0.0/0 discard;
}
autonomous-system 4567;
}

But a static default route that discards all packets isn’t enough. There
needs to be a policy statement that exports this default route to the
firewalls via IBGP:
policy-options {
policy-statement export-default {
term 1 {
from {
protocol static;
route-filter 0.0.0.0/0 exact;
}
then {
accept;
}
}
term 2 {
then reject;
}
}
}

This policy will find the static route 0/0, accept it, and reject all other
prefixes. The next step is to configure BGP on the MX240 and refer-
ence the policy-statement export-default:
protocols {
bgp {
group SRX {
type internal;
local-address 10.3.255.10;
export export-default;
multipath;
neighbor 10.3.255.11;
neighbor 10.3.255.12;
neighbor 10.3.255.13;
neighbor 10.3.255.14;
}
}

This BGP group is responsible for peering with the firewalls SRX-1
through SRX-4. Because all of the devices share the same ASN, the
peering type is IBGP. It’s considered best practice to use loopback
38 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

addressing when configuring IBGP – the MX240 will use its loopback
address 10.3.255.10 as the local-address and use the loopback address
of each firewall as the neighbor address.
Now that a 0/0 static route exists in the RIB, and there’s a policy to find
0/0 and accept the prefix, it needs to be applied to the BGP configura-
tion so the MX240 can advertise it to the firewalls. This is done with
the export export-default configuration.

Firewalls
The firewalls have the most interesting BGP configuration. Since the
entire point of this case study is to force traffic through the firewalls,
two different BGP groups on the SRX are needed: TRUST and UN-
TRUST.
protocols {
bgp {
group UNTRUST {
type internal;
local-address 10.3.255.11;
neighbor 10.3.255.10;
}
group TRUST {
type external;
multihop;
local-address 10.3.255.11;
peer-as 1234;
neighbor 10.2.255.10;
}
}
}

UNTRUST
The BGP group UNTRUST is used to peer to the MX240 via IBGP. No
policy, import or export is required. According to best practices, the
loopback addresses are used to establish connectivity between the
firewalls and MX240.

TRUST
When peering with the EX8200, EBGP is used, since the EX8200 is in
a different ASN. Since loopback peering was used with the MX240, it
was used again when peering with the EX8200, although it’s EBGP.
When using loopback peering with EBGP, the option multihop is
required because the default time to live (TTL) for EBGP is 1.
Chapter 2: The Test Bed 39

Default Route
Note that neither the BGP group UNTRUST or TRUST uses an import
or export policy. The firewalls are just using the default BGP rules
when advertising and accepting prefixes. Since the MX240 is advertis-
ing a 0/0 route to the firewalls, the firewalls advertise the 0/0 route to
the EX8200 in return. Let’s verify this behavior:
jnpr@SRX-1> show bgp summary
Groups: 2 Peers: 2 Down peers: 0
Table Tot Paths Act Paths Suppressed History Damp State Pending
inet.0 1 1 0 0 0 0
Peer AS InPkt OutPkt OutQ Flaps Last Up/Dwn State|#Active/
Received/Accepted/Damped...
10.2.255.10 1234 148 159 0 1 1:03:05 0/0/0/0
0/0/0/0
10.3.255.10 4567 149 156 0 1 1:02:57 1/1/1/0
0/0/0/0

The firewall is receiving a single prefix from the MX240 (10.3.255.10).


Let’s take a look at the RIB:

jnpr@SRX-1> show route 0/0 exact

inet.0: 15 destinations, 15 routes (15 active, 0 holddown, 0 hidden)


+ = Active Route, - = Last Active, * = Both

0.0.0.0/0 *[BGP/170] 01:03:04, localpref 100, from 10.3.255.10


AS path: I
> to 10.3.0.1 via ae0.300

As expected, the 0/0 route is in the RIB, and it originated from


10.3.255.10 with the next hop pointing towards the MX240.

EX8200
The EX8200 peers with all four firewalls via EBGP using loopback
peering. Recall that the EX8200 ASN is 1234 and the ASN of the four
firewalls is 4567. Because of the loopback peering with EBGP, the
multihop option needs to be used to increase the default TTL of 1:
protocols {
bgp {
group SRX {
type external;
multihop;
local-address 10.2.255.10;
peer-as 4567;
multipath;
neighbor 10.3.255.11;
neighbor 10.3.255.12;
40 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

neighbor 10.3.255.13;
neighbor 10.3.255.14;
}
}
}

Let’s take a look at the show bgp summary command to verify that
everything has come up properly:
jnpr@EX8208-SW1-RE0> show bgp summary 
Groups: 1 Peers: 4 Down peers: 0
Table          Tot Paths  Act Paths Suppressed    History Damp State    Pending
inet.0                 4          4          0          0          0          0
Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/
Dwn State|#Active/Received/Accepted/Damped...
10.3.255.11           4567         139        137       0      13     1:01:09 Establ
  inet.0: 1/1/1/0
10.3.255.12           4567       10216      10187       0       7  3d 4:52:42 Establ
  inet.0: 1/1/1/0
10.3.255.13           4567       10221      10186       0      13  3d 4:52:37 Establ
  inet.0: 1/1/1/0
10.3.255.14           4567        1386       1379       0      19    10:23:47 Establ
  inet.0: 1/1/1/0

All of the EBGP connections to the firewalls are Established and


receiving a single IPv4 prefix. Recall that the MX240 is originating a
0/0 BGP prefix that is advertised to all four firewalls. Since the default
rules for BGP allow for all BGP prefixes to be advertised to EBGP
peers, the EX8200 should see the 0/0 prefix with four next hops:
jnpr@EX8208-SW1-RE0> show route 0/0 exact 

inet.0: 16 destinations, 19 routes (16 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

0.0.0.0/0          *[BGP/170] 3d 04:52:49, localpref 100, from 10.3.255.12
                      AS path: 4567 I
                    > to 10.2.0.11 via vlan.200
                      to 10.2.0.12 via vlan.200
                      to 10.2.0.13 via vlan.200
                      to 10.2.0.14 via vlan.200
                    [BGP/170] 01:01:08, localpref 100, from 10.3.255.11
                      AS path: 4567 I
                    > to 10.2.0.11 via vlan.200
                    [BGP/170] 3d 04:52:44, localpref 100, from 10.3.255.13
                      AS path: 4567 I
                    > to 10.2.0.13 via vlan.200
                    [BGP/170] 10:23:54, localpref 100, from 10.3.255.14
                      AS path: 4567 I
                    > to 10.2.0.14 via vlan.200

The 0/0 prefix is being advertised by all four firewalls: 10.3.255.11


through10.3.255.14. An interesting thing to note is that there are
Chapter 2: The Test Bed 41

several different entries in the RIB for the 0/0 prefix.


The first prefix is denoted with the “*” – which indicates this prefix is
in both the RIB and FIB – it originated from 10.3.255.12 and has four
next hops. The other 0/0 prefixes are originated from 10.3.255.11,
10.3.255.13, and 10.3.255.14 and only have a single next hop.
The reason the 0/0 prefix originated from 10.3.255.12 in both the RIB
and FIB is because it’s the best BGP prefix for 0/0. In this example, the
firewall SRX-2 happened to have the longest established BGP session,
thus SRX-2 is considered more stable.
jnpr@EX8208-SW1-RE0> show bgp summary 
Groups: 1 Peers: 4 Down peers: 0
Table          Tot Paths  Act Paths Suppressed    History Damp State    Pending
inet.0                 4          4          0          0          0          0
Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/
Dwn State|#Active/Received/Accepted/Damped...
10.3.255.11           4567         139        137       0      13     1:01:09 Establ
  inet.0: 1/1/1/0
10.3.255.12           4567       10216      10187       0       7  3d 4:52:42 Establ
  inet.0: 1/1/1/0
10.3.255.13           4567       10221      10186       0      13  3d 4:52:37 Establ
  inet.0: 1/1/1/0
10.3.255.14           4567        1386       1379       0      19    10:23:47 Establ
  inet.0: 1/1/1/0

Note that SRX-2 (10.3.255.12) has been Up for 3 days, 4 hours, 52


minutes and 42 seconds, verifying that SRX-2 connected via BGP first.
The four next hops associated with 0/0 originate from 10.3.255.12
because of the multipath option in the BGP configuration on the
EX8200. The multipath option instructs BGP to stop calculating the
best path once it is determined that the IGP metrics are equal to the
prefix being evaluated. In other words, if there are four prefixes for 0/0
that are considered equal in the BGP best path selection process, all up
to the point of comparing IGP metrics, and if multipath is enabled,
stop processing the best path selection algorithm and install the prefix
into the RIB. If multipath wasn’t enabled, there would be only a single
next hop originating from the 10.3.255.12 firewall – because it just
happened to be the oldest BGP connection – and all of the other
firewalls would just sit in the RIB unused. The multipath option allows
all four firewalls to have an active entry in the RIB with next hops
pointing equally to each firewall.
However, in Junos the default load balancing algorithm is per-prefix.
In this specific example there’s only a single prefix of 0/0, so the default
load balancing method wouldn’t work very well. What’s needed is to
change the load balancing algorithm to be per-flow, so that traffic using
the 0/0 prefix will select a next hop based off attributes in the frame
42 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

and segment of the packet. The EX8200 has a hard-coded per-flow


algorithm that takes into account source MAC address, destination
MAC address, source IP address, destination IP address, and both
source and destination TCP or UDP port numbers.
In order to change the default load balancing algorithm, the traffic to
be load balanced needs to first be matched with a policy to change the
action:
routing-options {
autonomous-system 1234;
forwarding-table {
export lb;
}
}
policy-options {
policy-statement lb {
then {
load-balance per-packet;
}
}
}

The policy statement lb simply matches all traffic and changes the
load-balance option to be per-packet.

NOTE Keep in mind that when changing the load-balance option to per-pack-
et, it isn’t really per-packet, but it is per-flow. Junos shows per-packet
simply because of historical reasons.

And the second part of changing the default load balancing algorithm
is to apply this policy to the FIB. To do so, set the forwarding-table
export to reference the lb policy.
Before committing this configuration, let’s compare the FIB before and
after the change.

BEFORE
jnpr@EX8208-SW1-RE0> show route forwarding-table destination 0/0 
Routing table: default.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
default            user     0                    ulst 131078     1
                                                 indr 131074     2
                              0:1f:12:f2:7f:c0   ucst  1332      7 vlan.200
                                                 indr 131076     2
default            perm     0                    rjct    36      1
0.0.0.0/32         perm     0                    dscd    34      1
Chapter 2: The Test Bed 43

AFTER
jnpr@EX8208-SW1-RE0> show route forwarding-table destination 0/0 
Routing table: default.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
default            user     0                    ulst 131078     1
                                                 indr 131074     2
                              0:1f:12:f2:7f:c0   ucst  1332      7 vlan.200
                                                 indr 131076     2
                              0:1f:12:f1:ff:c0   ucst  1334      7 vlan.200
                                                 indr 131075     2
                              0:1f:12:f6:ef:c0   ucst  1338      7 vlan.200
                                                 indr 131070     2
                              0:1f:12:fa:f:c0    ucst  1324      7 vlan.200
default             perm     0                    rjct    36      1
0.0.0.0/32         perm     0                    dscd    34      1

Note that before the FIB export change, the next hop in the FIB for the
destination address 0/0 was a single MAC address ending in 7f:c0.
After the FIB export change was applied, the next hop for 0/0 has
changed. There are now four next hops pointing to four different
MAC addresses ending in 7f:c0, ff:c0, ef:c0, and 0f:c0.
Now, any traffic that’s taking the 0/0 route in the RIB will be hashed
per-flow in the FIB and have close to uniform distribution across all
four next hops.

Default Route
As the EX8200 serves as the default gateway for the IXIA Client, all
traffic sourced from the IXIA Client needs to have a valid route on the
EX8200. The IXIA Client has been configured to source traffic from
192.168.1/24 and sends it to 10.7.7.2 and 10.7.7.3. Let’s take a look at
the EX8200, and make sure that there is a valid route:
jnpr@EX8208-SW1-RE0> show route 10.7.7.2 

inet.0: 16 destinations, 19 routes (16 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

0.0.0.0/0          *[BGP/170] 3d 04:52:49, localpref 100, from 10.3.255.12
                      AS path: 4567 I
                    > to 10.2.0.11 via vlan.200
                      to 10.2.0.12 via vlan.200
                      to 10.2.0.13 via vlan.200
                      to 10.2.0.14 via vlan.200

Perfect; the traffic destined for 10.7.7.2 is hitting the default route on
the EX8200, which in turn will be forwarded uniformly to the SRX
firewalls, and ultimately up to the MX240 to reach the final destina-
tion of the IXIA Server.
44 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

Traffic Flow
Up till now, this chapter has covered the physical and logical topology
of the case study. Layered on top of the topology are the routing
protocols IS-IS and BGP to enable reachability within the topology and
allow the IXIA Client to reach the IXIA Server. Let’s take a moment
and review an example traffic flow sourced at the IXIA Client and
destined to the IXIA Server as depicted in Figure 2.14.

vv
!
Figure 2.14 Traffic Flow from IXIA Client to IXIA Server

The traffic in Figure 2.14 flows from the left to the right. Let’s walk
through the entire process to fully understand how each device routes
the packet:
1. IXIA Client generates a packet with a source of 192.168.1.100 and
a destination of 10.7.7.2.
2. IXIA Client forwards the packet to its default gateway 192.168.1.1.
3. EX8200 receives the packet.
4. EX8200 performs a route lookup for 10.7.7.2 and matches its 0/0
route.
Chapter 2: The Test Bed 45

5. EX8200 chooses one of the four next hops for 0/0 and in this
example forwards the packet to 10.3.255.11.
6. SRX-1 receives the packet (this example will skip the security
processing for now).
7. SRX-1 performs a route lookup for 10.7.7.2 and matches its 0/0
route.
8. SRX-1 forwards the packet to 10.3.255.10.
9. MX240 receives the packet.
10. MX240 performs a route lookup for 10.7.7.2.
11. MX240 has a direct interface with 10.7.7.1/24 and forwards it out
of this interface.
12. IXIA Server receives the packet.
Of course, this is only half of the picture. What’s shown in Figure 2.14
is only the egress traffic destined to the IXIA Server. What’s missing is
the return traffic that’s destined to the IXIA Client.

Return Traffic
The previous sections in this chapter provided a glimpse into how the
IXIA Client is able to send traffic out to the IXIA Server. Let’s go back
to the root of the challenge, and focus on how to demultiplex a large
stream of traffic statefully and distribute flows uniformly to multiple
SRX Series firewalls.

!
Figure 2.15 Demultiplexing and Multiplexing Traffic
46 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

At what point in the architecture is the traffic subject to demultiplex-


ing? Figure 2.15 should serve as a reminder. In this use case, the IXIA
Client is the device generating Traffic0 and the demux is the EX8200.
Output0 through Output3 represent flows destined to SRX-1 through
SRX-4; in other words Traffic0 is broken down on a per flow basis and
pushed through one of the firewalls based on the hash algorithm.
Finally the MX240 serves as the mux; as the traffic flows through the
four Juniper SRX firewalls, it ultimately converges onto the MX240
and is then routed to the IXIA Server.
Using the EX8200 to demux the traffic on a per flow basis is very easy.
Simply create a policy to match all traffic and apply the load-balance
per-packet option, then reference the policy in the forwarding-table
export section under routing-options. The EX8200 will examine both
the Ethernet frame and TCP / UDP segments in order to determine
which next hop to choose. Since TCP and UDP flows will have the
exact same values, ECMP will guarantee to demux the traffic on a
per-flow basis.
The interesting challenge is how to make the demux algorithm invert-
ible for both egress and return traffic. How can the architecture
guarantee that a flow can be forwarded to the same firewall from
which it originally came?

Network Address Translation


The answer is very simple and elegant: source network address transla-
tion (SNAT). This is a common firewall service that is applied to egress
Internet traffic and works perfectly to ensure that return traffic will
always go back to the firewall from which it came.
Each SRX is assigned a unique /24 network to use as a SNAT pool. As
traffic flows through each SRX and is sourced from 192.168/16, the
firewall will apply a SNAT policy.
Let’s take a look at the SRX-1 SNAT configuration:
security {
nat {
source {
pool test-snat {
address {
20.20.31.0/24;
}
}
rule-set rs1 {
from zone TRUST;
to zone UNTRUST;
rule r1 {
match {
Chapter 2: The Test Bed 47

source-address 192.168.0.0/16;
}
then {
source-nat {
pool {
test-snat;
}
}
}
}
}
}
}
policies {
from-zone TRUST to-zone UNTRUST {
policy permit-all {
match {
source-address any;
destination-address any;
application any;
}
then {
permit;
count;
}
}
}
}
}

When applying SNAT to traffic, there are several items that need to be
configured. To permit the traffic, a policy needs to be created to allow
traffic from the TRUST zone to the UNTRUST zone. The next step is
to define the SNAT pool. In this example SRX-1 creates a pool called
test-snat, which contains the address 20.20.31/24. This is true for the
other firewalls SRX-2 through SRX-4 having addresses 20.20.32/24
through 20.20.34/24 respectively. The last step is to create a NAT rule
to match traffic from the TRUST zone going to the UNTRUST zone
with a source address matching 192.168/16.
The last piece of routing information needed is some static routes on
the MX240. Since SRX-1 through SRX-4 will be using a SNAT pool of
20.20.31/24 through 20.20.34/24, respectively, the MX240 needs to
know how to reach these prefixes:
routing-options {
static {
route 20.20.31.0/24 next-hop 10.3.0.11;
route 20.20.32.0/24 next-hop 10.3.0.12;
route 20.20.33.0/24 next-hop 10.3.0.13;
route 20.20.34.0/24 next-hop 10.3.0.14;
}
}
48 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

Armed with this new information, let’s walk through the entire flow of
egress and ingress traffic from IXIA Client to IXIA Server as shown in
Figure 2.16.

!
Figure 2.16 Egress and Ingress Traffic

There’s no need to rehash steps 1 through 6, as they were already


reviewed in the previous section, so let’s start with step 7.
7. SRX accepts the traffic with policy permit-all and creates a session
for this flow in the session table. The SNAT rule-set rs1 matches the
traffic as it’s sourced from the TRUST zone and destined to the
UNTRUST zone; in addition the rule r1 matches the source address as
it’s part of the 192.168/16 network. The packet is then subject to
SNAT and the source address is changed from 192.168.1.100 to
20.20.31.1. SRX-1 performs a route lookup for 10.7.7.2 and matches
its 0/0 route.
8. SRX-1 forwards the packet to 10.3.255.10.
9. MX240 receives the packet.
10. MX240 performs a route lookup for 10.7.7.2.
11. MX240 has a direct interface with 10.7.7.1/24 and forwards it out
of this interface.
12. IXIA Server receives the packet.
Chapter 2: The Test Bed 49

13. IXIA Server processes the packet and responds. IXIA Server
performs a route lookup for 20.20.31.1 and sends the packet out its
default gateway of 10.7.7.1.
14. IXIA Server forwards the packet to the MX240.
15. MX240 receives the packet.
16. MX240 performs a route lookup for 20.20.31.1 and finds a static
route for 20.20.31/24 pointing to 10.3.255.11.
17. MX240 forwards the packet to 10.3.255.11.
18. SRX-1 receives the packet.
19. SRX-1 identifies the packet as part of an existing session as
explained in step 7. SRX-1 also identifies the packet as part of a SNAT
rule. SRX-1 reverts the destination address to the original source
address of 192.168.1.100 as described in step 7. SRX-1 performs a
route lookup for 192.168.1.100 and sees an IS-IS route for
192.168.1/24 pointing to the EX8200.
20. SRX-1 forwards the packet to 10.3.255.10.
21. EX8200 receives the packet.
22. EX8200 performs a route lookup for 192.168.1.100 and finds a
directly connected network of 192.168.1/24.
23. EX8200 forwards the packet to IXIA Client.
24. IXIA Client receives the packet.
This may seem like a lot of steps, but this example has been exagger-
ated to illustrate each function on each device through the packet’s
entire life.
Using a combination of ECMP for egress demux and SNAT to ensure
an invertible demux on the return traffic is the key to this architecture.
It’s all about breaking down a complex problem into simple building
blocks.

NOTE The author realizes that using a destination address of 10.7.7/24 isn’t
routable on the Internet and I should have used an address range that
isn’t part of RFC 1918. Please accept my apologies. Being pedantic
aside, using a private address doesn’t impact the functionality of the
case study.
50 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

Summary
From cabling and connecting the devices to setting up VLANs and
configuring routing protocols, this chapter has explained in detail how
the test bed is configured. Sometimes it’s easy to get so caught up in the
details you can’t see the forest for the trees. Let’s take a step back from
the implementation details and review the goals we set at the beginning
of the chapter.
When the amount of traffic exceeds the capacity of a single firewall,
one must break down the traffic into smaller chunks that are able to be
serviced by a single firewall. The traffic is then sent to its final destina-
tion. The next step is to handle the return traffic in the same fashion as
the original egress traffic. Because the egress traffic is subject to a
unique SNAT, the return traffic is always guaranteed to go back to the
firewall from which it came, as shown in Figure 2.17.
There are two flows in Figure 2.17: flow0 and flow1. In this example,
flow0 is mapped to SRX-1 via the FIB load balancing by the EX8200.
Because the SRX-1 has a unique SNAT pool, the return traffic is
guaranteed to be routed back to SRX-1. The same is true for flow1,
except that in this example, the EX8200 FIB has load balanced it to
SRX-4 – thus it’s subject to the unique SNAT pool on SRX-4. The
return traffic for flow1 is then destined back to SRX-4, making the
entire conversation stateful.
ECMP and SNAT are a powerful combination. Such simple tools can
be used to solve complex problems. However ECMP isn’t as simple as
it appears. The next chapter takes a deep dive into ECMP and focuses
on failure conditions.
Chapter 2: The Test Bed 51

Figure 2.17 Demux / Mux Applied to the Topology


52 Day One: Scaling Beyond a Single Juniper SRX in the Data Center
Chapter 3

Equal Cost Multi-Path (ECMP) Routing

How Does ECMP Work? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

ECMP Drawbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

ECMP Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

ECMP Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
54 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

There are two components in ECMP: the RIB and FIB. Both the RIB
and FIB work in various combinations to provide different types of
behavior in ECMP. As a quick refresher, let’s review the difference
between the RIB and FIB.

How Does ECMP Work?


The RIB lives in the control plane and is maintained by the Junos
kernel. The sole purpose of the RIB is providing information so a
subset of the route table can be pushed down into the FIB. For exam-
ple, consider the following route table:
jnpr@EX8208-SW1-RE0> show route 0/0 exact 

inet.0: 16 destinations, 19 routes (16 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

0.0.0.0/0          *[BGP/170] 3d 04:52:49, localpref 100, from 10.3.255.12
                      AS path: 14824 I
                    > to 10.2.0.11 via vlan.200
                      to 10.2.0.12 via vlan.200
                      to 10.2.0.13 via vlan.200
                      to 10.2.0.14 via vlan.200
                    [BGP/170] 01:01:08, localpref 100, from 10.3.255.11
                      AS path: 14824 I
                    > to 10.2.0.11 via vlan.200
                    [BGP/170] 3d 04:52:44, localpref 100, from 10.3.255.13
                      AS path: 14824 I
                    > to 10.2.0.13 via vlan.200
                    [BGP/170] 10:23:54, localpref 100, from 10.3.255.14
                      AS path: 14824 I
                    > to 10.2.0.14 via vlan.200
                    [Static/200] 3d 04:43:27
                    > to 10.2.0.11 via vlan.200

The prefix 0/0 has five entries in the RIB: there’s an entry for each of
the EBGP neighbors, 10.3.255.11 through 10.3.255.14. The last entry
for 0/0 is a static route with a preference of 200.
The job of the RIB is to determine which of the available destinations is
considered the best. The RIB has to follow a set of rules, such as:
„„ Match the longest prefix.
„„ Prefer the lowest preference.
„„ If an IGP, prefer the lowest metric.
„„ If BGP, there’s an entire laundry list of over items that need to be
compared to find the best path.
Once the RIB has identified the best destination for a given prefix, it is
pushed down to the FIB. Thus, the FIB only contains the bare essentials
Chapter 3: Equal Cost Multi-Path (ECMP) Routing 55

that are required to forward packets. Let’s review the FIB for the prefix
0/0 and compare it to the RIB:
jnpr@EX8208-SW1-RE0> show route forwarding-table destination 0/0 
Routing table: default.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
default            user     0                    ulst 131078     1
                                                 indr 131074     2
                              0:1f:12:f2:7f:c0   ucst  1332     7 vlan.200
                                                 indr 131076     2
                              0:1f:12:f1:ff:c0   ucst  1334     7 vlan.200
                                                 indr 131075     2
                              0:1f:12:f6:ef:c0   ucst  1338     7 vlan.200
                                                 indr 131070     2
                              0:1f:12:fa:f:c0    ucst  1324     7 vlan.200
default            perm     0                    rjct    36     1
0.0.0.0/32         perm     0                    dscd    34     1

In this example, the FIB for the prefix 0/0 only has a single entry, which
has considered the best destination by the RIB, but has multiple next
hops. This information is installed into each of the ASICs on the router
or switch’s line cards and is used exclusively for forwarding traffic at
line-rate. Think of the RIB as the brains, and the FIB as the brawn.

Load Balancing
Junos supports two different types of load balancing in the FIB:
per-prefix and per-flow. By default, Junos uses per-prefix load balanc-
ing. If a given set of prefixes share the same set of next hops, each
prefix will increment the next hop, until the last next hop in the set is
reached, then it starts from the beginning again as shown in Figure 3.1.

Figure 3.1 Per-Prefix Load Balancing

In Figure 3.1, given that Prefix 1 through Prefix 4 have the destinations
of next-hop 1 through next-hop 2, each prefix will increment next
56 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

hops. Here Prefix 1 has a destination of next-hop 1. The next entry,


Prefix 2, has the destination of next-hop 2. Since next-hop 2 is the last
available next hop, Prefix 3 will start over again with a destination of
next-hop 1, and so on.
The previous chapter illustrated how to create a policy-statement to
modify how the FIB selects the next hop. By using the load-balance
per-packet, it causes the FIB to install multiple next hops instead of
using the default per-prefix method.

Figure 3.2 Per-Packet Load Balancing

When load-balance per-packet is in effect, the FIB changes the method


with which it calculates the next hop. Now, each prefix has two
available next hops instead of one. As traffic passes through the FIB, it
performs a hash function and selects a deterministic next hop based off
the attributes of the given packet. From the perspective of the RIB,
nothing has changed. Prefix 1 through Prefix 4 have always had valid
destinations of next-hop 1 through next-hop 2, but it’s the FIB’s
responsibility when it comes to load balancing traffic across the next
hops for these given prefixes.

DID YOU KNOW? One of the most asked questions about Junos is: Why does Junos have
an option called per-packet if it’s really per-flow? The answer is that
with Juniper’s first router – the M40 – the per-packet actually load
balanced the traffic across multiple next hops on a per-packet basis.
Obviously, this was a problem, as packets tended to arrive out of order
and cause oscillation in the network. It was decided to change this
behavior to be per-flow instead of per-packet. The unfortunate (and
fortunate) part is that the configuration option was left unchanged so
that it wouldn’t break previous customer configurations. Juniper has
had bigger fish to fry since then, but hopefully one day this per-packet
option can be deprecated and replaced with the correctly named
per-flow option instead.
Chapter 3: Equal Cost Multi-Path (ECMP) Routing 57

ECMP Drawbacks
ECMP is great at providing uniform traffic distribution across a set of
next hops and has been used with great success in switches and routers
for a very long time. The hash algorithm used by the FIB to determine a
next hop is deterministic per packet, so it’s very good at providing
per-flow distribution. However, the Achilles’ heel of ECMP is that
when the attributes that feed the hash function change, the output of
the algorithm changes as well. This causes a change in how next hops
are calculated and can vary from a previous calculation. Consider
Figure 3.3.

Figure 3.3 Vanilla Hash Function

In Figure 3.3 there are four packets that represent four different flows
as well as four next hops. Let’s assume that each packet is mapped to a
specific next hop, given the current state of the hash function.
Let’s also assume that each next hop represents a different Juniper SRX
firewall. Now imagine that something occurred to cause the link to go
down; this could be the cause of a maintenance window or failure.
Since the link is down, it’s no longer a valid next hop with regard to the
hash function shown in the subsequent Figure 3.4.
Now that next-hop 3 is no longer available, the attributes that feed the
hash algorithm are different, and the next hop calculation has changed.
Assume that the same packets arrive at the hash algorithm again.
Previously, when there were four available next hops, packet 1 was
mapped to next-hop 2, but now that the hash is being calculated
differently due to the change in next hops, the new next hop for packet
1 is next-hop 4.
58 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

Figure 3.4 Vanilla Hash Function with a Next Hop Removed

This may appear to make sense visually, but how does this work
mathematically? Let’s take a look at a very simple hash algorithm:
hash = mod(p,n)
Here p equals packets, and n equals the number of next hops.
To illustrate this formula, let’s graph ten packets using the same
formula, but three different numbers of next hops:
In Figure 3.5 each shape represents the output of the hash function
given a different bucket size. In this example, the bucket size is analo-
gous to the number of next hops. The X axis represents the packet
number and the Y axis represents the bucket number. For example,
using the function mod(5, 3), packet 5 would be placed into bucket 2,
however, when using the function mod(5, 7), packet 5 would be placed
into bucket 5. The output is listed in Table 3.1.
Table 3.1 Output of: hash=mod(n) where n = 3, 4, and 7

packet 1 2 3 4 5 6 7 8 9 10

mod(3) 1 2 0 1 2 0 1 2 0 1

mod(4) 1 2 3 0 1 2 3 0 1 2

mod(7) 1 2 3 4 5 6 0 1 2 3

The end result is that any topology change in the network can impact
how traffic flows through the FIB. Since firewalls are a stateful device
and keep track of traffic flows, imagine if there were four firewalls,
each with its own session table, then one firewall had a failure, chang-
ing the number of next hops to three. Since the hash will be calculated
Chapter 3: Equal Cost Multi-Path (ECMP) Routing 59

Figure 3.5 Illustration of: hash=mod(packet,n) where n = 3, 4, and 7

differently, it’s possible that previous flows going to SRX-1 can now be
mapped to SRX-3. What happens in this case is that SRX-3 will receive
the flow midstream, look in its session table and see that there’s no
existing session for the incoming packet, and discard the packet.

NOTE The mod function was used here to provide a simple illustration of
hashing and how the number of next hops can change the output of the
function. Please rest assured that Juniper ASICs used a much more
advanced hashing algorithm. ; )

In summary, ECMP works very well as long as there are no topology


changes in the network. When the number of next hops change, the
output of the FIB hash algorithm will change. What this means is that
when:
„„ a new firewall is added
„„ a firewall is removed
60 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

„„ a firewall is rebooted
„„ a link is disabled for maintenance
... or any other action that would cause the number of firewall next
hops to change on the EX8200 is undertaken, it’s extremely probable
that a subset of flows will be mapped to a new next hop. The end result
causes a subset of the traffic to be discarded by the new firewall, as it
will not accept traffic without first having a session in its session table.
ECMP is only suitable for traffic that is short-lived and is able to
recover from errors by initializing new connections.

ECMP Testing
Armed with the knowledge of how the FIB hashing algorithm can
impact traffic, the next logical step is to test how a topology change
impacts real traffic. The goal here is to generate stateful traffic sourced
from an IXIA Client destined to the IXIA Server and measure the
impact of a FIB change.
It’s expected that during a topology change any existing concurrent
connections would be dropped. This is because existing flows would be
remapped to different next hops and intercepted by different firewalls
that have no knowledge of the ingress packet.

IXIA Test Configuration


For the ECMP testing, the values in Table 3.2 will be used.

Table 3.2 IXIA Test Configuration for ECMP

Key Value

Test Duration 300 seconds (5 minutes)

Source Address Range 192.168.1.2 to 192.168.1.252

Destination Address Range 10.7.7.2 to 10.7.7.3

Protocol TCP

Application HTTP

Packet Sizes IMIX

CPS 50,000
Chapter 3: Equal Cost Multi-Path (ECMP) Routing 61

Test Objective
When the test was started the CPS ramped up to 50,000 within about
10 seconds. Each Juniper SRX firewall received about 12,500 CPS
each as the traffic was distributed uniformly across all four next hops.
The EX8200 was configured with ECMP and per-packet (read:
per-flow) FIB load balancing.
Around 160 seconds into the test the author disabled the interface ae0
on SRX-1, causing the FIB on the EX8200 to go from four next hops
down to three. The CPS instantly dropped down to around 38,000 and
quickly ramped back up to 50,000 within 10 seconds, as shown in
Figure 3.6.

Figure 3.6 ECMP Testing Connection Rate

What was the real impact, though? Let’s take a look at the active
number of HTTP sessions shown in Figure 3.7.

Figure 3.7 HTTP Transactions That are Active for ECMP Test
62 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

Unfortunately, the number of active HTTP sessions in Figure 3.7 is a


bit too spiky. There are anywhere from 1 to 60 active HTTP sessions at
any given time with no apparent pattern. One of the test configurations
used is a traffic pattern of IMIX, which would help explain the spikes
in the number of active HTTP requests, so let’s look at the number of
active sessions at the time the failure occurred. There were approxi-
mately 12 active HTTP connections at the time of failure.
Let’s next take a look at the cumulative totals for HTTP to see if there
are additional clues to the true impact of the FIB change shown in
Figure 3.8.

Figure 3.8 Cumulative HTTP Totals for ECMP Testing

Because the four Juniper SRX firewalls have been configured to discard
traffic that isn’t already in the session table, what we can expect to see
is a flurry of TCP Retries from the IXIA Client during the EX8200 FIB
change.
As predicted, there are a large number of TCP Retries on the IXIA
Client. Because the IXIA Client wasn’t receiving a TCP ACK from the
IXIA Server, it kept attempting to retry until the socket was timed out.
The real impact during the exact moment of the EX8200 FIB change
was 19 sockets. This can be verified by looking at the number of IXIA
Client TCP RST packets sent. The IXIA Client never received TCP
ACKs from the IXIA Server, so it eventually timed out, and instead of
closing the socket with a TCP FIN, it sent a TCP RST packet. Notice
that the graph shows about 12 active HTTP connections at the time of
failure, but the report indicated that 12 active HTTP connections were
dropped. The graph isn’t 100% accurate because it averages the data,
but what can be concluded is that there were at least 12 active HTTP
sessions across all four firewalls according to the graph, and the
detailed report confirms that exactly 19 were impacted because of the
Chapter 3: Equal Cost Multi-Path (ECMP) Routing 63

failure. This clearly shows that all firewalls are impacted equally
during a failure scenario using ECMP, because there’s only a single
failure domain in this architecture.

Figure 3.9 TCP Connection Totals for Client and Server – ECMP Testing

Over the five minute duration of the IXIA test, nearly 15,000,000
packets were sent and received, as shown in Figure 3.9. Drilling down
into the details of the number of TCP SYN packets sent and received,
we can see there is a delta of 274 TCP SYN packets missing. Given that
the IXIA Client was sending out TCP SYN packets at a rate of 50,000
per second, it’s calculated that during the EX8200 FIB change that 5ms
of traffic was dropped on the disabled interface going to SRX-1.

NOTE The EX8200 FIB change caused 5ms worth of traffic to be dropped in
this specific topology and test configuration. This number is specific to
this case study and will differ in your network.

ECMP Conclusions
Using ECMP as a demux is an efficient method to split traffic into
chunks based off TCP and UDP flows. The hashing algorithm allows
the FIB to forward traffic to multiple next hops at line-rate without the
overhead of a state table. The caveat is that when the number of FIB
next hops changes for a given prefix, so does the algorithm to calculate
the next hop.
64 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

During an EX8200 FIB change the real impact of stateful traffic – ac-
cording to the data above – is as follows:
„„ As the next hop for SRX-1 was being removed from the 8200
FIB, there was 5ms of traffic loss for this particular next hop.
„„ Nineteen TCP sockets were timed out and closed.
Based off the test results and behavior during a failure scenario, it’s
recommended that ECMP be used with short lived sessions that are
able to reestablish a new session in the event of a failure.
Chapter 4

Filter-Based Forwarding

A Different Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Scaling with FBF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

FBF Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

FBF Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
66 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

If the effect of a FIB change using ECMP causes too much of an impact
in your network, the alternative is to reduce the size of the failure
domain; this can be accomplished with Filter-Based Forwarding (FBF).
FBF is a method to match traffic with firewall rules and push the traffic
into a different routing instance. This is similar to policy-based routing
(PBR), but FBF offers a huge advantage because it’s able to reset the
entire routing instance and not just the next hop.

A Different Approach
Traditional PBR will match traffic and simply change the next hop.
This works well enough until the said next hop doesn’t exist due to a
failure in the network. PBR is a very rigid method to move a subset of
traffic to a specific next hop. It provides nothing more and nothing less,
as you see in Figure 4.1.

!
Figure 4.1 Policy-Based Routing

PBR is too rigid to handle failure scenarios. Figure 4.1 illustrates that a
prefix is simply mapped to a next hop, and if that next hop were to
become unreachable, the PBR for Packet 1 would simply discard the
traffic.
FBF has the advantage of moving the traffic into a completely different
routing instance. Imagine that the routing instance SRX.inet.0, shown
in Figure 4.2, is running a dynamic routing protocol or has multiple
static default routes and is able to recover from a simple next hop
failure, whereas traditional PBR could not.

Figure 4.2 Filter-Based Forwarding

FBF is able to handle failure gracefully. Figure 4.2 illustrates that


Packet 1 is moved into the routing instance SRX.inet.0. Within this
routing instance are two static routes: each static route has its own
unique next hop. In this example, if next-hop 1 became unavailable,
Chapter 4: Filter-Based Forwarding 67

BFD could detect the loss of forwarding, and any traffic entering the
routing instance SRX.inet.0 would be forwarded to next-hop 2
instead.

Scaling with FBF


In the previous ECMP testing, it was illustrated that any change in the
number of next hops could cause temporary traffic loss. This is
because each next hop was mapped to a unique firewall. A change in
the FIB would cause the flows to be shifted across different next hops.
If a firewall received a packet not in the session table, it would immedi-
ately drop it.
An effective method to reduce the impact of a FIB change is to use FBF
to specify which next hop traffic should use. Let’s assume that the
source traffic was coming from a single /24 and in keeping with the
original test bed there are four Juniper SRX firewalls. One alternative
is to create four terms in a firewall filter that would match on the
source address of the traffic and map each /26 to a different firewall as
shown in Figure 4.3.

Figure 4.3 Multiple Maintenance Domains with Filter-Based Forwarding

In Figure 4.3, the first term would match for 192.168.1/26 and move
the traffic into the SRX-1.inet.0 routing instance and continue all the
way through 192.168.1.192/26, which would move the traffic into the
SRX-4.inet.0 routing instance.
This method increases the number of failure domains from one to four,
which results in containing failures to where they happened. In Figure
4.3 each SRX represents a single failure domain for a total of four. The
more failure domains, the better, as other parts of the network can
operate without being impacted.
For example, if SRX-1 had a failure, the traffic being matched in Term
1 would continue to be mapped to the SRX-1.inet.0 routing instance,
68 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

but a floating default route would intercept the traffic and point it to
SRX-2. During such a failure scenario, traffic being matched by Term
2, Term 3, and Term 4 would be unaffected by the failure of SRX-1.
A shown in Figure 4.4 the decision point revolves around stability
versus performance. The more next hops that are in ECMP the better
the performance and uniform distribution, however, the fewer next
hops that are in ECMP offer better stability during a failure scenario.
The decision choices look like Figure 4.4.

Figure 4.4 Decision Between Stability and Performance

The final conclusion, and recommendation, is to evaluate the type of


traffic being inspected. If the business requires more performance and
is able to recover from a failure scenario, then a pure ECMP method
would make the most sense. On the flip side, if the business require-
ments mandate high availability during a failure scenario, then a pure
FBF approach is recommended.
When using firewall filters to match traffic and move packets into a
different routing instance, it’s also possible to create a hybrid model,
like that shown in Figure 4.5.

Figure 4.5 Hybrid FBF and ECMP

Because each routing instance has its own instance of a FIB, it’s
possible to implement an architecture where both performance and
stability can co-exist. In Figure 4.5, both Term 1 and Term 2 use
ECMP to both SRX-1 and SRX-2. This offers more uniform distribu-
tion and performance, but the caveat is that they share the same failure
Chapter 4: Filter-Based Forwarding 69

domain. Although Term 3 and Term 4 are mapped to their own


firewalls, SRX-3 and SRX-4 respectively, providing more stability
during a failure scenario – the caveat is that the distribution isn’t as
uniform as ECMP.

FBF Configuration
The configuration of FBF is straightforward. The only difficulty is
deciding how to break up the traffic into different failure domains and
how to respond during a failure.
In the test bed, the IXIA Client is configured to use source addresses of
192.168.1.2 through 192.168.1.252. The most logical way to segment
this traffic is by matching on the four /26s within 192.168.1/24. Let’s
take a look at such a firewall filter:
firewall {
    family inet {
        filter distribute-default {
            term SRX-1 {
                from {
                    source-address {
                        192.168.1.0/26;
                    }
                }
                then {
                    routing-instance SRX-1;
                }
            }
            term SRX-2 {
                from {
                    source-address {
                        192.168.1.64/26;
                    }
                }
                then {
                    routing-instance SRX-2;
                }
            }
            term SRX-3 {
                from {
                    source-address {
                        192.168.1.128/26;
                    }
                }
                then {
                    routing-instance SRX-3;
                }
            }
            term SRX-4 {
70 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

                from {
                    source-address {
                        192.168.1.192/26;
                    }
                }
                then {
                    routing-instance SRX-4;
                }
            }
        }
    }
}

This firewall filter effectively breaks out each /26 network into its own
routing instance. Breaking out the traffic into different routing instanc-
es is easy enough, but how do you create a routing instance and define
the next hops?
The first step is to create the routing instances for each Juniper SRX in
the topology: SRX-1, SRX-2, SRX-3, and SRX-4.
routing-instances {
    SRX-1 {
        instance-type virtual-router;
        routing-options {
            static {
                route 0.0.0.0/0 {
                    qualified-next-hop 10.2.0.11;
                    qualified-next-hop 10.2.0.12 {
                        metric 6;
                    }
                }
            }
        }
    }
}

This routing instance isn’t enough, however, because the static route
0/0 has two next hops of 10.2.0.11 and 10.2.0.12, which are reachable
via vlan.200. The problem is that vlan.200 isn’t reachable inside of the
routing instance SRX-1.inet.0. In order for vlan.200 to be reachable
from the master routing instance inet.0 and SRX-1.inet.0, a RIB group
needs to be created:
routing-options {
    interface-routes {
        rib-group inet SRX;
    }
    rib-groups {
        SRX {
            import-rib [ inet.0 SRX-1.inet.0 SRX-2.inet.0 SRX-
Chapter 4: Filter-Based Forwarding 71

3.inet.0 SRX-4.inet.0 ];
        }
    }
}

The RIB group SRX contains SRX-1.inet.0 through SRX-4.inet.0. To


copy the interface routes into each routing instance, the interface-
routes command is used and references the SRX RIB group. Now each
of the routing instances contains a copy of the interfaces routes from
inet.0.
Creating a default route in the SRX-1.inet.0 routing instance will catch
all traffic and force it through a specific next hop. By default, a static
route has a preference of 5. The second qualified-next-hop of 10.2.0.12
acts as a backup default route in case of a failure. This can be verified
by taking a look at the RIB and FIB:
jnpr@EX8208-SW1-RE0> show route table SRX-1.inet.0 0/0 exact

SRX-1.inet.0: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

0.0.0.0/0          *[Static/5] 06:41:39
                    > to 10.2.0.11 via vlan.200
                    [Static/6] 06:41:39
                      to 10.2.0.12 via vlan.200

The next step is to ensure that failover is sub-second. Recall that the
test bed topology has a simple Layer 2 switch between the EX8200 and
Juniper SRX firewalls. If there was a link failure on SRX-1, it wouldn’t
be seen by EX8200. Another method is required to detect a data plane
failure. Chapter 2 introduced BFD as a method to detect data plane
failures on top of the IS-IS routing protocol. Since BFD is agnostic to
the client, Junos supports using BFD with static routes.
Within each routing instance BFD will be configured for each next hop
to provide sub second failover:
routing-instances {
    SRX-1 {
        instance-type virtual-router;
        routing-options {
            static {
                route 0.0.0.0/0 {
                    qualified-next-hop 10.2.0.11 {
                        bfd-liveness-detection {
                            minimum-interval 300;
                            multiplier 3;
                        }
                    }
72 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

                    qualified-next-hop 10.2.0.12 {
                        metric 6;
                        bfd-liveness-detection {
                            minimum-interval 300;
                            multiplier 3;
                        }
                    }
                }
            }
        }
    }

With both static routes riding on top of BFD, let’s verify with show bfd
sessions that BFD is up and acknowledges the static route as client:

jnpr@EX8208-SW1-RE0> show bfd session extensive 
                                                  Detect   Transmit
Address                  State     Interface      Time     Interval  Multiplier
10.2.0.11                Up        vlan.200       0.900     0.300        3   
 Client Static, TX interval 0.300, RX interval 0.300
 Client ISIS L2, TX interval 0.300, RX interval 0.300
 Session up time 01:01:54, previous down time 05:23:44
 Local diagnostic NbrSignal, remote diagnostic None
 Remote state Up, version 1‑ Replicated 
 Min async interval 0.300, min slow interval 1.000
 Adaptive async TX interval 0.300, RX interval 0.300
 Local min TX interval 0.300, minimum RX interval 0.300, multiplier 3
 Remote min TX interval 0.300, min RX interval 0.300, multiplier 3
 Local discriminator 4, remote discriminator 9
 Echo mode disabled/inactive

This verifies that BFD is up to 10.2.0.11 and has two applications:


Static and IS-IS. In the event of a failure, BFD will signal both IS-IS and
the static route that the next hop is no longer available. This will result
in the SRX-1.inet.0 routing instance beginning to use the backup
default route pointing towards SRX-2.
By combining FBF, routing instances, qualified next hops, and BFD, it’s
possible to create a very robust architecture that’s able to effectively
split up traffic across multiple next hops and quickly isolate and
recover from failures.
The alternative architecture is now coming full circle. Figure 4.6
illustrates at a high-level how ingress packets are evaluated by a
firewall filter and move the packet into the appropriate routing
instance.
Chapter 4: Filter-Based Forwarding 73

Figure 4.6 Filter-Based Forwarding and Bidirectional Forwarding Detection

You can see that once the packet enters the new routing instance it’s
under the control of the RIB and FIB of that particular routing instance.
Each routing instance is mapped to two different firewalls. For example,
the routing instance SRX-1.inet.0 is mapped to both firewalls SRX-1
and SRX-2. In addition to being mapped to both firewalls, each routing
instance provides two default routes with different preferences. Each
routing instance has a preferred firewall as the default route for all
traffic. For example, all traffic entering routing instance SRX-1.inet.0
would use the next hop for SRX-1. In the event of a failure that caused
SRX-1 to become unreachable, BFD would detect the forwarding error
and signal the static route for SRX-1 to be removed, thus only leaving
the backup default route for SRX-2 available.
Let’s take a closer look at the life of a packet when flowing through this
new architecture, as shown in Figure 4.7.
74 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

Figure 4.7 Flow Chart of a Packet using FBF and Primary / Secondary Default Routes

Figure 4.7 begins with the packet on the top left. It will be subject to a
firewall filter with four terms. Each term looks at the source address to
see if it matches a specific /26; if there’s no match it simply discards the
packet. Let’s assume that the source address of the packet is
192.168.1.67 with a destination address of 10.7.7.2. The second term
in the firewall filter matches this packet and pushes it into the SRX-2.
inet.0 routing instance. Once the packet is inside of the routing
instance it will need to choose a route. Inside of each routing instance
are two default routes. The first default route has a default preference
of 5, while the second default route has a preference of 6. Using this
method always guarantees that traffic will prefer the first default route
as it has a lower preference. Also keep in mind that each default route
Chapter 4: Filter-Based Forwarding 75

is a client to BFD and is monitoring each of the next hops. Let’s assume
there was a problem with SRX-1. BFD would detect the loss of hellos
and declare that the next hop to SRX-1 is down. The first default route
would be removed from the SRX-1.inet.0 route table and the only
remaining default route left would be pointing to SRX-2. Since the first
default route has been removed, the packet takes the second default
route with a preference of 6 and is forwarded to SRX-2.
The firewall filter, routing instances, and default routes have been
adjusted so that the traffic is split evenly across the 192.168.1/24
network into four different networks on the /26 boundary. Each /26
network has its own routing instance, default route, and firewall. This
configuration is a perfect example of tipping the scale towards stability,
as ECMP is not used.

FBF Testing
FBF introduces new options in how traffic is mapped to multiple
firewalls. Although FBF can be used to create a hybrid model of
performance and stability, it makes more sense to test the latter in this
chapter.
During a topology change it would be expected that only concurrent
flows in the specific routing instance would be impacted, while other
traffic in unrelated routing instances would continue forwarding traffic
without impact.

IXIA Test Configuration


For the FBF testing, Table 4.1 contains the pertinent values.
Table 4.1 IXIA Test Configuration for FBF

Key Value

Test Duration 300 seconds (5 minutes)

Source Address Range 192.168.1.2 to 192.168.1.252

Destination Address Range 10.7.7.2 to 10.7.7.3

Protocol TCP

Application HTTP

Packet Sizes IMIX

CPS 50,000
76 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

Test Objective
When the test was started, the CPS ramped up to 50,000 within about
10 seconds. Each Juniper SRX firewall received about 12,500 CPS
each as the traffic was distributed based off each packet’s source
address across all four next hops. The EX8200 was configured with
FBF and per-prefix FIB load balancing, which results only in a single
next hop in the FIB at any given time.
Once again, the author shut down the interface ae0 on SRX-1 around
160 seconds. The traffic CPS dropped to about 42,000 for about five
seconds and quickly recovered as shown in Figure 4.8.

Figure 4.8 FBF TCP Connections Per Second

During this failure scenario it was noted that BFD detected the change
and removed the primary default route from the SRX-1.inet.0 RIB,
and traffic with a source address of 192.168.1/26 and 192.168.1.64/26
was mapped to the firewall SRX-2. This meant that the firewall SRX-2
was doing double duty for the remaining duration of the test since the
interface ae0 on SRX-1 remained disabled.
Let’s review the cumulative report for the FBF in Figure 4.9 to deter-
mine the real impact to the network.
The concurrent HTTP connections that were connected to SRX-1 were
timed out during the failure scenario. There were 135 IXIA Client TCP
retries because the traffic for these particular sockets was mapped to
the SRX-2. The SRX-2 had no knowledge of the active sessions on
SRX-1 and simply discarded the packets. Because the IXIA Client
wasn’t receiving TCP ACKs, it attempted to retry until it timed out the
connection. Once the connection was timed out, the IXIA Client sent a
TCP RST packet to the IXIA Server to kill the connection.
Chapter 4: Filter-Based Forwarding 77

Figure 4.9 FBF Cumulative HTTP Totals

In this particular test the routing instance SRX-1.inet.0 happened to


have 12 active HTTP connections when the failure occurred. When
comparing the FBF failure results with ECMP, it’s clear that FBF comes
out ahead. ECMP lost 19 HTTP connections while FBF only lost 12
HTTP connections. Simply comparing the numbers isn’t enough; you
must fully understand the big picture.
It would seem logical that FBF would show a much lower value, but
keep in mind the number of active HTTP sessions varies wildly because
the traffic type is defined as IMIX. Let’s take a look at the approximate
number of active connections when the failure occurred in the graph
shown as Figure 4.10 and in Figure 4.11.

Figure 4.10 Active HTTP Transactions


There were about 42 active HTTP connections when the failure
occurred. Assuming that there was uniform distribution between the
four firewalls, each firewall would have, on average, 10.5 connections
at the time of the failure. This clearly shows that the failure of SRX-1
only resulted in the failure of 25% of the active HTTP connections,
whereas ECMP had a 100% failure in the active HTTP connections.
78 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

The benefit of using FBF was that the SRX-1 failure scenario was
localized to only the SRX-1.inet.0 routing instance. Traffic flowing
through the other three routing instances wasn’t impacted and contin-
ued to forward traffic normally.

Figure 4.11 FBF TCP Connections


Let’s calculate the outage window during the failure scenario of
SRX-1. The delta between the number of TCP SYN packets sent and
received is 94. Given that the IXIA Client was sending TCP SYN
packets at a rate of 50,000 a second, the result is that the EX8200 only
dropped less than 2ms of traffic in the SRX-1.inet.0 routing instance.
In the previous ECMP test it was discovered there was a 5ms outage;
FBF fared much better than ECMP in this regard.

FBF Testing Conclusions


FBF grants the network operator the ability to create as many failure
domains as there are firewalls. The more failure domains there are in a
network, the more stable it is. The only caveat to using a pure FBF
approach is that the distribution of traffic is a manual process and
tends to be a bit coarse when compared to ECMP.
The observation is that FBF handles failure scenarios much better than
ECMP. In the event of a failure, it’s isolated to the failure domain from
which it came. Other failure domains will not see an impact and will
continue to forward traffic as normal. The last observation is that the
EX8200 FIB outage is smaller when using FBF because the failure
domain is much smaller when compared to ECMP.
Chapter 5

Proof of Concept

Junos FIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Traffic Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Firewall Clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
80 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

One of the challenges of creating a hybrid model is the ability to treat


subsets of traffic differently. Sometimes it’s desirable to expose a subset
of traffic to ECMP and another subset of traffic to a simple FIB with a
primary and secondary next hop.
Junos FIB
Junos supports the virtualization of the FIB via routing instances. It’s
possible to have the master instance subject traffic to ECMP, while
routing instances subject traffic to a single next hop.
jnpr@EX8208-SW1-RE0> show route forwarding-table destination 0/0 | no-more    
Routing table: default.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
default            user     0                    ulst 131072     1
                                                 indr 131074     2
                              0:1f:12:f2:7f:c0   ucst  1332     8 vlan.200
                                                 indr 131076     2
                              0:1f:12:f1:ff:c0   ucst  1334     6 vlan.200
                                                 indr 131075     2
                              0:1f:12:f6:ef:c0   ucst  1338     7 vlan.200
                                                 indr 131070     2
                              0:1f:12:fa:f:c0    ucst  1324     7 vlan.200
default            perm     0                    rjct    36     1
0.0.0.0/32         perm     0                    dscd    34     1

Routing table: __master.anon__.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
default            perm     0                    rjct  1285     1
0.0.0.0/32         perm     0                    dscd  1283     1

Routing table: SRX-1.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
default            user     0 0:1f:12:f2:7f:c0   ucst  1332     8 vlan.200
default            perm     0                    rjct  1372     1
0.0.0.0/32         perm     0                    dscd  1370     1

Routing table: SRX-2.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
default            user     0 0:1f:12:f1:ff:c0   ucst  1334     8 vlan.200
default            perm     0                    rjct  1381     1
0.0.0.0/32         perm     0                    dscd  1379     1

Routing table: SRX-3.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
default            user     0 0:1f:12:f6:ef:c0    ucst  1338     7 vlan.200
default            perm     0                    rjct  1390     1
0.0.0.0/32         perm     0                    dscd  1388     1
Chapter 5: Proof of Concept 81

Routing table: SRX-4.inet
Internet:
Destination        Type RtRef Next hop           Type Index NhRef Netif
default            user     0 0:1f:12:fa:f:c0   ucst  1324     7 vlan.200
default            perm     0                    rjct  1399     1
0.0.0.0/32         perm     0                    dscd  1397     1

The default.inet represents the master instance and demonstrates that


the FIB for the prefix 0/0 contains four next hops. In comparison, each
of the routing instances SRX-1 through SRX-4 only contains a single
next hop. It’s possible to influence how the FIB calculates the next hop
using a policy-statement applied to the forwarding-table:
routing-options {
forwarding-table {
export lb;
}
}
policy-options {
policy-statement lb {
term SRX-1 {
from instance SRX-1;
then accept;
}
term SRX-2 {
from instance SRX-2;
then accept;
}
term SRX-3 {
from instance SRX-3;
then accept;
}
term SRX-4 {
from instance SRX-4;
then accept;
}
term master {
from instance master;
then {
load-balance per-packet;
}
}
}
}

The key here is to match on the routing instance and apply a separate
action for each instance. In this example, the routing instances SRX-1
through SRX-4 have an action of then accept. This action will simply
apply the default FIB policy, which is per-prefix, forcing the FIB to
have a single next hop. The last term in the policy-statement is match-
ing the master routing instance with from instance master – the action
for this term is load-balance per-packet, which forces the FIB to
perform per-flow hashing, which allows multiple next hops.
82 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

The power of Junos FIB virtualization allows you to create a powerful


and flexible architecture for traffic distribution. There’s no reason to
sacrifice performance for stability or vice versa; it’s completely possible
to create a hybrid architecture that provides a customized mixture of
both performance and stability.

Traffic Distribution
One of the key components to using the FBF architecture is the
creation and maintenance of a firewall filter that’s used to match traffic
and place the packet into a specific routing instance. The method
demonstrated during the test was to simply match on the packet’s
source address and network mask. This approach is very coarse and
makes the assumption that the traffic conforms to variable length
subnet masking (VLSM) and the networks are sequential.
An alternative method that’s better equipped to deal with any range of
source addresses, is to match on the actual bits of the source IP ad-
dress. The only caveat is that this alternative method requires that the
number of next hops be subject to 2n.
If there were four next hops, a good method would be to match on the
last two bits of the source IP address. The four values would be 00, 01,
10, and 11. Junos firewall filters support noncontiguous address
matching. Let’s take a look at how such a firewall filter would be
created:
firewall {
    family ethernet-switching {
        filter count-dist {
            term 1 {
                from {
                    source-address {
                        0.0.0.0/0.0.0.3;
                    }
                }
                then count c1;
            }
            term 2 {
                from {
                    source-address {
                        0.0.0.1/0.0.0.3;
                    }
                }
                then count c2;         
            }
            term 3 {
                from {
                    source-address {
                        0.0.0.2/0.0.0.3;
                    }
                }
                then count c3;
            }
            term 4 {
Chapter 5: Proof of Concept 83

                from {
                    source-address {
                        0.0.0.3/0.0.0.3;
                    }
                }
                then count c4;
            }
            term else {
                then {
                    discard;
                    log;
                    count c-else;
                }
            }
        }
    }
}

The trick is to use the lesser known address/prefix method when


specifying a source IP address. This method assumes there are four
next hops, as only two bits are in scope, as shown in Table 5.1.
Table 5.1 Matrix of Addresses, Prefixes, and the Last Two Bits Matched

Address Prefix Last Two Bits Matched

0.0.0.0 0.0.0.3 00

0.0.0.1 0.0.0.3 01

0.0.0.2 0.0.0.3 10

0.0.0.3 0.0.0.3 11

The huge benefit to using this method is that it will match any source
address because the first 30 bits are irrelevant. For example,
0.0.0.0/0.0.0.3 would match 10.100.4.4 and 192.168.6.188 because
the last two bits of the IP address end in binary 00. Let’s review the bit
positions of an IP packet:
    0                   1                   2                   3   
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Ver= 4 |IHL= 5 |Type of Service|        Total Length = 21      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      Identification = 111     |Flg=0|   Fragment Offset = 0   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Time = 123  |  Protocol = 1 |        header checksum        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                         source address                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      destination address                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     data      |                                                
   +-+-+-+-+-+-+-+-+
84 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

Using the Junos noncontiguous method of matching allows you to


match any bit in an IP address. The only caveat to this method is that it
requires 2n next hops, as shown in Table 5.2.
Table 5.2 Matrix of 2n and Number of Next Hops

n (bits) Number of Next Hops = 2n


1 2

2 4

3 8

4 16

5 32

If there were eight next hops, how would the firewall filter be modi-
fied? The last three bits of the IP address must be evaluated. Table 5.3
shows the address and prefix required to match the three bits of an
address.
Table 5.3 Matrix of Addresses, Prefixes, and the Last Three Bits Matched

Address Prefix Last Three Bits Matched

0.0.0.0 0.0.0.7 0000

0.0.0.1 0.0.0.7 0001

0.0.0.2 0.0.0.7 0010

0.0.0.3 0.0.0.7 0011

0.0.0.4 0.0.0.7 0100

0.0.0.5 0.0.0.7 0101

0.0.0.6 0.0.0.7 0110

0.0.0.7 0.0.0.7 0111

Keep in mind this alternative method assumes there’s adequate diver-


sity in the last octet of the source IP address. If you have very specific
requirements that only provide diversity in the third octet of the IP
address, the firewall filter must be modified to account for this.

NOTE Table 5.4 is a corner case example showing the flexibility of Junos
firewall filters to accept 0.0.3.0. It nearly all cases it’s better to use the
mask of 0.0.0.3.
Chapter 5: Proof of Concept 85

Table 5.4 Matrix of Addresses, Prefixes, and the Bits 22 and 23

Address Prefix Bits 22 and 23

0.0.0.0 0.0.3.0 00

0.0.1.0 0.0.3.0 01

0.0.2.0 0.0.3.0 10

0.0.3.0 0.0.3.0 11

Assuming there’s a use case that has more diversity in the third octet of
the source IP address, Table 5.4 will serve as the reference for matching
the last two bits within the scope of the third octet, which would be
bits 22 and 23. In this example, 0.0.0.0/0.0.3.0 would match both
172.16.56.17 and 10.244.244.17, as the bits 22 and 23 are binary 00.

Firewall Clustering
Another decision point in the performance versus stability question is
the use of Juniper SRX clustering. When two firewalls are configured
to act as a single cluster, the benefit is that if one firewall has a failure,
the other firewall takes over without dropping any current sessions.
The drawback is that designing an architecture that leverages firewall
clusters requires twice the capital investment and decreases perfor-
mance roughly ten percent. For example, a standalone SRX5800 with
ten SPCs is able to provide roughly 47Gbps of firewall throughput
when using IMIX traffic. Firewall clustering would increase the capital
investment to two SRX5800 chassis and 20 SPCs and decrease the
performance to roughly 42Gbps of firewall throughput when IMIX
traffic is used. However the benefit is that the cluster is able to recover
from a failure without dropping sessions.
A good method to provide horizontal scaling with increased stability is
to use FBF combined with firewall clustering. This architecture would
increase the number of failure domains in the FIB so that a failure is
limited to the firewall from which it came. The firewall clustering also
decreases the likelihood of a FIB change, as the number of next hops in
the FIB wouldn’t change if a firewall node were to fail.
When considering the use of firewall clusters, the technical impact is
very small in terms of performance. The real question goes back to the
business. Is the extra investment of capital worth the extra stability? In
many cases the answer will be yes, but in other scenarios where
performance is most important or only a certain level of stability is
required, firewall clustering wouldn’t be necessary.
86 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

Summary
Scaling traffic beyond a single Juniper SRX requires careful thought
and consideration. The traffic characteristics need to be analyzed, as
they are the first contributing factors in the firewall performance. Is the
traffic:
„„ High or low CPS?
„„ High or low throughput?
„„ Short-lived or long-lived sessions?
„„ Does the traffic consume a lot of sessions?
„„ How many packets per second?
The other limiting factor is what type of firewall services will be
applied to the traffic:
„„ Firewall cluster?
„„ Intrusion Detection and Prevention (IDP)?
„„ IPSec?
„„ AppSecure?
Once the traffic characteristics have been identified, the next step is to
understand the business requirements of the traffic. Which traffic
requires performance and which traffic requires stability? Once these
two questions have been answered, the technical implementation is
trivial.
Once the baseline configuration and topology have been created, the
firewalls need to be monitored via SNMP to gather data such as
throughput, SPU CPU usage, and other metrics. Collecting this data
and storing it over time is critical. Being able to view the performance
characteristics of the firewalls over time provides the operational staff
with a method to plan and predict for future upgrades. The general
rule of thumb is that when the firewalls reach over 50% SPU usage, it’s
time to add additional firewalls to add more capacity into the pool. It’s
also possible that a particular firewall in the pool could be handling
more traffic than its peers, which requires that the firewall filter be
adjusted to move traffic over to a firewall with more capacity.
The following Appendix contains all the device configurations used in
this proof of concept.
Appendix

Device Configurations

MX240. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

EX4500-VC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

SRX-1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

SRX-2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

SRX-3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

SRX-4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

EX8200-ECMP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

EX8200-FBF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
88 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

MX240

chassis {
aggregated-devices {
ethernet {
device-count 1;
}
}
}
interfaces {
xe-0/0/0 {
gigether-options {
802.3ad ae0;
}
}
xe-0/1/0 {
gigether-options {
802.3ad ae0;
}
}
xe-0/2/0 {
encapsulation ethernet-bridge;
unit 0 {
family bridge;
}
}
xe-0/3/0 {
encapsulation ethernet-bridge;
unit 0 {
family bridge;
}
}
ae0 {
aggregated-ether-options {
minimum-links 1;
link-speed 10g;
lacp {
active;
}
}
unit 0 {
family inet {
address 10.3.0.1/24;
}
family iso;
}
}
irb {
unit 100 {
family inet {
address 10.7.7.1/24;
}
Appendix: Device Configurations 89

}
}
lo0 {
unit 0 {
family inet {
address 10.3.255.10/32;
}
family iso {
address 49.0000.0010.0003.0255.0010.00;
}
}
}
}
routing-options {
static {
route 0.0.0.0/0 discard;
route 20.20.31.0/24 next-hop 10.3.0.11;
route 20.20.32.0/24 next-hop 10.3.0.12;
route 20.20.33.0/24 next-hop 10.3.0.13;
route 20.20.34.0/24 next-hop 10.3.0.14;
}
autonomous-system 4567;
}
protocols {
bgp {
group SRX {
type internal;
local-address 10.3.255.10;
export export-default;
multipath;
neighbor 10.3.255.11;
neighbor 10.3.255.12;
neighbor 10.3.255.13;
neighbor 10.3.255.14;
}
group IXIA {
type external;
peer-as 1234;
neighbor 10.7.7.2;
}
}
isis {
level 1 disable;
interface ae0.0 {
bfd-liveness-detection {
minimum-interval 300;
multiplier 3;
}
level 2 priority 127;
level 1 disable;
}
interface lo0.0 {
passive;
}
90 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

}
}
policy-options {
policy-statement export-default {
term 1 {
from {
protocol static;
route-filter 0.0.0.0/0 exact;
}
then {
next-hop self;
accept;
}
}
term 2 {
then reject;
}
}
bridge-domains {
IXIA {
vlan-id 100;
interface xe-0/2/0.0;
interface xe-0/3/0.0;
routing-interface irb.100;
}
}

EX4500-VC

chassis {
aggregated-devices {
ethernet {
device-count 10;
}
}
}
interfaces {
xe-0/0/0 {
ether-options {
802.3ad ae0;
}
}
xe-0/0/1 {
ether-options {
802.3ad ae1;
}
}
xe-0/0/2 {
ether-options {
Appendix: Device Configurations 91

802.3ad ae2;
}
}
xe-0/0/3 {
ether-options {
802.3ad ae3;
}
}
xe-0/0/4 {
ether-options {
802.3ad ae4;
}
}
xe-0/0/5 {
ether-options {
802.3ad ae5;
}
}
xe-1/0/0 {
ether-options {
802.3ad ae0;
}
}
xe-1/0/1 {
ether-options {
802.3ad ae1;
}
}
xe-1/0/2 {
ether-options {
802.3ad ae2;
}
}
xe-1/0/3 {
ether-options {
802.3ad ae3;
}
}
xe-1/0/4 {
ether-options {
802.3ad ae4;
}
}
xe-1/0/5 {
ether-options {
802.3ad ae5;
}
}
ae0 {
description SRX-1;
aggregated-ether-options {
lacp {
active;
periodic fast;
92 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

}
}
unit 0 {
family ethernet-switching {
port-mode trunk;
vlan {
members [ TRUST UNTRUST ];
}
}
}
}
ae1 {
description SRX-2;
aggregated-ether-options {
lacp {
active;
periodic fast;
}
}
unit 0 {
family ethernet-switching {
port-mode trunk;
vlan {
members [ TRUST UNTRUST ];
}
}
}
}
ae2 {
description SRX-3;
aggregated-ether-options {
lacp {
active;
periodic fast;
}
}
unit 0 {
family ethernet-switching {
port-mode trunk;
vlan {
members [ TRUST UNTRUST ];
}
}
}
}
ae3 {
description SRX-4;
aggregated-ether-options {
lacp {
active;
periodic fast;
}
}
unit 0 {
Appendix: Device Configurations 93

family ethernet-switching {
port-mode trunk;
vlan {
members [ TRUST UNTRUST ];
}
}
}
}
ae4 {
description MX240;
aggregated-ether-options {
lacp {
active;
periodic fast;
}
}
unit 0 {
family ethernet-switching {
port-mode access;
vlan {
members UNTRUST;
}
}
}
}
ae5 {
aggregated-ether-options {
lacp {
active;
periodic fast;
}
}
unit 0 {
family ethernet-switching {
port-mode trunk;
vlan {
members [ TRUST UNTRUST ];
}
}
}
}
vlan {
unit 200 {
family inet {
address 10.2.0.254/24;
}
}
unit 300 {
family inet {
address 10.3.0.254/24;
}
}
}
}
94 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

vlans {
TRUST {
vlan-id 200;
l3-interface vlan.200;
}
UNTRUST {
vlan-id 300;
l3-interface vlan.300;
}
}
virtual-chassis {
preprovisioned;
no-split-detection;
member 0 {
role routing-engine;
serial-number XX;
}
member 1 {
role routing-engine;
serial-number XX;
}
}

SRX-1

chassis {
aggregated-devices {
ethernet {
device-count 10;
}
}
}
interfaces {
xe-10/2/0 {
gigether-options {
802.3ad ae0;
}
}
xe-10/3/0 {
gigether-options {
802.3ad ae0;
}
}
ae0 {
disable;
vlan-tagging;
aggregated-ether-options {
lacp {
periodic fast;
}
}
unit 200 {
Appendix: Device Configurations 95

vlan-id 200;
family inet {
address 10.2.0.11/24;
}
family iso;
}
unit 300 {
vlan-id 300;
family inet {
address 10.3.0.11/24;
}
family iso;
}
}
lo0 {
unit 0 {
family inet {
address 10.3.255.11/32;
}
family iso {
address 49.0000.1111.1111.1111.00;
}
}
}
}
routing-options {
autonomous-system 4567;
}
protocols {
bgp {
group UNTRUST {
type internal;
local-address 10.3.255.11;
neighbor 10.3.255.10;
}
group TRUST {
type external;
multihop;
local-address 10.3.255.11;
peer-as 1234;
neighbor 10.2.255.10;
}
}
isis {
apply-groups isis-bfd;
level 1 disable;
interface ae0.200;
interface ae0.300;
interface lo0.0 {
passive;
}
}
}
security {
96 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

address-book {
global {
address 0/0 0.0.0.0/0;
address 192.168/16 192.168.0.0/16;
address 20/8 1.0.0.0/24;
address 1/24 1.0.0.0/24;
address 20.20/16 20.20.0.0/16;
}
}
nat {
source {
pool test-snat {
address {
20.20.31.0/24;
}
}
rule-set rs1 {
from zone TRUST;
to zone UNTRUST;
rule r1 {
match {
source-address 192.168.0.0/16;
}
then {
source-nat {
pool {
test-snat;
}
}
}
}
}
}
}
policies {
from-zone TRUST to-zone UNTRUST {
policy permit-all {
match {
source-address any;
destination-address any;
application any;
}
then {
permit;
count;
}
}
}
}
zones {
security-zone TRUST {
tcp-rst;
host-inbound-traffic {
system-services {
Appendix: Device Configurations 97

ping;
}
protocols {
bgp;
bfd;
}
}
interfaces {
ae0.200;
}
}
security-zone UNTRUST {
tcp-rst;
host-inbound-traffic {
system-services {
ping;
}
protocols {
bgp;
bfd;
}
}
interfaces {
ae0.300;
}
}
}
}

SRX-2

chassis {
aggregated-devices {
ethernet {
device-count 10;
}
}
}
interfaces {
xe-10/2/0 {
gigether-options {
802.3ad ae0;
}
}
xe-10/3/0 {
gigether-options {
802.3ad ae0;
}
}
ae0 {
vlan-tagging;
aggregated-ether-options {
98 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

lacp {
periodic fast;
}
}
unit 200 {
vlan-id 200;
family inet {
address 10.2.0.12/24;
}
family iso;
}
unit 300 {
vlan-id 300;
family inet {
address 10.3.0.12/24;
}
family iso;
}
}
lo0 {
unit 0 {
family inet {
address 10.3.255.12/32;
}
family iso {
address 49.0000.2222.2222.2222.00;
}
}
}
}
routing-options {
autonomous-system 4567;
}
protocols {
bgp {
group UNTRUST {
type internal;
local-address 10.3.255.12;
neighbor 10.3.255.10;
}
group TRUST {
type external;
multihop;
local-address 10.3.255.12;
peer-as 1234;
neighbor 10.2.255.10;
}
}
isis {
apply-groups isis-bfd;
level 1 disable;
interface ae0.200;
interface ae0.300;
interface lo0.0 {
Appendix: Device Configurations 99

passive;
}
}
}
security {
address-book {
global {
address 0/0 0.0.0.0/0;
address 192.168/16 192.168.0.0/16;
address 20/8 1.0.0.0/24;
address 1/24 1.0.0.0/24;
address 20.20/16 20.20.0.0/16;
}
}
nat {
source {
pool test-snat {
address {
20.20.32.0/24;
}
}
rule-set rs1 {
from zone TRUST;
to zone UNTRUST;
rule r1 {
match {
source-address 192.168.0.0/16;
}
then {
source-nat {
pool {
test-snat;
}
}
}
}
}
}
}
policies {
from-zone TRUST to-zone UNTRUST {
policy permit-all {
match {
source-address any;
destination-address any;
application any;
}
then {
permit;
count;
}
}
}
}
100 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

zones {
security-zone TRUST {
tcp-rst;
host-inbound-traffic {
system-services {
ping;
}
protocols {
bgp;
bfd;
}
}
interfaces {
ae0.200;
}
}
security-zone UNTRUST {
tcp-rst;
host-inbound-traffic {
system-services {
ping;
}
protocols {
bgp;
bfd;
}
}
interfaces {
ae0.300;
}
}
}
}

SRX-3

chassis {
aggregated-devices {
ethernet {
device-count 10;
}
}
}
interfaces {
xe-10/2/0 {
gigether-options {
802.3ad ae0;
}
}
xe-10/3/0 {
Appendix: Device Configurations 101

gigether-options {
802.3ad ae0;
}
}
ae0 {
vlan-tagging;
aggregated-ether-options {
lacp {
periodic fast;
}
}
unit 200 {
vlan-id 200;
family inet {
address 10.2.0.13/24;
}
family iso;
}
unit 300 {
vlan-id 300;
family inet {
address 10.3.0.13/24;
}
family iso;
}
}
lo0 {
unit 0 {
family inet {
address 10.3.255.13/32;
}
family iso {
address 49.0000.3333.3333.3333.00;
}
}
}
}
routing-options {
autonomous-system 4567;
}
protocols {
bgp {
group TRUST {
type internal;
local-address 10.3.255.13;
neighbor 10.3.255.10;
}
group SRX {
type external;
multihop;
local-address 10.3.255.13;
peer-as 1234;
neighbor 10.2.255.10;
}
102 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

}
isis {
apply-groups isis-bfd;
level 1 disable;
interface ae0.200;
interface ae0.300;
interface lo0.0 {
passive;
}
}
}
security {
address-book {
global {
address 0/0 0.0.0.0/0;
address 192.168/16 192.168.0.0/16;
address 20/8 1.0.0.0/24;
address 1/24 1.0.0.0/24;
address 20.20/16 20.20.0.0/16;
}
}
nat {
source {
pool test-snat {
address {
20.20.33.0/24;
}
}
rule-set rs1 {
from zone TRUST;
to zone UNTRUST;
rule r1 {
match {
source-address 192.168.0.0/16;
}
then {
source-nat {
pool {
test-snat;
}
}
}
}
}
}
}
policies {
from-zone TRUST to-zone UNTRUST {
policy permit-all {
match {
source-address any;
destination-address any;
application any;
}
Appendix: Device Configurations 103

then {
permit;
count;
}
}
}
}
zones {
security-zone TRUST {
tcp-rst;
host-inbound-traffic {
system-services {
ping;
}
protocols {
bgp;
bfd;
}
}
interfaces {
ae0.200;
}
}
security-zone UNTRUST {
tcp-rst;
host-inbound-traffic {
system-services {
ping;
}
protocols {
bgp;
bfd;
}
}
interfaces {
ae0.300;
}
}
}
}

SRX-4

chassis {
aggregated-devices {
ethernet {
device-count 10;
}
}
}
104 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

interfaces {
xe-10/2/0 {
gigether-options {
802.3ad ae0;
}
}
xe-10/3/0 {
gigether-options {
802.3ad ae0;
}
}
ae0 {
vlan-tagging;
aggregated-ether-options {
lacp {
periodic fast;
}
}
unit 200 {
vlan-id 200;
family inet {
address 10.2.0.14/24;
}
family iso;
}
unit 300 {
vlan-id 300;
family inet {
address 10.3.0.14/24;
}
family iso;
}
}
lo0 {
unit 0 {
family inet {
address 10.3.255.14/32;
}
family iso {
address 49.0000.4444.4444.4444.00;
}
}
}
}
routing-options {
autonomous-system 4567;
}
protocols {
bgp {
group UNTRUST {
type internal;
local-address 10.3.255.14;
neighbor 10.3.255.10;
}
Appendix: Device Configurations 105

group TRUST {
type external;
multihop;
local-address 10.3.255.14;
peer-as 1234;
neighbor 10.2.255.10;
}
}
isis {
apply-groups isis-bfd;
level 1 disable;
interface ae0.200;
interface ae0.300;
interface lo0.0 {
passive;
}
}
}
security {
address-book {
global {
address 0/0 0.0.0.0/0;
address 192.168/16 192.168.0.0/16;
address 20/8 1.0.0.0/24;
address 1/24 1.0.0.0/24;
address 20.20/16 20.20.0.0/16;
}
}
nat {
source {
pool test-snat {
address {
20.20.34.0/24;
}
}
rule-set rs1 {
from zone TRUST;
to zone UNTRUST;
rule r1 {
match {
source-address 192.168.0.0/16;
}
then {
source-nat {
pool {
test-snat;
}
}
}
}
}
}
}
policies {
106 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

from-zone TRUST to-zone UNTRUST {


policy permit-all {
match {
source-address any;
destination-address any;
application any;
}
then {
permit;
count;
}
}
}
}
zones {
security-zone TRUST {
tcp-rst;
host-inbound-traffic {
system-services {
ping;
}
protocols {
bgp;
bfd;
}
}
interfaces {
ae0.200;
}
}
security-zone UNTRUST {
tcp-rst;
host-inbound-traffic {
system-services {
ping;
}
protocols {
bgp;
bfd;
}
}
interfaces {
ae0.300;
}
}
}
}
Appendix: Device Configurations 107

EX8200-ECMP

chassis {
aggregated-devices {
ethernet {
device-count 10;
}
}
}
interfaces {
xe-0/0/0 {
ether-options {
802.3ad ae0;
}
}
xe-0/0/1 {
ether-options {
802.3ad ae0;
}
}
xe-0/0/2 {
unit 0 {
family ethernet-switching {
port-mode access;
vlan {
members DC;
}
}
}
}
xe-0/0/3 {
unit 0 {
family ethernet-switching {
port-mode access;
vlan {
members DC;
}
}
}
}
ae0 {
aggregated-ether-options {
lacp {
active;
periodic fast;
}
}
unit 0 {
family ethernet-switching {
port-mode trunk;
vlan {
members TRUST;
}
108 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

}
}
}
lo0 {
unit 0 {
family inet {
address 10.2.255.10/32;
}
family iso {
address 49.0000.7777.7777.7777.00;
}
}
}
vlan {
unit 200 {
family inet {
address 10.2.0.253/24;
}
family iso;
}
unit 1000 {
family inet {
address 192.168.1.1/24;
}
family iso;
}
}
}
routing-options {
autonomous-system 1234;
forwarding-table {
export lb;
}
}
protocols {
bgp {
group SRX {
type external;
multihop;
local-address 10.2.255.10;
peer-as 4567;
multipath;
neighbor 10.3.255.11;
neighbor 10.3.255.12;
neighbor 10.3.255.13;
neighbor 10.3.255.14;
}
}
isis {
level 1 disable;
interface lo0.0 {
passive;
}
interface vlan.200 {
Appendix: Device Configurations 109

bfd-liveness-detection {
minimum-interval 300;
multiplier 3;
}
}
interface vlan.1000;
}
}
policy-options {
policy-statement lb {
then {
load-balance per-packet;
}
}
}
vlans {
DC {
vlan-id 1000;
l3-interface vlan.1000;
}
TRUST {
vlan-id 200;
l3-interface vlan.200;
}
}

EX8200-FBF

chassis {
aggregated-devices {
ethernet {
device-count 10;
}
}
}
interfaces {
xe-0/0/0 {
ether-options {
802.3ad ae0;
}
}
xe-0/0/1 {
ether-options {
802.3ad ae0;
}
}
xe-0/0/2 {
unit 0 {
family ethernet-switching {
port-mode access;
110 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

vlan {
members DC;
}
}
}
}
xe-0/0/3 {
unit 0 {
family ethernet-switching {
port-mode access;
vlan {
members DC;
}
}
}
}
ae0 {
aggregated-ether-options {
lacp {
active;
periodic fast;
}
}
unit 0 {
family ethernet-switching {
port-mode trunk;
vlan {
members TRUST;
}
}
}
}
lo0 {
unit 0 {
family inet {
address 10.2.255.10/32;
}
family iso {
address 49.0000.7777.7777.7777.00;
}
}
}
vlan {
unit 200 {
family inet {
address 10.2.0.253/24;
}
family iso;
}
unit 1000 {
family inet {
filter {
input distribute-default;
}
Appendix: Device Configurations 111

address 192.168.1.1/24;
}
family iso;
}
}
}
routing-options {
interface-routes {
rib-group inet SRX;
}
rib-groups {
SRX {
import-rib [ inet.0 SRX-1.inet.0 SRX-2.inet.0 SRX-3.
inet.0 SRX-4.inet.0 ];
}
}
autonomous-system 1234;
forwarding-table {
export lb;
}
}
protocols {
bgp {
group SRX {
type external;
multihop;
local-address 10.2.255.10;
peer-as 4567;
multipath;
neighbor 10.3.255.11;
neighbor 10.3.255.12;
neighbor 10.3.255.13;
neighbor 10.3.255.14;
}
}
isis {
level 1 disable;
interface lo0.0 {
passive;
}
interface vlan.200 {
bfd-liveness-detection {
minimum-interval 300;
multiplier 3;
}
}
interface vlan.1000;
}
}
policy-options {
policy-statement lb {
term SRX-1 {
from instance SRX-1;
then accept;
112 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

}
term SRX-2 {
from instance SRX-2;
then accept;
}
term SRX-3 {
from instance SRX-3;
then accept;
}
term SRX-4 {
from instance SRX-4;
then accept;
}
term master {
from instance master;
then {
load-balance per-packet;
}
}
}
}
firewall {
family inet {
filter distribute-default {
term SRX-1 {
from {
source-address {
192.168.1.0/26;
}
}
then {
count SRX-1;
routing-instance SRX-1;
}
}
term SRX-2 {
from {
source-address {
192.168.1.64/26;
}
}
then {
count SRX-2;
routing-instance SRX-2;
}
}
term SRX-3 {
from {
source-address {
192.168.1.128/26;
}
}
then {
count SRX-3;
Appendix: Device Configurations 113

routing-instance SRX-3;
}
}
term SRX-4 {
from {
source-address {
192.168.1.192/26;
}
}
then {
count SRX-4;
routing-instance SRX-4;
}
}
term else {
then {
count SRX-discard;
discard;
}
}
}
}
}
routing-instances {
SRX-1 {
instance-type virtual-router;
routing-options {
static {
route 0.0.0.0/0 {
qualified-next-hop 10.2.0.11 {
bfd-liveness-detection {
minimum-interval 300;
multiplier 3;
}
}
qualified-next-hop 10.2.0.12 {
metric 6;
bfd-liveness-detection {
minimum-interval 300;
multiplier 3;
}
}
}
}
}
}
SRX-2 {
instance-type virtual-router;
routing-options {
static {
route 0.0.0.0/0 {
qualified-next-hop 10.2.0.12 {
bfd-liveness-detection {
minimum-interval 300;
114 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

multiplier 3;
}
}
qualified-next-hop 10.2.0.11 {
metric 6;
bfd-liveness-detection {
minimum-interval 300;
multiplier 3;
}
}
}
}
}
}
SRX-3 {
instance-type virtual-router;
routing-options {
static {
route 0.0.0.0/0 {
qualified-next-hop 10.2.0.13 {
bfd-liveness-detection {
minimum-interval 300;
multiplier 3;
}
}
qualified-next-hop 10.2.0.14 {
metric 6;
bfd-liveness-detection {
minimum-interval 300;
multiplier 3;
}
}
}
}
}
}
SRX-4 {
instance-type virtual-router;
routing-options {
static {
route 0.0.0.0/0 {
qualified-next-hop 10.2.0.14 {
bfd-liveness-detection {
minimum-interval 300;
multiplier 3;
}
}
qualified-next-hop 10.2.0.13 {
metric 6;
bfd-liveness-detection {
minimum-interval 300;
multiplier 3;
}
}
Appendix: Device Configurations 115

}
}
}
}
}
vlans {
DC {
vlan-id 1000;
l3-interface vlan.1000;
}
TRUST {
vlan-id 200;
l3-interface vlan.200;
}
}
116 Day One: Scaling Beyond a Single Juniper SRX in the Data Center

What to Do Next & Where to Go

http://www.juniper.net/books
The books in the Juniper Networks Technical Library may assist you in
understanding and implementing network efficiency.

http://www.juniper.net/dayone
The Day One book series is available for free download in PDF
format. Select titles also feature a Copy and Paste edition for direct
placement of Junos configurations.

http://forums.juniper.net/jnet
The Juniper-sponsored J-Net Communities forum is dedicated to
sharing information, best practices, and questions about Juniper
products, technologies, and solutions. Register to participate in this
free forum.

http://www.juniper.net/techpubs/
Juniper Networks technical documentation includes everything you
need to understand and configure all aspects of Junos, including
MPLS. The documentation set is both comprehensive and thoroughly
reviewed by Juniper engineering.

http://www.juniper.net/training/fasttrack
Take courses online, on location, or at one of the partner training
centers around the world. The Juniper Network Technical Certifica-
tion Program (JNTCP) allows you to earn certifications by demon-
strating competence in configuration and troubleshooting of Juniper
products. If you want the fast track to earning your certifications in
enterprise routing, switching, or security use the available online
courses, student guides, and lab guides.