You are on page 1of 160

DATA CENTER NETWORK

CONNECTIVITY WITH IBM SERVERS


Network infrastructure scenario designs
and configurations

by Meiji Wang, Mohini Singh Dukes, George Rainovic,


Jitender Miglani and Vijay Kamisetty
Juniper Networks Validated Solutions

Data Center Network Connectivity


with IBM Servers

Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Chapter 2: Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Chapter 3: Implementation Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Chapter 4: Connecting IBM Servers in the Data Center Network . . . . . . . . . . . . 45

Chapter 5: Configuring Spanning Tree Protocols. . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Chapter 6: Supporting Multicast Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Chapter 7: Understanding Network CoS and Latency . . . . . . . . . . . . . . . . . . . . . . 105

Chapter 8: Configuring High Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

Appendix A: Configuring TCP/IP Networking in Servers. . . . . . . . . . . . . . . . . . . . 144

Appendix B: LAG Test Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

Appendix C: Acronyms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

Appendix D: References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158


ii Data Center Network Connectivity with IBM Servers

© 2010 by Juniper Networks, Inc. All rights reserved. Key Contributors


Juniper Networks, the Juniper Networks logo, Junos,
NetScreen, and ScreenOS are registered trademarks of Chandra Shekhar Pandey is a Juniper Networks Director
Juniper Networks, Inc. in the United States and other of Solutions Engineering. He is responsible for service
countries. Junos-e is a trademark of Juniper Networks, Inc. provider, enterprise and OEM partners’ solutions engineer-
All other trademarks, service marks, registered trademarks, ing and validation. Chandra has more than18 years of
or registered service marks are the property of their networking experience, designing ASICs, architecting
respective owners. systems and designing solutions to address customer’s
challenges in the service providers, MSO and enterprise
market. He holds a bachelor’s degree in Electronics
Juniper Networks assumes no responsibility for any Engineering from K.N.I.T, Sultanpur, India and a MBA in
inaccuracies in this document. Juniper Networks reserves High Tech and Finance from Northeastern University,
the right to change, modify, transfer, or otherwise revise Boston, MA.
this publication without notice. Products made or sold by Louise Apichell is a Juniper Networks Senior Technical
Juniper Networks or components thereof might be covered Writing Specialist in the Solutions Marketing Group. She
by one or more of the following patents that are owned by assisted as a content developer, chief editor and project
or licensed to Juniper Networks: U.S. Patent Nos. manager in organizing, writing and editing this book. Louise
5,473,599, 5,905,725, 5,909,440, 6,192,051, 6,333,650, specializes in writing and editing all types of technical
6,359,479, 6,406,312, 6,429,706, 6,459,579, 6,493,347, collateral, such as white papers, application notes,
6,538,518, 6,538,899, 6,552,918, 6,567,902, 6,578,186, and implementation guides, reference architectures and
6,590,785. solution briefs.
Ravinder Singh is a Juniper Networks Director of Solution
Printed in the USA by Vervante Corporation. Architecture and Technical Marketing in the Solutions
Marketing Group. He is responsible for creating technical
knowledge bases and has significant experience working
Version History: v1 June 2010 with sales engineers and channels to support Juniper’s
Cloud/Data Center Solutions for the enterprise, service
2 3 4 5 6 7 8 9 10
providers and key OEM alliances. Prior to this role, Ravinder
was responsible for Enterprise Solutions Architecture and
Engineering where his team delivered several Enterprise
Solutions including Adaptive Threat Management,
Distributed Enterprise and Juniper Simplified Data Center
solutions. Ravinder holds a bachelor’s and master’s degree
in Electronics and a master’s of business degree in IT
Management and Marketing.
Mike Barker is a Juniper Networks Technical Marketing
Director, Solutions Engineering and Architectures. In this
role, he focuses on developing architectures and validating
multi-product solutions that create business value for
enterprise and Service Provider customers. Prior to this
role, Mike served in various Consulting and Systems
Engineering roles for Federal, Enterprise and Service
Provider markets at Juniper Networks, Acorn Packet
Solutions and Arbor Networks. Earlier in his career, Mike
held Network Engineering positions at Cable & Wireless,
Stanford Telecom and the USAF. Mr. Barker holds a
Bachelors of Science Degree in Business Management
from Mount Olive College and a MBA from Mount St.
Mary’s University.
Karen Joice is a Juniper Networks Marketing Specialist
who provided the technical illustrations for this book.
Karen has been a graphic artist and marketing
professional for more than 15 years, specializing in
technical illustrations, Flash, and Web design, with
expertise in print production.
You can purchase a printed copy of this book, or download
a free PDF version of this book, at: juniper.net/books.
About the Authors iii

About the Authors


Meiji Wang is a Juniper Networks Solutions Architect for data center applications
and cloud computing. He specializes in application development, data center
infrastructure optimization, cloud computing, Software as a Service (SaaS), and
data center networking. He has authored three books focusing on databases,
e-business web usage and most recently data center network design Redbook
in partnership with the IBM team. IBM Redbooks | IBM j-type Data Center
Networking Introduction.

Mohini Singh Dukes is a Juniper Networks Staff Solutions Design Engineer in the
Solutions Engineering Group. She designs, implements and validates a wide range
of solutions in the mobile, Carrier Ethernet, data center interconnectivity and
security, business and residential services. Specializing in mobile networking
solutions including backhaul, packet backbone and security, she has authored a
number of whitepapers, application notes and implementation and design guides
based on solution validation efforts. She has also published a series of blogs on
energy efficient networking.

George Rainovic is a Juniper Networks Solutions Staff Engineer. He specializes in


designing technical solutions for data center networking and Video CDN. He
specializes in testing IBM J-Type Ethernet switches and routers. George has more
than 15 years of networking experience and IT, designing, deploying and supporting
networks for network service providers and business enterprise customers. He
holds a bachelor’s degree in Electrical Engineering from the University of
Novi Sad, Serbia.

Jitender K. Miglani is a Juniper Networks Solutions Engineer for data center intra
and inter connectivity solutions. As part of Juniper’s OEM relationship with IBM,
Jitender assists in qualifying Juniper’s EX, MX and SRX Series Platforms with IBM
Open System Platforms (Power P5/P6, Blade Center and x3500).Jitender has
development and engineering experience in various voice and data networking
products, and with small/medium/large enterprise and carrier grade customers.
Jitender holds a bachelor’s in Computer Science from the Regional Engineering
College, Kurukshetra, India.

Vijay K. Kamisetty is a Juniper Networks Solutions Engineer. He specializes in


technical solutions for IPTV-Multiplay, HD-Video Conference, mobile backhaul,
application level security in the data center, development of managed services, and
validation of adaptive clock recovery. He assists in qualifying Juniper EX and MX
platforms with IBM Power P5 and x3500 platforms. He holds a bachelor’s degree
in Computer Science from JNTU Hyderabad, India.
iv Data Center Network Connectivity with IBM Servers

Authors Acknowledgments
The authors would like to take this opportunity to thank Patrick Ames, whose
direction and guidance was indispensible. To Nathan Alger, Lionel Ruggeri, and
Zach Gibbs, who provided valuable technical feedback several times during the
development of this booklet, your assistance was greatly appreciated. Thanks
also to Cathy Gadecki for helping in the formative stages of the booklet. There are
certainly others who helped in many different ways and we thank you all.

And Special Thanks to our Reviewers...

Juniper Networks
Marc Bernstein

Venkata Achanta

Charles Goldberg

Scott Sneddon

John Bartlomiejczyk

Allen Kluender

Fraser Street

Robert Yee

Niraj Brahmbhatt

Paul Parker-Johnson

Travis O’Hare

Scott Robohn

Ting Zou

Krishnan Manjeri

IBM
Rakesh Sharma

Casimer DeCusatis
Preface v

Preface

ENTERPRISES DEPEND MORE THAN EVER BEFORE on their data center


infrastructure efficiency and business applications performance to improve
employee productivity, reduce operational costs and increase revenue. To achieve
these objectives, virtualization, simplification and consolidation are three of
the most crucial initiatives to the enterprise. These objectives not only demand
high performance server and network technologies, but also require smooth
integration between the two as well to achieve optimal performance. Hence,
successful integration of servers and simplified networking infrastructure is
pivotal.

This guide provides enterprise architects, sales engineers, IT developers, system


administrators and other technical professionals guidance on how to design and
implement a high-performance data center using Juniper Networks infrastructure
and IBM Open Systems. With a step-by-step approach, readers can grasp a
thorough understanding of design considerations, recommended designs,
technical details and sample configurations, exemplifying simplified data center
network design. This approach is based on testing performed using Juniper
Networks devices and IBM servers in Juniper Networks solution labs.

The IBM Open System Servers solution – including IBM Power systems,
System x, and Blade Center Systems – comprises the foundation for a dynamic
infrastructure. IBM server platforms help consolidate applications and servers,
and virtualize its system resources while improving overall performance,
availability and energy efficiency, providing a more flexible, dynamic IT
infrastructure.

Juniper Networks offers a unique best-in-class data center infrastructure solution


based on open standards. It optimizes performance and enables consolidation
which in turn increases network scalability and resilience, simplifies operations,
and streamlines management while lowering overall Total Cost of Ownership
(TCO). The solution also automates network infrastructure management, making
existing infrastructure easily adaptable and flexible, especially for third-party
application deployment.

Key topics discussed in this book focus on the following routing and switching
solutions in Juniper’s simplified two-tier data center network architecture with IBM
open systems.

• Best practices for integrating Juniper Networks EX and MX Series switches and
routers with IBM Open Systems.
• Configuration details for various spanning tree protocols such as Spanning Tree
Protocol (STP), Multiple Spanning Tree Protocol (MSTP), Rapid Spanning Tree
Protocol (RSTP), and Virtual Spanning Tree Protocol (VSTP); deployment
vi Data Center Network Connectivity with IBM Servers

scenarios such as RSTP/MSTP and Virtual Spanning Tree Protocol/Per-VLAN


Spanning Tree (VSTP/PVST) with Juniper EX and MX Series (switches and
routers) connecting to IBM Blade Center.
• Details for Layer 2 and Layer 3 multicast scenarios with Protocol Independent
Multicast (PIM) and Internet Group Management Protocol (IGMP) snooping.
Scenarios include video streaming client running on IBM servers with PIM
implemented on network access and core/aggregation tiers along with IGMP
snooping at the access layer.
• Low latency network design and techniques such as Class of Service (CoS) for
improving data center network performance.
• Methods for increasing data center resiliency and high-availability.
Configuration details for protocols such as Virtual Router Redundancy Protocol
(VRRP), Redundant Trunk Group (RTG), Link Aggregation (LAG), Routing
Engine Redundancy, virtual chassis, Nonstop Bridging (NSB), Nonstop Routing
(NSR), Graceful Restart (GR) and In-Service-Software-Upgrade (ISSU).
Juniper Networks realizes that the scope of data center network design
encompasses many facets, for example servers, storage and security. Therefore, to
narrow the scope of this book, we have focused on network connectivity
implementation details based on Juniper EX, MX Series switches and routers and
IBM Open Systems. However, as new relevant technologies and best practices
evolve, we will continue to revise this book to include additional topics.

Please make sure to send us your feedback with any new or relevant ideas that you
would like to see in future revisions of this book, or in other Validated Solutions
books, at: solutions-engineering@juniper.net.
7

Chapter 1

Introduction

Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Challenges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

IBM and Juniper Networks Data Center Solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

IBM and Juniper Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

TODAY’S DATA CENTER ARCHITECTS and designers do not have the luxury .
of simply adding more and more devices to solve networking’s constant and
continuous demands such as higher bandwidth requirements, increased speed,
rack space, tighter security, storage, interoperability among many types of devices
and applications, and more and more diverse and remote users.

This chapter discusses in detail the data center trends and challenges now facing
network designers. Juniper Networks and IBM directly address these trends and
challenges with a data center solution that will improve data center efficiency by
simplifying the network infrastructure, by reducing recurring maintenance and
software costs, and by streamlining daily management and maintenance tasks.
8 Data Center Network Connectivity with IBM Servers

Trends
Although there are several types of data centers for supporting a wide range of
applications such as financial, web portals content providers, and IT back office
operations, they all share certain trends, such as:

More Data Than Ever Before


Since the dawn of the computer age, many companies have struggled to store their
electronic records. That struggle can be greater than ever today, as regulatory
requirements can force some companies to save even more records than before.
The growth of the Internet may compound the problem; as businesses move online,
they need to store enormous amounts of data such as customer account
information and order histories. The total capacity of shipped storage systems is
soaring by more than 50 percent a year, according to market researcher IDC. The
only thing that is growing faster than the volume of data itself is the amount of data
that must be transferred between data centers and users. Numerous large
enterprises are consolidating their geographically distributed data centers into
mega data centers to take advantage of cost benefits and economies of scale,
increased reliability, and to exploit the latest virtualization technologies. According
to research conducted by Nemertes, more than 50 percent of companies
consolidated their dispersed data centers into fewer but larger data centers in the
last 12 months, with even more planning to consolidate in the upcoming 12 months.

Server Growth
Servers are continuing to grow at a high annual rate of 11 percent, while storage is
growing at an even higher rate of 22 percent: both of which are causing tremendous
strain on the data center’s power and cooling capacity. According to Gartner, OS and
application instability is increasing the server sprawl with utilization rates of 20
percent, leading to an increased adoption of server virtualization technologies.

Evolution of Cloud Services


Cloud computing is a style of computing in which dynamically scalable and often
virtualized resources are provided as a service over the Internet. Large enterprises
are adopting cloud-computing methodology into their mega data centers. Smaller
businesses that cannot afford to keep up with the cost and complexity of
maintaining their privately owned data centers may look to outsource those
functions to cloud-hosting providers.

Challenges
Today’s major data center challenges include scale and virtualization, complexity
and cost, and interconnectivity for business continuity, and security:

Scale and Virtualization


With the evolution of mega data centers and cloud-computing architectures,
tremendous strain is being placed on current network architectures. Scaling
networking and security functions can quickly become a limiting factor to the
success of growing data centers as they strive to meet stringent performance and
Chapter 1: Introduction to Data Center Network Connectivity 9

high-availability requirements. However, simply adding more equipment is not


always satiating the appetite of hungry mega data centers. If the network and
security architecture does not enable application workload mobility and quick
responses to variable capacity requirements to support multi-tenancy within
servers (as required in a cloud environment) then the full value of data center
virtualization cannot be realized.

Complexity and Cost


Many data centers have become overly complex, inefficient and costly. Networking
architectures have stagnated for over a decade, resulting in network device sprawl
and increasingly chaotic network infrastructures designed largely to work around
low-performance and low-density devices. The ensuing capital expenses, rack
space, power consumption and management overhead all add to the overall cost,
not to mention the environmental impact. Unfortunately, instead of containing
costs and reallocating the savings into enhancing and accelerating business
practices, the IT budget all too often is misappropriated into sustaining and rapidly
growing already unwieldy data center operations. Emerging applications that use
Service Oriented Architecture (SOA) and Web services are increasingly
computational and network intensive; however, the network is not efficient. Gartner
(2007) asserts that 50 percent of the Ethernet switch ports within the data center
are used for switch interconnectivity.

Interconnectivity for Business Continuity


As data centers expand, they can easily outgrow a single location. When this occurs,
enterprises may have to open new centers and transparently interconnect these
locations so they can interoperate and appear as one large data center. Enterprises
with geographically distributed data centers may want to “virtually” consolidate
them into a single, logical data center in order to take advantage of the latest
technology.

Security
The shared infrastructure in the data center or cloud should support multiple
customers, each with multiple hosted applications, provide complete, granular and
virtualized security that is easy to configure and understand, and support all major
operating systems on a plethora of mobile and desktop devices. In addition, a
shared infrastructure should integrate seamlessly with existing identity systems,
check host posture before allowing access to the cloud, and make all of this
accessible for thousands of users, while protecting against sophisticated
application attacks, Distributed Denial of Service (DDoS) attacks and hackers.

Today, a data center infrastructure solution requires a dynamic infrastructure, a high


performance network and a comprehensive network management system.

IBM and Juniper Networks Data Center Solution


The IBM Servers solution, including IBM Power systems, System x and BladeCenter
Systems comprise the foundation for a dynamic infrastructure.
10 Data Center Network Connectivity with IBM Servers

IBM Power System


The IBM Power™ Systems family of servers includes proven server platforms that
help consolidate applications and servers, virtualize its system resources while
improving overall performance, availability and energy efficiency, and providing a
more flexible, dynamic IT infrastructure. A Power server can run up to 254
independent servers – each with its own processor, memory and I/O resources
within a single physical Power server. Processor resources can be assigned at a
granularity of 1/100th of core.

IBM System x
The IBM System x3850 X5 server is the fifth generation of the Enterprise
X-Architecture, delivering innovation with enhanced reliability and availability
features to enable optimal performance for databases, enterprise applications and
virtualized environments. According to a recent IBM Redbooks paper, a single IBM
System x3850 X5 host server can support up to 384. For details, please refer to High
density virtualization using the IBM system x3850 X5 at www.redbooks.ibm.com/
technotes/tips0770.pdf.

IBM BladeCenter
The BladeCenter is built on IBM X-Architecture to run multiple business-critical
applications with simplification, cost reduction and improved productivity.
Compared to first generation Xeon-based blade servers, IBM BladeCenter HS22
blade servers can help improve the economics of your data center with:

• Up to 11 times faster performance


• Up to 90 percent reduction in energy costs alone
• Up to 95 percent IT footprint reduction
• Up to 65 percent less in connectivity costs
• Up to 84 percent fewer cables
For detailed benefits concerning the IBM BladeCenter, please refer to
www-03.ibm.com/systems/migratetoibm/systems/bladecenter/.

Juniper Network Products for a High Performance Network Infrastructure Solution


Juniper Networks data center infrastructure solutions provide operational simplicity,
agility and efficiency to simplify the network with the following key technologies:
• Virtual Chassis technology, combined with wire-rate 10-Gigabit Ethernet
performance in the Juniper Networks EX Series Ethernet Switches, reduces the
number of networking devices and interconnections. This effectively eliminates
the need for an aggregation tier—contributing to a significant reduction of
capital equipment cost and network operational costs, improved application
performance, and faster time to deploy new servers and applications.
• Dynamic Services Architecture in the Juniper Networks SRX Series Services
Gateways consolidates security appliances with distinct functions into a highly
integrated, multifunction platform that results in simpler network designs,
improved application performance, and a reduction of space, power, and
cooling requirements.
Chapter 1: Introduction to Data Center Network Connectivity 11

• Network virtualization with MPLS in the Juniper Networks MX Series 3D


Universal Edge Routers and the Juniper Networks M Series Multiservice Edge
Routers enables network segmentation across data centers and to remote
offices for applications and departments without the need to build separate or
overlay networks.
• Juniper Networks Junos® operating system operates across the network
infrastructure, providing one operating system, enhanced through a single
release train, and developed upon a common modular architecture—giving
enterprises a “1-1-1” advantage.
• J-Care Technical Services provide automated incident management and
proactive analysis assistance through the Advanced Insight Solutions
technology resident in Junos OS.

MX Series 3D Universal Edge Routers


The Juniper Networks MX Series 3D Universal Edge Routers are a family of high-
performance Ethernet routers with powerful switching features designed for
enterprise and service provider networks. The MX Series provides unmatched
flexibility and reliability to support advanced services and applications. It addresses
a wide range of deployments, architectures, port densities and interfaces. High-
performance enterprise networks typically deploy MX Series routers in high-density
Ethernet LAN and data center aggregation, and the data center core.

The MX Series provides carrier grade reliability, density, performance, capacity and
scale for enterprise networks with mission critical applications. High availability
features such as nonstop routing (NSR), fast reroute, and unified in service software
upgrade (ISSU) ensure that the network is always up and running. The MX Series
delivers significant operational efficiencies enabled by Junos OS, and supports a
collapsed architecture requiring less power, cooling and space consumption. The
MX Series also provides open APIs for easily customized applications and services.

The MX Series enables enterprise networks to profit from the tremendous growth
of Ethernet transport with the confidence that the platforms they install now will
have the performance and service flexibility to meet the challenges of their evolving
requirements.

The MX Series 3D Universal Edge Routers include the MX80 and MX80-48T,
MX240, MX480 and MX960.Their common key features include:

• 256K multicast groups


• 1M MAC address and V4 routes
• 6K L3VPN and 4K VPLS instances
• Broadband services router
• IPsec
• Session boarder controller
• Video quality monitoring
As a member of the MX Series, the MX960 is a high density Layer 2 and Layer 3
Ethernet platform with up to 2.6 Tbps of switching and routing capacity, and is the
industry’s first 16-port 10GbE card. It is optimized for emerging Ethernet network
12 Data Center Network Connectivity with IBM Servers

architectures and services that require high availability, advanced QoS, and
performance and scalability that support mission critical networks. The MX960
platform is ideal where SCB and Routing Engine redundancy are required. All major
components are field-replaceable, increasing system serviceability and reliability,
and decreasing mean time to repair (MTTR). The enterprise customers typically
deploy MX960 or MX480 in their data center core.

NOTE We deployed the MX480 in this handbook. However, the configurations and
discussions pertaining to the MX480 also apply to the entire MX product line.

EX Series Ethernet Switches


As a member of the EX Series Ethernet Switches, the EX4200 Ethernet switches
with virtual chassis technology and the EX8200 modular chassis switches are
commonly deployed in the enterprise data center. We used the EX4200 and
EX8200 for most of our deployment scenarios.

EX4200 Ethernet Switches with Virtual Chassis Technology


The EX4200 line of Ethernet switches with Virtual Chassis technology combine
the HA and carrier class reliability of modular systems with the economics and
flexibility of stackable platforms, delivering a high-performance, scalable solution
for data center, campus, and branch office environments.

The EX4200 Ethernet switches with virtual chassis technology have the following
major features:

• Deliver high availability, performance and manageability of chassis-based


switches in a compact, power-efficient form factor.
• Offer the same connectivity, Power over Ethernet (PoE) and Junos OS options
as the EX3200 switches, with an additional 24-port fiber-based platform for
Gigabit aggregation deployments.
• Enable up to 10 EX4200 switches (with Virtual Chassis technology) to be
interconnected as a single logical device supporting up to 480 ports.
• Provide redundant, hot-swappable, load-sharing power supplies that reduce
mean time to repair (MTTR), while Graceful Route Engine Switchover (GRES)
ensures hitless forwarding in the unlikely event of a switch failure.
• Run the same modular fault-tolerant Junos OS as other EX Series switches and
all Juniper routers.

EX8200 Modular Chassis Switches


The EX8200 Modular chassis switches have the following major features:

• High-performance 8-slot (EX8208) and 16-slot (EX8216) switches support


data center and campus LAN core and aggregation layer deployments.
• Scalable switch fabric delivers up to 320 Gbps per slot 48-port
10/100/1000BASE-T and 100BASE-FX/1000BASE-X line cards support up to
384 (EX8208) or 768 (EX8216) GbE ports per chassis.
Chapter 1: Introduction to Data Center Network Connectivity 13

• 48-port 100/1000BASE-T and 100BASE-FX/100BASE-X line cards support up


to 384 (EX8208) or 768 (EX8216) GbE ports per chassis.
• 8-port 10GBASE-X line cards with SFP+ interfaces deliver up to 64 (EX8208)
or 128 (EX8216) 10-GbE ports per chassis.
• Carrier-class architecture includes redundant internal Routing Engines, switch
fabrics, and power and cooling, all ensuring uninterrupted forwarding and
maximum availability.
• Run the same modular fault-tolerant Junos OS as other EX Series switches and
all Juniper routers.
Juniper Networks high-performance data center network architecture reduces cost
and complexity by requiring fewer tiers of switching, and consolidating security
services, a common operating system, and one extensible model for network
management. As shown in the Figure 1.1, the Junos OS runs many data center
network switching, routing and security platforms, including Juniper Networks EX
Series, Juniper Networks MX Series, and Juniper Networks SRX Series, and IBM
j-type data center network products – Juniper Networks original equipment
manufacturer (OEM) for the EX and MX Series. For details concerning product
mapping between IBM and Juniper Networks products, see Table 1.1 at the end of
this chapter or visit the website, IBM and Junos in the Data Center: A Partnership
Made for Now, at https://simplifymydatacenter.com/ibm .

T Series EX8216

Junos Space

Junos Pulse
EX8208 NSM
SRX5000 Line

NSMXpress

SRX3000 Line

MX Series
SRX650 SRX240 EX4200 Line

M Series EX3200 Line


SRX100 SRX210 J Series EX2200 Line

SECURITY ROUTERS SWITCHES

Figure 1.1 Junos Operating System Runs on the Entire Data Center Network: Security, Routers, and
Switching Platforms
14

Figure 1.2 Data Center and Cloud Architecture

REMOTE/CLOUD USER

SSL VPN

PUBLIC CLOUD

EX8208
TELEWORKER
EX4200 EX8216
MX240
MX480
IBM System z SRX100
EX4200 MX960

EX4200
IBM System p
MX240
MX480
EX4200 MX960
EX8208
Virtual
IBM System x EX8216
Chassis
and BladeCenter

IC6500
SBR Appliance
Unified Access SRX5600
Control SRX5800

SSL VPN IPsec VPN

MPLS/VPLS MPLS/VPLS

WAN NETWORK

SECURITY NETWORK MANAGEMENT


MX240 MX240
MX480 MX480
MX960 MX960
NSM/Junos Space
WXC3400 WXC3400
SRX5600
SRX5800 STRM Series

IC6500 EX8208 EX8208


Unified Access EX8216 EX8216
SBR Appliance
Control

SA6500 NetView
EX4200 EX4200 EX4200 EX4200
Virtual Chassis
Federated Network Netcool
Identity Manager
Manager
IBM IBM IBM Blade
System z System p System x Center
Provisioning
Access Manager Fibre Channel Manager
Ethernet SERVER
FC iSCSI NFS/CIFS
SAN NAS File Systems
STORAGE

ENTERPRISE OWNED DATA CENTER – LOCATION 1


15

Figure 1.2 Data Center and Cloud Architecture (cont.)

LARGE BRANCH
Kiosk

Virtual Chassis

EX4200 EX4200
Tivoli Storage
Manager Fastrack
(TSMF)

SRX650 WXC2600
HEADQUARTERS

SMALL BRANCH IC4500

SRX3600 MX240 EX4200


MX480
MX960

SRX100 EX8208, EX8216


SRX210 EX4200
SRX240

Tivoli Storage
Manager Fastrack
(TSMF)
WXC2600 EX4200
Virtual
Chassis
EX8208, EX8216

SSL VPN IPsec VPN

MPLS/VPLS

WAN NETWORK

SECURITY NETWORK MANAGEMENT


MX240 MX240
MX480 MX480
MX960 MX960
NSM/Junos Space
WXC3400 WXC3400
SRX5600
SRX5800 STRM Series

IC6500 EX8208 EX8208


Unified Access EX8216 EX8216
SBR Appliance
Control

SA6500 NetView
EX4200 EX4200 EX4200 EX4200
Virtual Chassis
Federated Network Netcool
Identity Manager
Manager
IBM IBM IBM Blade
System z System p System x Center
Provisioning
Access Manager Fibre Channel Manager
Ethernet SERVER
FC iSCSI NFS/CIFS
SAN NAS File Systems
STORAGE

ENTERPRISE OWNED DATA CENTER – LOCATION 2


16 Data Center Network Connectivity with IBM Servers

IBM and Juniper Networks Data Center and Cloud Architecture


As shown in Figure 1.2 (divided into two sections on pages 14-15), the sample data
center and cloud architecture deploy IBM Servers, IBM software and Juniper
Networks data center network products. Juniper is the OEM for IBM j-type e-series
switches and m-series routers (EX Series and MX Series). For details concerning
product mapping between IBM and Juniper Networks products, see Table 1.1.

IBM Tivoli and Juniper Networks Junos Space for Comprehensive Network
Management Solution
Managing the data center network often requires many tools from different
vendors, as the typical network infrastructure often is a complex meshed network
deployment. This type of network deployment combines different network
topologies and often includes devices from multiple vendors and network
technologies for delivery. IBM Tivoli products and Juniper Networks Junos Space
together can manage data center networks effectively and comprehensively. The
tools include:

• IBM System Director


• Tivoli Netcool/OMNIbus
• IBM Tivoli Provisioning Manger
• Junos Space Network Application Platform
• Juniper Networks Junos Space Ethernet Activator
• Juniper Networks Junos Space Security Designer
• Juniper Networks Junos Space Route Insight Manager
• Juniper Networks Junos Space Service Now

MORE For the latest IBM and Juniper Networks data center solution, visit http://www.
juniper.net/us/en/company/partners/global/ibm/#dynamic.

IBM and Juniper Networks


The collaboration between IBM and Juniper Networks began a decade ago. In
November of 1997, IBM provided custom Application Specific Integrated Circuits
(ASICs) for Juniper Networks new class of Internet backbone devices as part of a
strategic technology relationship between the two companies.

Since 2007, the two companies have been working together on joint technology
solutions, standards development and network management and managed
security services. IBM specifically included Juniper Networks switching, routing, and
security products into their data center network portfolio, with IBM playing an
invaluable role as systems integrator.

Most recently, the two companies jointly collaborated on a global technology


demonstration highlighting how enterprises can seamlessly extend their private
data center clouds. The demonstration between Silicon Valley and Shanghai
showed a use case where customers could take advantage of remote servers in a
Chapter 1: Introduction to Data Center Network Connectivity 17

secure public cloud to ensure that high priority applications are given preference
over lower priority ones when computing resources become constrained. IBM and
Juniper are installing these advanced networking capabilities into IBM’s nine
worldwide Cloud Labs for customer engagements. Once these advanced
networking capabilities are installed in the nine worldwide Cloud Labs, IBM and
Juniper will be able to seamlessly moveclient-computing workloads between
private and publicly managed cloud environments, enabling customers to deliver
reliably on service-level agreements (SLAs).

In July of 2009, Juniper and IBM continued to broaden their strategic relationship by
entering into an OEM agreement that enables IBM to provide Juniper’s Ethernet
networking products and support within IBM’s data center portfolio. The addition of
Juniper’s products to IBM’s data center networking portfolio provides customers
with a best-in-class networking solution and accelerates the shared vision of both
companies for advancing the economics of networking and the data center by
reducing costs, improving services and managing risk.

IBM j-type Data Center Products and Juniper Networks Products Cross Reference
The IBM j-type e-series Ethernet switches and m-series Ethernet routers use
Juniper Networks technology. Table 1.1 shows the mapping of IBM switches and
routers to their corresponding Juniper Networks model. For further information
concerning product information, please visit the website, IBM and Junos in the Data
Center: A Partnership Made For Now, at https://simplifymydatacenter.com/ibm.

Table 1.1 Mapping of IBM j-type Data Center Network Products to Juniper Networks Products

IBM Machine Type and Juniper Networks


IBM Description
Model Model

IBM j-type e-series Ethernet Switch J48E 4273-E48 EX4200

IBM j-type e-series Ethernet Switch J08E 4274-E08 EX8208

IBM j-type e-series Ethernet Switch J16E 4274-E16 EX8216

IBM j-type m-series Ethernet Router J02M 4274-M02 MX240

IBM j-type m-series Ethernet Router J06M 4274-M06 MX480

IBM j-type m-series Ethernet Router J11M 4274-M11 MX960

IBM j-type s-series Ethernet Appliance J34S 4274-S34 SRX3400

IBM j-type s-series Ethernet Appliance J36S 4274-S36 SRX3600

IBM j-type s-series Ethernet Appliance J56S 4274-S56 SRX5600

IBM j-type s-series Ethernet Appliance J58S 4274-S58 SRX5800


19

Chapter 2

Design Considerations

Network Reference Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Design Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Two-Tier Network Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

THIS CHAPTER FOCUSES ON Juniper Networks data center network reference


architecture. It presents technical considerations for designing a modern day data
center network that must support consolidated and centralized server and
storage infrastructure, as well as enterprise applications.
20 Data Center Network Connectivity with IBM Servers

Network Reference Architecture


The data center network is realigning itself to meet new global demands by
providing better efficiency, higher performance and new capabilities. Today’s data
center network can:

• Maximize efficiency gains from technologies such as server virtualization.


• Provide required components with improved capabilities – security,
performance acceleration, high density and resilient switching, and high
performance routing.
• Use virtualization capabilities such as MPLS and virtual private LAN service
(VPLS) to enable a flexible, high-performance data center backbone network
between data centers.
The evolving networking demands a new network reference architecture, which can
sustain application performance, meet the demands of customer growth, reinforce
security compliance, reduce operational costs and adopt innovation technologies.

As shown in Figure 2.1, Juniper Networks data center network reference architecture
consists of the following four tiers:

• Edge Services Tier – provides all WAN services at the edge of data center
networks; connects to the WAN services in other locations, including other data
centers, campuses, headquarters, branches, carrier service providers, managed
service providers and even cloud service providers.
• Core Network Tier – acts as the data center network backbone, which
interconnects other tiers within the data center and can connect to the core
network tier in other data centers.
• Network Services Tier – provides centralized network security and application
services, including firewall, Intrusion Detection and Prevention (IDP) and server
load balancing.
• Applications and Data Services Tier – connects mainly servers and storage in
the LAN environment and acts as an uplink to the core network tier.
The subsequent sections in this chapter explain each network tier in detail.
Chapter 2: Design Considerations 21

PRIVATE WAN INTERNET

EDGE SERVICES

M Series M Series
Server Internet
WAN Security Access
Acceleration Gateway Gateway

VPN
WX Series/ SRX Series SRX Series SRX Series
Termination
WXC Series Gateway

NETWORK SERVICES CORE NETWORK

Intrusion
Detection and
Prevention
IDP Series EX4200
or

Core
Secure Access
Aggregation
(SSL)
SA Series Router
SRX Series MX Series EX8200
Core Firewall

APPLICATIONS
AND DATA
SERVICES
EX4200 EX4200 EX4200 EX4200

IP Storage Internal Business Infrastructure


Network Application Application Network
Network Network

Figure 2. 1 Juniper Network Data Center Network Reference Architecture


22 Data Center Network Connectivity with IBM Servers

Edge Services Tier


The edge services tier is responsible for all connectivity and network level security
aspects to connect the data center to the outside world, including other data
centers, campuses, headquarters, branches, carrier service providers, managed
service providers, or even cloud service providers. Typically, routers and firewall/
VPNs reside in this tier. It is likely that the data center connects to various leased
lines connecting to partners, branch offices and to the Internet. When connecting all
of these networks, it is important to plan for the following:

• Internet routing isolation, for example separating the exterior routing protocols
from the interior routing protocols.
• Network Address Translation (NAT) to convert your private IP addresses to
public Internet routable IP addresses.
• IPSec VPN tunnel termination for partner, branch and employee connections.
• Border security to enforce stateful firewall policies and content inspection.
• Quality of Service (QoS).

Core Network Tier


The core network acts as the backbone of the data center network, which
interconnects others tiers within data centers and can connect to the core network
tier in other data centers as well. It connects the network services tier and
aggregates uplink connections for the applications and data services tier. This tier
consolidates the functionality of the core and aggregation tiers in traditional
three-tier network architecture, thereby significantly reducing the number of
devices.

Combining the traditional three-tier core and aggregation tiers into a single
consolidated core provides other benefits such as:

• Significant power savings


• Reduced facilities system footprint
• Simplified device management
• Tighter security control
• Reduced number of system failure points

Network Services Tier


The network services tier provides centralized network security and application
services, including firewall, IDP, server load balancing, SSIL offload, HTTP cache,
TCP multiplex, and global server load balancing (GSLB). This tier typically connects
directly to the core network tier, resulting in low latency and high throughput.

This tier is responsible for handling service policies for any network, servers and/or
application. Because network service is centralized, it must provide service to all
servers and applications within the data center; it should apply a network-specific
policy to a particular network or apply an application-specific policy to set of
Chapter 2: Design Considerations 23

servers associated to particular applications. For example, a security service, such


as traffic SYN checking/sequence number checking, must apply to any server that is
exposed to public networks.

The network services tier requires:

• High performance devices, for example, high performance firewalls to process


traffic associated with large numbers of endpoints, such as networks, servers
and applications.
• Virtualization capabilities, such as virtual instance to secure many,
simultaneous logical services.

Applications and Data Services Tier


The applications and data services tier (also known as the access tier) is primarily
responsible for connecting servers and storage in the LAN environment and acts as
an uplink to the core network tier. It includes the access tier in a data network and
storage network. This tier supports interoperability with server connections and
high throughput network interconnections. When the number of servers increases,
the network topology remains agile and can scale seamlessly. Based on different
business objectives and IT requirements, the application and data services tier can
have many networks, including:

• External applications networks, which can have multiple external networks that
serve separate network segments. These typically include applications such as
the public Web, public mail transfer agent (MTA), Domain Name System (DNS)
services and remote access and potential file services that are available
through unfiltered access.
• Internal applications networks, which can have multiple internal networks
serving different levels of internal access from campus or branch locations.
These networks typically connect internal applications such as finance and
human resources systems. Also residing in the internal network are partner
applications and or any specific applications that are exposed to partners such
as inventory systems and manufacturing information.
• Infrastructure services networks, which provide secure infrastructure network
connections between servers and their supporting infrastructure services, such
as Lightweight Directory Access Protocol (LDAP), databases, file sharing,
content management and middleware servers. Out of Band Management is
also a part of this network.
• Storage networks, which provide remote storage to servers using different
standards, such as Fibre Channel, InfiniBand or Internet Small Computer
System Interface (iSCSI). Many mission critical application servers typically use
the Bus Adapter (HBA) to connect to a remote storage system, ensuring fast
access to data. However, large numbers of servers use iSCSI to access remote
storage systems by using the TCP/IP network for simplicity and cost efficiency.
24 Data Center Network Connectivity with IBM Servers

Design Considerations
The following key design considerations are critical attributes for designing today’s
data center network architecture:

• High availability and disaster recovery


• Security
• Simplicity
• Performance
• Innovation

NOTE The design considerations discussed in this handbook are not necessarily specific
to Juniper Networks solutions and can be applied universally to any data center
network design, regardless of vendor selection.

High Availability and Disaster Recovery


From the perspective of a data center network designer, high availability and
disaster recovery are key requirements and must be considered not only in light of
what is happening within the data center, but also from across multiple data
centers. Network high availability should be deployed by using a combination of link
redundancy (both external and internal connectivity) and critical device
redundancy to ensure network operations and business continuity. In addition,
using site redundancy (multiple data centers) is critical to meeting disaster recovery
and regulatory compliance objectives. Moreover, devices and systems deployed
within the confines of the data center should support component-level high
availability, such as redundant power supplies, fans and routing engines. Another
important consideration is the software/firmware running on these devices, which
should be based on a modular architecture that provides features such as ISSU
features in the MX Series to prevent software failures and upgrade events from
impacting the entire device. Software upgrades should only impact a particular
module, thereby ensuring system availability.

Security
The critical resources in any enterprise location are typically the applications
themselves, and the servers and supporting systems such as storage and
databases. Financial, human resources, and manufacturing applications with
supporting data typically represent a company’s most critical assets and, if
compromised, can create a potential disaster for even the most stable enterprise.
The core network security layers must protect these business critical resources
from unauthorized user access and attacks, including application-level attacks.

The security design must employ layers of protection from the network edge
through the core to the various endpoints, such as, for example, defense in depth. A
layered security solution protects critical network resources that reside on the
network. If one layer fails, the next layer will stop the attack and/or limit the
damages that can occur. This level of security allows IT departments to apply the
appropriate level of resource protection to the various network entry points based
upon their different security, performance and management requirements.
Chapter 2: Design Considerations 25

Layers of security that should be deployed at the data center include the following:

• DoS protection at the edge


• Firewalls to tightly control who and what gets in and out of the network
• VPN to provide secure remote access
• Intrusion Prevention System (IPS) solutions to prevent a more generic set of
application layer attacks.
Further, application-layer firewalls and gateways also play a key role in protecting
specific application traffic such as XML.

For further details, refer to the National Institute of Science and Technology (NIST)
recommended best practices, as described in Guide to General Server Security
Recommendations of the National Institute of Standards and Technology at
http://csrc.nist.gov/publications/nistpubs/800-123/SP800-123.pdf .

Policy-based networking is a powerful concept that enables devices in the network


to be managed efficiently, especially within virtualized configurations, and can
provide granular levels of network access control. The policy and control
capabilities should allow organizations to centralize policy management while
offering distributed enforcement at the same time. The network policy and control
solution should provide appropriate levels of access control, policy creation as well
as management and network and service management – ensuring secure and
reliable networks for all applications. In addition, the data center network
infrastructure should integrate easily into a customer’s existing management
frameworks and third-party tools, such as Tivoli, and provide best-in-class
centralized management, monitoring and reporting services for network services
and the infrastructure.

Simplicity
Simplicity can be achieved by adopting new architectural designs, new
technologies, and network operating systems.

The two-tier network architecture is a new design that allows network


administrators to simplify the data center infrastructure. Traditionally, data center
networks were constructed using a three-tier design approach, resulting in access,
aggregation and core layers. A large number of devices must be deployed,
configured and managed within each of these tiers, increasing cost and complexity.

This is primarily because of scalability requirements, performance limitations and


key feature deficiencies in traditional switches and routers. Juniper Networks
products support a data center network design that requires fewer devices,
interconnections and network tiers. Moreover, the design also enables the following
key benefits:

• Reduced latency due to fewer device hops


• Simplified device management
• Significant power, cooling and space savings
• Fewer system failure points.
26 Data Center Network Connectivity with IBM Servers

Figure 2.2 shows data center network design trends from a traditional data center
network, to a network consisting of a virtualized access tier and collapsed
aggregate and core tiers, to a network with improved network virtualization on
the WAN.

Traditional Data Center Virtualized Access, Integrated WAN Interface with


Network Design Consolidated Core/Aggregation MPLS-Enabled Core/Aggregation
WAN Gateway
SRX5800 WAN SRX5800
Tier 3: SRX5600 SRX5600
Core

Tier 2:
MX480 MX480
Aggregation
EX8208 EX8208

EX4200 Virtual Chassis EX4200 Virtual Chassis


Tier 1:
Access

Servers

• Multiple L2/L3 switches • Up to 10 EX4200 Ethernet • Collapsed aggregation and


at aggregation Switches can be managed core layer
• Multiple L2 access as single device with Virtual • MPLS capable core with
switches to be managed Chassis technology MX240, MX480 and MX960
• Multiple layers in the • High-performance L2/L3 Ethernet Routers
network collapsed core/aggregation • WAN interface available on
with EX8208 and EX8216 MX240, MX480 and MX960
Ethernet Switches reduce Ethernet Routers
number of devices

Figure 2. 2 Data Center Network Design Trends

Converged I/O technology is a new technology that simplifies the data center
infrastructure by supporting flexible storage and data access on the same network
interfaces on the server side, and by consolidating storage area networks (SANs
and LANs) into a single logical infrastructure. This simplification and consolidation
makes it possible to allocate dynamically any resource – including routing,
switching, security services, storage systems, appliances and servers – without
compromising performance.

Keeping in mind that network devices are complex, designing an efficient hardware
platform is not, by itself, sufficient in achieving an effective, cost-efficient and
operationally tenable product. Software in the control plane plays a critical role in
the development of features and in ensuring device usability. Because Junos is a
proven modular software network operating system that runs across different
platforms, implementing Junos is one of the best approaches to simplifying the
daily operations of the data center network.

In a recent study titled, The Total Economic Impact™ of Juniper Networks Junos
Network Operating System, Forrester Consulting reported a 41 percent reduction in
overall network operational costs based on dollar savings across specific task
Chapter 2: Design Considerations 27

categories, including planned events, reduction in frequency and duration of


unplanned network events, the sum of planned and unplanned events, the time
needed to resolve unplanned network events, and the “adding infrastructure” task.

As the foundation of any high performance network, Junos exhibits the following
key attributes as illustrated in Figure 2.3:

• One operating system with a single source base and a single consistent feature
implementation.
• One software release train extended through a highly disciplined and firmly
scheduled development process.
• One common modular software architecture that stretches across many
different Junos hardware platforms for many different Junos hardware
platforms, including MX Series, EX Series, and SRX Series.

S ECU R I T Y

ONE Module

— API —
OS
N AG E M E N T

RO U T I N G
ONE
Architecture
MA

Frequent Releases

9.6 10.0 10.1


ONE
Release Track

SW I T
CHING

Figure 2.3 Junos: A 1-1-1 Advantage

Performance
To address performance requirements related to server virtualization, centralization
and data center consolidation, the data center network should boost the
performance of all application traffic, whether local or remote. The data center
should offer LAN-like user experience levels for all enterprise users irrespective of
their physical location. To accomplish this, the data center network should optimize
applications, servers, storage and network performance.
28 Data Center Network Connectivity with IBM Servers

WAN optimization techniques that include data compression, TCP and application
protocol acceleration, bandwidth allocation, and traffic prioritization improve
performance network traffic. In addition, these techniques can be applied to data
replication, and to backup and restoration between data centers and remote sites,
including disaster recovery sites.

Within the data center, Application Front Ends (AFEs) and load balancing solutions
boost the performance of both client-server and Web-based applications, as well
as speeding Web page downloads. In addition, designers must consider offloading
CPU-intensive functions, such as TCP connection processing and HTTP
compression, from backend applications and Web servers.

Beyond application acceleration, critical infrastructure components such as


routers, switches, firewalls, remote access platforms and other security devices
should be built on non-blocking modular architecture, so that they have the
performance characteristics necessary to handle the higher volumes of mixed
traffic types associated with centralization and consolidation. Designers also
should account for remote users.

Juniper Networks innovative silicon chipset and the virtualization technologies


deliver a unique high performance data center solution.
• Junos Trio represents Juniper’s fourth generation of purpose-built silicon and is
the industry’s first “network instruction set” – a new silicon architecture unlike
traditional ASICs and network processing units (NPUs). The new architecture
leverages customized “network instructions” that are designed into silicon to
maximize performance and functionality, while working closely with Junos
software to ensure programmability of network resources. The new Junos One
family thus combines the performance benefits of ASICs and the flexibility of
network processors to break the standard trade-offs between the two.
Built in 65-nanometer technology, Junos Trio includes four chips with a total of 1.5
billion transistors and 320 simultaneous processes, yielding total router throughput
up to 2.6 terabits per second and up to 2.3 million subscribers per rack – far
exceeding the performance and scale possible through off-the-shelf silicon. Junos
Trio includes advanced forwarding, queuing, scheduling, synchronization and
end-to-end resiliency features, helping customers provide service-level guarantees
for voice, video and data delivery. Junos Trio also incorporates significant power
efficiency features to enable more environmentally conscious data center and
service provider networks.

Junos Trio chipset with revolutionary 3D Scaling technology enables networks to


scale dynamically for more bandwidth, subscribers and services – all at the same
time without compromise. Junos Trio also yields breakthroughs for delivering rich
business, residential and mobile services at massive scale — all while using half as
much power per gigabit. The new chipset includes more than 30 patent-pending
innovations in silicon architecture, packet processing, QoS and energy efficiency.
• The Juniper Networks data center network architecture employs a mix of
virtualization technologies – such as Virtual Chassis technology with VLANs
and MPLS-based advanced traffic engineering, VPN enhanced security, QoS,
VPLS, and other virtualization services. These virtualization technologies
address many of the challenges introduced by server, storage and application
virtualization. For example, Virtual Chassis supports low-latency server live
migration from server to server in completely different racks within a data
Chapter 2: Design Considerations 29

center, and from server to server between data centers in a flat Layer 2 network,
when these data centers are within reasonably close proximity. Virtual Chassis
with MPLS allows the Layer 2 domain to extend across data centers to support
live migration from server to server when data centers are distributed over
significant distances.
Juniper Networks virtualization technologies support low latency, throughput, QoS
and high availability required by server and storage virtualization. MPLS-based
virtualization addresses these requirements with advanced traffic engineering to
provide bandwidth guarantees, label switching and intelligent path selection for
optimized low latency and fast reroute for extreme high availability across the WAN.
MPLS-based VPNs enhance security with QoS to efficiently meet application and
user performance needs.

These virtualization technologies serve to improve efficiencies and performance


with greater agility while simplifying operations. For example, acquisitions and new
networks can be folded quickly into the existing MPLS-based infrastructure without
reconfiguring the network to avoid IP address conflicts. This approach creates a
highly flexible and efficient data center WAN.

Innovation
Innovation, for example green initiatives, influences data center design. A green
data center is a repository for the storage, management and dissemination of data
in which the mechanical, lighting, electrical and computer systems provide
maximum energy efficiency with minimum environmental impact. As older data
center facilities are upgraded and newer data centers are built, it is important to
ensure that the data center network infrastructure is highly energy and space
efficient.

Network designers should consider power, space and cooling requirements for all
network components, and they should compare different architectures and
systems so that they can ascertain the environmental and cost impacts across the
entire data center. In some environments, it might be more efficient to implement
high-end, highly scalable systems that can replace a large number of smaller
components, thereby promoting energy and space efficiency.

Green initiatives that track resource usage, carbon emissions and efficient
utilization of resources, such as power and cooling are important factors when
designing a data center. Among the many Juniper energy efficiency devices, the
MX960 is presented in Table 2.1 to demonstrate its effects on reductions in energy
consumption and footprint within the data center.

Table 2. 1 Juniper Networks MX 960 Power Efficiency Analysis

Characteristics Juniper Networks Core MX960 2x Chassis


Line-rate 10 GigE (ports) 96
Throughput per chassis (Mpps) 720
Output current (Amps) 187.84
Output Power (Watts) 9020.00
Heat Dissipation (BTU/Hr) 36074.33
Chassis Required (rack space) 2 chassis
Rack space (racks) 2/3rds of a single rack
30 Data Center Network Connectivity with IBM Servers

Two-Tier Network Deployment


In this handbook, we deploy a two-tier network architecture in a data center
network, as shown in Figure 2.4. These two tiers consist of the core network and
access network tiers. These two tiers are associated with the data services and
applications tier and core network tier, which define Juniper Networks data center
reference architecture. For further details concerning Juniper’s data center reference
architecture, refer to the Enterprise Data Center Network Reference Architecture –
Using a High Performance Network Backbone to Meet the Requirements of the
Modern Enterprise Data Center at www.juniper.net/us/en/local/pdf/reference-
architectures/8030001-en.pdf.

NOTE For a detailed discussion of two-tier network deployment, see


Chapter 3: Implementation Overview. The two-tier network architecture
defined in this handbook does not include a storage network.

To Edge Services Tier

Core
Tier
MX480 MX480

Access EX4200 VC EX4200 VC


Tier

EX4200 VC EX4200 VC

EX4200 VC EX4200 VC EX8200 EX8200


Virtual Chassis Virtual Chassis

MM1
MM2 NICs/HEA NICs/HEA
(Host EthernetAdapter) (Host EthernetAdapter)

Virtual Virtual
Servers Switch Switch

VIOS LPAR VIOS LPAR

IBM Blade Servers IBM Power VM IBM Power VM

Figure 2. 4 Sample Two-Tier Data Center Network Deployment


Chapter 2: Design Considerations 31

Core Network Tier


The core network tier commonly uses Juniper Networks MX Series Ethernet
Services Routers or Juniper Networks EX8200 line of Ethernet switches, such as
MX960, MX480, EX8216 and EX8208. Deciding on a particular device depends
on various factors, including functional requirements in the core network tier,
budgetary constraints or phased deployment considerations. The following
represent several customer scenarios:
• Extend the Layer 2 broadcasting domain across a geographically dispersed
data center so that all the servers associated with the Layer 2 domain appears
on the same Ethernet LAN. Then the enterprise can leverage many existing
provisioning and data migration tools to manage worldwide-distributed servers
effectively. The MX960 and MX480 are ideal devices for building an MPLS
backbone in the enterprise core network tier and for leveraging VPLS to create
an extended Layer 2 broadcasting domain between data centers. In the core
network tier, also known as the consolidated core layer, two MX Series routers
connect to two SRX Series platforms, which have many virtual security services
that can be configured into independent security zones. The MX Series routers
connect to top-of-the-rack Juniper Networks EX Series Ethernet Switches in
the access layer, which in turn aggregate the servers in the data center.
• Consolidate a traditional three-tier network infrastructure to support traffic-
intensive applications and multi-tier business applications to lower latency,
support data and video and integrate security. The MX960 and SRX5800 are
ideal products to provide a consolidated solution, as illustrated in Figure 2.5.

WAN Edge M Series M Series


Trunk
VPN
Server VLAN

Consolidated Core Layer


• Mapping of VLANs to Mapping SRX5800
Security Zones VRF to
Firewall
MX960 MX960 Security
#1
• Map VRFs on core to Zones IPS
routing instances on VRF VRF #1 Security
SRX Series #1 #1 NAT
#1 Zones
• Establish adjacency • Firewall
VRF VRF Firewall
between VRFs on core #2 • IPS
#2 #2 Mapping
IPS • NAT
• Traffic between VRF to
networks runs through Security #2
IP VPN
SRX Series by default, Zones Firewall
or filtered on MX Series #3

Access Layer VLANs

EX4200 Virtual Chassis EX4200 Virtual Chassis

HR Finance Guest Departments

Figure 2. 5 Integrating Security Service with Core Network Tier


32 Data Center Network Connectivity with IBM Servers

Two MX960 routers are shown to indicate high availability between these devices,
providing end-to-end network virtualization for applications by mapping Virtual
Routing and Forwarding (VRF) in the MX Series to security zones in the SRX. In Figure
2.5 for example, the VRF #1 is mapped to security zones Firewall #1, NAT #1, and IPS
#1, and VRF #2 is mapped to Firewall #2 and NAT #2.

For details concerning network virtualization on the MX Series, refer to Juniper


Networks white paper, Extending The Virtualization Advantage With NetworkVir-
tualization –Virtualization Techniques in Juniper Networks MX Series 3D Universal Edge
Routers at www.juniper.net/us/en/local/pdf/whitepapers/2000342-en.pdf.

Access Tier
We typically deploy the EX4200 Ethernet Switch with Virtual Chassis Technology as
a top-of-rack virtual chassis in the access tier.

The EX4200, together with server virtualization technology, supports high availability
and high maintainability – two key requirements for mission critical, online
applications.

EX4200 EX4200 EX4200


TOR Virtual Chassis 2

TOR Virtual Chassis 1

Servers uplink
(LAG+ backup) TOR Virtual Chassis
Uplink (LAG)

VIOS VIOS VIOS VIOS VIOS VIOS

VIOS VIOS VIOS VIOS VIOS VIOS

VIOS Secondary
VIOS

Power 570 Power 570 Power 570 VIOS Primary


VIOS
RACK 1 RACK 2 RACK 7

Figure 2. 6 Deploying PowerVM Using Dual Vios and Dual Top-Of-Rack


Virtual Chassis

As illustrated in Figure 2.6:

• The Power 570 Servers are deployed with dual Virtual I/O Servers (VIOS): the
primary VIOS runs in active mode while the secondary VIOS runs in standby
mode. The primary VIOS connects to one top-of-rack virtual chassis while the
secondary one connects to another top-of-rack virtual chassis.
Chapter 2: Design Considerations 33

• The typical bandwidth between the PowerVM’s VIOS and the top-of-rack
virtual chassis switch is 4Gbps, realized as 4 x 1Gbps ports in the NIC combined
in a LAG. The bandwidth can scale up to 8 Gbps by aggregating eight ports in a
LAG interface.
• The two Hardware Management Consoles (HMCs) connect to two different
top-of-rack virtual chassis, for example HMC 1 and HMC 2.
Besides preventing single point of failure (SPOF), this approach also provides highly
available maintenance architecture for the network: when a VIOS or virtual chassis
instance requires maintenance, operators can upgrade the standby VIOS or virtual
chassis while the environment runs business as usual, then switch the environment
to the upgraded version without disrupting application service.

For connecting a larger number of servers, it is straightforward to duplicate the


top-of-rack virtual chassis deployment at the access layer. Figure 2.7 shows a
top-of-rack virtual chassis with seven EX4200s connected to a group of 56 Power
570 systems.

To connect additional 56 Power 570 systems, an additional top-of-rack virtual


chassis is deployed at the access layer. As a result, the access layer can connect a
large number of Power 570 systems.

After addressing all the connectivity issues, we must not lose sight of the
importance of performance in the other network layers and network security
because we are operating the data center network as one secured network.

CORE LAYER

EX8200 EX8200

ACCESS LAYER
EX4200 EX4200

SERVER LAYER 56 IBM Power 570 systems 56 IBM Power 570 systems
4480 Client Partitions 4480 Client Partitions

Figure 2. 7 Top-Of-Rack Virtual Chassis with Seven EX4200s Connected to Power 570 Systems
34 Data Center Network Connectivity with IBM Servers

The EX4200 top-of-rack virtual chassis supports different types of physical


connections. The EX4200 provides 48, 1000 Base-TX ports and two ports for 10
Gbps XFP transceivers through its XFP uplink module. The XFP port can uplink
other network devices or it can connect to the IBM Power Systems based on user
requirements. Table 2.2 lists three typical 10 Gbps connections used in a Power
System and the XFP Uplink module required for each EX4200 connection.

MORE For further details concerning IBM PowerVM and EX4200 top-of-rack virtual
chassis scalability, refer to Implementing IBM PowerVM Virtual Machines on Juniper
Networks Data Center Networks at www.juniper.net/us/en/local/pdf/
implementation-guides/8010049-en.pdf.

Table 2. 2 Physical Connectivity Between IBM Power 570 and EX4200

IBM POWER 570 XFP Uplink Module Cable

10 Gbps Ethernet –LR XFP Uplink Module


SMF
PCI-Express Adapter XFP LR 10 Gbps Optical Transceiver Module

10 Gbps Ethernet –LR XFP Uplink Module


SMF
PCI-X2.0 DDR Adapter XFP LR 10 Gbps Optical Transceiver Module

Logical Host Ethernet XFP Uplink Module


SMF
Adapter (lp-hea) XFP SR 10 Gbps Optical Transceiver Module
35

Chapter 3

Implementation Overview

Implementation Network Topology Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Server and Network Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Spanning Tree Protocols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Multicast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

High Availability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Implementation Scenarios. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

THIS CHAPTER SERVES AS a reference to the later chapters in this handbook .


by presenting an overview of the next generation intra-data center network .
implementation scenarios. The implementation scenarios summarized in this .
chapter address the requirements, as previously discussed in Chapter 2. The
network topology of this reference data center is covered specifically as a part of
this chapter.

Chapters 4 through 8 focus on the technical aspects of the implementation that


primarily include server connectivity, STP, multicast, performance, and high
availability.
36 Data Center Network Connectivity with IBM Servers

Implementation Network Topology Overview


This chapter presents the implementation of a two-tier data center network
topology. This topology is common to all scenarios described in the later chapters.
Please note that the setup diagrams for each individual scenario can be different
despite common overall network topology.

As shown in Figure 3.1, the implementation network topology is a two-tier data


center network architecture.

Multicast
WAN Edge + Core Streaming
Core/Aggregation Tier Source
LAG/VRRP

PIM

PIM LAG/VRRP LAG/VRRP

MX480 EX8200 MX480

VRRP/RTG/STP VRRP/RTG/STP VRRP/RTG/STP


Access Tier PIM

PIM EX4200 EX4200 EX4200

EX4200
EX4200 EX4200
Virtual Chassis 2

EX4200 EX4200
Virtual Chassis 1

To R1 To R2 To R3
LAG LAG STP

Servers - VLAN A Servers - VLAN B Servers - VLAN C

Multicast Receiver - IGMP host Multicast Receiver - IGMP host Multicast Receiver - IGMP host

VC - EX4200 Virtual Chassis


RTG - Redundant Trunk Group
LAG - Link Aggregation Group I0 GE link
VRRP - Virtual Router Redundancy Protocol Virtual Chassis
STP - Spanning Tree Protocol 1 GE Link

Figure 3. 1 Implementation Network Topology Overview


Chapter 3: Implementation Overview 37

NOTE Each individual implementation can differ based on network design and
requirements.

The topology described here consists of the following tiers and servers:

• Core/aggregation tier consisting of EX8200s or MX480s.


• Access tier comprised of EX4200s. These access switches can be deployed
either individually or configured to form a virtual chassis. Either of these
options can be implemented as top-of-rack switches to meet different
Ethernet port density requirements. Pertaining to the topology under
discussion:
-- Three EX4200 switches form a virtual chassis (VC1), functioning astop-of
rack switching(ToR1).
-- Two EX4200 switches form a virtual chassis (VC2), functioning as top-of
rack switch (ToR2).
-- The EX4200-1, EX4200-2, EX4200-3 are three individual access switches,
functioning as top-of-rack switches (ToR3).
• Servers where the IBM BladeCenter, IBM x3500 and IBM PowerVM reside
for all scenarios presented. For ease of configuration, one server type is
used for each scenario.
Servers are segmented into different VLANs, for example VLAN A, B, and C, as
shown in Figure 3.1. The physical network topology consists of the following
connections:

• The servers connect to the access tier through multiple 1GbE links with Link
Aggregation (LAG) to prevent single point of failure (SPOF) in the physical
link and improve bandwidth.
• The access switches connect to the core layer with multiple 10GbE links.
• At the core tier, the MX480s and EX8200s interconnect to each other using
redundant 10GbE links. These devices connect to the WAN edge tier, which
interconnects the different data centers and connects to external networks.

NOTE Choosing different connection configurations is based on network design and


requirements. Redundant physical links are extremely important for achieving
network high availability.

Server and Network Connections


Chapter 4 discusses the IBM System p, PowerVM, and Juniper Networks MX an EX
Series network configurations. Typically, these network configurations are required
for any implementation scenario.

For the IBM System p and PowerVM, we discuss its production networks and
management networks. We also discuss key PowerVM server virtualization
concepts, including Shared Ethernet Adapter (SEA) and Virtual I/O Server (VIOS).

For the Juniper Networks MX and EX Series, we discuss the Junos operating system,
which runs on both the MX and EX Series platforms. In addition, we discuss the
jumbo frame Maximum Transmission Unit (MTU) setting.
38 Data Center Network Connectivity with IBM Servers

Spanning Tree Protocols


Spanning Tree Protocol is enabled on the connections between the access switches
and the servers, and on the connection between the access and core/aggregation
devices. For server-to-access switch connections, STP is configured on the switch
side so that the links to the servers are designated as “edge ports.” There are no
other bridges attached to edge ports. Administrators can configure RSTP, MSTP or
VSTP between the access and aggregation/core devices.

NOTE Both the MX Series and EX Series devices support all spanning tree protocols.

Spanning Tree Protocols, such as RSTP, MSTP and VSTP, prevent loops in Layer
2-based access and aggregation layers. MSTP and VSTP are enhancements over
RSTP. MSTP is useful when it is necessary to divide a Layer 2 network into multiple,
logical spanning tree instances. For example, it is possible to have two MSTP
instances that are mutually exclusive from each other while maintaining a single
broadcast domain. Thus, MSTP provides better control throughout the network
by dividing it into smaller regions. MSTP is preferred when different devices must
fulfill the role of the root bridge. Thus, the role of the root bridge is spread across
multiple devices.

The tradeoff for implementing MSTP is increased administrative overhead and


network complexity. A higher number of root devices increase the latency time
during the root bridge election.

NOTE When using MSTP, it is important to distribute the root bridge functionality across
an optimal number of devices without increasing the latency time during root
bridge election.

VSTP can be compared to Cisco’s PVST+ protocol. VSTP is implemented when


spanning tree is enabled across multiple VLANs. However, VSTP is not scalable and
cannot be used for a larger number of VLANs. See Chapter 5 for a detailed
discussion on STP protocols.

Multicast
The multicast protocol optimizes the delivery of video streaming and improves
network infrastructure and overall efficiency. In Chapter 6, we present multicast
implementation scenarios, including Protocol Independent Multicast (PIM) and
IGMP snooping.

In these scenarios, the video streaming client runs on IBM servers. PIM is
implemented on the core/aggregation tiers, while IGMP snooping is implemented
on the access tier.

Performance
In Chapter 7 two methods for improving data center network performance are
covered in detail:
• Using CoS to manage traffic.
• Considering latency characteristics when designing networks using Juniper
Networks data center network products.
Chapter 3: Implementation Overview 39

Using CoS to Manage Traffic


Configuring CoS on the different devices within the datacenter enables SLAs for
different voice, video and other critical services. Traffic can be prioritized using
different forwarding classes. Prioritization between streams assigned to a particular
forwarding class can be achieved using a combination of Behavior Aggregate (BA)
and Multifield (MF) classifiers.

Latency
Evolution of Web services and SOA has been critical to the integration of
applications that use standard protocols such as HTML. This tight integration of
applications with web services has generated an increase of almost 30-75 percent
of east-west traffic (server-to-server traffic) within the data center.

As a result, latency between servers must be reduced. Reduced latency can be


achieved by:

• Consolidating the number of devices and thus the tiers within the data center.
• Extending the consolidation between tiers using techniques such as virtual
chassis. Virtual chassis and multiple access layer switches can be grouped
logically to form one single switch. This reduces the latency time to a few
microseconds because the traffic from the server does not need to be
forwarded through multiple devices to the aggregation layer.
In the latency implementation scenario, we primarily focus on how to configure the
MX480 for measuring Layer 2 and Layer 3 latency.

High Availability
High availability can provide continuous service availability when implementing
redundancy, stateful recovery from a failure, and proactive fault prediction. High
availability minimizes failure recovery time.

Junos OS provides several high availability features to improve user experience and
to reduce network downtime and maintenance. For example, features such as
virtual chassis (supported on EX4200), Non Stop Routing/Bridging (NSR/NSB,
both supported on MX Series), GRES, GR and Routing Engine Redundancy can help
increase availability at the device level. The Virtual Routing Redundancy Protocol
(VRRP), Redundant Trunk Group (RTG) and LAG features control the flow of traffic
over chosen devices and links. The ISSU feature on the MX Series reduces network
downtime for a Junos OS software upgrade. For further details concerning a variety
of high availability features, see Chapter 8: Configuring High Availability.

Each high availability feature can address certain technical challenges but may not
address all the challenges that today’s customers experience. To meet network
design requirements, customers can implement one or many high availability
features. In the following section, we discuss high availability features by comparing
their characteristics and limitations within the following groups:

• GRES, GR versus NSR/NSB


• Routing Engine Switchover
• Virtual Chassis
• VRRP
40 Data Center Network Connectivity with IBM Servers

Comparing GRES and GR to NSR/NSB


Table 3.1 provides an overview of the GRES, GR and NSR/NSB high availability
features available in Junos.

Table 3. 1 High Availability Features in Junos OS

HA Features Functions Implementation Considerations


Incapable of providing router redundancy by
Provides uninterrupted traffic forwarding.
itself. Works with GR protocol extensions.
GRES
Network churn and processing not
Maintains kernel state between REs and PFE.
proportional to effective change.

Allows a failure of a neighboring router not to


disrupt adjacencies or traffic forwarding for a
certain time interval.
Network topology changes can interfere with
Enables adjoining peers to recognize RE graceful restart.
GR (protocol switchover as a transitional event. This
extensions) prevents them from starting the process of
reconverging network paths.

Neighbors are required to support graceful GR can cause blackholing if RE failure occurs
restart. due to a different cause.

RE switchover is transparent to network peer.

No peer participation required.


Unsupported protocols must be refreshed
NSR/NSB No drop in adjacencies or session. using the normal recovery mechanisms
inherent in each protocol.
Minimal impact on convergence.

Allows switchover to occur at any point, even


when routing convergence is in progress.

Nonstop active routing/bridging and graceful restart are two different mechanisms
for maintaining high availability when a router restarts.

A router undergoing a graceful restart relies on its neighbors to restore its routing
protocol information. Graceful restart requires a restart process where the
neighbors have to exit a wait interval and start providing routing information to the
restarting router.

NSR/NSB does not require a route restart. Both primary and backup Routing
Engines exchange updates with neighbors. Routing information exchange
continues seamlessly with the neighbors when the primary Routing Engine fails
because the backup takes over.

NOTE NSR cannot be enabled when the router is configured for graceful restart.
Chapter 3: Implementation Overview 41

Routing Engine Switchover


Because Routing Engine switchover works well with other high availability features,
including graceful restart and NSR, many implementation options are possible.
Table 3.2 summarizes the feature behavior and process flow of these options. The
dual (redundant) Routing Engines option means that the Routing Engine
switchover is disabled. We also use the dual Routing Engine only option as a
baseline to compare other options with high availability enabled, such as the
graceful routing engine switchover enabled option.

Table 3. 2 Routing Engine Switchover Implementation Options Summary

Implementation Options Feature Behavior Process Flow


Dual Routing Engines only Routing convergence takes • All physical interfaces are taken offline.
(no high availability place and traffic resumes • Packet Forwarding Engines restart.
features enabled) when the switch over to the
• Backup Routing Engine restarts the routing
new primary Routing Engine
protocol process (rpd).
is complete.
• The new primary Routing Engine discovers all
hardware and interfaces.
• The switchover takes several minutes and all
of the router’s adjacencies are aware of the
physical (interface alarms) and routing
(topology) change.

Graceful Routing Engine Interface and kernel • The new primary Routing Engine restarts the
switchover enabled information preserved routing protocol process (rpd).
during switchover. The • All adjacencies are aware of the router’s
switchover is faster because change in state.
the Packet Forwarding
Engines are not restarted.

Graceful Routing Engine Traffic is not interrupted • Unsupported protocols must be refreshed
switchover and nonstop during the switchover. using the normal recovery mechanisms
active routing enabled Interface, kernel and routing inherent in each protocol.
protocol information is
preserved.

Graceful Routing Engine Traffic is not interrupted • Neighbors are required to support graceful
switchover and graceful during the switchover. restart and a wait interval is required.
restart enabled Interface and kernel • The routing protocol process (rpd) restarts.
information is preserved. For certain protocols, a significant change in
Graceful restart protocol the network can cause graceful restart to
extensions quickly collect stop.
and restore routing
information from the
neighboring routers.
42 Data Center Network Connectivity with IBM Servers

Virtual Chassis
Between 2 and 10 EX4200 switches can be connected and configured to form a
single virtual chassis that acts as a single logical device to the rest of the network. A
virtual chassis typically is deployed in the access tier. It provides high availability to
the connections between the servers and access switches. The servers can be
connected to different member switches of the virtual chassis to prevent SPOF.

Virtual Router Redundancy Protocol


The Virtual Routing Redundancy Protocol (VRRP) described in IETF standard RFC
3768, is a redundancy protocol that increases the availability of a default gateway
on a static routing environment. VRRP enables hosts on a LAN to use redundant
routers on that LAN without requiring more than the static configuration of a single
default route on the hosts. The VRRP routers share the IP address corresponding to
the default route configured in the hosts.

At any time, one of the VRRP routers is the master (active) and the others are
backups. If the master fails, one of the backup routers becomes the new master
router, thus always providing a virtual default router and allowing traffic on the LAN
to be routed without relying on a single router.

Junos OS provides two tracking capabilities to enhance VRRP operations:

• Track the logical interfaces and switch to a VRRP backup router.


• Track the reachability to the primary router. An automatic failover to the backup
occurs if the route to the given primary no longer exists in the routing table.

Implementation Scenarios
Table 3.3 summarizes the implementation scenarios presented in this handbook. It
provides mapping between each scenario, network tier, and devices. Using this
table as a reference, you can map the corresponding chapter to each particular
implementation scenario.
Chapter 3: Implementation Overview 43

Table 3. 3 Implementation Scenarios Summary

Implementation Scenarios Chapter Implementation Deployment Device Support

Access-Aggregation/Core EX4200
Spanning Tree
Chapter 5 Aggregation-Aggregation EX8200
(MSTP/RSTP/VSTP)
Aggregation-Core MX Series

EX4200
PIM Chapter 6 Access
EX8200, MX Series

EX4200, EX8200,
IGMP snooping Chapter 6 Access
MX Series

Access EX4200
CoS Chapter 7
Aggregation/Core EX8200, MX Series

Virtual Chassis Chapter 8 Access EX4200

Access
EX4200
VRRP Chapter 8 Aggregation/Core
EX8200, MX Series

ISSU Chapter 8 Aggregation/Core MX Series only

Access
RTG Chapter 8 EX Series only
Aggregation

MX Series, EX8200
Routing Engine Redundancy Chapter 8 Aggregation/Core

Non-Stop Routing Chapter 8 Aggregation/Core MX Series only

Access EX4200
GR Chapter 8
Aggregation/Core EX8200, MX Series

RTG Chapter 8 Access-Aggregation EX Series only

Access-Server EX4200
LAG Chapter 8
Aggregation/Core EX8200, MX Series
44 Data Center Network Connectivity with IBM Servers

Table 3.4 functions as a reference aid to help our customers thoroughly understand
how Juniper Networks products and features, which are available in Junos 9.6, can
be implemented into their networks. This table summarizes implementation
scenarios and their supported products that are defined in detail later in this guide.

Table 3. 4 Mapping of Implementation Scenarios to Juniper Networks Supported Products

Implementation Scenarios EX4200 EX8200 MX480


High Availability

NSR/NSB – – Yes

GRES + GR – Yes Yes

Virtual Chassis Yes – –

VRRP Yes Yes Yes

RTG Yes Yes –

LAG Yes Yes Yes

ISSU – – Yes

Routing Engine Redundancy Yes Yes Yes

Spanning Tree Protocol


STP/RSTP, MSTP, VSTP Yes Yes Yes

Performance
CoS Yes Yes Yes

Multicast
PIM Yes Yes Yes

IGMP Yes Yes Yes


45

Chapter 4

Connecting IBM Servers in the Data


Center Network

IBM System p and PowerVM Production Networks . . . . . . . . . . . . . . . . . . . . . . . . 46

IBM System p and PowerVM Management Networks . . . . . . . . . . . . . . . . . . . . . . 47

Configuring IBM System p Servers and PowerVM . . . . . . . . . . . . . . . . . . . . . . . . . . 48

IBM PowerVM Network Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Junos Operating System Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Configuring Network Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

THIS CHAPTER DISCUSSES the IBM System p and PowerVM network


configuration and the Juniper Networks MX and EX Series network configuration.
The IBM System p server is based on POWER processors, such as POWER5,
POWER6 and the recently announced POWER7. In addition to the System p
server, IBM offers PowerVM, which is a new brand for the system virtualization
powered by POWER processors, and which includes elements such as Micro-
Partitioning, logical partitioning (LPAR), Virtual I/O Server (VIOS) and hypervisor.
Both System p servers and PowerVM typically are deployed in the data center and
support mission critical applications.
46 Data Center Network Connectivity with IBM Servers

IBM System p and PowerVM Production Networks


As illustrated in Figure 4.1, the POWER Hypervisor is the foundation of virtual
machine implementation in the IBM PowerVM system. Combined with features
designed into IBM’s power processors, the POWER Hypervisor enables dedicated-
processor partitions, Micro-Partitioning, virtual processors, an IEEE VLAN
compatible virtual switch, virtual Ethernet adapters, virtual SCSI adapters, and
virtual consoles within the individual server. The POWER Hypervisor is a firmware
layer positioned between the hosted operating systems and the server hardware. It
is automatically installed and activated, regardless of system configuration. The
POWER Hypervisor does not require specific or dedicated processor resources
assigned to it.

Standalone Servers IBM Power 570 System

Client Partitions (Virtual Machines)

App App App App App App


Virtual I/O Server (VIOS)

Shared Ethernet Adapters


OS OS OS OS OS OS

Virtual Virtual Virtual Virtual Virtual

VLAN 100
VLAN 200
Hypervisor

VLAN 100
VLAN 200

EX4200

Figure 4.1 IBM Power Systems Virtualization Overview

The VIOS, also called the Hosting Partition, is a special-purpose LPAR in the server,
which provides virtual I/O resources to client partitions. The VIOS owns the
resources, such as physical network interfaces and storage connections. The
network or storage resources, reachable through the VIOS, can be shared by client
partitions running on the machine, enabling administrators to minimize the number
of physical servers deployed in their network.

In PowerVM, client partitions can communicate among each other on the same
server without requiring access to physical Ethernet adapters. Physical Ethernet
adapters are required to allow communication between applications running in the
client partitions and external networks. A Shared Ethernet Adapter (SEA) in VIOS
bridges the physical Ethernet adapters from the server to the virtual Ethernet
adapters functioning within the server.
Chapter 4: Connecting IBM Servers in the Data Center Network 47

Because the SEA functions at Layer 2, the original MAC address and VLAN tags of
the frames associated with the client partitions (virtual machines) are visible to
other systems in the network. For further details, refer to IBM’s white paper Virtual
Networking on AIX 5L at www.ibm.com/servers/aix/whitepapers/aix_vn.pdf.

In PowerVM, the physical Network Interface Card (NIC) typically is allocated on


VIOS for improved utilization and in the IBM System p, the physical NIC is exclusively
allocated to a LPAR.

IBM System p and PowerVM Management Networks


The System p servers and PowerVM in the data center can be managed by HMC,
which is a set of applications running on a dedicated IBM X Series server that
provides a CLI-based and web-based server management interface. HMC
typically connects monitor, keyboard and mouse for local access. However, the
management network, which connects HMC and its managed servers, is critical
to the remote access, an essential operational task in today’s data center.

Out-of-Band
Network
Management
Client

NIC httpd sshd


X-Window
Server
NIC dhcpd RMC
HMC
HMC

HMC Private
Management
Network

Server SRV 1
LPAR LPAR 1 LPAR 2 LPAR 3
VIOS
FSP
NIC NIC NIC

Server SRV 2
LPAR LPAR 1 LPAR 2 LPAR 3
VIOS
FSP
NIC NIC NIC

Figure 4.2 IBM Power Systems Management Networks Overview

As illustrated in Figure 4.2, IBM Power Systems management networks require two
networks:

• Out-of-Band management network.


• HMC private management network.
48 Data Center Network Connectivity with IBM Servers

The out-of-band management network connects HMC and client networks so that
a client’s request for access can be routed to the HMC. A HMC private management
network is dedicated for communication between the HMC and its managed
servers. The network uses a selected range of non-routable IP addresses, and the
Dynamic Host Configuration Protocol (DHCP) server is available in the HMC for IP
allocation. Each p server connects to the private management network through its
Flexible Service Processor (FSP) ports.

Through the HMC private management network, the HMC manages servers in the
following steps:

1. Connects the p server’s FSP port to the HMC private management network so
that HMC and the server are in the same broadcast domain, and HMC runs
DHCP server (dhcpd).
2. Powers on the server. The server’s FSP runs the DHCP client and requests a new
IP address. FSP gets the IP address, which is allocated from HMC.
3. HMC communicates with the server and updates its managed server list with
this new server.
4. HMC performs operations on the server, for example powers on/off the server,
creates LPAR, sets shared adapters (Host Ethernet and Host Channel) and
configures virtual resources.

Configuring IBM System p Servers and PowerVM


In this section, we discuss IBM’s system p servers and PowerVM network
configuration, including NIC, virtual Ethernet Adapter and virtual Ethernet switch
configurations, SEA in VIOS, and Host Ethernet Adapter.

Network Interface Card


As illustrated in Figure 4.3, the NIC can be allocated exclusively to a LPAR through
HMC. In LPAR, the system administrators will further configure NIC operation
parameters, such as auto-negotiation, speed, duplex, flow control and support for
jumbo frames.
Server SRV 1
LPAR LPAR 1 LPAR 2 LPAR 3
VIOS
to HMC FSP
NIC NIC NIC

Figure 4.3 IBM Power Systems Management Networks Overview

To allocate (or remove) the NIC on the LPAR, perform the following steps:

1. Select LPAR.
2. Select: Configuration >> Manage Profiles.
3. Select the profile that you want to change.
4. Select I/O tab.
5. Select NIC (physical I/O resource).
6. Click Add to add the NIC (or Remove to remove the NIC).
7. Select OK to save changes, then click Close.
Chapter 4: Connecting IBM Servers in the Data Center Network 49

NOTE The NIC can be allocated to multiple profiles. Because the NIC allocation is
exclusive during the profile runtime, only one profile activates and uses this NIC. If
the NIC is already used by one active LPAR, and you attempt to activate another
LPAR, which requires the same NIC adapter, the activation process will be aborted.
Adding or removing the NIC requires re-active LPAR profile.

Configuring Virtual Ethernet Adapter and Virtual Ethernet Switch


As illustrated in Figure 4.4, the POWER Hypervisor implements an IEEE 802.1Q
VLAN style virtual Ethernet switch. Similar to a physical Ethernet switch, it provides
virtual ports which supports IEEE 802.1Q VLAN tagged or untagged Ethernet
frames.

Similar to a physical Ethernet adapter on the physical server, the virtual Ethernet
adapter on the partition provides network connectivity to the virtual Ethernet
switch. When you create a virtual Ethernet adapter on the partition from the HMC,
the corresponding virtual port is created on the virtual Ethernet switch and there is
no need to attach explicitly a virtual Ethernet adapter to a virtual port.

The virtual Ethernet adapter and virtual Ethernet switch form a virtual network
among the client partitions so that they can communicate with each other running
on the same physical server. The VIOS is required for client partition to further
access the physical network outside of the physical server. As shown in Figure 4.4,
three LPARs and VIOS connect to two virtual Ethernet switches through virtual
Ethernet adapters. The VIOS also connects to the physical NIC so that LPAR2 and
LPAR3 can communicate with each other; LPAR1, LPAR2 and VIOS can
communicate with each other and further access the external physical network
through the physical NIC.

Server SRV 1
LPAR LPAR 1 LPAR 2 LPAR 3
VIOS
to HMC FSP

NIC 1 Virtual Switch 2

Figure 4.4 Configuring Virtual Ethernet Switches and Virtual Ethernet


Adapters

This section provides steps for the following:

• Creating a virtual Ethernet switch


• Removing a virtual Ethernet switch
• Creating a virtual Ethernet adapter
• Removing a virtual Ethernet adapter
• Changing virtual Ethernet adapter properties
50 Data Center Network Connectivity with IBM Servers

To Create a Virtual Ethernet Switch

1. Select server (Systems Management >> Servers >> select server).


2. Select Configuration >> Virtual Resources >> Virtual Network Management.
3. Select Action >> Create VSwitch.
4. Enter a name for the VSwitch then select OK.
5. Click Close to close dialog.
To Remove a Virtual Ethernet Switch

1. Select server (Systems Management >> Servers >> select server).


2. Select Configuration >> Virtual Resources >> Virtual Network Management.
3. Select Action >> Remove VSwitch.
4. Click Close to close dialog.
To Create a Virtual Ethernet Adapter

1. Select server (Systems Management >> Servers >> select server).


2. Select LPAR.
3. Select Configuration >> Manage Profiles.
4. Select the profile that you want to change.
5. Select Virtual Adapters tab.
6. Select Actions >> Create >> Ethernet Adapter (see Figure 4.5).

Figure 4.5 Virtual Ethernet Adapter Properties Window


Chapter 4: Connecting IBM Servers in the Data Center Network 51

7. In the virtual Ethernet Adapter Properties window (as shown in Figure 4.5),
enter the following:
a. Adapter ID, (default value displays).
b. VSwitch, virtual Ethernet Switch that this adapter connects to.
c. VLAN ID, VLAN ID for untagged frames, Vswitch will add/remove
the VLAN header.
d. Select the checkbox, this adapter is required for partition activation.
e. Select the checkbox, IEEE 802.1q compatible adapter, for control if VLAN
tagged frames are allowed on this adapter.
f. Select the Add, Remove, New VLAN ID and Additional VLANs for adding/
removing VLAN IDs that are allowed for VLAN tagged frames.
g. Select the checkbox Access external network enabled only on LPARs used
for bridging traffic from the virtual Ethernet Switch to some other NIC.
Typically this should be kept unchecked for regular LPARs and checked
for VIOS.
h. Click OK to save changes made in the profile and then select Close.
To Remove a Virtual Ethernet Adapter

1. Select server (Systems Management >> Servers >> select server).


2. Select LPAR.
3. Select Configuration >> Manage Profiles.
4. Select the profile that you want to change.
5. Select Virtual Adapters tab.
6. Select the Ethernet Adapter that you want to remove.
7. Select: Actions >> Delete.
8. Click OK to save changes made in the profile and then select Close.
To Change a Virtual Ethernet Adapter’s Properties

1. Select server (Systems Management >> Servers >> select server).


2. Select LPAR.
3. Select Configuration >> Manage Profiles.
4. Select the profile that you want to change.
5. Select the Virtual Adapters tab.
6. Select the Ethernet Adapter that you want to remove.
7. Select Actions >> Edit.
8. Enter the required information in the fields, as illustrated in Figure 4.5
9. Click OK to save changes made in the profile and then select Close.
52 Data Center Network Connectivity with IBM Servers

Shared Ethernet Adapter in VIOS


The SEA is a software implemented Ethernet bridge that connects a virtual
Ethernet network to an external Ethernet network. With this connection, the SEA
becomes a logical device in VIOS, which typically connects two other devices: the
virtual Ethernet adapter on VIOS connects to the virtual Ethernet switch; the
physical NIC connects to the external Ethernet network.

NOTE Make sure that the Access External network option is checked when the virtual
Ethernet adapter is created on VIOS.

To create a SEA on VIOS, use the following command syntax:

mkvdev -sea <target_device> -vadapter <virtual_ethernet_adapters> -default


<DefaultVirtualEthernetAdapter> -defaultid <SEADefaultPVID>

Table 4.1 lists and defines the parameters associated with this command.

Table 4.1 mk/debv Command Parameters and Description

Parameters Description
Is the physical port that connects to the external network, on
target _ device
NIC exclusively allocated to VIOS, LPAR or LHEA.
Represents one or more virtual Ethernet adapters that SEA
virtual _ ethernet _ adapters
will bridge to target_device (typically only one adapter).
Is the default virtual Ethernet adapter that will handle
DefaultVirtualEthernetAdapter
untagged frames (typically the same as previous parameter).
Is the VID for the default virtual Ethernet adapter (typically
SEADefaultPVID
has the value of 1).

The following sample command creates a SEA, as shown in Figure 4.6:

mkvdev -sea ent1 -vadapter emt2 -default ent3 -defaultid 1

Server SRV 1
LPAR LPAR 1 LPAR 2
VIOS

SEA (ent3) (ent2)


to HMC FSP

(ent1) Virtual Switch 1

NIC

ent1 – Ethernet Interface to NIC Assigned to VIOS LPAR


ent2 – Ethernet Interface to Virtual Switch
ent3 – Shared Ethernet Adaper (Logical Device)

Figure 4.6 Creating a shared Ethernet Adapter in VIOS


Chapter 4: Connecting IBM Servers in the Data Center Network 53

Host Ethernet Adapter


The HEA, also called Integrated virtual Ethernet Adapter, is an integrated high-
speed Ethernet adapter with hardware-assisted virtualization, which is a standard
feature on every POWER6 processor-based server. The HEA provides physical
high-speed connection (10G) to the external network and provides a logical port.
Figure 4.7 shows the LHEAs for LPARs.

Server SRV 1
LPAR LPAR 1 LPAR 2 LPAR 3
VIOS
to HMC FSP
HEA
HEA Ext Port 1 HEA Ext Port 2

Figure 4.7 Host Ethernet Adapter Overview

Because HEA creates a virtual network for the client partitions and bridges the
virtual network to the physical network, it replaces the need for both the virtual
Ethernet and the Shared Ethernet Adapter. In addition, HEA enhances performance
and improves utilization for Ethernet because HEA eliminates the need to move
packets (using virtual Ethernet) between partitions and then through a SEA to the
physical Ethernet interface. For detailed information, refer to IBM’s Redpaper
Integrated Virtual Ethernet Adapter Technical Overview and Introduction at
www.redbooks.ibm.com/abstracts/redp4340.html.

HEA is configured through HMC. The following list includes some HEA
configuration rules:

• LPAR uses only one logical port to connect to HEA.


• HEA consists of one or two groups of logical ports.
• Each group of logical ports has 16 logical ports (16 or 32 total for HEA).
• Each group of logical ports can have one or two external ports assigned to it
(predefined).
• A logical port group consists of one or two Ethernet switch partitions, one for
each external port.
• LPAR can have only one logical port connected to an Ethernet switch partition.
This means that only one logical port can connect to the external port.
• MCS increases bandwidth between LPAR and NIC. MCS reduces the number of
logical ports, for MCS=2 the number of logical ports is 16/2=8. For MCS to take
effect, a server restart is required.
• Only one logical port in a port group can be set in promiscuous mode.
In this section, we discuss the following HEA configurations:
• Configuring a HEA physical port
• Adding a LHEA logical port
• Removing a LHEA logical port.
54 Data Center Network Connectivity with IBM Servers

Configuring a HEA Physical Port


To configure the HEA physical port, perform the following steps (refer to Figure 4.8
as a reference):

1. Select server (Systems Management >> Servers >> select server).


2. Select Hardware Information >> Adapters >> Host Ethernet.
3. Select adapter (port).
4. Click the Configure button.
5. Enter parameters for the following fields: Speed, Duplex, Maximum
receivingpacket size (Jumbo frames), Pending Port Group Multi-Core Scaling
value, Flow control, Promiscuous LPAR.
6. Click OK to save your changes.

Figure 4.8 HEA Physical Port Configuration Overview Window

Adding a LHEA Logical Port


To add a LHEA, perform the following steps:

1. Select server (Systems Management >> Servers >> select server).


2. Select LPAR.
3. Select Configuration >> Manage Profiles.
4. Select the profile that you want to change.
5. Select the tab Logical Host Ethernet Adapters (LHEA).
6. Select the external port that LHEA connects to.
7. Click Configure.
8. Enter the parameters for the following fields: Logical port, select one port 1…16,
if MCS is greater than 1 some logical ports will be identified as Not Available.
9. Select the checkbox Allow all VLAN IDs. Otherwise, enter the actual VLAN ID in
the VLAN to add field, as shown in Figure 4.9.
10. Click OK.
Chapter 4: Connecting IBM Servers in the Data Center Network 55

Figure 4.9 HEA Physical Port Configuration Overview

Removing a LHEA Logical Port


To remove the LHEA, perform the following steps:

1. Select server (Systems Management >> Servers >> select server).


2. Select LPAR.
3. Select Configuration >> Manage Profiles.
4. Select the profile that you want to change.
5. Select the tab Logical Host Ethernet Adapters (LHEA).
6. Select the external port that LHEA connects to.
7. Click the Reset button.
8. Click OK to close the window.
9. Click OK to save changes and close the window.
56 Data Center Network Connectivity with IBM Servers

IBM PowerVM Network Deployment


In this section, we discuss a typical IBM PowerVM network deployment. As
illustrated in the Figure 4.10, two IBM System p6 servers are deployed in a data
center and three networks are required:

• HMC private network (192.168.128.0/17).


• Out of Band management network (172.28.113.0/24).
• Production network (11.11.1.0/8). Typically, testing traffic is sent to interfaces on
the production network.

dhcpd Web App sshd


X-Window
Server
HMC

Keyboard
and Monitor
Ethernet Switch
Private Network
(192.168.128.0/17)

p6 Server
LPAR LPAR LPAR LPAR LPAR
VIOS RHEL SUSE AIX 5.3 AIX 6.1

FSP

Host Ethernet
Adapter (HEA) NIC under test

p5 Server
LPAR LPAR LPAR
VIOS RHEL SUSE

FSP

Hypervisor NIC under test

Ethernet Switch
Management Network
(172.28.113.0/24) DUT
Production Network
(11.11.1.0/24)

Management Workstation
(Web client–Telnet/SSH Client)

Figure 4.10 IBM Power Series Servers, LPARs, and Network Connections
Chapter 4: Connecting IBM Servers in the Data Center Network 57

HMC runs on a Linux server with two network interfaces: one connects to a private
network for all managed P5/P6 systems (on-board Ethernet adapter on servers,
controlled by FSP process); the other network interface connects to a management
network. In the management network, the management workstation accesses the
HMC Web interface through a Web browser.

There are two ways to set up communication with LPAR (logical partitions):

• Using a console window through HMC.


• Using Telnet/SSH over the management network. Each LPAR has one
dedicated Ethernet interface for connecting to the management network using
the first physical port on HEA (IVE) shared among LPARs.
Each LPAR must connect to the virtual Ethernet Switch using the virtual Ethernet
Adapter. You create a virtual Ethernet switch and a virtual Ethernet adapter using
the HMC. Virtual Ethernet adapters for VIOS LPARs must have the Access External
Network option enabled.

VIOS LPAR, which is a special version of AIX, performs the bridging between the
virtual Ethernet switch (implemented in Hypervisor) and the external port. For
bridging frames between the physical adapter on the NIC and the virtual Ethernet
adapter connected to the virtual Ethernet switch, another logical device (the SEA)
is created in VIOS.

As illustrated in Figure 4.11, the typical network deployment with the access switch
and LPAR (virtual machine) is as follows:

• The access switch connects to physical NIC, which is assigned to ent1 in VIOS.
• The ent3 (SEA) bridges ent1 (physical NIC) and ent2 (virtual Ethernet
adapters).
• The ent2 (virtual Ethernet adapter) is created and dedicated to LPAR which
runs Red Hat Enterprise Linux.
• The ent3 also supports multiple VLANs. Each VLAN will associate with one
logical Ethernet adapter, for example ent4.

LPAR
LPAR RHEL
ent1 VIOS
NIC
Ethernet SEA (ent3) ent2
Switch
ent4 Virtual Switch 1

ent1 – Ethernet Interface to NIC Assigned to VIOS LPAR


ent2 – Ethernet Interface to Virtual Switch (Sw. in Hypervisor)
ent3 – Shared Ethernet Adaper (Logical Device)
ent4 – Logical Ethernet Adaper for One VLAN

Figure 4.11 Detailed Network Deployment with SEA


58 Data Center Network Connectivity with IBM Servers

Junos Operating System Overview


As shown in Figure 4.12, the Junos OS includes two components: Routing Engine
and the Packet Forwarding Engine. These two components provide a separation of
control plane functions such as routing updates and system management from
packet data forwarding. Hence, products from Juniper Networks can deliver
superior performance and highly reliable Internet operation.

Junos Software
SNMP
Routing
Engine

User

Routing Routing Interface Command-Line Chassis


Tables Protocol Process Process Interface (CLI) Process

Forwarding
Kernel
Table

Forwarding Interface Distributed Chassis


Table Process ASICs Process

Embedded Microkernel

Packet
Forwarding Microkernel
Engine

Figure 4.12 Junos OS Architecture

Routing Engines
The Routing Engine runs the Junos operating system, which includes the FreeBSD
kernel and the software processes. The primary operator processes include the
device control process (dcd), routing protocol process (rpd), chassis process
(chassisd), management process (mgd), traffic sampling process (sampled),
automatic protection switching process (apsd), simple network management
protocol process (snmpd) and system logging process (syslogd). The Routing
Engine installs directly into the control panel and interacts with the Packet
Forwarding Engine.
Chapter 4: Connecting IBM Servers in the Data Center Network 59

Packet Forwarding Engine


The Packet Forwarding Engine is designed to perform Layer 2 and Layer 3 switching,
route lookups and rapid forwarding of packets. The Packet Forwarding Engine
includes the backplane (or midplane), Flexible PIC Concentrator (FPC), Physical
Interface Cards (PICs) and the control board (switching/forwarding) and a CPU
that runs the microkernel.

The microkernel is a simple, cooperative, multitasking, real-time operating system


designed and built by Juniper Networks. The microkernel, which has many features,
comprises fully independent software processes, each with its own chunk of
memory. These applications communicate with one another. The hardware in the
router prevents one process from affecting another. A snapshot is taken wherever
the failure occurred so that engineers can analyze the core dump and resolve the
problem. The Switch Control Board (SCB) powers on and off cards, controls
clocking, resets, boots, and then monitors and controls systems functions, including
the fan speed, board power status, PDM status and control, and the system
front panel.

Interaction Between Routing Engine and Packet Forwarding Engine


The kernel on the Routing Engine communicates with the Packet Forwarding
Engine and synchronizes a copy of the forwarding table on the Packet Forwarding
Engine with that on the Routing Engine. Figure 4.12 shows the interaction between
the Routing Engine and Packet Forwarding Engine with respect to the forwarding
activity. The Routing Engine builds a master-forwarding table based on its routing
table. The kernel on the Routing Engine communicates with the Packet Forwarding
Engine and provides the Packet Forwarding Engine the forwarding table. From this
point on, the Packet Forwarding Engine performs traffic forwarding.

The Routing Engine itself is never involved in the forwarding of packets. The ASICs in
the forwarding path only identify and send the Routing Engine any exception
packets or routing control packets for processing. There are security mechanisms in
place that prevent the Routing Engine (and control traffic) from becoming attached
or overwhelmed by these packets. Packets sent to the control plane from the
forwarding plane are rate limited to protect the router from DOS attacks. The
control traffic is protected from excess exception packets using multiple queues
that provide a clean separation between the two. The packets are prioritized by the
packet-handling interface, which sends them to the correct queues for appropriate
handling.

The redundant function components in the network devices prevent SPOF and
increase high availability and reliability. Juniper Networks devices typically configure
with a single Routing Engine and Packet Forwarding Engine. To achieve high
availability and reliability, the user has two options:

• Create redundant Routing Engines and a single Packet Forwarding Engine, or


• Create redundant Routing Engines and redundant Packet Routing Engines.
60 Data Center Network Connectivity with IBM Servers

Junos Processes
Junos processes run on the Routing Engine and maintain the routing tables, manage
the routing protocols used on the router, control the router interfaces, control some
chassis components, and act as the interface for system management and user
access to the router. Major processes are discussed in detail later this section.

The Junos process is a UNIX process that runs nonstop in the background while a
machine is running. All of the processes operate through the Command Line
Interface (CLI). Each process is a piece of the software and has a specific function
or area to manage. The processes run in separated and protected address spaces.
The following sections briefly cover two major Junos processes: the routing protocol
process (rpd) and the management process (mgd).

Routing Protocol Process


The routing protocol process (rpd) provides the routing protocol intelligence to the
router, controlling the forwarding of packets. Sitting in the user space (versus the
kernel) of the routing engine, rpd is a mechanism for the Routing Engine to learn
routing information and construct the routing table, which stores route information.

This process starts all configured routing protocols and handles all routing
messages. It maintains one or more routing tables, which consolidates the routing
information learned from all routing protocols. From this routing information, the
rdp process determines the active routes to network destinations and installs these
routes into the Routing Engine’s forwarding table. Finally, the rdp process
implements the routing policy, which enables an operator to control the routing
information that is transferred between the routing protocols and the routing table.
Using a routing policy, operators can filter and limit the transfer of information as
well as set properties associated with specific routes.

NOTE RPD handles both unicast and multicast routing protocols as data travels to one
destination and travels to many destinations, respectively.

Management Process
Several databases connect to the management process (mgd). The config schema
database merges the packages /usr/lib/dd/libjkernel-dd.so, /usr/lib/dd/
libjroute-dd.so and /usr/lib/dd/libjdocs-dd at initialization time to make /var/
db/schema.db, which controls what the user interface (UI) is. The config database
holds /var/db/juniper.db.

The mgd works closely with CLI, allowing the CLI to communicate with all the other
processes. Mgd knows which process is required to execute commands (user
input).

When the user enters a command, the CLI communicates with mgd over a UNIX
domain socket using Junoscript, an XML-based remote procedure call (RPC)
protocol. The mgd is connected to all the processes, and each process has a UNIX
domain management socket.
Chapter 4: Connecting IBM Servers in the Data Center Network 61

If the command is legal, the socket opens and mgd sends the command to the
appropriate process. For example, the chassis process (chassisd) implements the
actions for the command show chassis hardware. The process sends its response
to mgd in XML form and mgd relays the response back to the CLI.

Mgd plays an important part in the commit check phase. When you edit a
configuration on the router, you must commit the change for it to take effect. Before
the change actually is made, mgd subjects the candidate configuration to a check
phase. The management process writes the new configuration into the config db
(juniper.db).

Junos Operating System Network Management


The Junos operating system network management features work in conjunction
with an operations support system (OSS) to manage the devices within the
network. The Junos OS can assist in performing the following management tasks:

• Fault management: includes device monitoring and detecting and fixing faults
• Configuration management
• Accounting management: collects statistics for accounting purposes
• Performance management: monitors and adjusts device performance
• Security management: controls device access and authenticates users
The following interfaces (APIs) typically are used to manage and monitor Juniper
Networks network devices:

• CLI
• J-Web
• SNMP
• NETCONF
In addition, Junos also supports other management interfaces to meet various
requirements from enterprise and carrier providers, including J-Flow, sFlow,
Ethernet OAM, TWAMP, etc.

MORE For detailed configuration information concerning the network management


interfaces, please refer to Junos Software Network Management Configuration
Guide Release 10.0 at http://www.juniper.net/techpubs/en_US/junos10.0/
information-products/topic-collections/config-guide-network-mgm/frameset.
html.

Configuring Network Devices


Table 4.2 lists and describes the ways by which IBM servers can connect to Juniper
switches and routers in the data center.
62 Data Center Network Connectivity with IBM Servers

Table 4.2 Methods for Connecting IBM Servers to Juniper Switches and Routers

Connection Types Description

To the IBM servers, the network device appears as a Layer 2 switch. The
The network device acts as network device interfaces and IBM server’s NIC are in the same Layer 2
Layer 2 switch. broadcast domain. Because the network device interfaces do not configure
Layer 3 IP addresses, they do not provide routing functionality.

To the IBM servers, the network device appears as a Layer 2 switch. The
The network device acts as network device interfaces and IBM server’s NIC are in the same Layer 2
a switch with Layer 3 address. broadcast domain. The network device interfaces configure Layer 3 IP
addresses so that they can route traffic to other connected networks.

To the IBM servers, the network device appears as a Layer 3 router with a
The network device acts as
single Ethernet interface and IP address. The network device does not
a router.
provide Layer 2 switching functionality.

In the next section, several different but typical methods for configuring the MX
Series routers and EX Series switches are presented.

Configuring MX Series 3D Universal Edge Routers


In an MX Series configuration, one physical interface can have multiple logical
interfaces, so that each logical interface is defined as a unit under the physical
interface, followed by the logical interface ID number. Use the following statement
to configure the mapping of Ethernet traffic to logical interfaces:

encapsulation and vlan-tagging

Configuring Layer 2 Switching


As illustrated in the following code, two Ethernet ports are in the same broadcast
domain: ge-5/1/5 interface is configured with untagged VLAN, while ge-5/1/7
interface is configured with tagged VLAN.

Ethernet interfaces in MX Series routers can support one or many VLANs. Each
Ethernet VLAN is mapped into one logical interface. If logical interfaces are used to
separate traffic to different VLANs, we recommend using the same numbers for
logical interface (unit) and VLAN ID. For instance, the logical interface and the VLAN
ID in the following sample use the same number (100):

interfaces ge-5/1/5 {
unit 0 {
family bridge;
}
}
interfaces ge-5/1/7 {
vlan-tagging;
encapsulation flexible-ethernet-services;
unit 100 {
encapsulation vlan-bridge;
family bridge;
}
}
bridge-domains {
Data01 {
Chapter 4: Connecting IBM Servers in the Data Center Network 63

domain-type bridge;
vlan-id 100;
interface ge-5/1/5.0;
interface ge-5/1/7.100;
}
}

Configuring Layer 2 Switching and Layer 3 Interface


As illustrated in the following code, two Ethernet ports are in the same broadcast
domain, ge-5/1/5 interface is configured with untagged VLAN, while ge-5/1/7
interface is configured with tagged VLAN.

In addition, IRB on the MX Series provides simultaneous support for Layer 2 bridging
and Layer 3 routing on the same interface, such as irb.100 so that the local packets
are able to route to another routed interface or to another bridging domain that has
a Layer 3 protocol configured.

interfaces ge-5/1/5 {
unit 0 {
family bridge;
}
}
interfaces ge-5/1/7 {
vlan-tagging;
encapsulation flexible-ethernet-services;
unit 100 {
encapsulation vlan-bridge;
family bridge;
}
}
interfaces irb {
unit 100 {
family inet {
address 11.11.1.1/24;
}
}
}
bridge-domains {
Data01 {
domain-type bridge;
vlan-id 100;
interface ge-5/1/5.0;
interface ge-5/1/7.100;
routing-interface irb.100;
}
}

Configuring Layer 3 Routing


As illustrated in the following code, one Ethernet interface (ge-5/0/0) is
configured with a tagged VLAN and an IP address.
interfaces ge-5/0/0
description “P6-1”;
vlan-tagging;
unit 30 {
description Data01;
vlan-id 30;
family inet {
address 11.11.1.1/24;
}
}
64 Data Center Network Connectivity with IBM Servers

Configuring EX Series 4200 and 8200 Ethernet Switches


In a typical EX Series configuration, one physical interface can have multiple logical
interfaces so that a logical interface is defined as a unit under the physical interface,
followed by a logical interface ID number. However, for Ethernet switching between
ports in the EX Series, configuration on interfaces must include family Ethernet-
switching under unit 0.

Define the configuration of Layer 2 broadcast (bridge) domains under vlan stanza.
Interface membership in VLANs can be defined using one of the following two
methods:

• Under vlan x interface (preferred method).


• Under interface y unit 0 family ethernet-switching vlan members.
If the Ethernet port carries only untagged frames for one VLAN, port mode should
be defined as access (default). If the Ethernet port carries tagged frames, port
mode must be defined as trunk (case with two or more VLANs on one port).

Configuring Layer 2 Switching


As illustrated in the following code, two Ethernet ports are in the same broadcast
domain: ge-5/1/5 interface is configured with untagged VLAN, while ge-5/1/7
interface is configured with tagged VLAN.

The Ethernet interfaces in EX Series routers can support one or many VLANs. Each
VLAN is mapped into one logical interface. If logical interfaces are used to separate
traffic to different VLANs, we recommend using the same numbers for the logical
interface (unit) and VLAN ID. For example, the logical interface and the VLAN ID in
the following sample use the same number (100):

interfaces ge-5/1/5 {
unit 0 {
family ethernet-switching;
}
interfaces ge-5/1/7 {
unit 0 {
family ethernet-switching {
port-mode trunk;
}
}
vlans {
Data01 {
vlan-id 100;
interface {
ge-5/1/5.0;
ge-5/1/7.0;
}
}
}
Chapter 4: Connecting IBM Servers in the Data Center Network 65

Configuring Layer 2 Switching and Layer 3 Interface


As illustrated in the following code, two Ethernet ports are in the same broadcast
domain: ge-5/1/5 interface is configured with untagged VLAN, while ge-5/1/7
interface is configured with tagged VLAN.

In addition, the EX series Ethernet Switch supports routed interfaces called Routed
VLAN Interfaces (RVI). RVIs are needed to route the traffic from one VLAN to
another. As opposed to IRB, which routes bridge domains, RBI routes VLANs. In the
following code, the RVI interface with IP address 11.11.1.1/24 is associated with VLAN
100 logical interface.

interfaces ge-5/1/5 {
unit 0 {
family ethernet-switching;
}
interfaces ge-5/1/7 {
unit 0 {
family ethernet-switching {
port-mode trunk;
}
}
interfaces vlan {
unit 100 {
family inet {
address 11.11.1.1/24;
}
}
}
vlans {
Data01 {
vlan-id 100;
interface {
ge-5/1/5.0;
ge-5/1/7.0;
}
l3-interface vlan.100;
}
}

Configuring Layer 3 Routing


As illustrated in the following code, the Ethernet interface (ge-5/0/0) is configured
with tagged VLAN and IP address:

interfaces ge-5/0/0
description “P6-1”;
vlan-tagging;
unit 30 {
description Data01;
vlan-id 30;
family inet {
address 11.11.1.1/24;
}
}
66 Data Center Network Connectivity with IBM Servers

MX and EX Series Ethernet Interface Setting


In general, the default value of the Ethernet interface setting is as follows:
• Auto-negotiation for the speed setting.
• Automatic for the link mode setting.
• Flow-control for the flow-control setting link-mode= automatic.
Because these default settings on the MX and EX Series worked well in many use
cases, we recommend using these default settings as a starting point and then
optimizing some of the settings only when necessary.
The Ethernet interface configuration stanzas on the MX and EX Series are different.
On the MX Series, the interface settings can be changed under interface x gigether-
options stanza; on the EX Series, the interface settings can be changed under
interface x ether-options stanza.
Under the configuration stanzas, the following settings are available:
• Link speed can be set to 10m, 100m, 1g or auto-negotiation.
• Link-mode can be set to automatic, full duplex or half-duplex.
• Flow-control can be set to flow-control or no-flow-control.
NOTE When one device is set to auto-negotiate link-mode while the other device is set to
full-duplex link mode, the connection between these two devices does will not
work properly due to the limitation set by IEEE802.3 standard. We highly
recommend using the auto-negotiate link setting for gigabit Ethernet.
NOTE The MX Series does not support half-duplex operation on 10/100/1000BaseT
interfaces.

MX and EX Series Support for Jumbo Frames (MTU)


The EX and MX Series can support frame sizes on Ethernet interfaces up to 9216
octets. In a Junos configuration, this parameter is called Maximum Transmission
Unit (MTU). In Junos, MTU includes Ethernet overhead such as source address,
destination address, and VLAN-tag. However, it does not include the preamble or
frame check sequence (FCS). The default Ethernet frame size in Junos is 1514
octets, while the default frame size on other vendor devices can be 1500 octets.
It is important to understand that all devices in one broadcast domain must have the
same jumbo frames MTU size. Otherwise, devices that do not support jumbo frames
could discard some frames silently. As a result, this creates remittance network
problems, such as failures between routers to establish OSPF neighbor adjacency.
The EX and MX Series devices have different types of interfaces, such as the
physical and irb interfaces. Because MTU is associated with each interface type, the
MTU configuration syntax is different, as listed in Table 4.3.

Table 4.3 MTU Configuration Syntax

Juniper Network Devices Interface Type Command


Physical interface set interfaces mtu <mtu>
MX Series Routers
IRB interface set interfaces irb mtu <mtu>
Physical interface set interfaces mtu <mtu>
EX Series Ethernet
VLAN interface set interfaces vlan mtu <mtu>
Switches
Interface VLAN unit set interfaces vlan unit 100 family inet mtu <mtu>
67

Chapter 5

Configuring Spanning Tree Protocols

Spanning Tree Protocols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Configuring RSTP/MSTP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

Configuring VSTP/PVST+/Rapid-PVST+. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

THIS CHAPTER FOCUSES on the different spanning tree protocols – STP, .


RSTP, MSTP, and VSTP – that are used in Layer 2 networks to prevent loops.

Typically, STP is supported only on legacy equipment and has been replaced with
RSTP and other variants of Spanning Tree. Support for RSTP is mandatory on all
devices that are capable of spanning tree functionality. When interoperating with
legacy switches, a RSTP capable switch automatically reverts to STP. We discuss
STP in this chapter to provide a background on spanning tree functionality.
68 Data Center Network Connectivity with IBM Servers

Spanning Tree Protocols


The STP protocol works on the concept of a switch elected as a root bridge that
connects in a mesh to other non-root switches. The active path of least cost is
selected between each of the non-root bridges and the root bridge. In addition, a
redundant path is identified and used when failure occurs. All these bridges
exchange Bridge Protocol Data Units (BPDU) that contain the bridge IDs and cost in
order to reach the root port.

The root bridge is elected based on priority. A switch, assigned the lowest priority, is
elected as the root. The ports on a switch that are closest (in cost) to the Root
Bridge become the Root Ports (RP).

NOTE There can only be one RP on a switch. A root bridge cannot have an RP.

The ports that have a least cost to the Root Bridge in the network are known as the
Designated Ports (DP). Ports that are not selected as RP or DP are considered to
be Blocked. An optimized active path based on bridge/port priority and cost is
chosen to forward data in the network. The BPDUs that provide the information on
the optimal path are referred to as “superior” BPDUs while those that provide
sub-optimal metrics are referred to as “inferior” BPDUs.

BPDUs mainly consist of the following fields that are used as the basis for
determining the optimal forwarding topology:

• Root Identifier - A representation of the switch’s current snapshot of the


network assuming it as the root bridge.
• Root path cost - Link speed of the port on which the BPDU is received.
• Bridge Identifier - Identity used by the switch to send BPDUs.
• Port Identifier - Identity of the port from which BPDU originated.
Convergence of a spanning tree based network consists of a three-step process:

1. Root Bridge Election.


2. Root Port Election (on non-root switches).
3. Designated Port Election (on network segment).
Figure 5.1 shows three switches: one root and two non-root bridges. The ports on
the root bridge are the designated ports (DP). The ports with least cost to the root
bridge are the root ports (RP). All other interfaces running STP on the non-root
bridges are alternate ports (ALT).

Rapid Spanning Tree Protocol


Rapid Spanning Tree Protocol (RSTP) is a later and enhanced version of STP that
provides faster convergence times. The faster times are possible because RSTP
uses protocol handshake messages unlike STP, which uses fixed timeouts. When
compared to STP, RSTP provides enhanced performance by:
Chapter 5: Configuring Spanning Tree Protocols 69

• Generating and transmitting BPDUs from all nodes at the configured Hello
interval, irrespective of whether they receive any BPDUs from the RP. This
allows the nodes to monitor any loss of Hello messages and thus detect link
failures more quickly than STP.
• Expediting changes in topology by directly transitioning the port (either edge
port or a port connected to a point-to-point link) from a blocked to forwarding
state.
• Providing a distributed model where all bridges in the network actively
participate in network connectivity.

Root Bridge

DP DP

RP RP

ALT DP

Non-Root Bridge 1 Non-Root Bridge 2

RP – Root Port
DP – Designated Port
ALT – Alternate Port

Figure 5.1 STP Network

New interface types defined in RSTP are:

• Point to Point
• Edge
• Shared or Non-edge

Point to Point
A point-to-point (P2P) interface provides a direct connection between two
switches. Usually, a full duplex interface is set automatically to be P2P.

Edge
The edge interface is another enhancement in RSTP that helps reduce convergence
time when compared to STP. Ports connected to servers (there are no bridges
attached) are typically defined as edge ports. Any changes that are made to the
status of the edge port do not result in changes to the forwarding network topology
and thus are ignored by RSTP.
70 Data Center Network Connectivity with IBM Servers

Shared or Non-edge
A Shared or Non-edge interface is an interface that is half-duplex or has more than
two bridges on the same LAN.

When compared to STP, RSTP introduces the concept of a port state, role and
interface. The state and role of a RSTP based port are independent. A port can send
or receive BPDUs or data based on its current state. The role of a port depends on its
position in the network. The role of a port can be determined by performing a BPDU
comparison during convergence.

Table 5.1 shows the mapping between RSTP port states and roles.

Table 5.1 Mapping between RSTP Port States and Roles

RSTP Role RSTP State


Root Forwarding

Designated Forwarding

Alternate Discard

Backup Discard

Disabled Discard

The Alternate role in RSTP is analogous to the Blocked port in STP. Defining an edge
port allows a port to transition into a forwarding state, eliminating the 30-second
delay that occurs with STP.

Multiple Spanning Tree Protocol


Multiple Spanning Tree Protocol (MSTP) is an enhancement to RSTP. MSTP
supports the logical division of a Layer 2 network, or even a single switch into
regions. A region here refers to a network, single VLAN or multiple VLANs. With
MSTP, separate spanning tree groups or instances can be configured for each
network, VLAN or group of VLANs. There can be Multiple Spanning Tree Instances
(MSTI) for each region. MSTP can thus control the spanning tree topology within
each region. On the other hand, Common Instance Spanning Tree (CIST) is a
separate instance that is common across all regions. It controls the topology
between the different regions.

Each MSTI has a spanning tree associated with it. RSTP based spanning tree tables
are maintained per MSTI. Using CIST to distribute this information over the
common instance minimizes the exchange of spanning tree related packets and
thus network traffic between regions.

MSTP is compatible with STP and makes use of RSTP for convergence algorithms.
Chapter 5: Configuring Spanning Tree Protocols 71

MSTI-A
VLAN 501
BPDUs - internal to instance MSTI-A

CIST MSTI-B
BPDUs VLANs 990, 991
between instances BPDUs - internal to instance MSTI-B

MSTI-C
VLANs 100, 200, 300
BPDUs - internal to instance MSTI-C

Figure 5.2 MSTP Example

Figure 5.2 shows the three MSTIs: A, B, and C. Each of these instances consists of
either one or more VLANs. BPDUs specific to the particular instance are exchanged
within each of the MSTIs. The CIST handles all BPDU information which is required
to maintain the topology across the regions. CIST is the instance that is common to
all regions.

With MSTP, bridge priorities and related configurations can be applied on a per
instance basis. Thus, a root bridge in one instance does not necessarily have to be in
a different instance.

VLAN Spanning Tree Protocol


In case of VLAN Spanning Tree Protocol (VSTP), each VLAN has a spanning tree
associated with it. The problem with this approach is mainly that of scalability – the
processing resources consumed increase proportionally with the number of VLANs.

When configuring VSTP, the bridge priorities and the rest of the spanning tree
configuration can be applied on a per VLAN basis.

NOTE When configuring VSTP, please pay close attention to the following:

• When using virtual switches, VSTP cannot be configured on virtual switch


bridge domains that contain ports with either VLAN ranges or mappings.
• VSTP can only be enabled for a VLAN ID that is associated with a bridge
domain or VPLS routing instance. All logical interfaces assigned to the VLAN
must have the same VLAN ID.
• VSTP is compatible with the Cisco PVST implementation.
Table 5.2 lists the support existing on different platforms for the different spanning
tree protocols.
72 Data Center Network Connectivity with IBM Servers

Table 5.2 Overview of Spanning Tree Protocols and Platforms

Platforms
IBM BladeCenter
Protocols EX4200 EX8200 MX Series
(Cisco ESM)
Configuration not supported, Configuration not supported,
STP works with MSTP/PVST STP STP works with RSTP (backwards
(backwards compatible) compatible)
Configuration not supported,
RSTP works with MSTP/PVST RSTP RSTP RSTP
(backwards compatible)
MSTP MSTP MSTP MSTP MSTP
PVST+(Cisco)/
PVST+ VSTP VSTP VSTP
VSTP (Juniper)

Rapid-PVST+ (Cisco)/
Rapid-PVST+ VSTP VSTP VSTP
VSTP (Juniper)

Configuring RSTP/MSTP
Figure 5.3 shows a sample MSTP network that can be used to configure and verify
RSTP/MSTP functionality. The devices in this network connect in a full mesh. The
switches and IBM BladeCenter connect in a mesh and are assigned these priorities:

• EX 4200-A – 0K (lowest bridge priority number)


• MX480 – 8K
• EX8200 – 16K
• IBM BladeCenter (Cisco ESM) – 16K
• EX4200-B – 32K
We configure EX4200-A as the root bridge. Two MSTP instances, MSTI-1 and
MSTI-2, correspond to VLANs 1122 and 71, respectively. Either one, or both, of these
VLANs are configured on links between the switches on this spanning tree network.
Table 5.3 shows the association between the links, VLANs and MSTI instances.

Table 5.3 Association between Links, VLANs and MSTI Instances

Links between Switches MSTI Instance VLAN ID


EX4200-B – EX8200 MSTI-1 1122

EX4200-B – IBM BladeCenter MSTI-1 1122

EX4200-B – MX480 MSTI-1, MSTI-2 1122, 71

EX4200-A – IBM BladeCenter MSTI-2 71

EX4200-A – MX480 MSTI-1, MSTI-2 1122, 71

EX4200-A – EX8200 MSTI-1 1122

MX480 – IBM BladeCenter MSTI-1, MSTI-2 1122, 71


Chapter 5: Configuring Spanning Tree Protocols 73

MSTP
Priority 8K
ge-5/3/4
BladeCenter
BladeCenter *blade 8, eth 1,
* blade 6, eth 1, ge-5/3/3 ip=11.22.1.8
ip=11.22.1.6
VLAN [71, 1122]
ge-5/2/2
Priority 16K ge-1/0/2
MX480 BladeCenter
ge-5/1/2
ge-1/0/6 *blade 7, eth 1,
ge-1/0/8 ge-5/1/1 VLAN [71, 1122] ip=11.22.1.7
ge-0/0/21
ge-0/0/20 ge-0/0/7
VLAN [1122]
ge-0/0/9
ge-0/0/0
EX4200
VLAN [71, 1122] ge-0/0/23 172.28.113.175
EX8200 ge-1/0/4
Priority 0K BladeCenter
*blade 9, eth 1,
VLAN [1122]
ge-0/0/14 VLAN [71, 1122] ip=11.22.1.9

ge-0/0/10
ge-0/0/12
ge-0/0/15
VLAN [71]
EX4200 ge-0/0/13
172.28.113.180 Trunk Port 17
Priority 32K
VLAN [1122]
Trunk Port 19
BladeCenter
* blade 10, eth 1, Trunk Port 18
ip=11.22.1.10

IBM BladeCenter 9.3.71.39 Internal eth 0 interface for


Priority 16K 9.3.71.50 each blade [6, 7, 8, 9, 10]
9.3.71.40 connected via Trunk Ports [17, 18, 19, 20]
Pass-Through Module 9.3.71.35
via eth 1 interface on Blades 9.3.71.41
[6, 7, 8, 9, 10]

*Server connections simulated to each DUT via eth 1 interface connected to BladeCenter Pass-Through Module

Figure 5.3 Spanning Tree MSTP/RSTP

Another instance MSTI-0 (constitutes the CIST) is created by default to exchange


the overall spanning tree information for all MSTIs between the switches. The blade
servers connect to each of the switches as hosts/servers. The switch ports on the
different switches that connect to these BladeCenter servers are defined as edge
ports and are assigned IP addresses. The selection of the root bridge is controlled
by explicit configuration. That is, a bridge can be prevented from being elected as a
root bridge by enabling root protection.
74 Data Center Network Connectivity with IBM Servers

Configuration Snippets
The following code pertains to the EX4200-A (RSTP/MSTP):

// Enable RSTP by assigning bridge priorities.


// Set priorities on interfaces to calculate the least cost path.
// Enable root protection so that the interface is blocked for the RSTP
instance that receives superior BPDUs. Also, define the port to be an edge
port.
rstp {
bridge-priority 4k;
interface ge-0/0/0.0 {
priority 240;
}
interface ge-0/0/7.0 {
priority 240;
edge;
no-root-port;
}
interface ge-0/0/9.0 {
priority 240;
edge;
no-root-port;
}
interface ge-0/0/20.0 {
priority 240;
}
interface ge-0/0/21.0 {
priority 240;
}
}
chandra@EX-175-CSR# show protocols mstp
configuration-name MSTP;
bridge-priority 8k;
interface ge-0/0/0.0 {
priority 240;
}
// Enable RSTP by assigning bridge priorities
// Set priorities on interfaces.
// Enable root protection so that the interface is blocked when it receives
BPDUs. An operator can configure a bridge not to be elected as a root bridge by
enabling root protection. Root protection increases user control over the
placement of the root bridge in the network.

Also, define the port to be an edge port.


// Define MST-1, provide a bridge priority for the instance. Associate a VLAN
with the instance.
// Define MSTI-2, provide a bridge priority for the instance. Associate a VLAN
and interface with the instance.
interface ge-0/0/7.0 {
priority 240;
edge;
no-root-port;
}
interface ge-0/0/9.0 {
priority 240;
edge;
no-root-port;
}
interface ge-0/0/20.0 {
Chapter 5: Configuring Spanning Tree Protocols 75

priority 224;
}
interface ge-0/0/21.0 {
priority 192;
}
interface ge-0/0/23.0 {
priority 224;
}
msti 1 {
bridge-priority 8k;
vlan 1122;
}
msti 2 {
bridge-priority 8k;
vlan 71;
interface ge-0/0/23.0 {
priority 224;
}
}

The following code snippet pertains to the EX4200-B:

// Enable RSTP by assigning bridge priorities


// Set priorities on interfaces.
// Enable root protection so that the interface is blocked when it receives
BPDUs. Also, define the port to be an edge port.
rstp {
bridge-priority 4k;
interface ge-0/0/10.0 {
priority 240;
}
interface ge-0/0/12.0 {
priority 240;
}
interface ge-0/0/14.0 {
priority 240;
}
interface ge-0/0/15.0 {
priority 240;
edge;
no-root-port;
}
}
// Assign bridge priorities.
// Set priorities on interfaces.
// Enable root protection so that the interface is blocked when it receives
BPDUs. Also, define the port to be an edge port.
chandra@SPLAB-EX-180> show configuration protocols mstp
configuration-name MSTP;
bridge-priority 0;
interface ge-0/0/10.0 {
priority 240;
}
interface ge-0/0/11.0 {
priority 240;
}
interface ge-0/0/12.0 {
priority 224;
}
76 Data Center Network Connectivity with IBM Servers

interface ge-0/0/13.0 {
priority 224;
}
interface ge-0/0/14.0 {
priority 192;
}
interface ge-0/0/15.0 {
priority 240;
edge;
no-root-port;
}
// Define MSTI-1, provide a bridge priority for the instance. Associate a VLAN
with the instance.
msti 1 {
bridge-priority 0;
vlan 1122;
}
// Define MSTI-2, provide a bridge priority for the instance. Associate a VLAN
and interface with the instance.
msti 2 {
bridge-priority 0;
vlan 71;
interface ge-0/0/13.0 {
priority 224;
}
}

The following code snippet pertains to the MX480:

rstp {
bridge-priority 40k;
interface ge-5/1/1 {
priority 240;
}
interface ge-5/1/2 {
priority 240;
}
interface ge-5/2/2 {
priority 240;
}
interface ge-5/3/3 {
priority 240;
}
interface ge-5/3/4 {
priority 240;
edge;
no-root-port;
}
}
chandra@HE-RE-0-MX480# show protocols mstp
bridge-priority 8k;
interface ge-5/1/1 {
priority 224;
}
interface ge-5/1/2 {
priority 192;
}
interface ge-5/2/2 {
priority 192;
}
Chapter 5: Configuring Spanning Tree Protocols 77

interface ge-5/3/3 {
priority 224;
}
interface ge-5/3/4 {
priority 240;
edge;
no-root-port;
}
msti 1 {
bridge-priority 4k;
vlan 1122;
}
msti 2 {
bridge-priority 4k;
vlan 71;
interface ge-5/1/1 {
priority 224;
}
}

Verification
Based on the sample network, administrators can verify the RSTP/MSTP
configuration by issuing the show commands to verify that there are two MSTI
instances and one MTSI-0 common instance present on each switch. The following
CLI sample shows these three different MSTI instances and the VLANs associated
with each of them:

chandra@SPLAB-EX-180> show spanning-tree mstp configuration


MSTP information
Context identifier : 0
Region name : MSTP
Revision : 0
Configuration digest : 0xeef3ba72b1e4404425b44520425d3d9e
MSTI Member VLANs
0 0-70,72-1121,1123-4094
1 1122
2 71

Each of these instances should have a RP (ROOT), BP (ALT) and DP (DESG) of its
own:

chandra@SPLAB-EX-180> show spanning-tree interface


Spanning tree interface parameters for instance 0
Interface Port ID Designated Designated Port State Role
port ID bridgeID Cost
ge-0/0/10.0 240:523 240:513 0.0019e2544040 20000 FWD ROOT
ge-0/0/11.0 240:524 240:524 32768.0019e2544ec0 20000 FWD DESG
ge-0/0/12.0 224:525 224:525 32768.0019e2544ec0 20000 FWD DESG
ge-0/0/13.0 224:526 224:526 32768.0019e2544ec0 20000 FWD DESG
ge-0/0/14.0 192:527 192:213 8192.001db5a167d1 20000 BLK ALT
ge-0/0/15.0 240:528 240:528 32768.0019e2544ec0 20000 FWD DESG
ge-0/0/36.0 128:549 128:549 32768.0019e2544ec0 20000 FWD DESG
ge-0/0/46.0 128:559 128:559 32768.0019e2544ec0 20000 FWD DESG
Spanning tree interface parameters for instance 1
Interface Port ID Designated Designated Port State Role
port ID bridge ID Cost
ge-0/0/10.0 128:523 128:513 1.0019e2544040 20000 FWD ROOT
78 Data Center Network Connectivity with IBM Servers

ge-0/0/12.0 128:525 128:525 32769.0019e2544ec0 20000 FWD DESG


ge-0/0/14.0 128:527 192:213 4097.001db5a167d1 20000 BLK ALT
ge-0/0/15.0 128:528 128:528 32769.0019e2544ec0 20000 FWD DESG
Spanning tree interface parameters for instance 2
Interface Port ID Designated Designated Port State Role
port ID bridge ID Cost
ge-0/0/10.0 128:523 128:513 2.0019e2544040 20000 FWD ROOT
ge-0/0/13.0 224:526 224:526 16386.0019e2544ec0 20000 FWD DESG
ge-0/0/14.0 128:527 192:213 4098.001db5a167d1 20000 BLK ALT

The following CLI output shows the MSTI-0 information on the Root Bridge. All
ports are in the forwarding state.

chandra@EX-175-CSR> show spanning-tree interface


Spanning tree interface parameters for instance 0
Interface Port ID Designated Designated Port State Role
port ID bridge ID Cost
ge-0/0/0.0 240:513 240:513 12288.0019e2544040 20000 FWD DESG
ge-0/0/7.0 240:520 240:520 12288.0019e2544040 20000 FWD DESG
ge-0/0/9.0 240:522 240:522 12288.0019e2544040 20000 FWD DESG
ge-0/0/20.0 240:533 240:533 12288.0019e2544040 20000 FWD DESG
ge-0/0/21.0 240:534 240:534 12288.0019e2544040 20000 FWD DESG
ge-0/0/24.0 128:537 128:537 12288.0019e2544040 20000 FWD DESG
ge-0/0/25.0 128:538 128:538 12288.0019e2544040 20000 FWD DESG

1. Check that only the information from instance MSTI-0 (but not MSTI-1 and
MSTI-2) is available on all switches.
2. Confirm that there is only one direct path to any other interface within each
MSTI instance on a switch. All other redundant paths should be designated as
Blocked. Use the show spanning-tree interface command for this purpose.
3. Verify that a change in priority on any MSTI instance on a switch is propagated
through the entire mesh using the show spanning-tree interface command.

Configuring VSTP/PVST+/Rapid-PVST+
Figure 5.4 depicts a sample network consisting of a mesh of EX8200/4200 and
MX480 devices with the Cisco ESM switch. VSTP and PVST+ must be enabled on
the Cisco and Juniper devices, respectively, for interoperability. Two VLANs 1122 and
71 are created on all devices; VSTP is enabled for both of these VLANs.
Chapter 5: Configuring Spanning Tree Protocols 79

Cisco (PVST +/ Rapid-PVST+) and Juniper (VSTP)

Priority
BladeCenter bc_ext:16K BladeCenter
* blade 6, eth 1, bc_int:16K * blade 8, eth 1,
ip=11.22.1.6 ge-5/3/4 ip=11.22.1.8

ge-5/3/3
Priority
bc_ext:24K ge-5/2/2
VLAN [71, 1122]
bc_int:12K
MX480 VLAN [71, 1122]
ge-1/0/2 ge-5/1/2 ge-5/1/1
Trunk Port 17
ge-1/0/6
9.3.71.39
ge-1/0/8 9.3.71.50
Trunk Port 18 9.3.71.40
VLAN [71, 1122] 9.3.71.35
VLAN [71, 1122] 9.3.71.41

EX8200 Internal eth 0 interface for


ge-1/0/4 each blade [6, 7, 8, 9, 10]
Trunk Port 19 connected via
VLAN [1122] Trunk Ports [17, 18, 19]
VLAN [1122]
VLAN [71]
ge-0/0/12 ge-0/0/21
VLAN [71]
ge-0/0/14 ge-0/0/23

ge-0/0/13 ge-0/0/20
ge-0/0/15 ge-0/0/9
EX4200 ge-0/0/7 EX4200
172.28.113.175
Priority
bc_ext:16K
bc_int:8K
BladeCenter BladeCenter BladeCenter
* blade 10, eth 1, * blade 7, eth 1, * blade 9, eth 1,
ip=11.22.1.10 ip=11.22.1.7 ip=11.22.1.9

*Server connections simulated to each DUT via eth 1 interface connected to BladeCenter’s
Pass-Through Module for blade slots [6, 7, 8, 9, 10]

Figure 5.4 Spanning Tree – VSTP/(PVST+, Rapid-PVST+)


80 Data Center Network Connectivity with IBM Servers

Table 5.4 lists the bridge priorities for each of the VLANs.

Table 5.4 VSTP Bridge Priorities

VLAN ID Bridge Priority

EX4200-A – 8K
EX4200-B – 4K
71
EX8200 – 12K
MX480 – 16K

EX4200-A – 16K
EX4200-B – 32K
1122
EX8200 – 24K
MX480 – 16K

Verification
Based on the sample setup as shown in Figure 5.4, verify interoperability of the
VSTP configuration with Cisco PVST+ by performing the following steps.

• 1. Verify that each of the switches with VSTP/ PVST+ enabled has two spanning
trees corresponding to two VLANs. Each VLAN has its own RP (ROOT), BP
(ALT) and DP (DESG). Use the show spanning tree command.
chandra@SPLAB-EX-180> show spanning-tree interface
Spanning tree interface parameters for VLAN 1122
Interface Port ID Designated Designated Port State Role
port ID bridge ID Cost
ge-0/0/10.0 128:523 128:513 17506.0019e2544040 20000 FWD ROOT
ge-0/0/12.0 224:525 224:525 33890.0019e2544ec0 20000 FWD DESG
ge-0/0/14.0 240:527 240:213 17506.001db5a167d0 20000 BLK ALT
ge-0/0/15.0 240:528 240:528 33890.0019e2544ec0 20000 FWD DESG
Spanning tree interface parameters for VLAN 71
Interface Port ID Designated Designated Port State Role
port ID bridge ID Cost
ge-0/0/10.0 128:523 128:523 4167.0019e2544ec0 20000 FWD DESG
ge-0/0/13.0 224:526 224:526 4167.0019e2544ec0 20000 FWD DESG
ge-0/0/14.0 240:527 240:527 4167.0019e2544ec0 20000 FWD DESG

• 2. Confirm that there is only one direct active path per VLAN instance to any
other non-root bridge. All redundant paths should be identified as Blocked. Use
the output of the show spanning-tree interface command for this purpose.
• Rebooting the root port should cause the device with the next lower priority
step up as the root for the particular VLAN. This information must be updated in
the VLAN table on all devices.
• 3. Verify that the original root bridge becomes the primary (active), after the
reboot. This information should be updated on all devices in the mesh.

NOTE Any change in bridge priorities on either of the VSTP must be propagated through
the mesh.
Chapter 5: Configuring Spanning Tree Protocols 81

Configuration Snippets
The following code pertains to the EX4200-A.

chandra@EX-175-CSR> show configuration protocols vstp


// Define a “VLAN bc-external”, assign bridge and interface priorities.
// Enable root protection so that the interface is blocked when it receives
BPDUs. Also, define the port to be an edge port.
vlan bc-external {
bridge-priority 16k;
interface ge-0/0/7.0 {
priority 240;
edge;
no-root-port;
}
interface ge-0/0/20.0 {
priority 224;
}
interface ge-0/0/21.0 {
priority 240;
}
}
// Define a “VLAN bc-internal”, assign bridge and interface priorities.
// Enable root protection so that interface is blocked when it receives
BPDUs. Also, define the port to be an edge port.
vlan bc-internal {
bridge-priority 8k;
interface ge-0/0/9.0 {
priority 240;
edge;
no-root-port;
}
interface ge-0/0/21.0 {
priority 240;
}
interface ge-0/0/23.0 {
priority 224;
}
}

The following code pertains to the MX480.

// Define VLAN71, assign bridge and interface priorities.


// Define VLAN1122, assign bridge and interface priorities.
chandra@HE-RE-0-MX480> show configuration protocols vstp
vlan 71 {
bridge-priority 16k;
interface ge-5/1/1 {
priority 240;
}
interface ge-5/1/2 {
priority 240;
}
interface ge-5/2/2 {
priority 240;
}
interface ge-5/3/3 {
priority 240;
}
}
82 Data Center Network Connectivity with IBM Servers

vlan 1122 {
bridge-priority 16k;
interface ge-5/1/1 {
priority 240;
}
interface ge-5/1/2 {
priority 240;
}
interface ge-5/2/2 {
priority 240;
}
interface ge-5/3/3 {
priority 240;
}
interface ge-5/3/4 {
priority 240;
}
}
83

Chapter 6

Supporting Multicast Traffic

Internet Group Management Protocol Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

Configuring Protocol Independent Multicast. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

IGMP Snooping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Configuring IGMP Snooping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

IPv4 SENDS IP DATAGRAMS to a single destination or a group of interested


receivers by using three fundamental types of addresses:
• Unicast – sends a packet to a single destination.
• Broadcast – sends a datagram to an entire subnetwork.
• Multicast – sends a datagram to a set of hosts that can be on different .
sub-networks and can be configured as members of a multicast group.
84 Data Center Network Connectivity with IBM Servers

Internet Group Management Protocol Overview


A multicast datagram is delivered to destination group members with the same
best-effort reliability as a standard unicast IP datagram. This means that multicast
datagrams are not guaranteed to reach all members of a group or to arrive in the
same order in which they were transmitted. The only difference between a multicast
IP packet and a unicast IP packet is the presence of a group address in the IP header
destination address field.

NOTE According to RFC3171, IP addresses 224.0.0.0 through 239.255.255.255 are


designated as multicast addresses in IP v4. Individual hosts can join or leave a
multicast group at any time. There are no restrictions on the physical location or on
the number of members in a multicast group. A host can be a member of more than
one multicast group at any time and does not have to belong to a group to send
packets to members of a group.

Routers use a group membership protocol to learn about the presence of group
members on directly attached subnetworks. When a host joins a multicast group, it
transmits a group membership protocol message to the group, and sets its IP
process and network interface card to receive frames addressed to the multicast
group.

Junos software supports IP multicast routing with many protocols, such as:

• Internet Group Management Protocol (IGMP), versions 1, 2 and 3.


• Multicast Listener Discovery (MLD), versions 1 and 2.
• Distance Vector Multicast Routing Protocol (DVMRP).
• Protocol Independent Multicast (PIM).
• Multicast Source Discovery Protocol (MSDP).
• Session Announcement Protocol (SAP) and Session Description
Protocol(SDP).
For details concerning the IP multicast feature and how to configure it using Junos
OS v10.0, please refer to the IP Multicast Operational Mode Commands Guide at
https://www.kr.juniper.net/techpubs/en_US/junos10.0/information-products/
topic-collections/swcmdref-protocols/chap-ip-multicast-op-mode-cmds.
html#chap-ip-multicast-op-mode-cmds.

Implementing an IP multicast network requires a number of building blocks. Figure


6.1 shows a typical end-to-end video streaming service with IP multicasting. Both
the client computer and adjacent network switches use IGMP to connect the client
to a local multicast router. Between the local and remote multicast routers, we used
Protocol Independent Multicast (PIM) to direct multicast traffic from the video
server to many multicast clients.

The Internet Group Management Protocol (IGMP) manages the membership of


hosts and routers in multicast groups. IP hosts use IGMP to report their multicast
group memberships to any adjacent multicast routers. For each of their attached
physical networks, multicast routers use IGMP to learn which groups have
members.
Chapter 6: Configuring the Internet Group Management Protocol 85

L2 Switch with
Multicast Router IGMP Snooping Video Client

Video Server Router 1 Local Multicast Router Laptop

PIM IGMP IGMP

UDP/RTP
Multicast Traffic
LAN

Figure 6-1 IP Multicasting Network Deployment

IGMP manages the membership of hosts and routers in multicast groups. IP hosts
use IGMP to report their multicast group memberships to any neighboring multicast
routers. In addition, IGMP is used as the transport for several related multicast
protocols, such as DVMRP and PIMv1. IGMP has three versions that are supported
by hosts and routers:

IGMPv1 – The original protocol defined in RFC 1112. An explicit join message is sent
to the router, but a timeout is used to determine when hosts leave a group.

IGMPv2 – Defined in RFC 2236. Among other features, IGMPv2 adds an explicit
leave message to the join message so that routers can easily determine when a
group has no listeners.

IGMPv3 – Defined in RFC 3376. IGMPv3 supports the ability to specify which
sources can send to a multicast group. This type of multicast group is called a
source-specific multicast (SSM) group and its multicast address is 232/8. IGMP v3
is also backwards compatible with IGMP v1 and IGMP v2.

For SSM mode, we can configure the multicast source address so that the source
can send the traffic to the multicast group. In this example, we create group 225.1.1.1
and accept IP address 10.0.0.2 as the only source.

user@host# set protocols igmp interface fe-0/1/2 static group 225.1.1.1


source 10.0.0.2
user@host# set protocols igmp interface fe-0/1/2 static group 225.1.1.1
source 10.0.0.2 source-count 3
user@host# set protocols igmp interface fe-0/1/2 static group 225.1.1.1
source 10.0.0.2 source-count 3 source-increment 0.0.0.2
user@host# set protocols igmp interface fe-0/1/2 static group 225.1.1.1
exclude source 10.0.0.2

NOTE The SSM configuration requires that the IGMP version on the interface be set to
IGMPv3.
86 Data Center Network Connectivity with IBM Servers

IGMP Static Group Membership


We can create IGMP static group membership for multicast forwarding without a
receiver host. The following are some of the examples with various options used
while creating static groups:

user@host# set protocols igmp interface fe-0/1/2 static group 225.1.1.1


user@host# set protocols igmp interface fe-0/1/2 static group 225.1.1.1
group-count 3
user@host# set protocols igmp interface fe-0/1/2 static group 225.1.1.1
group-count 3 group-increment 0.0.0.2

When we enable IGMP static group membership, data is forwarded to an interface


without that interface receiving membership reports from downstream hosts.

NOTE When we configure static IGMP group entries on point-to-point links that connect
routers to a rendezvous point (RP), the static IGMP group entries do not generate
join messages toward the RP.

Various Multicast Routing Protocols


Multicast routing protocols enable a collection of multicast routers to build (join)
distribution trees when a host on a directly attached subnet, typically a LAN, wants
to receive traffic from a certain multicast group. There are five multicast routing
protocols: DVMRP, Multicast Open Shortest Path First (MOSPF), CBT (Core Based
Tree), PIM-Sparse, and PIM-Dense that can be used to achieve this. Table 6.1
summarizes the differences among the five multicast routing protocols.

Table 6.1 Multicast Routing Protocols Summary

Multicast Routing Dense Sparse Implicit Explicit (*,G) Shared


(S,G) SBT
Protocols Mode Mode Join Join Tree
DVMRP Yes No Yes No Yes No
MOSPF Yes No No Yes Yes No
PIM dense mode Yes No Yes No Yes No
PIM sparse mode No Yes No Yes Yes Yes
CBT No Yes No Yes No Yes

Because PIM Sparse Mode and PIM Dense Mode are the most widely deployed
techniques, they were used in this reference design.

Protocol Independent Multicast


The predominant multicast routing protocol used on the Internet today is Protocol
Independent Multicast (PIM). PIM has two versions, v1 and v2. The main difference
between PIMv1 and PIMv2 is the packet format. PIMv1 messages use Internet Group
Management Protocol (IGMP) packets, whereas PIMv2 has its own IP protocol
number (103) and packet structure.

In addition, it is important to select the appropriate mode. Although PIM provides


four modes: sparse mode, dense mode, sparse-dense mode, and source-specific
mode, users typically use one of two basic modes: sparse mode or dense mode.
Chapter 6: Configuring the Internet Group Management Protocol 87

PIM dense mode requires only a multicast source and a series of multicast-enabled
routers that run PIM dense mode to allow receivers to obtain multicast content.
Dense mode ensures that the traffic reaches its prescribed destination by
periodically flooding the network with multicast traffic, and relies on prune
messages to ensure that subnets (where all receivers are un-interested in that
particular multicast group) stop receiving packets.

PIM sparse mode requires establishing special routers called rendezvous points
(RPs) in the network core. This is the point where these routers upstream join
messages from interested receivers and meet downstream traffic from the source
of the multicast group content. A network can have many RPs, but PIM sparse
mode allows only one RP to be active for any multicast group.

On the multicast router, it typically has two IGMP interfaces: upstream IGMP
interface and downstream IGMP interface. We must configure PIM on the upstream
IGMP interfaces to enable multicast routing and to perform reverse path forwarding
for multicast data packets to populate the multicast-forwarding table for the
upstream interfaces. In the case of PIM sparse mode, to distribute IGMP group
memberships into the multicast routing domain.

Only one “pseudo PIM interface” is required to represent all IGMP downstream
(IGMP-only) interfaces on the router. Therefore, PIM is generally not required on all
IGMP downstream interfaces, reducing the amount of router resources, such as
memory.

IGMP and Nonstop Active Routing


NSR configurations include passive support with IGMP in association with PIM. The
primary Routing Engine uses IGMP to determine its PIM multicast state, and this
IGMP-derived information is replicated on the backup Routing Engine. IGMP on the
new primary Routing Engine (after failover) relearns the state information quickly
through the IGMP operation. In the interim, the new primary Routing Engine retains
the IGMP-derived PIM state as received by the replication process from the original
primary Routing Engine. This state information times out unless refreshed by IGMP
on the new primary Routing Engine. Additional IGMP configuration is not required.

Filtering Unwanted IGMP Reports at the IGMP Interface Level


The group-policy statement enables the router to filter unwanted IGMP reports at
the interface level. When this statement is enabled on a router running IGMP version
2 (IGMPv2) or version 3 (IGMPv3), after the router receives an IGMP report, the
router compares the group against the specified group policy and performs the
action configured in that policy. For example, the router rejects the report if the
policy doesn’t match the defined address or network.

To enable IGMP report filtering for an interface, include the following group-policy
statement:

protocols {
igmp {
interface ge-1/1/1.0 {
group-policy reject_policy;
}
}
}
88 Data Center Network Connectivity with IBM Servers

policy-options {
//IGMPv2 policy
policy-statement reject_policy {
from {
router-filter 192.1.1.1/32 exact;
}
then reject;
}
policy-statement reject_policy {
//IGMPv3 policy
from {
router-filter 192.1.1.1/32 exact;
source-address-filter 10.1.0.0/16 orlonger;
}
then reject;
}
}

IGMP Configuration Command Hierarchy

To configure the Internet Group Management Protocol (IGMP), include the


following igmp statement:

igmp {
accounting; // Accounting Purposes
interface interface-name {
disable;
(accounting | no-accounting); // Individual interface specific
accounting group-policy [ policy-names ];
immediate-leave; // see Note 1 at end of code snippet.

oif-map map-name;
promiscuous-mode; // See Note 2 at end of code snippet.

ssm-map ssm-map-name;
static {
group multicast-group-address {
exclude;
group-count number;
group-increment increment;
source ip-address {
source-count number;
source-increment increment;
}
}
}
version version; // See Note 3 at end of code snippet.
}
query-interval seconds;
query-last-member-interval seconds; // Default 1 Second
query-response-interval seconds; // Default 10 Seconds
robust-count number; // See Note 4 at end of code snippet.

traceoptions { // Tracing Purposes


file filename <files number> <size size> <world-readable | no-world-readable>;
flag flag <flag-modifier> <disable>;
// Flag can be : [leave (for IGMPv2 only)| mtrace | packets | query | report]
}
}
Chapter 6: Configuring the Internet Group Management Protocol 89

NOTE 1 Use this statement only on IGMP version 2 (IGMPv2) interfaces to which one IGMP
host is connected. If more than one IGMP host is connected to a LAN through the
same interface, and one host sends a leave group message, the router removes all
hosts on the interface from the multicast group. The router loses contact with the
hosts that must remain in the multicast group until they send join requests in
response to the router’s next general group membership query.

NOTE 2 By default, IGMP interfaces accept IGMP messages only from the same
subnetwork. The promiscuous-mode statement enables the router to accept IGMP
messages from different sub-networks.

NOTE 3 By default, the router runs IGMPv2. If a source address is specified in a multicast
group that is configured statically, the IGMP version must be set to IGMPv3.
Otherwise, the source will be ignored and only the group will be added. The join will
be treated as an IGMPv2 group join.

When we reconfigure the router from IGMPv1 to IGMPv2, the router will continue to
use IGMPv1 for up to 6 minutes and will then use IGMPv2.

NOTE 4 The robustness variable provides fine-tuning to allow for expected packet loss on a
subnetwork. The value of the robustness variable is used in calculating the
following IGMP message intervals: Group member interval=(robustness variable x
query-interval) + (1 x query-response-interval) Other querier present interval=
(robustness variable x query-interval) + (0.5 x query-response-interval), last-
member query count=robustness variable. By default, the robustness variable is
set to 2. Increase this value if you expect a subnetwork to lose packets.

Configuring Protocol Independent Multicast


This section focuses on configuring PIM-Sparse Mode on the MX Ethernet router
and EX Ethernet switch series with various routing protocols based on the following
scenarios.

• Scenario 1: configure PIM on MX480 and EX4200 with OSPF.


• Scenario 2: configure PIM on EX8200 and EX4200 with RIP.
In each scenario, we used an IGMP server as the source of the multicast streams
and used the VideoLAN (VLC) media player as the IGMP client, which makes a
request to join the multicast group.

Scenario 1: Configuring PIM on the MX480 and EX4200 with OSPF


As illustrated in Figure 6.2, the MX480 and EX4200 are the multicast routers, which
interoperate with OSPF routing protocol. PIM is configured on both routers and is
configured only on upstream interfaces to enable multicast routing. The multicast
client runs on the IBM Blade Server, which connects to the access switch, for
example the EX4200.
90 Data Center Network Connectivity with IBM Servers

IGMP Multicast Source


IGMP Source
(Streaming)

ge-5/2/6
Multicast Router

Io0.0
8.8.8.8

MX480 ge-5/2/5

PIM with OSPF


ge-0/0/44
Multicast Router Io0.0
6.6.6.6
EX4200
ge-0/0/2

VLAN 1119
IGMP Multicast Client

BNT
Pass-Through

Eth1

Figure 6.2 Configuring PIM on MX480 and EX4200 with OSPF

Configuring the MX480


chandra@HE-RE-1-MX480# show ge-5/2/5
unit 0 {
family inet {
address 22.11.5.5/24;
}
}
{master}[edit interfaces]
chandra@HE-RE-1-MX480# show lo0
unit 0 {
family inet {
address 8.8.8.8/32;
}
}
chandra@HE-RE-1-MX480# show protocols igmp
interface all {
promiscuous-mode;
}
interface ge-5/2/6.0 {
static {
group 239.168.1.1 {
group-count 10;
source 10.10.10.254;
}
}
Chapter 6: Configuring the Internet Group Management Protocol 91

}
interface ge-5/2/5.0 {
static {
group 239.168.1.4;
}
}
{master}[edit]
chandra@HE-RE-1-MX480# show protocols pim
rp {
local {
address 8.8.8.8;
}
}
interface all {
mode sparse;
}
interface fxp0.0 {
disable;
}
chandra@HE-RE-1-MX480# show protocols ospf
area 0.0.0.0 {
interface ge-5/2/5.0;
interface lo0.0 {
passive;
}
interface fxp0.0 {
disable;
}
}
chandra@HE-RE-1-MX480# show routing-options
router-id 8.8.8.8;

Configuring the EX4200


chandra@EX-175-CSR# show interfaces ge-0/0/2
unit 0 {
family ethernet-switching;
}
chandra@EX-175-CSR# show interfaces ge-0/0/44
unit 0 {
family inet {
address 22.11.5.44/24;
}
}

chandra@EX-175-CSR# show interfaces vlan


unit 1119 {
family inet {
address 10.10.9.100/24;
}
}
chandra@EX-175-CSR# show protocols igmp
interface me0.0 {
disable;
}
interface vlan.1119 {
immediate-leave;
}
interface ge-0/0/6.1119;
interface all;
chandra@EX-175-CSR# show protocols pim
92 Data Center Network Connectivity with IBM Servers

rp {
static {
address 8.8.8.8;
}
}
interface vlan.1119;
interface me0.0 {
disable;
}
interface all {
mode sparse;
}
chandra@EX-175-CSR# show interfaces lo0
unit 0 {
family inet {
address 6.6.6.6/32;
}
}
chandra@EX-175-CSR# show protocols ospf
area 0.0.0.0 {
interface ge-0/0/44.0;
interface lo0.0 {
passive;
}
interface me0.0 {
disable;
}
}
chandra@EX-175-CSR# show routing-options
router-id 6.6.6.6;

Validating the MX480 Configuration


chandra@HE-RE-1-MX480> show route |grep PIM
224.0.0.2/32 *[PIM/0] 06:21:14
MultiRecv
224.0.0.13/32 *[PIM/0] 06:21:14
MultiRecv
239.168.1.1,10.10.10.254/32*[PIM/105] 01:28:54
Multicast (IPv4)
239.168.1.2,10.10.10.254/32*[PIM/105] 01:23:33
Multicast (IPv4)
239.168.1.3,10.10.10.254/32*[PIM/105] 01:23:33
Multicast (IPv4)
239.168.1.4,10.10.10.254/32*[PIM/105] 01:23:33
Multicast (IPv4)

chandra@HE-RE-1-MX480> show pim neighbors


Instance: PIM.master
B = Bidirectional Capable, G = Generation Identifier,
H = Hello Option Holdtime, L = Hello Option LAN Prune Delay,
P = Hello Option DR Priority
Interface IP V Mode Option Uptime Neighbor addr
ge-5/2/5.0 4 2 HPLG 01:14:14 22.11.5.44
chandra@HE-RE-1-MX480> show pim join
Instance: PIM.master Family: INET
R = Rendezvous Point Tree, S = Sparse, W = Wildcard

Group: 239.168.1.1
Chapter 6: Configuring the Internet Group Management Protocol 93

Source: *
RP: 8.8.8.8
Flags: sparse,rptree,wildcard
Upstream interface: Local

Group: 239.168.1.1
Source: 10.10.10.254
Flags: sparse,spt
Upstream interface: ge-5/2/6.0

Group: 239.168.1.2
Source: *
RP: 8.8.8.8
Flags: sparse,rptree,wildcard
Upstream interface: Local

Group: 239.168.1.2
Source: 10.10.10.254
Flags: sparse,spt
Upstream interface: ge-5/2/6.0

chandra@HE-RE-1-MX480> show pim source


Instance: PIM.master Family: INET

Source 8.8.8.8
Prefix 8.8.8.8/32
Upstream interface Local
Upstream neighbor Local

Source 10.10.10.254
Prefix 10.10.10.0/24
Upstream interface ge-5/2/6.0
Upstream neighbor 10.10.10.2

Source 10.10.10.254
Prefix 10.10.10.0/24
Upstream interface ge-5/2/6.0
Upstream neighbor Direct

Validating the EX4200 Configuration


chandra@EX-175-CSR# run show pim join
Instance: PIM.master Family: INET
R = Rendezvous Point Tree, S = Sparse, W = Wildcard

Group: 239.168.1.1
Source: *
RP: 8.8.8.8
Flags: sparse,rptree,wildcard
Upstream interface: ge-0/0/44.0
Group: 239.168.1.1
Source: 10.10.10.254
Flags: sparse,spt
Upstream interface: ge-0/0/44.0
chandra@EX-175-CSR# run show pim neighbors
Instance: PIM.master
B = Bidirectional Capable, G = Generation Identifier,
94 Data Center Network Connectivity with IBM Servers

H = Hello Option Holdtime, L = Hello Option LAN Prune Delay,


P = Hello Option DR Priority

Interface IP V Mode Option Uptime Neighbor addr


ge-0/0/44.0 4 2 HPLG 01:06:07 22.11.5.5
chandra@EX-175-CSR# run show pim source
Instance: PIM.master Family: INET

Source 8.8.8.8
Prefix 8.8.8.8/32
Upstream interface ge-0/0/44.0
Upstream neighbor 22.11.5.5

Source 10.10.10.254
Prefix 10.10.10.0/24
Upstream interface ge-0/0/44.0
Upstream neighbor 22.11.5.5

Scenario 2: Configuring PIM on the EX8200 and EX4200 with RIP


As illustrated in Figure 6.3, the EX8200 and EX4200 are the multicast routers with
RIP enabled. PIM is configured in both routers, and is configured only on upstream
interfaces to enable multicast routing. The multicasting client runs on the IBM
PowerVM, which connects to the EX4200 access switch.

Configuring the EX4200


chandra@EX-175-CSR# show interfaces ge-0/0/2
unit 0 {
family ethernet-switching;
}

chandra@EX-175-CSR# show interfaces ge-0/0/17


unit 0 {
family inet {
address 22.11.2.17/24;
}
}

chandra@EX-175-CSR# show interfaces vlan


unit 2211 {
family inet {
address 10.10.9.200/24;
}
}
chandra@EX-175-CSR# show protocols igmp
interface me0.0 {
disable;
}
interface vlan.2211 {
Chapter 6: Configuring the Internet Group Management Protocol 95

IGMP Multicast Source


IGMP Multicast Source
(Streaming)

ge-1/0/20
Multicast Router
Io0.0
9.9.9.9

EX8200 ge-1/0/26
PIM with RIP
ge-0/0/17
Multicast Router Io0.0
6.6.6.6
EX4200 ge-0/0/2

VLAN 2211
IGMP Multicast Client
IBM POWERVM NICs/HEA
(Host Ethernet Adapter)

SEA
(Shared Ethernet Adapter)
Virtual
Network

Figure 6.3 Configuring PIM on EX8200 and EX4200 with RIP

immediate-leave;
}
interface ge-0/0/2.2211;
interface all;
chandra@EX-175-CSR# show protocols pim
rp {
static {
address 9.9.9.9;
}
}
interface vlan.2211;
interface me0.0 {
disable;
}
interface all {
mode sparse;
}
chandra@EX-175-CSR# show interfaces lo0
unit 0 {
family inet {
address 6.6.6.6/32;
}
}
96 Data Center Network Connectivity with IBM Servers

chandra@EX-175-CSR# show protocols rip


send broadcast;
receive both;
group jweb-rip {
export jweb-policy-rip-direct;
neighbor ge-0/0/2.0;
neighbor lo0.0;
neighbor vlan.2211;
}
chandra@EX-175-CSR# show policy-options
policy-statement jweb-policy-rip-direct {
term 1 {
from {
protocol [ direct rip ];
interface [ ge-0/0/2.0 ge-0/0/17.0];
}
then accept;
}
term 2 {
then accept;
}
}

Configuring the EX8200


chandra@SPLAB-8200-1-re0# show protocols rip
send broadcast;
receive both;
group jweb-rip {
export jweb-policy-rip-direct;
neighbor ge-1/0/26.0;
neighbor lo0.0;
}

chandra@SPLAB-8200-1-re0# show policy-options


policy-statement jweb-policy-rip-direct {
term 1 {
from {
protocol [ direct rip ];
interface [ ge-1/0/26.0];
}
then accept;
}
term 2 {
then accept;
}
}

IGMP Snooping
An access switch usually learns unicast MAC addresses by checking the source
address field of the frames it receives. However, a multicast MAC address can never
be the source address for a packet. As a result, the switch floods multicast traffic on
the VLAN, consuming significant amounts of bandwidth.
Chapter 6: Configuring the Internet Group Management Protocol 97

IGMP snooping regulates multicast traffic on a VLAN to avoid flooding. When IGMP
snooping is enabled, the switch intercepts IGMP packets and uses the content of
the packets to build a multicast cache table. The cache table is a database of
multicast groups and their corresponding member ports and is used to regulate
multicast traffic on the VLAN.

When the switch receives multicast packets, it uses the cache table to selectively
forward the packets only to the ports that are members of the destination multicast
group.

As illustrated in Figure 6.4, the access switch EX4200 connects four hosts and
segments their data traffic with two VLANs, where host1 and host2 belong to VLAN1
and host3 and host4 belong to VLAN2. The hosts at the same VLAN might take
different action on whether to subscribe or to unsubscribe the multicast group. For
instance, host1 has subscribed to multicast group 1, while host2 is not interested in
multicast group1 traffic; host3 has subscribed to multicast group 2, while host4 is
not interested in multicast group 2 traffic. The EX4200 IGMP snooping feature can
accommodate this request so that host1 receives multicast group1 traffic, and host2
does not; host3 receives multicast group 2 traffic, and host4 does not.

Host 1 in VLAN 1
Subscribes Group 1

Host 2 in VLAN 1

Trunk
EX4200

Host 3 in VLAN 2
Subscribes Group 2

VLAN 1
VLAN 2
Multicast Group 1 Traffic
Host 4 in VLAN 2
Multicast Group 2 Traffic

Figure 6.4 IGMP Traffic Flow with IGMP Snooping Enabled

Hosts can join multicast groups in two ways:

• By sending an unsolicited IGMP join message to a multicast router that


specifies the IP multicast that the host is attempting to join.
• By sending an IGMP join message in response to a general query from a
multicast router.
A multicast router continues to forward multicast traffic to a VLAN if at least one
host on that VLAN responds to the periodic general IGMP queries. To leave a
multicast group, a host can either not respond to the periodic general IGMP queries,
which results in a silent leave, or send a group-specific IGMPv2 leave message.
98 Data Center Network Connectivity with IBM Servers

IGMP Snooping in EX Series Ethernet Switches


In the EX Series Ethernet switches, IGMP snooping works with both Layer 2
interfaces and the routed VLAN interfaces (RVIs) to regulate multicast traffic in a
switched network. Switches use Layer 2 interfaces to send traffic to hosts that are
part of the same broadcast domain and use a RVI to route traffic from one
broadcast domain to another.

When an EX Series switch receives a multicast packet, the Packet Forwarding


Engines in the switch perform an IP multicast lookup on the multicast packet to
determine how to forward the packet to its local ports. From the results of the IP
multicast lookup, each Packet Forwarding Engine extracts a list of Layer 3 interfaces
(which can include VLAN interfaces) that have ports local to the Packet Forwarding
Engine. If an RVI is part of this list, the switch provides a bridge multicast group ID for
each RVI to the Packet Forwarding Engine.

Figure 6.5 shows how multicast traffic is forwarded on a multilayer switch. The
multicast traffic arrives through the xe-0/1/0.0 interface. A multicast group is
formed by the Layer 3 interface ge-0/0/2.0, vlan.0 and vlan.1. The ge-2/0/0.0
interface is a common trunk interface that belongs to both vlan.0 and vlan.1. The
letter R next to an interface name in Figure 6.5 indicates that a multicast receiver
host is associated with that interface.

xe-0/1/0.0 ge-0/0/2.0 (R)


Multicast Traffic

xe-0/1/0.0 ge-0/0/3.0
EX4200 Series
Switch

VLAN 0 VLAN 1

v100 v200

(R) ge-0/0/0.0 ge-2/0/0.0 (R) (R) ge-0/0/0.0 ge-2/0/0.0 (R)

ge-1/0/0.0 ge-1/0/0.0

Multicast Traffic
Non-Multicast Traffic
(R) Receiving Multicast Traffic

Figure 6.5 IGMP Traffic Flow with Routed VLAN Interfaces


Chapter 6: Configuring the Internet Group Management Protocol 99

IGMP Snooping Configuration Command


The IGMP snooping feature is available on the MX Ethernet router and the EX
Ethernet switch series. However, the configuration command hierarchy is different
on these two devices.

In the EX Ethernet switch series, the configuration hierarchy is at the [edit


protocols] hierarchy level in Junos CLI and the detailed configuration stanza is as
follows:

igmp-snooping {
vlan (vlan-id | vlan-number {
disable {
interface interface-name
}
immediate-leave;
interface interface-name {
multicast-router-interface;
static {
group ip-address;
}
}
query-interval seconds;
query-last-member-interval seconds;
query-response-interval seconds;
robust-count number;
}
}

NOTE By default, IGMP snooping is not enabled. Statements configured at the VLAN level
apply only to that particular VLAN.

With the MX Ethernet Router Series in the Junos CLI, we can configure a Layer 2
broadcasting domain with a bridge domain, so that IGMP snooping is configured at
the [bridge-domains] configuration hierarchy. The detailed configuration stanza is
as follows:

multicast-snooping-options {
flood-groups [ ip-addresses ];
forwarding-cache {
threshold suppress value <reuse value>;
}
graceful-restart <restart-duration seconds>;
ignore-stp-topology-change;
}
100 Data Center Network Connectivity with IBM Servers

Configuring IGMP Snooping


This section focuses on configuring IGMP-Snooping on the MX Ethernet Router and
EX Ethernet switch series with various IGMP client platforms on the following
scenarios.
• Scenario 1: MX480, EX Series and IBM Blade Center.
• Scenario 2: MX480 and IBM x3500 Server.
In each scenario, we used a IGMP server as the source of multicast steams and
used the VideoLAN (VLC) media player as the IGMP Client, which requests to join
onto the multicast group.

Scenario 1: MX480, EX Series and IBM Blade Center


As illustrated in Figure 6.6, the IGMP multicast source generates the IGMP Group 2
flow: from the MX480 to the EX800, and then on to the IGMP client, which runs on
the IBM Blade Center.

Two interfaces (ge-5/2/3 and ge-5/2/6) in the MX480 are configured as Layer 2
switches by using bridge domain, which is associated with VLAN 1117. The ge-5/2/6
interface is configured with the multicast-router interface and this interface
connects to the multicasting source; interface ge-5/2/3 is a Layer 2 interface with
the multicasting IP address (239.168.1.3). This configuration allows the interface to
receive and then forward the multicasting packets to their target.

IGMP Multicast Source


(Streaming)

ge-5/2/6
Multicast
Router

ge-5/2/3
ge-5/2/5
ge-0/0/44
ge-5/2/4
MX480 ge-1/0/20

ge-0/0/2
ge-1/0/22 EX4200
ge-0/0/6 VLAN 1117 VLAN 2211

EX8200
Cisco BNT
MM1 ESM 1 Pass-Through

VLAN 1119

Eth1
SoL
Trunk Port 17
Up to 14 GigE Links Eth0

Figure 6.6 MX480, EX8200, EX4200 and IBM Blade Center – IGMP Traffic Flow with IGMP Snooping
Chapter 6: Configuring the Internet Group Management Protocol 101

Configuring the MX480


chandra@HE-RE-1-MX480> show configuration bridge-domains 1117
domain-type bridge;
vlan-id 1117;
interface ge-5/2/3.0;
interface ge-5/2/6.0;
protocols {
igmp-snooping {
interface ge-5/2/3.0 {
static {
group 239.168.1.3;
}
}
interface ge-5/2/6.0 {
multicast-router-interface;
}
}
}

Configuring the EX4200


{master:0}
chandra@EX-175-CSR> show configuration protocols igmp-snooping
vlan IGMP {
interface ge-0/0/2.0 {
static {
group 239.168.1.1;
}
}
interface ge-0/0/17.0 {
static {
group 239.168.1.1;
}
multicast-router-interface;
}

}
chandra@EX-175-CSR> show configuration vlans 2211
vlan-id 2211;
interface {
ge-0/0/2.0;
ge-0/0/17.0;
}

Configuring the EX8200


chandra@SPLAB-8200-1-re0> show configuration protocols igmp-snooping
vlan 1117 {
interface ge-1/0/18.0 {
static {
group 239.168.1.3;
}
multicast-router-interface;
}
interface ge-1/0/22.0 {
static {
group 239.168.1.3;
}
}
}
102 Data Center Network Connectivity with IBM Servers

Validating IGMP Snooping


laka-bay1#show ip igmp snooping group
Vlan Group Version Port List
--------------------------------------------
2211 239.168.1.1 v2 Gi0/17

laka-bay1#show ip igmp snooping mrouter


Vlan ports
---- -----
2211 Gi0/19(dynamic)

laka-bay1#show ip igmp snooping querier


Vlan IP Address IGMP Version Port
-----------------------------------------------
2211 11.22.3.24 v2 Gi0/19

chandra@HE-RE-1-MX480> show igmp snooping statistics


Bridge: bc-igmp
IGMP Message type Received Sent Rx errors
. . .
Membership Query 0 9 0
V1 Membership Report 0 0 0
DVMRP 0 0 0
. . .
Group Leave 1 4 0
. . .
V3 Membership Report 43 56 0

. . .

chandra@HE-RE-1-MX480> show igmp snooping membership detail


Instance: default-switch

Bridge-Domain: bc-igmp
Learning-Domain: default
Interface: ge-5/2/6.0
Interface: ge-5/2/5.0
Group: 239.168.1.2
Group mode: Exclude
Source: 0.0.0.0
Last reported by: 10.10.10.1
Group timeout: 76 Type: Dynamic

chandra@EX-175-CSR> show igmp-snooping membership


VLAN: IGMP
239.168.1.2 *
Interfaces: ge-0/0/2.0, ge-0/0/44.0

chandra@EX-175-CSR> show igmp-snooping membership detail


VLAN: IGMP Tag: 2211 (Index: 10)
Router interfaces:
ge-0/0/44.0 static Uptime: 00:31:59
Group: 239.168.1.2
Receiver count: 1, Flags: <V2-hosts Static>
ge-0/0/2.0 Uptime: 00:39:34
ge-0/0/44.0 Uptime: 00:39:34

chandra@EX-175-CSR> show igmp-snooping statistics


Bad length: 0 Bad checksum: 0 Invalid interface: 0
Chapter 6: Configuring the Internet Group Management Protocol 103

Not local: 0 Receive unknown: 0 Timed out: 2


IGMP Type Received Transmitted Receive Errors
Queries: 156 12 0
Reports: 121 121 0
Leaves: 2 2 0
Other: 0 0 0

Scenario 2: MX480 and IBM x3500 Server


In this scenario, the IGMP group traffic flow is generated from the IGMP source and
sent to the MX480; it then continues to the client, which runs on the IBM x3500
Series Platform.

As shown in Figure 6.7, two interfaces in the MX480 (ge-5/2/4 and ge-5/2/6) in the
MX480 are configured as Layer 2 switches by using bridge domain, which is
associated with VLAN 1118. The interface ge-5/2/6 which is configured with the
multicast-router-interface, connects to the multicasting source; interface ge-5/2/4
is a Layer 2 interface with multicasting IP address (239.168.1.4) and is set up to
receive and forward multicasting packets to their respective servers.
IGMP Multicast Source
IGMP Multicast Source
(Streaming)

ge-5/2/6

Multicast Router

MX480
ge-5/2/4 ip = 239.168.1.4

IGMP Client
IBM X3500 Server
IP = 10.10.9.1
GW = 10.10.9.1

Figure 6.7 MX480 and IBM x3500 IGMP Traffic Flow with IGMP Snooping

chandra@HE-RE-1-MX480> show configuration bridge-domains 1118 domain-type


bridge
vlan-id 1118;
interface ge-5/2/4.0;
interface ge-5/2/6.0;
protocols {
igmp-snooping {
interface ge-5/2/6.0 {
multicast-router-interface;
}
interface ge-5/2/4.0 {
static {
group 239.168.1.4;
}
}
}
}
105

Chapter 7

Understanding Network CoS


and Latency

Class of Service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

Configuring CoS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

AN APPLICATION’S PERFORMANCE directly relies on network performance.


Network performance typically refers to bandwidth because bandwidth is the
primary measure of computer network speed and represents overall capacity of a
connection. Greater capacity typically generates improved performance. However,
network bandwidth is not the only factor that contributes to network
performance.

The performance of an application relies on different network characteristics. .


Some real-time applications such as voice and video are extremely sensitive to
latency, jitter, and packet loss, while some non real-time applications, such as web
applications (HTTP), email, File Transfer Protocol (FTP), and Telnet, do not
require any specific reliability on the network, and “best effort” policy works well in
transmitting these traffic types.
106 Data Center Network Connectivity with IBM Servers

In today’s converged network, including data/voice converged networks and


data/storage converged networks, and in cloud-ready data centers with server
virtualization, different types of applications are transmitted throughout the same
network. To ensure application performance for all types of applications, additional
provisions are required within the network to minimize latency and packet loss.

This chapter covers two techniques for improving data center network
performance:

• Using class of service (CoS) to manage packet loss.


• Considering latency characteristics when designing networks using Juniper
Networks data center network products.

Class of Service
Typically, when a network experiences congestion and delay, some packets will be
dropped. However, as an aid in preventing dropped packets, Junos CoS allows an
administrator to divide traffic into classes and offers various levels of throughput
and packet loss when congestion and delay occur. This allows packet loss to occur
only when specific rules are configured on the system.

In designing CoS applications, we must consider service needs, and we must


thoroughly plan and design CoS configuration to ensure consistency across all
routers in a CoS domain. We must also consider all the routers and other networking
equipment in the CoS domain to ensure interoperability among different types of
equipment. However, before we further proceed with implementing CoS in Junos,
we should understand CoS components and packet flow through the CoS process.

Junos CoS Process


Figure 7.1 shows a typical CoS process, the general flow of a packet as it passes
through CoS in a QoS implemented router.

Ingress Processing

IFL-Based BA MF Policing Packet


Classifier Classifiers Classifiers Forwarding

Engress Processing

Rewrite Packet Queueing Policing MF


Dropping and Shaping Classifiers

Packet Classification/Marking

Packet Queueing/Shaping

Figure 7.1 CoS Processing Model


Chapter 7: Understanding Network CoS and Latency 107

The following is a list of the key steps in the QoS process, together with the
corresponding configuration commands for the process.

1. Classifying: This step examines (for example, EXP bits, IEEE 802.1p bits, or
DSCP bits) to separate incoming traffic.
One or more classifiers must be assigned to a physical interface or a logical
interface must be assigned one or more classifiers to separate the traffic flows.
The classifier configuration is at the [edit class-of-service interfaces]
hierarchy level in Junos CLI.
In addition, the classifier statement further defines how to assign the packet to
a forwarding class with a loss priority. The configuration is at the [edit class-
of-service classifiers] hierarchy level in Junos CLI. For details concerning
packet loss priority and forwarding class, see Defining Loss Priorities and
Defining Forwarding Classes on page 109 of this handbook.
Furthermore, each forwarding class can be assigned to a queue. The
configuration is at the [edit class-of-service forwarding-classes] hierarchy
level.
2. Policing: This step meters traffic. It changes the forwarding class and loss
priority if a traffic flow exceeds its pre-defined service level.
3. Scheduling: This step manages all attributes of queuing, such as transmission
rate, buffer depth, priority, and Random Early Detection (RED) profile.
A schedule map will be assigned to the physical or logical interface. The
configuration is at the [edit class-of-service interfaces] hierarchy level in
Junos CLI.
In addition, the scheduler statement defines how traffic is treated in the output
queue—for example, the transmit rate, buffer size, priority, and drop profile. The
configuration is at the [edit class-of-service schedulers] hierarchy level.
Finally, the scheduler-maps statement assigns a scheduler to each forwarding
class. The configuration is at the [edit class-of-service scheduler-maps]
hierarchy level.
4. Packet Dropping: This step manages drop-profile to avoid TCP
synchronization and protect high priority traffic from being dropped.
The drop-profile defines how aggressively to drop packets that are using a
particular scheduler. The configuration is at the [edit class-of-service
drop-profiles] hierarchy level.

5. Rewrite Marker: This step rewrites the packet CoS fields (for example, EXP or
DSCP bits) according to the forwarding class and loss priority of the packet.
The rewrite rule takes effect as the packet leaves a logical interface that has a
rewrite rule. The configuration is at the [edit class-of-service rewrite-rules]
hierarchy level in Junos CLI.
108 Data Center Network Connectivity with IBM Servers

JUNOS CoS Implementation Best Practices


Best practices include the following:

• Selecting the appropriate classifier.


• Using code-point aliases.
• Defining loss priorities.
• Defining forwarding classes.
• Defining comprehensive schedulers.
• Defining policers for traffic classes.

Selecting the Appropriate Classifier


Selecting the appropriate classifier is key in distinguishing traffic. Table 7.1 lists
classifier comparisons between Juniper Networks MX Series and EX Series.

Table 7.1 Packet Classifiers Comparison Between MX Series and EX Series

MX960 Series & EX8200 Series &


Packet Classifiers Function
MX480 Series EX4200 Series
dscp Yes Yes Handles incoming IPv4 packets.
dscp-ipv6 Yes – Handles incoming IPv6 packets.
Handles MPLS packets using Layer 2
exp Yes –
headers.
ieee-802.1 Yes Yes Handles Layer 2 CoS.
ieee-802.1ad Yes – Handles IEEE-802.1ad (DEI) classifier.

Handles incoming IPv4 packets. IP


inet-precedence
Yes Yes precedence mapping requires only the
upper three bits of the DSCP field.

Using Code-Point Aliases


Using code-point aliases requires an operator to assign a name to a pattern of
code-point bits. We can use this name instead of the bit pattern when configuring
other CoS components, such as classifiers, drop-profile maps, and rewrite rules, for
example ieee-802.1 { be 000; af12 101; af11 100; be1 001; ef 010; } .

Defining Loss Priorities


Loss priority affects the scheduling of a packet without affecting the packet’s
relative ordering. An administrator can use the packet loss priority (PLP) bit as part
of a congestion control strategy and can use the loss priority setting to identify
packets that have experienced congestion. Typically, an administrator will mark
packets exceeding a specified service level with a high loss priority and set the loss
priority by configuring a classifier or a policer. The loss priority is used later in the
work flow to select one of the drop profiles used by random early detection (RED).
Chapter 7: Understanding Network CoS and Latency 109

Defining Forwarding Classes


The forwarding class affects the forwarding, scheduling, and marking of policies
applied to packets as they move through a router. Table 7.2 summarizes the
mapping between queues and different forwarding classes for both the MX and EX
Series.

Table 7.2 Forwarding Classes for MX480, EX4200 and EX8200 Series

Forwarding Class MX Series Queue EX Series Queue


Voice (EF) Q3 Q5

Video (AF) Q2 Q4

Data (BE) Q0 Q0

Network Control (NC) – Q7

The forwarding class, plus the loss priority defines the per-hop behavior. If the
use case requires associating the forwarding classes with next hops, then the
forwarding policy options are available only on the MX Series.

Defining Comprehensive Schedulers


An individual router interface has multiple queues assigned to store packets.
The router determines which queue to service based on a particular method of
scheduling. This process often involves a determination of which type of packet
should be transmitted before another type of packet. Junos schedulers allow an
administrator to define the priority, bandwidth, delay buffer size, rate control status,
and RED drop profiles to be applied to a particular queue for packet transmission.

Defining Policers for Traffic Classes


Policers allow an administrator to limit traffic of a certain class to a specified
bandwidth and burst size. Packets exceeding the policer limits can be discarded
or can be assigned to a different forwarding class, a different loss priority, or both.
Juniper defines policers with filters that can be associated with input or output
interfaces.

Table 7.3 compares the multicast routing protocols as they pertain to Juniper
Networks MX4800, EX8200, and EX4200.
110 Data Center Network Connectivity with IBM Servers

Table 7.3 Comparison of Multicast Routing Protocols

MX4800 EX8200 EX4200


Field Description
Series Series Series

classifiers Classify incoming packets based on code point value Yes Yes Yes

code-point-
Mapping of code point aliases to bit strings Yes Yes Yes
aliases

drop-profiles Random Early Drop (RED) data point map Yes Yes Yes

fabric Define CoS parameters of switch fabric Yes Yes -

forwarding- One or more mappings of forwarding class to queue


Yes Yes Yes
classes number
forwarding-
Class-of-service forwarding policy Yes - -
policy
fragmentation- Mapping of forwarding class to fragmentation
Yes - -
maps options
host-outbound-
Classify and mark host traffic to forwarding engine Yes - -
traffic

interfaces Apply class-of-service options to interfaces Yes Yes -

multi-
Multicast class of service - Yes -
destination
restricted-
Map forwarding classes to restricted queues Yes - -
queues

rewrite-rules Write code point value of outgoing packets Yes Yes Yes

routing- Apply CoS options to routing instances with VRF table


Yes - -
instances label
scheduler-
Mapping of forwarding classes to packet schedulers Yes Yes Yes
maps

schedulers Packet schedulers Yes Yes Yes

traffic-control-
Traffic shaping and scheduling profiles Yes - -
profiles
translation-
Translation table Yes - -
table

tri-color Enable tricolor marking Yes - -


Chapter 7: Understanding Network CoS and Latency 111

Configuring CoS
In this section, we demonstrate a sample configuration scenario for configuring CoS
on the EX4200. Two blade servers connect to two different interfaces to simulate
production traffic by issuing a ping command; the test device (N2X) will generate
significant network traffic classified as background traffic through the EX4200 to
one of the blade servers. This background package will congest with production
traffic, causing packet loss in the product traffic. Because the EX4200 is central to
network traffic aggregation in this scenario, it is reasonable to apply a CoS packet
loss policy on the EX4200 to ensure that no packet loss occurs in the product traffic.

NOTE The configuration scenario and snippet is also applicable to MX Series Ethernet
Routers.

Configuration Description
As illustrated in Figure 7.2, the EX4200 is the DUT, which interconnects IBM blade
servers, and the Agilent Traffic Generator N2X.

ge-203/1
11.22.1.100 ge-0/0/24 ge-0/0/9
N2X 11.22.1.9
ge-304/4 ge-0/0/25 EX4200
11.22.1.200 IBM BladeCenter
ge-0/0/7 Pass-Through Module via
eth 1 Interface on 9th Blade

11.22.1.7

IBM BladeCenter
Pass-Through Module via Background Traffic
eth 1 Interface on 7th Blade Production Traffic

Figure 7.2 EX4200 CoS Validation Scenario

The test includes the following steps:

1. The N2X generates network traffic as background traffic onto the EX4200
through two ingress GigE ports (ge-0/0/24 and ge-0/0/25).
2. The EX4200 forwards the background traffic to a single egress GigE port
(ge-0/0/9).
3. At the same time, the blade server uses the ping command to generate
production traffic onto the EX4200 through a different interface (ge-0/0/7).
4. The EX4200 also forwards the production traffic to the same egress port
(ge-0/0/9). From a packet loss policy perspective, the production traffic is low
loss priority, while the background traffic is high.
112 Data Center Network Connectivity with IBM Servers

To verify the status of packets on ingress/egress ports, we enable the show


interface queue <ge-0/x/y> command to confirm that only high loss priority
packets from the BACKGROUND forwarding class were being tail dropped.

NOTE The configuration used in this setup was sufficient to achieve confirmation on CoS
functionality (in simplest form). Other detailed configuration options are available
and can be enabled as needed. Refer to the CoS command Hierarchy Levels in the
Junos Software CLI User Guide at www.juniper.net/techpubs/software/junos/
junos95/swref-hierarchy/hierarchy-summary-configuration-statement-class-
of-service.html#hierarchy-summary-configuration-statement-class-of-service.

The following steps summarize the setup configuration process.

• 1. Configure the setup as illustrated in Figure 7.2 and by reviewing the CoS
configuration code snippet.
• 2. Create some simple flows on N2X to send from each port-to-port ge-
0/0/9.
• 3. Send the traffic at 50% from each port to 11.22.1.9. (in absence of two ports,
one port could be used to send 100% traffic).
• 4. Configure the DUT to do the CoS-based processing on ingress traffic
from source 11.22.1.7 coming over interface ge-0/0/7 as High Class and low
probability to get dropped and from interfaces ge-0/0/24 and ge-0/0/25 as
High Priority to get dropped.
• 5. Now start the ping from 11.22.1.7 onto 11.22.1.9.
• 6. Tune the line-rate parameter with N2X traffic coming to ge-0/0/9.
• 7. Observe the egress interface statistics and ingress ports statistics to get
confirmation that ping traffic is tagged higher forwarding class and does not
get dropped, while traffic coming from port ge-0/0/24 and ge-0/0/25 gets
dropped on ingress.

CoS Configuration Snippet


chandra@EX> show configuration class-of-service
classifiers {
ieee-802.1 DOTP-CLASSIFIER { //define the type of classifer
forwarding-class CONVERSATIONAL {
//Assign Expedited forwarding to CONVERSATIONAL forwarding-class
loss-priority low code-points ef;
}
forwarding-class INTERACTIVE {
loss-priority low code-points af12;
}
forwarding-class STREAMING {
loss-priority low code-points af11;
}
forwarding-class BACKGROUND {
loss-priority high code-points be;
}
}
}
Chapter 7: Understanding Network CoS and Latency 113

code-point-aliases {
ieee-802.1 { //associate the code point aliases
be 000; af12 101; af11 100; be1 001; ef 010;
}
}
forwarding-classes { //assigned the four queues to the forwarding classes
queue 0 BACKGROUND;
queue 3 CONVERSATIONAL;

queue 2 INTERACTIVE;
queue 1 STREAMING;
}
interfaces {
ge-0/0/9 {
//associate the scheduler map, rewrite rules and classifer with the interface
scheduler-map SCHED-MAP;
unit 0 {
classifiers {
ieee-802.1 DOTP-CLASSIFIER;
}
rewrite-rules {
ieee-802.1 DOTP-RW;
}
}
}
}
rewrite-rules {
//define the rewrite rules for each of the forwarding classes. Set the code
points to be used in each case
ieee-802.1 DOTP-RW {
forwarding-class CONVERSATIONAL {
loss-priority low code-point ef;
}
forwarding-class INTERACTIVE {
loss-priority low code-point af12;
}
forwarding-class STREAMING {
loss-priority low code-point af11;
}
forwarding-class BACKGROUND {
loss-priority high code-point be;
}
}
}
scheduler-maps {
//define the scheduler maps for each forwarding class
SCHED-MAP {
forwarding-class BACKGROUND scheduler BACK-SCHED;
forwarding-class CONVERSATIONAL scheduler CONV-SCHED;
forwarding-class INTERACTIVE scheduler INTERACT-SCHED;
forwarding-class STREAMING scheduler STREAMING-SCHED;
}
}
schedulers {
//Specify the scheduler properties for each forwarding class. Priorities
assigned here define how the scheduler handles the traffic.
CONV-SCHED {
transmit-rate remainder;
buffer-size percent 80;
priority strict-high;
}
114 Data Center Network Connectivity with IBM Servers

INTERACT-SCHED;
STREAMING-SCHED {
transmit-rate percent 20;
}
BACK-SCHED {
transmit-rate remainder;
priority low;
}
}
chandra@EX> show configuration firewall
family ethernet-switching {
//Configure a multifield classifer for better granularity. CONVERSATIONAL
class gets higher priority than BACKGROUND
filter HIGH {
term 1 {
from {
source-address {
11.22.1.7/32;
}
}
then {
accept;
forwarding-class CONVERSATIONAL;
loss-priority low;
}
}
term 2 {
then { accept; count all; }
}
}
filter LOW {
term 1 {
from {
source-address {
11.22.1.100/32;
11.22.1.101/32;
}
}
then {
accept;
forwarding-class BACKGROUND;
loss-priority high;
}
}
term 2 {
then { accept; count all; }
}
}
}
chandra@EX > show configuration interfaces ge-0/0/24
unit 0 {
family ethernet-switching {
//Assign the firewall filter to the interface
port-mode access;
filter {
input LOW; output LOW;
}
}
}
chandra@EX> show configuration interfaces ge-0/0/25
unit 0 {
Chapter 7: Understanding Network CoS and Latency 115

family ethernet-switching {
port-mode access;
filter {
input LOW; output LOW;
}
}
}
chandra@EX> show configuration interfaces ge-0/0/7
unit 0 {
family ethernet-switching {
port-mode access;
filter {
input HIGH; output HIGH;
}
}
}
chandra@EX> show configuration interfaces ge-0/0/9
unit 0 {
family ethernet-switching {
port-mode access;
}
}

Latency
Network latency is critical to business. Today, the competitiveness in the global
financial markets is measured in microseconds. High performance computing and
financial trading demand an ultra low-latency network infrastructure. Voice and
video traffic is time-sensitive and typically requires low latency.

Because network latency in a TCP/IP network can be measured on different layers,


such as Layer 2/3, and for different types of traffic, such as unicast or multicast,
it often refers to one of the following: Layer 2 unicast, Layer 3 unicast, Layer 2
multicast or Layer 3 multicast.

Often, latency is measured in various frame sizes – 64, 128, 256, 512, 1024, 1280, 1518
bytes for Ethernet.

The simulated traffic throughput is a critical factor in the accuracy of test results.
For a 1 Gbps full-duplex interface, the transmitting (TX) throughput of simulated
traffic and the receiving (TR) throughput require 1Gbps and the TX/TR throughput
ratio must be at least 99%.

Measuring network latency often requires sophisticated test appliances, such as


Agilent N2X, Spirent Communications, and IXIA.

NetworkWorld validated Juniper Networks EX4200 performance, including Layer 2


unicast latency, Layer 3 unicast, Layer 2 multicast and Layer 3 multicast.
For detailed test results, please refer to www.networkworld.com/
reviews/2008/071408-test-juniper-switch.html.
116 Data Center Network Connectivity with IBM Servers

In this section, we discuss the concept of measuring device latency and


demonstrate the sample configuration for measuring Layer 2 and Layer 3 unicast
latency on the MX480.

Measuring Latency
IEFT standard RFC 2544 defines performance test criteria for measuring latency of
the DUT. As shown in Figure 7.3, the ideal way to test DUT latency is to use a tester
with both transmitting and receiving ports. The tester connects DUT with two
connections: the transmitting port of the tester connects to the receiving port of the
DUT, and the sending port of the DUT connects to the receiving port of the tester.
The setup also applies to measuring the latency of multiple DUTs, as shown in
Figure 7.3.

DUT 1

Tester DUT Tester

DUT 2

Figure 7.3 Measuring Latency

Figure 7.4 illustrates two latency test scenarios. We measured the latency of the
MX480 in one scenario; we measured the end-to-end latency of MX480 and
Cisco’s ESM in another scenario. We used Agilent’s N2X with transmitting port (ge-
2/3/1) and receiving port (ge-3/4/4) as a tester.

ge-5/3/5
11.22.1.1

N2X
ge-2/3/1 ge-5/3/7
11.22.1.2 11.22.2.1
MX480

Port 18

N2X
ge-3/4/4 Port 20
11.22.2.2

Cisco
ESM
IBM BladeCenter

Device Latency
End-to-End Latency

Figure 7.4 Latency Setup


Chapter 7: Understanding Network CoS and Latency 117

In the first test scenario, the N2X and MX480 connections, represented by the
dashed line,are made from the sending ports (ge-2/3/1) of the N2X to the receiving
ports (ge-5/3/5) of the MX480 and from the sending ports (ge-5/3/6) of the
MX480 back to the receiving ports (ge-3/4/4) of the tester.

In second test scenario, the connection among the N2X, MX480 and Cisco’s ESM
(represented by the solid line in Figure 7.4) occurs in the following order:

• Connection from the sending ports of the N2X to the receiving ports of the
MX480
• Connection from the sending port of the MX480 to the receiving port (Port 18)
of Cisco’s ESM
• Connection from the sending port (Port 20) of Cisco’s ESM to the receiving port
of the N2X.

Configuration on Measuring Layer 2 Latency


To measure the Layer 2 Latency, all participating ports on the DUTs must be
configured with the same VLAN. That is the same Layer 2 broadcast domain. Here
is a sample Layer 2 configuration on the MX480:

ge-5/3/5 {
//Define a VLAN tagged interface and Ethernet-bridge encapsulation
vlan-tagging;
encapsulation ethernet-bridge;
}

unit 1122 {
//Define a logical unit, vlan-id and a vlan-bridge type encapsulation
encapsulation vlan-bridge;
vlan-id 1122;
ge-0/0/35.0;
}

bc-ext {
//Define a bridge domain and assign VLAN id and interface.
domain-type bridge;
vlan-id 1122;
interface ge-5/3/5.1122;
interface ge-5/3/7.1122;
}
118 Data Center Network Connectivity with IBM Servers

Configuration on Measuring Layer 3 Latency


To measure the Layer 3 latency, all the participating ports on the DUTs must be
configured with the same IP subnet.

Configuring the MX480


ge-5/3/5 {
unit 0 {
family inet {
address 11.22.1.1/24;
}
}
}

ge-5/3/7 {
unit 0 {
family inet {
address 11.22.2.1/24;
}
}
}
119

Chapter 8

Configuring High Availability

Routing Engine Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

Graceful Routing Engine Switchover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Virtual Chassis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

Nonstop Active Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Nonstop Bridging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

Graceful Restart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

In-Service Software Upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

Virtual Router Redundancy Protocol. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

Link Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

Redundant Trunk Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

IMPLEMENTING HIGH AVAILABILITY (HA) is critical when designing a network.


Operators can implement high availability using one or more of
the several methods described in Chapter 3: Implementation Overview.
120 Data Center Network Connectivity with IBM Servers

This chapter covers the following software-based high availability features that
operators can enable in the data center:

• Routing Engine Redundancy


• Graceful Routing Engine Switchover (GRES)
• Virtual Chassis
• Nonstop Routing (NSR)
• Nonstop Bridging (NSB)
• Graceful Restart (GR)
• In-Service Software Upgrade (ISSU)
• Virtual Router Redundancy Protocol (VRRP)
• Link Aggregation (LAG)
• Redundant Trunk Group (RTG)
Enabling either one or a combination of the features listed increases the reliability of
the network.

This chapter first introduces Junos OS based features such as Routing Engine
redundancy, GRES, GR, NSR, NSB and ISSU that are critical to implementing high
availability in the data center. Reliability features such as VRRP, RTG and LAG are
implemented over these key high availability elements.

Routing Engine Redundancy


Routing Engine redundancy occurs when two physical Routing Engines reside on the
same device. One of the Routing Engines functions as the primary, while the other
serves as a backup. When the primary Routing Engine fails, the backup Routing
Engine automatically becomes the primary Routing Engine, thus increasing the
availability of the device. (Routing Engine Redundancy with respect to the scope of
this handbook is available only on the MX Series and EX8200 platforms.)

Any one of the following failures can trigger a switchover from the primary to the
backup Routing Engine:

• Hardware failure – This can be a hard disk error or a loss of power on the primary
Routing Engine.
• Software failure – This can be a kernel crash or a CPU lock. These failures cause
a loss of keepalives from the primary to the backup Routing Engine.
• Software process failure – Specific software processes that fail at least four
times within the span of 30 seconds on the primary Routing Engine.

NOTE To revert to the original primary post-failure recovery, operators must perform a
manual switchover.

Configuration Hierarchy for Routing Engine Redundancy


The following redundancy statements that define the routing engine roles and
failover mechanism and are available at the [edit chassis] hierarchy:
Chapter 8: Configuring High Availability 121

redundancy {
graceful-switchover;
keepalive-time seconds;
routing-engine slot-number (master | backup | disabled);
}

1. Configuring the automatic failover from an active to backup Routing Engine


without any interruption to packet forwarding can be done at the edit chassis
redundancy heirarchy. The triggers are either a detection of a hard disk error or a
loss of keepalives from the primary Routing Engine:
[edit chassis redundancy]{
failover on-disk-failure;
failover on-loss-of-keepalives;
}

2. Specify the threshold time interval for loss of keepalives after which the backup
Routing Engine takes over from the primaryRouting Engine. The failover occurs
by default after 300 seconds when Graceful Routing Engine Switchover is not
configured.
[edit chassis redundancy]
keepalive-time seconds;

3. Configure automatic switchover to the backup Routing Engine following a


software process failure by including the failover other-routing-engine
statement at the [edit system processes process-name] hierarchy level:
[edit system processes]
<process-name> failover other-routing-engine;

4. The Routing Engine mastership can be manually switched using the following
CLI commands:
request chassis routing-engine master acquire on backup Routing Engine
request chassis routing-engine master release on primary Routing Engine
request chassis routing-engine master switch on either primary or
backup Routing Engines

Graceful Routing Engine Switchover


Junos OS provides a separation between the routing and control planes. Graceful
Routing Engine switchover (leverages this separation to provide a switchover
between the Routing Engines without disrupting traffic flow). Configuring graceful
Routing Engine switchover on a router enables the interface information and
kernel state to be synchronized on both Routing Engines. This leads to the same
routing and forwarding states to be preserved on both Routing Engines. Any routing
changes occurring on the primary Routing Engine are replicated on the kernel of the
backup Routing Engine. Although graceful Routing Engine switchover synchronizes
the kernel state, it does not preserve the control plane.
122 Data Center Network Connectivity with IBM Servers

It is important to note that graceful Routing Engine switchover only offers Routing
Engine redundancy, not router level redundancy. Traffic flows through the router
for a short interval during the Routing Engine switchover. However, the traffic is
dropped as soon as any of the routing protocol timers expire and the neighbor
relationship with the upstream router ends. To avoid this situation, operators must
apply graceful Routing Engine switchover in conjunction with Graceful Restart (GR)
protocol extensions.

NOTE Although graceful Routing Engine switchover is available on many other platforms,
with respect to the scope of this handbook, graceful Routing Engine switchover is
available only on the MX Series and EX8200 platforms.

Figure 8.1 shows a primary and backup Routing Engine exchanging keepalive
messages.

Keep-Alives

Master Backup
Routing Engine Routing Engine

Figure 8.1 Primary and Backup Routing Engines

For details concerning GR, see the Graceful Restart section on page 126.

Configuring Graceful Routing Engine Switchover


1. Graceful Routing Engine switchover can be configured under the edit chassis
redundancy hierarchy:
[edit chassis redundancy]
graceful-switchover;

2. The operational show system switchover command can be used to check the
graceful Routing Engine switchover status on the backup Routing Engine:
{backup}
chandra@HE-Routing Engine-1-MX480-194> show system switchover
Graceful switchover: On
Configuration database: Ready
Kernel database: Ready
state: Steady State

Virtual Chassis
Routing Engines are built into the EX Series chassis. In this case, Routing Engine
redundancy can be achieved by connecting and configuring two (or up to ten)
EX switches as a part of a virtual chassis. This virtual chassis operates as a single
network entity and consists of designated primary and backup switches. Routing
Engines on each of these two switches then become the master and backup
Routing Engines of the virtual chassis, respectively. The rest of the switches of
Chapter 8: Configuring High Availability 123

the virtual chassis assume the role of line cards. The master Routing Engine on
the primary switch manages all the other switches that are members of the
virtual chassis and has full control of the configuration and processes. It receives
and transmits routing information, builds and maintains routing tables, and
communicates with interfaces and the forwarding components of the member
switches.

The backup switch acts as the backup Routing Engine of the virtual chassis and
takes over as the master when the primary Routing Engine fails. The virtual chassis
uses GRES and NSR to recover from control plane failures. Operators can physically
connect individual chassis using either virtual chassis extension cables or 10G/1G
Ethernet links.

Using graceful Routing Engine switchover on a virtual chassis enables the interface
and kernel states to be synchronized between the primary and backup Routing
Engines. This allows the switchover between primary and backup Routing Engine
to occur with minimal disruption to traffic. The graceful Routing Engine switchover
behavior on the virtual chassis is similar to the description in the Graceful Routing
Engine Switchover section on page 121.

When graceful Routing Engine switchover is not enabled, the line card switches of
the virtual chassis initialize to the boot up state before connecting to the backup
that takes over as the master when Routing Engine failover occurs. Enabling
graceful Routing Engine switchover eliminates the need for the line card switches to
re-initialize their state. Instead, they resynchronize their state with the new master
Routing Engine thus ensuring minimal disruption to traffic.

Some of the resiliency features of a virtual chassis include the following:

• A software upgrade either succeeds or fails on all or none of the switches


belonging to the virtual chassis.
• A virtual chassis fast failover, a hardware mechanism that automatically
reroutes traffic and reduces traffic loss when a link failure occurs.
• A virtual chassis split and merge that causes the virtual chassis configuration
to split into two separate virtual chassis when member switches fail or are
removed.
Figure 8.2 shows a virtual chassis that consists of three EX4200 switches: EX-6,
EX-7 and EX-8. A virtual chassis cable connects the switches to each other, ensuring
that the failure of one link does not cause a virtual chassis split.
Line Card

EX4200 (EX-6)

Backup

EX4200 (EX-7)

Primary

EX4200 (EX-8)

Figure 8.2 Virtual Chassis Example Consisting of Three EX4200s


124 Data Center Network Connectivity with IBM Servers

Virtual Chassis Configuration Snippet


// Define members of a virtual chassis.
virtual-chassis {
member 1 {
mastership-priority 130;
}
member 2 {
mastership-priority 130;
}
}
// Define a management interface and address for the VC.
interfaces {
vme {
unit 0 {
family inet {
address 172.28.113.236/24;
}
}
}
}

The show virtual-chassis CLI command provides a status of a virtual chassis that
has a master and backup switch and line card. There are three EX4200 switches
connected and configured to form a virtual chassis. Each switch has a member ID
and sees the other two switches as its neighbors when the virtual chassis is fully
functioning. The master and backup switches are assigned the same priority (130)
to ensure a non-revertive behavior after the master recovers.

show virtual-chassis
Virtual Chassis ID: 555c.afba.0405
Mastership Neighbor List
Member ID Status Serial No Model priority Role ID Interface
0 (FPC 0) Prsnt BQ0208376936 ex4200-48p 128 Linecard 1 vcp-0
2 vcp-1
1 (FPC 1) Prsnt BQ0208376979 ex4200-48p 130 Backup 2 vcp-0
0 vcp-1
2 (FPC 2) Prsnt BQ0208376919 ex4200-48p 130 Master* 0 vcp-0
1 vcp-1
Member ID for next new member: 0 (FPC 0)

Use the following operational CLI command to define the 10/1G Ethernet ports that
are used only for virtual chassis inter-member connectivity.

request virtual-chassis vc-port set pic-slot 1 port 0 or request virtual-chassis vc-


port set pic-slot 1 port 1

Nonstop Active Routing


Nonstop Active Routing (NSR) preserves kernel and interface information in
a manner similar to graceful Routing Engine switchover. However, compared
to graceful Routing Engine switchover, NSR goes a step further and saves the
routing protocol information on the backup Routing Engine. It also preserves the
protocol connection information in the kernel. Any switchover between the Routing
Engines is dynamic, is transparent to the peers, and occurs without any disruption
to protocol peering. For these reasons, NSR is beneficial in cases where the peer
routers do not support graceful Routing Engine switchover.
Chapter 8: Configuring High Availability 125

Juniper Networks recommends enabling NSR in conjunction with graceful Routing


Engine switchover because this maintains the forwarding plane information during
the switchover.

State information for a protocol that is not supported by NSR is the primary Routing
Engine. State information must be refreshed using the normal recovery mechanism
inherent to the protocol.

• Automatic route distinguishers for multicast can be enabled simultaneously


with NSR.
• It is not necessary to start the primary and backup Routing Engines at the same
time.
• Activating a backup Routing Engine at any time automatically synchronizes the
primary Routing Engine.
For further details, refer to the Junos High Availability Guide for the latest Junos
software version at www.juniper.net/techpubs/en_US/junos10.1/information-
products/topic-collections/swconfig-high-availability/noframes-
collapsedTOC.html.

Configuring Nonstop Active Routing


1. Enable graceful Routing Engine switchover under the chassis stanza.
[edit chassis redundancy]
graceful-switchover;

2. Enable nonstop active routing under the routing-options stanza.


[edit routing-options]
nonstop-routing;

3. When operators enable NSR, they must synchronize configuration changes on


both Routing Engines.
[edit system]
commit synchronize;

4. A switchover to the backup Routing Engine must occur when the routing
protocol process (rpd) fails three times consecutively, in rapid intervals. For this
to occur, the following statement must be included.
[edit system processes routing failover]
routing failover other-routing-engine;

5. Operators must add the following command to achieve synchronization


between the Routing Engines after configuration changes.
[edit system]
commit synchronize

6. Operators can use the following operational command to verify if NSR is


enabled and active.
show task replication
126 Data Center Network Connectivity with IBM Servers

Nonstop Bridging
Nonstop Bridging (NSB) enables a switchover between the primary and backup
Routing Engines without losing Layer 2 Control Protocol (L2CP) information. NSB is
similar to NSR in that it preserves interface and kernel information. The difference is
that NSB saves the Layer 2 control information by running a Layer 2 Control Protocol
process (l2cpd) on the backup Routing Engine. For NSB to function, operators must
enable Graceful Routing Engine switchover.

The following Layer 2 control protocols support NSB:

• Spanning Tree Protocol (STP)


• Rapid STP (RSTP)
• Multiple STP (MSTP)

Configuring Nonstop Bridging


1. Enable graceful Routing Engine switchover under the “chassis” stanza.
[edit chassis redundancy]
graceful-switchover;
Explicitly enable NSB

[edit protocols layer2-control]


nonstop-bridging;

2. Ensure synchronization between Routing Engines whenever a configuration is


required.
[edit system]
commit synchronize

NOTE It is not necessary to start the primary and backup Routing Engines at the same
time. Implementing a backup Routing Engine at any time automatically
synchronizes with the primary Routing Engine when NSB is enabled.

Graceful Restart
A service disruption necessitates routing protocols on a router to recalculate peering
relationships, protocol specific information and routing databases. Disruptions due
to an unprotected restart of a router can cause route flapping, greater protocol
reconvergence times or forwarding delays, ultimately resulting in dropped packets.
However, Graceful Restart (GR) alleviates this situation, acting as an extension to
the routing protocols.

A router with GR extensions can be defined either in a role of “restarting” or “helper.”


These extensions provide the neighboring routers with the status of a router when a
failure occurs. Consider a router on which a failure has occurred, the GR extensions
signal the neighboring routers that a restart is occurring. This prevents the neighbors
from sending out network updates to the router for the duration of the graceful
restart wait interval. A router with GR enabled must negotiate the GR support with
its neighbors at the start of a routing session. The primary advantages of GR are
uninterrupted packet forwarding and temporary suppression of all routing
protocol updates.
Chapter 8: Configuring High Availability 127

NOTE A helper router undergoing Routing Engine switchover drops the GR wait state
that it may be in and propagates the adjacency’s state change to the network. GR
support is available for routing/MPLS related protocols and Layer 2 or Layer 3 VPNs.

MORE See Table-B.3 in Appendix B of this handbook for a list of GR protocols supported on
the MX and EX Series platforms.

Configuring Graceful Restart


1. Enable GR either at global or at specific protocol levels. When configuring on a
global level, operators must use the routing-options hierarchy. The restart
duration specifies the duration of the GR period.

NOTE The GR helper mode is enabled by default even though GR may not be enabled.
If necessary, the GR helper mode can be disabled on a per-protocol basis. If GR is
enabled globally, it can be disabled only if required for each individual protocol.

edit routing-options]
graceful-restart
restart-duration

2. GR can be enabled for static routes under the routing-options hierarchy


[edit routing-options]
graceful-restart

In-Service Software Upgrade


In-service software upgrade (ISSU) facilitates software upgrades of Juniper devices
in environments where there is a high concentration of users and business critical
applications. Operators can use ISSU to upgrade the software from one JUNOS
release to another without any disruption to the control plane. Any disruption to
traffic during the upgrade is minimal.

ISSU runs only on platforms that support dual Routing Engines and requires that
graceful Routing Engine switchover and NSR be enabled. Graceful Routing Engine
switchover is required because a switch from the primary to the backup Routing
Engine must happen without any packet forwarding loss. The NSR with graceful
Routing Engine switchover maintains routing protocol and control information
during the switchover between the Routing Engines.

NOTE Similar to regular upgrades, Telnet sessions, SNMP, and CLI access can be
interrupted briefly when ISSU is being performed.

If BFD is enabled, the detection and transmission session timers increase


temporarily during the ISSU activity. The timers revert to their original values once
the ISSU activity is complete.

When attempting to perform an ISSU, the following conditions must be met:

• The primary and backup Routing Engines must be running the same software
version.
• The status of the PICs cannot be changed during the ISSU process. For
example, the PICs cannot be brought online/offline.
• The network must be in a steady, stable state.
128 Data Center Network Connectivity with IBM Servers

An ISSU can be performed in one of the following ways:

• Upgrading and rebooting both Routing Engines automatically – Both Routing


Engines are upgraded to the newer version of software and then rebooted
automatically.
• Upgrading both Routing Engines and then manually rebooting the new backup
Routing Engine – The original backup Routing Engine is rebooted first after the
upgrade to become the new primary Routing Engine. Following this, the original
primary Routing Engine must be rebooted manually for the new software to
take effect. The original primary Routing Engine then becomes the backup
Routing Engine.
• Upgrading and rebooting only one Routing Engine – In this case, the original
backup Routing Engine is upgraded and rebooted and becomes the new
primary Routing Engine. The former primary Routing Engine must be upgraded
and rebooted manually.

MORE For more details when performing an ISSU using the above-listed methods, see
Appendix A of this handbook.

Verifying Conditions and Tasks Prior to ISSU Operation


1. Verify that the primary and backup Routing Engines are running the same
software version using the show version invoke-on all-routing-engines CLI
command:
{master}
chandra@MX480-131-0> show version invoke-on all-routing-engines
re0:
----------------------------------------------------------------------
----
Hostname: MX480-131-0
Model: mx480
JUNOS Base OS boot [10.0R1.8]
JUNOS Base OS Software Suite [10.0R1.8]
JUNOS Kernel Software Suite [10.0R1.8]
JUNOS Crypto Software Suite [10.0R1.8]
JUNOS Packet Forwarding Engine Support (M/T Common) [10.0R1.8]
JUNOS Packet Forwarding Engine Support (MX Common) [10.0R1.8]
JUNOS Online Documentation [10.0R1.8]
JUNOS Voice Services Container package [10.0R1.8]
JUNOS Border Gateway Function package [10.0R1.8]
JUNOS Services AACL Container package [10.0R1.8]
JUNOS Services LL-PDF Container package [10.0R1.8]
JUNOS Services Stateful Firewall [10.0R1.8]
JUNOS AppId Services [10.0R1.8]
JUNOS IDP Services [10.0R1.8]
JUNOS Routing Software Suite [10.0R1.8]
re1:
----------------------------------------------------------------------
----
Hostname: MX480-131-1
Model: mx480
JUNOS Base OS boot [10.0R1.8]
JUNOS Base OS Software Suite [10.0R1.8]
JUNOS Kernel Software Suite [10.0R1.8]
JUNOS Crypto Software Suite [10.0R1.8]
Chapter 8: Configuring High Availability 129

JUNOS Packet Forwarding Engine Support (M/T Common) [10.0R1.8]


JUNOS Packet Forwarding Engine Support (MX Common) [10.0R1.8]
JUNOS Online Documentation [10.0R1.8]
JUNOS Voice Services Container package [10.0R1.8]
JUNOS Border Gateway Function package [10.0R1.8]
JUNOS Services AACL Container package [10.0R1.8]
JUNOS Services LL-PDF Container package [10.0R1.8JUNOS Services Stateful
Firewall [10.0R1.8]
JUNOS AppId Services [10.0R1.8]
JUNOS IDP Services [10.0R1.8]
JUNOS Routing Software Suite [10.0R1.8]

2. Verify that graceful Routing Engine switchover and NSR are enabled using the
show system switchover and show task replication commands.

3. BFD timer negotiation can be disabled explicitly during the ISSU activity using
the [edit protocols bfd] hierarchy:
[edit protocols bfd]
no-issu-timer-negotiation;

4. Perform a software backup on each Routing Engine using the request system
snapshot CLI command:
{master}
chandra@MX480-131-0> request system snapshot
Verifying compatibility of destination media partitions...
Running newfs (899MB) on hard-disk media / partition (ad2s1a)...
Running newfs (99MB) on hard-disk media /config partition (ad2s1e)...
Copying ‘/dev/ad0s1a’ to ‘/dev/ad2s1a’ .. (this may take a few minutes)
Copying ‘/dev/ad0s1e’ to ‘/dev/ad2s1e’ .. (this may take a few minutes)
The following filesystems were archived: / /config

Verifying a Unified ISSU


Execute the show chassis in-service-upgrade command on the primary Routing
Engine to verify the status of FPCs and their corresponding PICs after the most
recent ISSU activity.
130 Data Center Network Connectivity with IBM Servers

Virtual Router Redundancy Protocol


Virtual Router Redundancy Protocol (VRRP) is a protocol, which runs on routing
devices that are connected to the same broadcast domain. VRRP configuration
assigns these devices to a group. The grouping eliminates the possibility of a single
point of failure and thus provides high availability of network connectivity to the
hosts on the broadcast domain. Routers participating in VRRP share a virtual IP
address and virtual MAC address. The shared Virtual IP address corresponds to the
default route configured on the hosts. For example, hosts on a broadcast domain
can use a single default route to reach multiple redundant routers belonging to the
VRRP group on that broadcast domain.

One of the routers is elected dynamically as a default primary of the group and
is active at a given time. All the other participating routing devices perform a
backup role. Operators can assign priorities to devices manually, forcing them
to act as primary and backup devices. The VRRP primary sends out multicast
advertisements to the backup devices at regular intervals (default interval is
1 second). When the backup devices do not receive an advertisement for a
configured period, the device with the next highest priority becomes the new
primary. This occurs dynamically, thus enabling an automatic transition with
minimal traffic loss. This VRRP action eliminates the dependence on achieving
connectivity using a single routing platform that can result in a single point of
failure. In addition, the change between the primary and backup roles occurs with
minimum VRRP messaging and no intervention on the host side.

Figure 8.3 shows a set of hosts connected to three EX switches: EX4200-0,


EX8200-1 and EX8200-2 on the same broadcast domain. EX4200-0 is configured
as a Layer 2 switch only, without any routing functionality. EX8200-1 and EX8200-2
are configured to have their respective IP addresses on the broadcast domain and
are configured to be VRRP members with a virtual address of 172.1.1.10/16. EX8200-
1 is set to be the primary, while EX8200-2 is the backup. The default gateway on
each of the hosts is set to be the virtual address.

Traffic from the hosts is sent to hosts on other networks through EX8200-1 because
it is the primary. When the hosts lose connectivity to EX8200-1 either due to a node
or link failure, EX8200-2 becomes the primary. The hosts start sending the traffic
through EX8200-2. This is possible because the hosts forward the traffic to the
gateway that owns virtual IP address 172.1.1.10, and IP packets are encapsulated in
Ethernet frames destined to a virtual MAC address.

Junos provides a solution that prevents re-learning of ARP information on the


backup router when the primary router fails. This solution increases performance
when large numbers of hosts exists on the LAN.
Chapter 8: Configuring High Availability 131

EX8200 - 1
Virtual Address
172.1.1.10/16
EX4200 - 0

Default Gateway
on Each Host set
to 172.1.1.10

EX8200 - 2

Figure 8.3 VRRP

MORE For VRRP configuration details, refer to the Junos High Availability Guide at
www.juniper.net/techpubs/software/junos/junos90/swconfig-high-
availability/high-availability-overview.html.

VRRP Configuration Diagram


Figure 8.4 shows a sample VRRP network scenario. In this scenario, two EX4200
devices (EX4200-A and EX4200-B) are configured as part of a VRRP group.

NOTE Although this VRRP sample scenario uses EX4200 devices, it is possible to
configure other combinations of VRRP groups consisting of devices such as:

• EX8200 – EX4200
• EX8200 – MX480
• MX480 – MX480
• EX8200 – EX8200
Figure 8.4 shows devices EX8200-A and EX8200-B, MX480-A and MX480-B to
illustrate the choices of different platforms when configuring VRRP in the network.
132 Data Center Network Connectivity with IBM Servers

VRRP Configuration Options:


MX480-A — MX480-B
MX480-A — EX4200-B 11.22.5.1/24
MX480-A — EX8200-B ge-5/3/5
EX8200-A — EX8200-B
EX8200-A — EX4200-B

11.22.3.1/24 MX480 11.22.2.1/24


ge-5/3/9 ge-5/3/7
11.22.3.16/24 11.22.2.36/24
ge-0/0/16 ge-0/0/36
MX480-A MX480-B

EX4200-A 11.22.1.11/24 Virtual Router Group 11.22.1.31/24 EX4200-B


ge-0/0/11 IP address = 11.22.1.1 ge-0/0/31

Primary Path Backup Path


(when VRRP’s Primary
Interface Fails)

Trunk Port 18 Trunk Port 19

BNT
EX8200-A Pass- EX8200-B
Cisco
MM1 Through
ESM 1
MM2

Eth1
SoL

Eth0

IBM Blade Center connected


via Cisco ESM/IBM Power 5/
Power6/IBM x3500 servers

Figure 8.4 VRRP Test Network

The virtual address assigned to the EX4200 group discussed here is 11.22.1.1. The
two devices and the IBM Blade servers physically connect on the same broadcast
domain. EX4200-A is elected as the primary and so the path between the servers
to EX4200-A through the Cisco ESM is the primary preferred path. The link between
the Cisco ESM and EX4200-B is the backup path.

NOTE Cisco’s ESM included in the IBM Blade Center is a Layer 2 switch that does not
support VRRP, but it serves as an access network layer switch connected to routers
that use VRRP. Other switch modules for the IBM Blade Center support Layer3
functionality but are out of the scope of this book.
Chapter 8: Configuring High Availability 133

Configuring VRRP
To configure VRRP on the sample network perform the following steps:

1. Create two trunk ports on Cisco’s ESM. Assign an internal eth0 port on Blade[x]
to same network as VRRP, for example 11.22.1.x.
2. Add a router with a Layer 3 address that is reachable from the 11.22.1.x network
on the blade center. In this case, the MX480 acts as a Layer 3 router that
connects to both EX4200-A and EX4200-B through the 11.22.2.x and11.22.3.x
networks, respectively.
3. This Layer 3 MX480 router also terminates the 11.22.5.X network via interface
ge-5/3/5 with family inet address 11.22.5.1.
4. Verify that this address is reachable from the blade server by configuring the
default gateway to be either 11.22.1.11(ge-0/0/11) or 11.22.1.31 (ge-0/0/31).
5. Configure VRRP between the two interfaces ge-0/0/11 (EX4200-A) and
ge-0/0/31 (EX4200-B). The default virtual address (known as vrrp-id) is
11.22.1.1 with ge-0/0/11 on EX4200-A set to have a higher priority.
Verify operation on the sample network by performing the following steps.

1. Reconfigure the default route on 11.22.1.60 (blade server) to 11.22.1.1 (vrrp router id).
2. Confirm that 11.22.5.1 is reachable from 11.22.1.60 and vice-versa .Perform a
traceroute to ensure that the next hop is 11.22.1.11 on EX4200-A.
3. Either lower the priority on EX4200-A or administratively disable the interface
ge-0/0/11 to simulate an outage of EX4200-A.
4. Confirm that pings from 11.22.1.60 to 11.22.5.1 are still working but use the
backup path to EX4200-B.
5. Perform a traceroute to confirm that the backup path is being used.

NOTE The traceroute command can be used for confirmation in both directions – to and
from the BladeCenter.

VRRP Configuration Snippet


The VRRP configuration snippet shows the minimum configuration required on the
EX Series to enable a VRRP group.

// Configure the interface ge-0/0/31 on EX4200-B with an IP address of


11.22.1.31/24 on the logical unit 0.
// Define a VRRP group with a virtual IP of 11.22.1.1 and priority of 243.
show configuration interfaces ge-0/0/31
unit 0 {
family inet {
address 11.22.1.31/24 {
vrrp-group 1 {
virtual-address 11.22.1.1;
priority 243;
preempt {
hold-time 0;
}
accept-data;
}
}
}
}
134 Data Center Network Connectivity with IBM Servers

// Interface ge-0/0/36 to MX480 with an IP of 11.22.2.36/24


show configuration interfaces ge-0/0/36
unit 0 {
family inet {
address 11.22.2.36/24;
}
}
// Configure the interface ge-0/0/11 on EX4200-A with an IP address of
11.22.1.11/24 on the logical unit 0.
// Define a VRRP group with a virtual IP of 11.22.1.1 and priority of 240.
show configuration interfaces ge-0/0/11
unit 0 {
family inet {
address 11.22.1.11/24 {
vrrp-group 1 {
virtual-address 11.22.1.1;
priority 240;
preempt {
hold-time 0;
}
accept-data;
}
}
}
}

VRRP Configuration Hierarchy for IPv4

This section shows that VRRP statements can be included at the interface
hierarchy level.

[edit interfaces interface-name unit <unit-number> family inet address


address]
vrrp-group group-id {
(accept-data | no-accept-data);
advertise-interval seconds;
authentication-key key;
authentication-type authentication;
fast-interval milliseconds;
(preempt | no-preempt) {
hold-time seconds;
}
priority number;
track {
interface interface-name {
priority-cost priority;
bandwidth-threshold bits-per-second {
priority-cost priority;
}
}
priority-hold-time seconds;
route prefix routing-instance instance-name {
priority-cost priority;
}
}
virtual-address [ addresses ];
}
Chapter 8: Configuring High Availability 135

Configuring VRRP for IPv6 (MX Series Platform Only)


As mentioned earlier, operators can configure VRRP for IPv6 on the MX platform. To
configure VRRP for IPv6, include the following statements at this hierarchy level:

[edit interfaces interface-name unit <-unit-number> family inet6 address


address]
vrrp-inet6-group group-id {
(accept-data | no-accept-data);
fast-interval milliseconds;
inet6-advertise-interval seconds;
(preempt | no-preempt) {
hold-time seconds;
}
priority number;
track {
interface interface-name {
priority-cost priority;
bandwidth-threshold bits-per-second {
priority-cost priority;
}
}
priority-hold-time seconds;
route prefix routing-instance instance-name {
priority-cost priority;
}
}
virtual-inet6-address [ addresses ];
virtual-link-local-address ipv6-address
}

Link Aggregation
Link Aggregation (LAG) is a feature that aggregates two or more physical Ethernet
links into one logical link to obtain higher bandwidth and to provide redundancy.
LAG provides high link availability and capacity which results in improved
performance and availability.

Traffic is balanced across all links that are members of an aggregated bundle. The
failure of a member link does not cause traffic disruption. Instead, because there are
multiple member links, traffic continues over active links.

LAG is an 802.3ad standard that can be used in conjunction with Link Aggregation
Control Protocol (LACP). Using LACP, multiple physical ports can be bundled
together to form a logical channel. Enabling LACP on two peers that participate in
a LAG group enables them to exchange LACP packets and negotiate the automatic
bundling of links.

NOTE LAG can be enabled on interfaces spread across multiple chassis; this is known as
Multichassis LAG (MC-LAG). This means that the member links of a bundle can be
configured between multiple chassis instead of only two chassis.

Currently, MC-LAG support only exists on the MX platforms.


136 Data Center Network Connectivity with IBM Servers

Some points to note with respect to LAG:

• Ethernet links between two points support LAG.


• A maximum of 16 Ethernet interfaces can be included within a LAG on the MX
Series Platforms. The LAG can consist of interfaces that reside on different
Flexible PIC Concentrators (FPC) cards in the same MX chassis. However, these
interface links must be of the same type.
• The EX Series Platforms supports a maximum of 8 Ethernet interfaces in a LAG.
In case of an EX4200 based virtual chassis, the interfaces that belong to a LAG
can be on different switch members of the virtual chassis.

Link Aggregation Configuration Diagram


Figure 8.5 shows a sample link aggregation and load balancing setup. In this
configuration, LAG is enabled on the interfaces between the MX480 and Cisco’s
ESM switch on the IBM Blade Center, thus bundling the physical connections into
one logical link.
LAG – EX and MX Series
IBM Blade

DUT

17
ge-5/0/5
N2X Cisco
ge-304/4 ESM
ge-5/0/1 18
Aggregated
Ethernet
MX480 Etherchannel

Trunk Port 20

N2X
ge-201/1

Figure 8.5 LAG and Load Balancing Setup

NOTE The EX8200 or any of the MX Series devices can be used instead of the MX480, as
shown in Figure 8.5.

Link Aggregation Configuration Hierarchy


This section describes the different steps involved in configuring and verifying LAG
on the test network. A physical interface can be associated with an aggregated
Ethernet interface on the EX and MX Series Platforms. Enable the aggregated link
as follows:

1. At [edit chassis] hierarchy level, configure the maximum number of


aggregated-devices available on system:
aggregated-devices {
ethernet {
device-count X;
}
}
Chapter 8: Configuring High Availability 137

NOTE Here X refers to the number of aggregated interfaces (0-127).

2. At [edit interfaces interface-name] hierarchy level, include the 802.3ad


statement:
[edit interfaces interface-name (fastether-options | gigether-options)]
802.3ad aeX;

3. A statement defining aeX also must be included at the [edit interfaces]


hierarchy level.
4. Some of the physical properties that specifically apply to aggregated Ethernet
interfaces also can be configured:
chandra@HE-Routing Engine-1-MX480> show configuration interfaces aeX
aggregated-ether-options {
minimum-links 1;
link-speed 1g;
lacp {
active;
periodic fast;
}
}
unit 0 {
family bridge {
interface-mode trunk;
vlan-id-list 1122;
}
}
An aggregated Ethernet interface can be deleted from the configuration by issuing the
delete interfaces aex command at the [edit] hierarchy level in configuration
mode.

[edit]
user@host# delete interfaces aeX

NOTE When an aggregated Ethernet interface is deleted from the configuration, Junos
removes the configuration statements related to aeX and sets this interface to the
DOWN state. However, the aggregated Ethernet interface is not deleted until the
chassis aggregated-devices ethernet device-count configuration statement is
deleted.

Forwarding Options in LAG (MX 480 only)


By default, hash-key algorithms use the interface as the default parameter to generate
hash-keys for load distribution. Forwarding options must be configured to achieve
load balancing based on source and destination IP; source and destination MAC or any
other combination of Layer 3 or Layer 4 parameters.

NOTE Although EX Series Platforms can also perform hash-key based load balancing as of
release 9.6R1.13, they do not have the flexibility to configure the criteria for hashing.

hash-key {
family multiservice {
source-mac;
destination-mac;
138 Data Center Network Connectivity with IBM Servers

payload {
ip {
layer-3 {
[source-ip-only | destination-ip-only];
}
layer-4;
}
}
symmetric-hash;
}
}

Link Aggregation Configuration Description


// Specify the number of aggregated devices
aggregated-devices {
ethernet {
device-count X;
}
}
// Specify the aeX interface properties such as minimum number of links,
speed and LACP options.
aggregated-ether-options {
minimum-links 1;
link-speed 1g;
lacp {
active;
periodic fast;
}
}
// Define a logical unit that is a bridge type trunk interface and vlan-id.
unit 0 {
family bridge {
interface-mode trunk;
vlan-id-list 1122;
}
}

Link Failover Scenarios - LAG with LACP and NSR


Link failover between members of LAG on MX480 can occur in conjunction with
different combinations of LACP and NSR. There are various failure scenarios such
as Routing Engine/ FPC/ Switch fabric failover, system upgrade with and without
ISSU possible for each of the LACP/NSR combinations.

The different LACP/NSR combinations on the MX480 include the following:

• LACP Enabled, NSR Enabled


• LACP Enabled, NSR Disabled
• LACP Disabled, NSR Enabled
• LACP Disabled, NSR Disabled
Table B.3 and Table B.4 in Appendix B of this handbook provide detailed LAG testing
results based on the scenarios listed above.
Chapter 8: Configuring High Availability 139

The salient test results, listed in Appendix B are as follows:

• Enabling LACP provided seamless recovery from Routing Engine failover on the
MX480. The Routing Engine took approximately 20 seconds to recover from a
failure with LACP disabled as opposed to no disruption when it was enabled.
• FPCs with only one LAG interface recovered more quickly (in 1.5 seconds) than
FPCs with two interfaces (approximately 55 seconds).
• The switch fabric recovered immediately after a failure in all the scenarios.
• A similar validation was performed using the EX4200 instead of the MX480.
In this case, enabling or disabling the LACP did not make a difference. The
following scenarios were validated:
-- Routing Engine Failover
-- FPC Failover (two LAG links and an interface to the traffic generator)
-- Switch Fabric Failover
-- System Upgrade (without ISSU or graceful Routing Engine switchover)
-- System Upgrade (without ISSU, with graceful Routing Engine switchover)

MORE Table C.1 and Table C.4 in Appendix C of this handbook provide detailed LAG test
results using the EX4200 and MX480.

Redundant Trunk Group


Redundant Trunk Group (RTG), which is a Layer 2 based redundancy mechanism
similar to STP, is available on the EX Series switches. RTG eliminates the need for
spanning tree. In its simplest form, RTG is implemented on a switch that is dual
homed to network devices. Enabling RTG makes one of the links active and the
other a backup; traffic is forwarded over the active link. The backup link takes over
the traffic forwarding when the active link fails thus reducing the convergence time.
There is, however, a distinction between how data and control traffic are handled
by the backup link. Layer 2 controls traffic, for example, LLDP session messages
are permitted over the backup link while data traffic is blocked. This behavior is
consistent irrespective of whether the switch is a physical or virtual chassis.

Figure 8.6 shows an EX Series switch that has links to Switch1 and Switch2,
respectively. RTG is configured on the EX Series switch so that the link to Switch1
is active and performs traffic forwarding. The link to Switch2 is the backup link and
starts forwarding traffic when the active link fails.

NOTE Given the multi-chassis scenario, it is better to use RTG instead of MC-LAG.
140 Data Center Network Connectivity with IBM Servers

Switch 1

Active

EX Series
Backup

Switch 2

Figure 8.6 RTG-based Homing to Two Switches

Figure 8.7 shows an EX Series switch that has two links to Switch1. RTG is configured
on the EX Series switch so that one of the links to Switch1 is active and performs
traffic forwarding while the other link acts as the backup. The backup link starts
forwarding traffic to Switch1 when the active link fails.

NOTE In this scenario, it may be more efficient in terms of bandwidth and availability to
use LAG instead of RTG. LAG provides better use of bandwidth and faster recovery
because there is no flushing and relearning of MAC addresses.

Active

EX Series Backup
Switch 1

Figure 8.7 RTG-homing to Single Switch

Based on these two scenarios, RTG can be used to control the flow of traffic over
links from a single switch to multiple destination switches while providing link
redundancy.

This feature is enabled on a physical interface and is similar specifically to STP.


However, RTG and STP are mutually exclusive on a physical port. Junos does not
permit the same interface to be a part of both RTG and STP simultaneously. The
significance of RTG is local and not network wide since decisions are made locally
on the switch.

Typically, RTG is implemented on an access switch device or on a virtual chassis


that is connected to two or more devices that do not operate as a virtual chassis,
multi-chassis or use STP. It is configured between the access and core layers in a
two-tier data center architecture or between the access and aggregation layers in a
three-tier model. There can be a maximum of 16 RTGs in a standalone switch or in a
virtual chassis.

Both RTG active and backup links must be members of the same VLANs.

NOTE Junos does not allow the configuration to take effect if there is a mismatch of VLAN
IDs between the links belonging to a RTG.
Chapter 8: Configuring High Availability 141

Figure 8.8 shows a sample two-tier architecture with RTG and LAG enabled
between the access-core layers and access-to-server layers. The core consists
of two MX Series devices: MX480-A and MX480-B. Two EX4200 based virtual
chassis (EX4200 VC-A, EX4200 VC-B) and EX8200s-A and B form the access
layer. There are connections from each of the access layer devices to MX480-A
and B, respectively.

ae4
MX480-A ae3 MX480-B
ae1 RTG RTG ae2

EX4200 (EX-1) EX4200 (EX-4)

EX4200 (EX-2) EX4200 (EX-5)

EX4200 (EX-3) EX4200 (EX-6)

Figure 8.8 RTG and LAG in 2-Tier Model

We enable LAG and RTG on these links to ensure redundancy and control traffic flow.

We enable LAG on the access devices for links between the following devices:

• A-ae1 (EX4200 VC-A -> MX480-A)


• A-ae2 (EX4200 VC-A -> MX480-B)
• B-ae1 (EX4200 VC-B -> MX480-A)
• B-ae2 (EX4200 VC-B -> MX480-B)
• EX-A-ae1 (EX8200-A -> MX480-A)
• EX-A-ae2 (EX8200-A -> MX480-B)
• EX-B-ae1 (EX8200-B -> MX480-A)
• EX-B-ae1 (EX8200-B -> MX480-B)
In addition, we configure LAG on the EX8200-A and EX8200-B to provide
aggregation on links to the IBM Power VMServers.

We enable RTG on the EX4200 VC-A and B so that that links AL-A and Al-B to
MX480-A are active and are used to forward traffic. The set of backup links RL-A
and RL-B from the virtual chasses to MX480-B take over the traffic forwarding
activity when the active link(s) fails.
142 Data Center Network Connectivity with IBM Servers

Configuration Details
To configure a redundant trunk link, a RTG first must be created. As stated earlier,
RTG can be configured on the access switch that has two links – a primary (active)
and a secondary (backup) link. The secondary link automatically starts forwarding
data traffic when the active link fails.

Execute the following commands to configure RTG and to disable RSTP on the EX
switches.

• Define RTG on the LAG interface ae1:


set ethernet-switching-options redundant-trunk-group group DC_RTG
interface ae1

• Define RTG on the LAG interface ae2:


set ethernet-switching-options redundant-trunk-group group DC_RTG
interface ae2

• Disable RSTP on interface “ae1” that is member of RTG:


set protocols rstp interface ae1 disable

• Disable RSTP on interface “ae2” that is member of RTG:


set protocols rstp interface ae2 disable
143

Appendices

Appendix A: Configuring TCP/IP Networking in Servers. . . . . . . . . . . . . . . . . . . . 144

Appendix B: LAG Test Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

Appendix C: Acronyms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

Appendix D: References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158


144 Appendices

Appendix A: Configuring TCP/IP Networking in Servers


Server network configuration includes many tasks such as enabling the interface,
setting an IP address, and routing information, creating a logical interface, and
optimizing Ethernet port settings which includes speed, duplex, flow control, MTU
(Jumbo frames), or VLAN ID.

The engineering testers enabled many network configuration commands in


different OSs, including RHEL, SUSE, AIX, and Windows. This appendix lists
the common network configuration commands with their associated OS as a
convenient reference.

Table A.1 lists tasks that are associated with system-dependent commands.
Obviously, a command that works on one platform may not work on another. For
example, the lsdev command only works on the AIX platform.

Table A.1 Network Interface Configuration Tasks on Different Server Platform

Interfaces Server Platform Configuration Tasks


Uses HMC to allocate the physical NIC to partition.
Physical NIC IBM system P The adapter configuration in the partition depends on the OS,
including RHEL, SUSE and AIX.

Uses HMC to allocate the virtual Ethernet Adapter to each partition.


Virtual Ethernet
IBM PowerVM The adapter configuration in the partition depends on the OS,
Adapter
including RHEL, SUSE, AIX.

Uses HMC to allocate the virtual Ethernet Adapter to each partition.


Host Ethernet
IBM PowerVM The adapter configuration in the partition depends on the OS,
Adapter (HEA)
including RHEL, SUSE, AIX.

Logical Host Uses HMC to allocate the virtual Ethernet Adapter to each partition.
Ethernet IBM PowerVM The adapter configuration in the partition depends on the OS,
Adapter (LHEA) including RHEL, SUSE, AIX.

Shared Ethernet Uses HMC to allocate the interface to VIOS.


IBM PowerVM
Adapter (SEA) Uses VIOS commands to configure SEA.

Uses Blade Center Management Module (GUI) to allocate the


Interfaces in the interface to the blade server.
Ethernet Pass- IBM Blade Center
Thru Module Interface configuration in the blade server depends on the OS,
including RHEL, SUSE, AIX, Windows.

The physical NIC configuration depends on the OS, including RHEL,


Physical NIC IBM x3500
SUSE, AIX and Windows.

NOTE Some of these commands will change IP address settings immediately, while some
of them require a restart of network service.

NOTE Not all tools will save changes in the configuration database. It means that the
changes may not be preserved after server reboot.
Appendices 145

Configuring Red Hat Enterprise Linux Network


In Red Hat Enterprise Linux (RHEL), the configuration files for network interfaces
and the scripts to activate and deactivate them are located in /etc/sysconfig/
network-scripts/ directory:

• File/etc/sysconfig/network specifies routing and host information for all


network interfaces
• File/etc/sysconfig/network-scripts/ifcfg-<interface-name>
For each network interface on a Red Hat Linux system, there is a corresponding
interface configuration script. Each of these files provide information specific to a
particular network interface. The following is a sample ifcfg-eth0 file for a system
using a fixed IP address:

DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
NETWORK=10.0.1.0
NETMASK=255.255.255.0
IPADDR=10.0.1.27
USERCTL=no

In addition, several other commands can be helpful, as listed in Table A.2.

Table A.2 Additional Commands

Commands Description
Queries and changes settings of an Ethernet device, such as auto-negotiation,
ethtool
speed, link-mode, flow-control.

kudzu Detects and configures new and or changed hardware on a system.

Queries and changes settings of an Ethernet interface. The changes made via
ifconfig ifconfig take effect immediately but they are not saved in the configuration
database.

The following is a sample ifconfig command to create eth0 interface with a fixed IP
address.

# ifconfig eth0.5 192.168.1.100 netmask 255.255.255.0 broadcast 192.168.1.255


up
146 Appendices

Vconfig adds or removes a VLAN interface. When vconfig adds a VLAN interface,
a new logical interface will be formed with its base interface name and the VLAN
ID. Below is a sample vconfig command to add a VLAN 5 interface on the eth0
interface:

#vconfig add eth0 5


The eht0.5 interface configuration file will be created in /etc/sysconfig/
network-scripts/ifcfg-eth0.5

• Service network restart restarts networking.


• System-config-network launches a GUI-based network administration tool for
configuring the interface.
• Route allows operators to inquire about a routing table or to add a static route.
The static route added by the route command is not persistent after a system
reboot or network service restart.
• Netstat allows operators to check network configuration and activity. For
instance, netstat –I shows interface statistic reports; netstat –r shows
routing table information.
• Ping allows operators to check network connectivity.
• Traceroute allows operators to trace the route packets transmitted from an IP
network to a given host.
For further details concerning these commands, refer to Red hat Linux Reference
Guide at www.redhat.com/docs/manuals/linux/RHL-9-Manual/pdf/
rhl-rg-en-9.pdf.

Configuring SUSE Linux Enterprise Network


Table A.3 lists and defines commonly used SUSE Linux network configuration
commands.

Table A.3 SUSE Linux Enterprise Network Configuration Commands

Commands Description

ifconfig Configures network interface parameters.

rcnetwork restart Restarts network service.

Provides format for printing network connections, routing tables, interface statistics
netstat
and protocol statistics.

ping Checks network connectivity.

traceroute Tracks the route packets taken from an IP network on their way to a given host.

For further details concerning the SUSE Linux network configuration commands,
refer to Novell’s Command Line Utilities at www.novell.com/documentation/oes/
tcpipenu/?page=/documentation/oes/tcpipenu/data/ajn67vf.html.
Appendices 147

Configuring AIX Network


AIX network configuration can be performed using smitty, a system management
tool that is a cursor-based text (command line) interface. Table A.4 lists and
defines smitty commands.

Table A.4 Smitty Commands and Definitions

Commands Definition
Displays configuration, diagnostic and vital product data (VPD) information about
lscfg
the system and its resources.
Displays dynamically reconfigurable slots, such as hot plug slots and their
lslot
characteristics.

lsdev Displays devices in the system and their characteristics.

rmdev Removes devices from the configuration database.

Configures devices and optionally installs device software by running the programs
cfgmgr
specified in the Configuration Rules object class.
Displays attribute characteristics and possible values of attributes for devices in the
lsattr
system.
Provides a cursor-based text interface to perform system management. In addition
smitty to a hierarchy of menus, smitty allows FastPath to take users directly to the dialog,
by passing the menu interactive.
Configures an adapter, determines a network adapter hardware address, sets an
smitty chgenet
alternative hardware address or enables Jumbo Frames.
Sets the required value for starting TCP/IP on a host, including setting the host
smit mktcpip name, setting the IP address of the interface in the configuration database, setting
the sub network mask, or adding a static route.

ifconfig Configures or displays network interface parameters for a TCP/IP network.

Displays network status, including the number of packets received, transmitted and
Netstat
dropped, and the routes and their status.
Shows Ethernet device driver and device statistics. For example, the command
Entstat
entstat ent0 displays the device generic statistics for ent0.

ping Checks network connectivity.

traceroute Tracks the route packets from an IP network to a given host.

For the details concerning the above-listed commands, refer to


publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp.
148 Appendices

Configuring Virtual I/O Server Network


Virtual I/O Server (VIOS) network configuration is used in POWER5, POWER6 and
POWER7 systems. Table A.5 lists and defines some of the more commonly used
VIOS network configuration commands.

Table A.5 VIOS Commands and Definitions

Commands Definitions

Creates a mapping between a virtual adapter and a physical resource. For example,
mkvdev the following command creates a SEA that links physical ent0 to virtual ent2:
mkvdev –sea ent0 –vadapter ent2 –default ent1 –defaultid 1

Lists the mapping between virtual adapters and physical resources. For example,
lsmap use the following lsmap command to list all virtual adapters attached to vhost1:
lsmap –vadapter vhost1

Changes the attribute on the device. For instance, use the following chdev command
chdev to enable jumbo frames on the ent1 device:
chdev –dev ent0 –attr jumbo _ frame=yes

Changes the VIOS TCP/IP setting and parameters. For example, use the following
chtcpip command to change the current network address and mask to the new setting:
chtcpip –interface en0 –inetaddr 9.1.1.1 –netmask 255.255.255.0

Displays the VIOS TCP/IP setting and parameters. For example, use the following
lstcpip command to list the current routing table:
lstcpip –routetable

Initiates the OEM installation and setup environment so that users can install and
set up software in the traditional way. For example, the oem_setup_env command
oem _ setup _ env can place a user in a non-restricted UNIX root shell so that the user can implement
the AIX commands to install and set up software and use most of the AIX network
commands, including lsdev, rmdev, chdev, netstat, entstat, ping and traceroute.

For further details concerning VIOS network commands, refer to


publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/iphcg/
iphcg_network_commands.htm .
Appendices 149

Configuring Windows 2003 Network


Typically, Windows 2003 network configuration is performed by the network
applet in the GUI-based control panel. NIC vendors also might provide a web GUI to
configure the NIC setting, including frame size. Table A.6 lists and defines some of
the more commonly used Windows 2003 commands for network configuration.

Table A.6 Windows 2003 Network Commands

Commands Definitions
ipconfig Command line utility to get TCP/IP network adapters configuration.
route Command line utility to add or remove a static route. You can make the change
persistent by using the –p option when adding routes.
ping Used to check network connectivity.
tracert Used to tracks the route packets taken from an IP network on their way to a given
host.

For the details concerning Windows 2003 network commands, refer to


Windows 2003 product help at technet.microsoft.com/en-us/library/
cc780339%28WS.10%29.aspx .
150 Appendices

Appendix B: LAG Test Results


Table B.1 lists detailed LAG test results for the MX480.

NOTE The following values listed in Table B.1 represent approximations in seconds.

Table B.1 MX480 Link Aggregation Failover Scenarios

LAG Failover Routing FPC FPC Switch System System


Scenarios Engine Failover Failover Fabric Upgrade Upgrade
(Graceful Failover (FPC (FPC with 2 Failover without ISSU with ISSU
Restart Enabled) with one links of (NSR must
link of LAG and be
LAG) interface to enabled)
traffic
generator)
0
(upgrade
LACP NSR 1.5 53 backup first, 0
0 Immediate
Enabled Enabled (53, 53 ) and then
upgrade the
primary)

LACP NSR 1.5 ~52


0 Immediate – –
Enabled Disabled (51, 52, 53)

~20
NSR (upgrade
LACP ~63 backup first,
Enabled ~ 20 10 Immediate ~20 *
Disabled (57, 63, 64) and then
upgrade the
primary)

LACP NSR ~63


~ 20 10 Immediate – –
Disable Disabled (63, 64)

NOTE The following values listed in Table B.2 represent approximations in seconds.

Table B.2 EX8200 Link Aggregation Failover Scenarios

LAG Failover Routing FPC Failover Switch Fabric System Upgrade System
Scenarios Engine (FPC with LAG Failover (without ISSU/ Upgrade
Failover and interface without GRES) (without
to traffic) ISSU/with
GRES)
LACP Enabled/
Disabled (Does not ~84
matter if LACP is 0 (82, 86) Immediate 527 152
enabled/disabled)
Appendices 151

NOTE Refer to TableB.2 when reviewing the following system upgrade steps.

Steps associated with system upgrade (without ISSU/without GRES):

1. Break GRES between the primary and the backup device.


2. Upgrade the backup device.
3. Upgrade the primary device. (Observe the outage in approximate seconds).
4. Re-establish the GRES between the primary and the backup device.
Steps associated with System Upgrade (without ISSU/with GRES):

1. Break GRES between the primary and the backup device.


2. Upgrade the backup device.
3. Re-establish the GRES between the primary and the backup device.
4. Reverse the roles between the primary and backup devices (The primary device
becomes the backup and the backup device becomes the primary). Ignore the
warning about version-mismatch.
5. Break GRES between the primary device and the backup device.
6. Upgrade the backup device.
7. Re-establish the GRES between the primary and the backup device.

Methods for Performing Unified ISSU


The three methods for performing a unified ISSU are the following:

• Upgrading and Rebooting Both Routing Engines Automatically.


• Upgrading Both Routing Engines and Manually Rebooting the New Backup
Routing Engine.
• Upgrading and Rebooting Only One Routing Engine.

Method 1: Upgrading and Rebooting Both Routing Engines Automatically


This method uses the following reboot command:

request system software in-service-upgrade package-name reboot

1. Download the software package from the Juniper Networks Support Web site.
2. Copy the package to the /var/tmp directory on the router:
user@host>file copy
ftp://username:prompt@ftp.hostname.net/filename /var/
tmp/filename

3. Verify the current software version on both Routing Engines, using the show
version invoke-on all-routing-engines command:
{backup}
user@host> show version invoke-on all-routing-engines
152 Appendices

4. Issue the request system software in-service-upgrade package-name reboot


command on the master Routing Engine:
{master}
user@host> request system software in-service-upgrade /var/tmp/jinstall-
9.0-20080114.2-domestic-signed.tgz reboot
ISSU: Validating Image
PIC 0/3 will be offlined (In-Service-Upgrade not supported)
Do you want to continue with these actions being taken ? [yes,no] (no)
yes
ISSU: Preparing Backup RE
Pushing bundle to re1
Checking compatibility with configuration
. . .
ISSU: Old Master Upgrade Done
ISSU: IDLE
Shutdown NOW!
. . .
*** FINAL System shutdown message from root@host ***
System going down IMMEDIATELY
Connection to host closed.

5. Log in to the router once the new master (formerly backup Routing Engine) is
online. Verify that both Routing Engines have been upgraded:
{backup}
user@host> show version invoke-on all-routing-engines

6. To make the backup Routing Engine (former master Routing Engine) the
primary Routing Engine, issue the following command:
{backup}
user@host> request chassis routing-engine master acquire
Attempt to become the primary routing engine ? [yes,no] (no) yes
Resolving mastership...
Complete. The local routing engine becomes the master.
{master}
user@host>

7. Issue the request system snapshot command on each of the Routing Engines
to back up the system software to the router’s hard disk.

Method 2: Upgrading Both Routing Engines and Manually Rebooting the New
Backup Routing Engine
1. Issue the request system software in-service-upgrade command.
2. Perform steps 1 through 4 as described in Method 1.
3. Issue the show version invoke-on all-routing-engines command to verify
that the new backup Routing Engine (former master) is still running the
previous software image, while the new primary Routing Engine (former
backup) is running the new software image:
{backup}
user@host> show version

4. At this point, a choice between installing newer software or retaining the old
version can be made. To retain the older version, execute the request system
software delete install command.
Appendices 153

5. To ensure that a newer version of software is activated, reboot the new backup
Routing Engine, by issuing the following:
{backup}
user@host> request system reboot
Reboot the system ? [yes,no] (no) yes
Shutdown NOW!
. . .
System going down IMMEDIATELY
Connection to host closed by remote host.

6. Log in to the new backup Routing Engine and verify that both Routing Engines
have been upgraded:
{backup}
user@host> show version invoke-on all-routing-engines

7. To make the new backup the primary, issue the following command:
{backup}
user@host> request chassis routing-engine master acquire
Attempt to become the master routing engine ? [yes,no] (no) yes

8. Issue the request system snapshot command on each of the Routing Engines
to back up the system software to the router’s hard disk.

Method 3: Upgrading and Rebooting Only One Routing Engine


Use the request system software in-service-upgrade package-name no-old-
master-upgrade command on the master Routing Engine.

1. Request an ISSU upgrade:


{master}
user@host> request system software in-service-upgrade
/var/tmp/jinstall-9.0-20080116.2-domestic-signed.tgz no-old-master-
upgrade

2. To install the new software version on the new backup Routing Engine, issue
the request system software add command.

Troubleshooting Unified ISSU

NOTE The following Unified ISSU steps relate only to the Junos 9.6 release.

Perform the following steps if the ISSU procedure stops progressing.

1. Execute a request system software abort in-service-upgrade command on


the master Routing Engine.
2. To verify that the upgrade has been aborted, check the existing router session
for the following message: ISSU: aborted!
154 Appendices

Appendix C: Acronyms
A
AFE: Application Front Ends

apsd: automatic protection switching process

B
BPDU: Bridge Protocol Data Unit

BSR: Bootstrap Router

C
CBT: Core Based Tree

CIST: Common Instance Spanning Tree

CLI: Command Line Interface

CoS: class of service

D
dcd: device control process

DDoS: Distributed Denial of Service

DHCP: Dynamic Host control Protocol

DNS: Domain Name System

DSCP: Diffserv Code Points

DUT: Device Under Test

DVMRP: Distance Vector Multicast Routing Protocol

E
ESM: Ethernet Switch Module, Embedded Syslog Manager

F
FC: Fibre Channel

FCS: frame check sequence

FPC: Flexible PIC Concentrator

FSP: Flexible Service Processor

G
GRES: Graceful Route Engine Switchover

GSL: global server load balancing

H
HBA: Host Bus Adapter

HEA: Host Ethernet Adapter

HMC: Hardware Management Console

I
Appendices 155

IDP: Intrusion Detection and Prevention

IGMP: Internet Group Management Protocol

ISCSI: Internet Small Computer System Interface

iSSU: In Service Software Upgrade

IVE: Instant Virtual Extranet

IVM: Integrated Virtualization Manager

L
LAG: Link Aggregation

LDAP: Lightweight Directory Access Protocol

LPAR: Logical Partitions

LHEA: Logical Host Ethernet Adapter

M
MAC: Media Access Control

MCS: Multi Core Scaling

mgd: management process

MLD: Multicast Listener Discovery

MM: Management Module

MOSPF: Multicast Open Shortest Path First

MSTI: Multiple Spanning Tree Instance

MSDP: Multicast Source Discovery Protocol

MSTP: Multiple Spanning Tree Protocol

MTA: mail transfer agent

MTTR: mean time to repair

MTU: Maximum Transmission Unit

N
NAT: Network Address Translation

NIC: Network Interface Card

NIST: National Institute of Science and Technology

NPU: network processing unit

NSB: Nonstop Bridging

NSR: nonstop active routing

O
156 Appendices

OEM: Original Equipment Manufacturer

OSS: operation support systems

P
PDM: Power Distribution Module

PIC: Physical Interface Card

PIM: Protocol Independent Multicast

PLP: packet loss priority

PM: Pass: through Module

PoE: Power over Ethernet

PVST: Per-VLAN Spanning Tree

Q
QoS: Quality of Service

R
RED: random early detection

ROI: return on investment

RP: rendezvous point

RPC: remote procedure call

rpd: routing protocol process

RTG: Redundant Trunk Group

RSTP: Rapid Spanning Tree Protocol

RVI: routed VLAN interface

S
SAN: storage area network

SAP: Session Announcement Protocol

SCB: Switch Control Board

SDP: Session Description Protocol

SEA: Shared Ethernet Adapter

SMT: Simultaneous Multithreading

SNMP: Simple Network Management Protocol

snmpd: simple network management protocol process

SOA: Service Oriented Architecture

SOL: Serial over LAN

SPOF: single point of failure


Appendices 157

STP: Spanning Tree Protocol

SSH: source-specific multicast

SSL: Secure Sockets Layer

SSM: source: specific multicast

Syslogd: system logging process

T
TWAMP: Two-Way Active Measurement Protocol

V
VID: VLAN Identifier (IEEE 802.1q)

VIOS: Virtual I/O Server

VLAN: Virtual LAN

VLC: VideoLAN

VPLS: virtual private LAN service

VRF: Virtual Routing and Forwarding

VRRP: Virtual Router Redundancy Protocol

VSTP: Virtual Spanning Tree Protocol

W
WPAR: Workload based Partitioning
158 Appendices

Appendix D: References
• www.juniper.net/techpubs/software/junos/junos90/swconfig-high-
availability/swconfig-high-availability.pdf
• The Junos High Availability Configuration Guide, Release 9.0 presents an
overview of high availability concepts and techniques. By understanding the
redundancy features of Juniper Networks routing platforms and the Junos
software, a network administrator can enhance the reliability of a network and
deliver highly available services to customers.
• IEEE 802.3ad link aggregation standard
• STP - IEEE 802.1D 1998 specification
• RSTP - IEEE 802.1D-2004 specification
• MSTP - IEEE 802.1Q-2003 specification
• www.nettedautomation.com/standardization/IEEE_802/standards_802/
Summary_1999_11.html
Provides access to the IEEE 802 Organization website with links to all 802
standards.
• RFC 3768, Virtual Router Redundancy Protocol
• https://datatracker.ietf.org/wg/vrrp/
Provides access to all RFCs associated with the Virtual Router Redundancy
Protocol (VRRP).
• RFC 2338, Virtual Router Redundancy Protocol for IPv6
• https://datatracker.ietf.org/doc/draft-ietf-vrrp-ipv6-spec/
Provides access to the abstract that defines VRRP for IPv6.
Data Center Network Connectivity with IBM Servers

Data Center Network Connectivity Handbook


This handbook serves as an-easy-to-use reference tool for implementing a two-tier data center
network by deploying IBM open systems as the server platform with Juniper Networks routing and
switching solutions.

“A must-read, practical guide for IT professionals, network architects


and engineers, who wish to design and implement a high performance
Data Center infrastructure. This book provides a step-by-step approach,
with validated solution scenarios for integrating IBM open system
servers and Juniper Networks data center network, including technical
concepts and sample configurations.”

− −Scott Stevens, VP Technology,


Worldwide Systems Engineering,
Juniper Networks

“This book is a valuable resource for anyone interested in designing


network infrastructure for next generation data centers...it provides
clear, easy to understand descriptions of the unique requirements
for data communication in an IBM open systems environment. Highly
recommended!”

− −Dr. Casimer DeCusatis,


IBM Distinguished Engineer

7100125-001-EN June 2010