Beruflich Dokumente
Kultur Dokumente
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Mohini Singh Dukes is a Juniper Networks Staff Solutions Design Engineer in the
Solutions Engineering Group. She designs, implements and validates a wide range
of solutions in the mobile, Carrier Ethernet, data center interconnectivity and
security, business and residential services. Specializing in mobile networking
solutions including backhaul, packet backbone and security, she has authored a
number of whitepapers, application notes and implementation and design guides
based on solution validation efforts. She has also published a series of blogs on
energy efficient networking.
Jitender K. Miglani is a Juniper Networks Solutions Engineer for data center intra
and inter connectivity solutions. As part of Juniper’s OEM relationship with IBM,
Jitender assists in qualifying Juniper’s EX, MX and SRX Series Platforms with IBM
Open System Platforms (Power P5/P6, Blade Center and x3500).Jitender has
development and engineering experience in various voice and data networking
products, and with small/medium/large enterprise and carrier grade customers.
Jitender holds a bachelor’s in Computer Science from the Regional Engineering
College, Kurukshetra, India.
Authors Acknowledgments
The authors would like to take this opportunity to thank Patrick Ames, whose
direction and guidance was indispensible. To Nathan Alger, Lionel Ruggeri, and
Zach Gibbs, who provided valuable technical feedback several times during the
development of this booklet, your assistance was greatly appreciated. Thanks
also to Cathy Gadecki for helping in the formative stages of the booklet. There are
certainly others who helped in many different ways and we thank you all.
Juniper Networks
Marc Bernstein
Venkata Achanta
Charles Goldberg
Scott Sneddon
John Bartlomiejczyk
Allen Kluender
Fraser Street
Robert Yee
Niraj Brahmbhatt
Paul Parker-Johnson
Travis O’Hare
Scott Robohn
Ting Zou
Krishnan Manjeri
IBM
Rakesh Sharma
Casimer DeCusatis
Preface v
Preface
The IBM Open System Servers solution – including IBM Power systems,
System x, and Blade Center Systems – comprises the foundation for a dynamic
infrastructure. IBM server platforms help consolidate applications and servers,
and virtualize its system resources while improving overall performance,
availability and energy efficiency, providing a more flexible, dynamic IT
infrastructure.
Key topics discussed in this book focus on the following routing and switching
solutions in Juniper’s simplified two-tier data center network architecture with IBM
open systems.
• Best practices for integrating Juniper Networks EX and MX Series switches and
routers with IBM Open Systems.
• Configuration details for various spanning tree protocols such as Spanning Tree
Protocol (STP), Multiple Spanning Tree Protocol (MSTP), Rapid Spanning Tree
Protocol (RSTP), and Virtual Spanning Tree Protocol (VSTP); deployment
vi Data Center Network Connectivity with IBM Servers
Please make sure to send us your feedback with any new or relevant ideas that you
would like to see in future revisions of this book, or in other Validated Solutions
books, at: solutions-engineering@juniper.net.
7
Chapter 1
Introduction
Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Challenges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
TODAY’S DATA CENTER ARCHITECTS and designers do not have the luxury .
of simply adding more and more devices to solve networking’s constant and
continuous demands such as higher bandwidth requirements, increased speed,
rack space, tighter security, storage, interoperability among many types of devices
and applications, and more and more diverse and remote users.
This chapter discusses in detail the data center trends and challenges now facing
network designers. Juniper Networks and IBM directly address these trends and
challenges with a data center solution that will improve data center efficiency by
simplifying the network infrastructure, by reducing recurring maintenance and
software costs, and by streamlining daily management and maintenance tasks.
8 Data Center Network Connectivity with IBM Servers
Trends
Although there are several types of data centers for supporting a wide range of
applications such as financial, web portals content providers, and IT back office
operations, they all share certain trends, such as:
Server Growth
Servers are continuing to grow at a high annual rate of 11 percent, while storage is
growing at an even higher rate of 22 percent: both of which are causing tremendous
strain on the data center’s power and cooling capacity. According to Gartner, OS and
application instability is increasing the server sprawl with utilization rates of 20
percent, leading to an increased adoption of server virtualization technologies.
Challenges
Today’s major data center challenges include scale and virtualization, complexity
and cost, and interconnectivity for business continuity, and security:
Security
The shared infrastructure in the data center or cloud should support multiple
customers, each with multiple hosted applications, provide complete, granular and
virtualized security that is easy to configure and understand, and support all major
operating systems on a plethora of mobile and desktop devices. In addition, a
shared infrastructure should integrate seamlessly with existing identity systems,
check host posture before allowing access to the cloud, and make all of this
accessible for thousands of users, while protecting against sophisticated
application attacks, Distributed Denial of Service (DDoS) attacks and hackers.
IBM System x
The IBM System x3850 X5 server is the fifth generation of the Enterprise
X-Architecture, delivering innovation with enhanced reliability and availability
features to enable optimal performance for databases, enterprise applications and
virtualized environments. According to a recent IBM Redbooks paper, a single IBM
System x3850 X5 host server can support up to 384. For details, please refer to High
density virtualization using the IBM system x3850 X5 at www.redbooks.ibm.com/
technotes/tips0770.pdf.
IBM BladeCenter
The BladeCenter is built on IBM X-Architecture to run multiple business-critical
applications with simplification, cost reduction and improved productivity.
Compared to first generation Xeon-based blade servers, IBM BladeCenter HS22
blade servers can help improve the economics of your data center with:
The MX Series provides carrier grade reliability, density, performance, capacity and
scale for enterprise networks with mission critical applications. High availability
features such as nonstop routing (NSR), fast reroute, and unified in service software
upgrade (ISSU) ensure that the network is always up and running. The MX Series
delivers significant operational efficiencies enabled by Junos OS, and supports a
collapsed architecture requiring less power, cooling and space consumption. The
MX Series also provides open APIs for easily customized applications and services.
The MX Series enables enterprise networks to profit from the tremendous growth
of Ethernet transport with the confidence that the platforms they install now will
have the performance and service flexibility to meet the challenges of their evolving
requirements.
The MX Series 3D Universal Edge Routers include the MX80 and MX80-48T,
MX240, MX480 and MX960.Their common key features include:
architectures and services that require high availability, advanced QoS, and
performance and scalability that support mission critical networks. The MX960
platform is ideal where SCB and Routing Engine redundancy are required. All major
components are field-replaceable, increasing system serviceability and reliability,
and decreasing mean time to repair (MTTR). The enterprise customers typically
deploy MX960 or MX480 in their data center core.
NOTE We deployed the MX480 in this handbook. However, the configurations and
discussions pertaining to the MX480 also apply to the entire MX product line.
The EX4200 Ethernet switches with virtual chassis technology have the following
major features:
T Series EX8216
Junos Space
Junos Pulse
EX8208 NSM
SRX5000 Line
NSMXpress
SRX3000 Line
MX Series
SRX650 SRX240 EX4200 Line
Figure 1.1 Junos Operating System Runs on the Entire Data Center Network: Security, Routers, and
Switching Platforms
14
REMOTE/CLOUD USER
SSL VPN
PUBLIC CLOUD
EX8208
TELEWORKER
EX4200 EX8216
MX240
MX480
IBM System z SRX100
EX4200 MX960
EX4200
IBM System p
MX240
MX480
EX4200 MX960
EX8208
Virtual
IBM System x EX8216
Chassis
and BladeCenter
IC6500
SBR Appliance
Unified Access SRX5600
Control SRX5800
MPLS/VPLS MPLS/VPLS
WAN NETWORK
SA6500 NetView
EX4200 EX4200 EX4200 EX4200
Virtual Chassis
Federated Network Netcool
Identity Manager
Manager
IBM IBM IBM Blade
System z System p System x Center
Provisioning
Access Manager Fibre Channel Manager
Ethernet SERVER
FC iSCSI NFS/CIFS
SAN NAS File Systems
STORAGE
LARGE BRANCH
Kiosk
Virtual Chassis
EX4200 EX4200
Tivoli Storage
Manager Fastrack
(TSMF)
SRX650 WXC2600
HEADQUARTERS
Tivoli Storage
Manager Fastrack
(TSMF)
WXC2600 EX4200
Virtual
Chassis
EX8208, EX8216
MPLS/VPLS
WAN NETWORK
SA6500 NetView
EX4200 EX4200 EX4200 EX4200
Virtual Chassis
Federated Network Netcool
Identity Manager
Manager
IBM IBM IBM Blade
System z System p System x Center
Provisioning
Access Manager Fibre Channel Manager
Ethernet SERVER
FC iSCSI NFS/CIFS
SAN NAS File Systems
STORAGE
IBM Tivoli and Juniper Networks Junos Space for Comprehensive Network
Management Solution
Managing the data center network often requires many tools from different
vendors, as the typical network infrastructure often is a complex meshed network
deployment. This type of network deployment combines different network
topologies and often includes devices from multiple vendors and network
technologies for delivery. IBM Tivoli products and Juniper Networks Junos Space
together can manage data center networks effectively and comprehensively. The
tools include:
MORE For the latest IBM and Juniper Networks data center solution, visit http://www.
juniper.net/us/en/company/partners/global/ibm/#dynamic.
Since 2007, the two companies have been working together on joint technology
solutions, standards development and network management and managed
security services. IBM specifically included Juniper Networks switching, routing, and
security products into their data center network portfolio, with IBM playing an
invaluable role as systems integrator.
secure public cloud to ensure that high priority applications are given preference
over lower priority ones when computing resources become constrained. IBM and
Juniper are installing these advanced networking capabilities into IBM’s nine
worldwide Cloud Labs for customer engagements. Once these advanced
networking capabilities are installed in the nine worldwide Cloud Labs, IBM and
Juniper will be able to seamlessly moveclient-computing workloads between
private and publicly managed cloud environments, enabling customers to deliver
reliably on service-level agreements (SLAs).
In July of 2009, Juniper and IBM continued to broaden their strategic relationship by
entering into an OEM agreement that enables IBM to provide Juniper’s Ethernet
networking products and support within IBM’s data center portfolio. The addition of
Juniper’s products to IBM’s data center networking portfolio provides customers
with a best-in-class networking solution and accelerates the shared vision of both
companies for advancing the economics of networking and the data center by
reducing costs, improving services and managing risk.
IBM j-type Data Center Products and Juniper Networks Products Cross Reference
The IBM j-type e-series Ethernet switches and m-series Ethernet routers use
Juniper Networks technology. Table 1.1 shows the mapping of IBM switches and
routers to their corresponding Juniper Networks model. For further information
concerning product information, please visit the website, IBM and Junos in the Data
Center: A Partnership Made For Now, at https://simplifymydatacenter.com/ibm.
Table 1.1 Mapping of IBM j-type Data Center Network Products to Juniper Networks Products
Chapter 2
Design Considerations
Design Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
As shown in Figure 2.1, Juniper Networks data center network reference architecture
consists of the following four tiers:
• Edge Services Tier – provides all WAN services at the edge of data center
networks; connects to the WAN services in other locations, including other data
centers, campuses, headquarters, branches, carrier service providers, managed
service providers and even cloud service providers.
• Core Network Tier – acts as the data center network backbone, which
interconnects other tiers within the data center and can connect to the core
network tier in other data centers.
• Network Services Tier – provides centralized network security and application
services, including firewall, Intrusion Detection and Prevention (IDP) and server
load balancing.
• Applications and Data Services Tier – connects mainly servers and storage in
the LAN environment and acts as an uplink to the core network tier.
The subsequent sections in this chapter explain each network tier in detail.
Chapter 2: Design Considerations 21
EDGE SERVICES
M Series M Series
Server Internet
WAN Security Access
Acceleration Gateway Gateway
VPN
WX Series/ SRX Series SRX Series SRX Series
Termination
WXC Series Gateway
Intrusion
Detection and
Prevention
IDP Series EX4200
or
Core
Secure Access
Aggregation
(SSL)
SA Series Router
SRX Series MX Series EX8200
Core Firewall
APPLICATIONS
AND DATA
SERVICES
EX4200 EX4200 EX4200 EX4200
• Internet routing isolation, for example separating the exterior routing protocols
from the interior routing protocols.
• Network Address Translation (NAT) to convert your private IP addresses to
public Internet routable IP addresses.
• IPSec VPN tunnel termination for partner, branch and employee connections.
• Border security to enforce stateful firewall policies and content inspection.
• Quality of Service (QoS).
Combining the traditional three-tier core and aggregation tiers into a single
consolidated core provides other benefits such as:
This tier is responsible for handling service policies for any network, servers and/or
application. Because network service is centralized, it must provide service to all
servers and applications within the data center; it should apply a network-specific
policy to a particular network or apply an application-specific policy to set of
Chapter 2: Design Considerations 23
• External applications networks, which can have multiple external networks that
serve separate network segments. These typically include applications such as
the public Web, public mail transfer agent (MTA), Domain Name System (DNS)
services and remote access and potential file services that are available
through unfiltered access.
• Internal applications networks, which can have multiple internal networks
serving different levels of internal access from campus or branch locations.
These networks typically connect internal applications such as finance and
human resources systems. Also residing in the internal network are partner
applications and or any specific applications that are exposed to partners such
as inventory systems and manufacturing information.
• Infrastructure services networks, which provide secure infrastructure network
connections between servers and their supporting infrastructure services, such
as Lightweight Directory Access Protocol (LDAP), databases, file sharing,
content management and middleware servers. Out of Band Management is
also a part of this network.
• Storage networks, which provide remote storage to servers using different
standards, such as Fibre Channel, InfiniBand or Internet Small Computer
System Interface (iSCSI). Many mission critical application servers typically use
the Bus Adapter (HBA) to connect to a remote storage system, ensuring fast
access to data. However, large numbers of servers use iSCSI to access remote
storage systems by using the TCP/IP network for simplicity and cost efficiency.
24 Data Center Network Connectivity with IBM Servers
Design Considerations
The following key design considerations are critical attributes for designing today’s
data center network architecture:
NOTE The design considerations discussed in this handbook are not necessarily specific
to Juniper Networks solutions and can be applied universally to any data center
network design, regardless of vendor selection.
Security
The critical resources in any enterprise location are typically the applications
themselves, and the servers and supporting systems such as storage and
databases. Financial, human resources, and manufacturing applications with
supporting data typically represent a company’s most critical assets and, if
compromised, can create a potential disaster for even the most stable enterprise.
The core network security layers must protect these business critical resources
from unauthorized user access and attacks, including application-level attacks.
The security design must employ layers of protection from the network edge
through the core to the various endpoints, such as, for example, defense in depth. A
layered security solution protects critical network resources that reside on the
network. If one layer fails, the next layer will stop the attack and/or limit the
damages that can occur. This level of security allows IT departments to apply the
appropriate level of resource protection to the various network entry points based
upon their different security, performance and management requirements.
Chapter 2: Design Considerations 25
Layers of security that should be deployed at the data center include the following:
For further details, refer to the National Institute of Science and Technology (NIST)
recommended best practices, as described in Guide to General Server Security
Recommendations of the National Institute of Standards and Technology at
http://csrc.nist.gov/publications/nistpubs/800-123/SP800-123.pdf .
Simplicity
Simplicity can be achieved by adopting new architectural designs, new
technologies, and network operating systems.
Figure 2.2 shows data center network design trends from a traditional data center
network, to a network consisting of a virtualized access tier and collapsed
aggregate and core tiers, to a network with improved network virtualization on
the WAN.
Tier 2:
MX480 MX480
Aggregation
EX8208 EX8208
Servers
Converged I/O technology is a new technology that simplifies the data center
infrastructure by supporting flexible storage and data access on the same network
interfaces on the server side, and by consolidating storage area networks (SANs
and LANs) into a single logical infrastructure. This simplification and consolidation
makes it possible to allocate dynamically any resource – including routing,
switching, security services, storage systems, appliances and servers – without
compromising performance.
Keeping in mind that network devices are complex, designing an efficient hardware
platform is not, by itself, sufficient in achieving an effective, cost-efficient and
operationally tenable product. Software in the control plane plays a critical role in
the development of features and in ensuring device usability. Because Junos is a
proven modular software network operating system that runs across different
platforms, implementing Junos is one of the best approaches to simplifying the
daily operations of the data center network.
In a recent study titled, The Total Economic Impact™ of Juniper Networks Junos
Network Operating System, Forrester Consulting reported a 41 percent reduction in
overall network operational costs based on dollar savings across specific task
Chapter 2: Design Considerations 27
As the foundation of any high performance network, Junos exhibits the following
key attributes as illustrated in Figure 2.3:
• One operating system with a single source base and a single consistent feature
implementation.
• One software release train extended through a highly disciplined and firmly
scheduled development process.
• One common modular software architecture that stretches across many
different Junos hardware platforms for many different Junos hardware
platforms, including MX Series, EX Series, and SRX Series.
S ECU R I T Y
ONE Module
— API —
OS
N AG E M E N T
RO U T I N G
ONE
Architecture
MA
Frequent Releases
SW I T
CHING
Performance
To address performance requirements related to server virtualization, centralization
and data center consolidation, the data center network should boost the
performance of all application traffic, whether local or remote. The data center
should offer LAN-like user experience levels for all enterprise users irrespective of
their physical location. To accomplish this, the data center network should optimize
applications, servers, storage and network performance.
28 Data Center Network Connectivity with IBM Servers
WAN optimization techniques that include data compression, TCP and application
protocol acceleration, bandwidth allocation, and traffic prioritization improve
performance network traffic. In addition, these techniques can be applied to data
replication, and to backup and restoration between data centers and remote sites,
including disaster recovery sites.
Within the data center, Application Front Ends (AFEs) and load balancing solutions
boost the performance of both client-server and Web-based applications, as well
as speeding Web page downloads. In addition, designers must consider offloading
CPU-intensive functions, such as TCP connection processing and HTTP
compression, from backend applications and Web servers.
center, and from server to server between data centers in a flat Layer 2 network,
when these data centers are within reasonably close proximity. Virtual Chassis
with MPLS allows the Layer 2 domain to extend across data centers to support
live migration from server to server when data centers are distributed over
significant distances.
Juniper Networks virtualization technologies support low latency, throughput, QoS
and high availability required by server and storage virtualization. MPLS-based
virtualization addresses these requirements with advanced traffic engineering to
provide bandwidth guarantees, label switching and intelligent path selection for
optimized low latency and fast reroute for extreme high availability across the WAN.
MPLS-based VPNs enhance security with QoS to efficiently meet application and
user performance needs.
Innovation
Innovation, for example green initiatives, influences data center design. A green
data center is a repository for the storage, management and dissemination of data
in which the mechanical, lighting, electrical and computer systems provide
maximum energy efficiency with minimum environmental impact. As older data
center facilities are upgraded and newer data centers are built, it is important to
ensure that the data center network infrastructure is highly energy and space
efficient.
Network designers should consider power, space and cooling requirements for all
network components, and they should compare different architectures and
systems so that they can ascertain the environmental and cost impacts across the
entire data center. In some environments, it might be more efficient to implement
high-end, highly scalable systems that can replace a large number of smaller
components, thereby promoting energy and space efficiency.
Green initiatives that track resource usage, carbon emissions and efficient
utilization of resources, such as power and cooling are important factors when
designing a data center. Among the many Juniper energy efficiency devices, the
MX960 is presented in Table 2.1 to demonstrate its effects on reductions in energy
consumption and footprint within the data center.
Core
Tier
MX480 MX480
EX4200 VC EX4200 VC
MM1
MM2 NICs/HEA NICs/HEA
(Host EthernetAdapter) (Host EthernetAdapter)
Virtual Virtual
Servers Switch Switch
Two MX960 routers are shown to indicate high availability between these devices,
providing end-to-end network virtualization for applications by mapping Virtual
Routing and Forwarding (VRF) in the MX Series to security zones in the SRX. In Figure
2.5 for example, the VRF #1 is mapped to security zones Firewall #1, NAT #1, and IPS
#1, and VRF #2 is mapped to Firewall #2 and NAT #2.
Access Tier
We typically deploy the EX4200 Ethernet Switch with Virtual Chassis Technology as
a top-of-rack virtual chassis in the access tier.
The EX4200, together with server virtualization technology, supports high availability
and high maintainability – two key requirements for mission critical, online
applications.
Servers uplink
(LAG+ backup) TOR Virtual Chassis
Uplink (LAG)
VIOS Secondary
VIOS
• The Power 570 Servers are deployed with dual Virtual I/O Servers (VIOS): the
primary VIOS runs in active mode while the secondary VIOS runs in standby
mode. The primary VIOS connects to one top-of-rack virtual chassis while the
secondary one connects to another top-of-rack virtual chassis.
Chapter 2: Design Considerations 33
• The typical bandwidth between the PowerVM’s VIOS and the top-of-rack
virtual chassis switch is 4Gbps, realized as 4 x 1Gbps ports in the NIC combined
in a LAG. The bandwidth can scale up to 8 Gbps by aggregating eight ports in a
LAG interface.
• The two Hardware Management Consoles (HMCs) connect to two different
top-of-rack virtual chassis, for example HMC 1 and HMC 2.
Besides preventing single point of failure (SPOF), this approach also provides highly
available maintenance architecture for the network: when a VIOS or virtual chassis
instance requires maintenance, operators can upgrade the standby VIOS or virtual
chassis while the environment runs business as usual, then switch the environment
to the upgraded version without disrupting application service.
After addressing all the connectivity issues, we must not lose sight of the
importance of performance in the other network layers and network security
because we are operating the data center network as one secured network.
CORE LAYER
EX8200 EX8200
ACCESS LAYER
EX4200 EX4200
SERVER LAYER 56 IBM Power 570 systems 56 IBM Power 570 systems
4480 Client Partitions 4480 Client Partitions
Figure 2. 7 Top-Of-Rack Virtual Chassis with Seven EX4200s Connected to Power 570 Systems
34 Data Center Network Connectivity with IBM Servers
MORE For further details concerning IBM PowerVM and EX4200 top-of-rack virtual
chassis scalability, refer to Implementing IBM PowerVM Virtual Machines on Juniper
Networks Data Center Networks at www.juniper.net/us/en/local/pdf/
implementation-guides/8010049-en.pdf.
Chapter 3
Implementation Overview
Multicast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
High Availability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Implementation Scenarios. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Multicast
WAN Edge + Core Streaming
Core/Aggregation Tier Source
LAG/VRRP
PIM
EX4200
EX4200 EX4200
Virtual Chassis 2
EX4200 EX4200
Virtual Chassis 1
To R1 To R2 To R3
LAG LAG STP
Multicast Receiver - IGMP host Multicast Receiver - IGMP host Multicast Receiver - IGMP host
NOTE Each individual implementation can differ based on network design and
requirements.
The topology described here consists of the following tiers and servers:
• The servers connect to the access tier through multiple 1GbE links with Link
Aggregation (LAG) to prevent single point of failure (SPOF) in the physical
link and improve bandwidth.
• The access switches connect to the core layer with multiple 10GbE links.
• At the core tier, the MX480s and EX8200s interconnect to each other using
redundant 10GbE links. These devices connect to the WAN edge tier, which
interconnects the different data centers and connects to external networks.
For the IBM System p and PowerVM, we discuss its production networks and
management networks. We also discuss key PowerVM server virtualization
concepts, including Shared Ethernet Adapter (SEA) and Virtual I/O Server (VIOS).
For the Juniper Networks MX and EX Series, we discuss the Junos operating system,
which runs on both the MX and EX Series platforms. In addition, we discuss the
jumbo frame Maximum Transmission Unit (MTU) setting.
38 Data Center Network Connectivity with IBM Servers
NOTE Both the MX Series and EX Series devices support all spanning tree protocols.
Spanning Tree Protocols, such as RSTP, MSTP and VSTP, prevent loops in Layer
2-based access and aggregation layers. MSTP and VSTP are enhancements over
RSTP. MSTP is useful when it is necessary to divide a Layer 2 network into multiple,
logical spanning tree instances. For example, it is possible to have two MSTP
instances that are mutually exclusive from each other while maintaining a single
broadcast domain. Thus, MSTP provides better control throughout the network
by dividing it into smaller regions. MSTP is preferred when different devices must
fulfill the role of the root bridge. Thus, the role of the root bridge is spread across
multiple devices.
NOTE When using MSTP, it is important to distribute the root bridge functionality across
an optimal number of devices without increasing the latency time during root
bridge election.
Multicast
The multicast protocol optimizes the delivery of video streaming and improves
network infrastructure and overall efficiency. In Chapter 6, we present multicast
implementation scenarios, including Protocol Independent Multicast (PIM) and
IGMP snooping.
In these scenarios, the video streaming client runs on IBM servers. PIM is
implemented on the core/aggregation tiers, while IGMP snooping is implemented
on the access tier.
Performance
In Chapter 7 two methods for improving data center network performance are
covered in detail:
• Using CoS to manage traffic.
• Considering latency characteristics when designing networks using Juniper
Networks data center network products.
Chapter 3: Implementation Overview 39
Latency
Evolution of Web services and SOA has been critical to the integration of
applications that use standard protocols such as HTML. This tight integration of
applications with web services has generated an increase of almost 30-75 percent
of east-west traffic (server-to-server traffic) within the data center.
• Consolidating the number of devices and thus the tiers within the data center.
• Extending the consolidation between tiers using techniques such as virtual
chassis. Virtual chassis and multiple access layer switches can be grouped
logically to form one single switch. This reduces the latency time to a few
microseconds because the traffic from the server does not need to be
forwarded through multiple devices to the aggregation layer.
In the latency implementation scenario, we primarily focus on how to configure the
MX480 for measuring Layer 2 and Layer 3 latency.
High Availability
High availability can provide continuous service availability when implementing
redundancy, stateful recovery from a failure, and proactive fault prediction. High
availability minimizes failure recovery time.
Junos OS provides several high availability features to improve user experience and
to reduce network downtime and maintenance. For example, features such as
virtual chassis (supported on EX4200), Non Stop Routing/Bridging (NSR/NSB,
both supported on MX Series), GRES, GR and Routing Engine Redundancy can help
increase availability at the device level. The Virtual Routing Redundancy Protocol
(VRRP), Redundant Trunk Group (RTG) and LAG features control the flow of traffic
over chosen devices and links. The ISSU feature on the MX Series reduces network
downtime for a Junos OS software upgrade. For further details concerning a variety
of high availability features, see Chapter 8: Configuring High Availability.
Each high availability feature can address certain technical challenges but may not
address all the challenges that today’s customers experience. To meet network
design requirements, customers can implement one or many high availability
features. In the following section, we discuss high availability features by comparing
their characteristics and limitations within the following groups:
Neighbors are required to support graceful GR can cause blackholing if RE failure occurs
restart. due to a different cause.
Nonstop active routing/bridging and graceful restart are two different mechanisms
for maintaining high availability when a router restarts.
A router undergoing a graceful restart relies on its neighbors to restore its routing
protocol information. Graceful restart requires a restart process where the
neighbors have to exit a wait interval and start providing routing information to the
restarting router.
NSR/NSB does not require a route restart. Both primary and backup Routing
Engines exchange updates with neighbors. Routing information exchange
continues seamlessly with the neighbors when the primary Routing Engine fails
because the backup takes over.
NOTE NSR cannot be enabled when the router is configured for graceful restart.
Chapter 3: Implementation Overview 41
Graceful Routing Engine Interface and kernel • The new primary Routing Engine restarts the
switchover enabled information preserved routing protocol process (rpd).
during switchover. The • All adjacencies are aware of the router’s
switchover is faster because change in state.
the Packet Forwarding
Engines are not restarted.
Graceful Routing Engine Traffic is not interrupted • Unsupported protocols must be refreshed
switchover and nonstop during the switchover. using the normal recovery mechanisms
active routing enabled Interface, kernel and routing inherent in each protocol.
protocol information is
preserved.
Graceful Routing Engine Traffic is not interrupted • Neighbors are required to support graceful
switchover and graceful during the switchover. restart and a wait interval is required.
restart enabled Interface and kernel • The routing protocol process (rpd) restarts.
information is preserved. For certain protocols, a significant change in
Graceful restart protocol the network can cause graceful restart to
extensions quickly collect stop.
and restore routing
information from the
neighboring routers.
42 Data Center Network Connectivity with IBM Servers
Virtual Chassis
Between 2 and 10 EX4200 switches can be connected and configured to form a
single virtual chassis that acts as a single logical device to the rest of the network. A
virtual chassis typically is deployed in the access tier. It provides high availability to
the connections between the servers and access switches. The servers can be
connected to different member switches of the virtual chassis to prevent SPOF.
At any time, one of the VRRP routers is the master (active) and the others are
backups. If the master fails, one of the backup routers becomes the new master
router, thus always providing a virtual default router and allowing traffic on the LAN
to be routed without relying on a single router.
Implementation Scenarios
Table 3.3 summarizes the implementation scenarios presented in this handbook. It
provides mapping between each scenario, network tier, and devices. Using this
table as a reference, you can map the corresponding chapter to each particular
implementation scenario.
Chapter 3: Implementation Overview 43
Access-Aggregation/Core EX4200
Spanning Tree
Chapter 5 Aggregation-Aggregation EX8200
(MSTP/RSTP/VSTP)
Aggregation-Core MX Series
EX4200
PIM Chapter 6 Access
EX8200, MX Series
EX4200, EX8200,
IGMP snooping Chapter 6 Access
MX Series
Access EX4200
CoS Chapter 7
Aggregation/Core EX8200, MX Series
Access
EX4200
VRRP Chapter 8 Aggregation/Core
EX8200, MX Series
Access
RTG Chapter 8 EX Series only
Aggregation
MX Series, EX8200
Routing Engine Redundancy Chapter 8 Aggregation/Core
Access EX4200
GR Chapter 8
Aggregation/Core EX8200, MX Series
Access-Server EX4200
LAG Chapter 8
Aggregation/Core EX8200, MX Series
44 Data Center Network Connectivity with IBM Servers
Table 3.4 functions as a reference aid to help our customers thoroughly understand
how Juniper Networks products and features, which are available in Junos 9.6, can
be implemented into their networks. This table summarizes implementation
scenarios and their supported products that are defined in detail later in this guide.
NSR/NSB – – Yes
ISSU – – Yes
Performance
CoS Yes Yes Yes
Multicast
PIM Yes Yes Yes
Chapter 4
VLAN 100
VLAN 200
Hypervisor
VLAN 100
VLAN 200
EX4200
The VIOS, also called the Hosting Partition, is a special-purpose LPAR in the server,
which provides virtual I/O resources to client partitions. The VIOS owns the
resources, such as physical network interfaces and storage connections. The
network or storage resources, reachable through the VIOS, can be shared by client
partitions running on the machine, enabling administrators to minimize the number
of physical servers deployed in their network.
In PowerVM, client partitions can communicate among each other on the same
server without requiring access to physical Ethernet adapters. Physical Ethernet
adapters are required to allow communication between applications running in the
client partitions and external networks. A Shared Ethernet Adapter (SEA) in VIOS
bridges the physical Ethernet adapters from the server to the virtual Ethernet
adapters functioning within the server.
Chapter 4: Connecting IBM Servers in the Data Center Network 47
Because the SEA functions at Layer 2, the original MAC address and VLAN tags of
the frames associated with the client partitions (virtual machines) are visible to
other systems in the network. For further details, refer to IBM’s white paper Virtual
Networking on AIX 5L at www.ibm.com/servers/aix/whitepapers/aix_vn.pdf.
Out-of-Band
Network
Management
Client
HMC Private
Management
Network
Server SRV 1
LPAR LPAR 1 LPAR 2 LPAR 3
VIOS
FSP
NIC NIC NIC
Server SRV 2
LPAR LPAR 1 LPAR 2 LPAR 3
VIOS
FSP
NIC NIC NIC
As illustrated in Figure 4.2, IBM Power Systems management networks require two
networks:
The out-of-band management network connects HMC and client networks so that
a client’s request for access can be routed to the HMC. A HMC private management
network is dedicated for communication between the HMC and its managed
servers. The network uses a selected range of non-routable IP addresses, and the
Dynamic Host Configuration Protocol (DHCP) server is available in the HMC for IP
allocation. Each p server connects to the private management network through its
Flexible Service Processor (FSP) ports.
Through the HMC private management network, the HMC manages servers in the
following steps:
1. Connects the p server’s FSP port to the HMC private management network so
that HMC and the server are in the same broadcast domain, and HMC runs
DHCP server (dhcpd).
2. Powers on the server. The server’s FSP runs the DHCP client and requests a new
IP address. FSP gets the IP address, which is allocated from HMC.
3. HMC communicates with the server and updates its managed server list with
this new server.
4. HMC performs operations on the server, for example powers on/off the server,
creates LPAR, sets shared adapters (Host Ethernet and Host Channel) and
configures virtual resources.
To allocate (or remove) the NIC on the LPAR, perform the following steps:
1. Select LPAR.
2. Select: Configuration >> Manage Profiles.
3. Select the profile that you want to change.
4. Select I/O tab.
5. Select NIC (physical I/O resource).
6. Click Add to add the NIC (or Remove to remove the NIC).
7. Select OK to save changes, then click Close.
Chapter 4: Connecting IBM Servers in the Data Center Network 49
NOTE The NIC can be allocated to multiple profiles. Because the NIC allocation is
exclusive during the profile runtime, only one profile activates and uses this NIC. If
the NIC is already used by one active LPAR, and you attempt to activate another
LPAR, which requires the same NIC adapter, the activation process will be aborted.
Adding or removing the NIC requires re-active LPAR profile.
Similar to a physical Ethernet adapter on the physical server, the virtual Ethernet
adapter on the partition provides network connectivity to the virtual Ethernet
switch. When you create a virtual Ethernet adapter on the partition from the HMC,
the corresponding virtual port is created on the virtual Ethernet switch and there is
no need to attach explicitly a virtual Ethernet adapter to a virtual port.
The virtual Ethernet adapter and virtual Ethernet switch form a virtual network
among the client partitions so that they can communicate with each other running
on the same physical server. The VIOS is required for client partition to further
access the physical network outside of the physical server. As shown in Figure 4.4,
three LPARs and VIOS connect to two virtual Ethernet switches through virtual
Ethernet adapters. The VIOS also connects to the physical NIC so that LPAR2 and
LPAR3 can communicate with each other; LPAR1, LPAR2 and VIOS can
communicate with each other and further access the external physical network
through the physical NIC.
Server SRV 1
LPAR LPAR 1 LPAR 2 LPAR 3
VIOS
to HMC FSP
7. In the virtual Ethernet Adapter Properties window (as shown in Figure 4.5),
enter the following:
a. Adapter ID, (default value displays).
b. VSwitch, virtual Ethernet Switch that this adapter connects to.
c. VLAN ID, VLAN ID for untagged frames, Vswitch will add/remove
the VLAN header.
d. Select the checkbox, this adapter is required for partition activation.
e. Select the checkbox, IEEE 802.1q compatible adapter, for control if VLAN
tagged frames are allowed on this adapter.
f. Select the Add, Remove, New VLAN ID and Additional VLANs for adding/
removing VLAN IDs that are allowed for VLAN tagged frames.
g. Select the checkbox Access external network enabled only on LPARs used
for bridging traffic from the virtual Ethernet Switch to some other NIC.
Typically this should be kept unchecked for regular LPARs and checked
for VIOS.
h. Click OK to save changes made in the profile and then select Close.
To Remove a Virtual Ethernet Adapter
NOTE Make sure that the Access External network option is checked when the virtual
Ethernet adapter is created on VIOS.
Table 4.1 lists and defines the parameters associated with this command.
Parameters Description
Is the physical port that connects to the external network, on
target _ device
NIC exclusively allocated to VIOS, LPAR or LHEA.
Represents one or more virtual Ethernet adapters that SEA
virtual _ ethernet _ adapters
will bridge to target_device (typically only one adapter).
Is the default virtual Ethernet adapter that will handle
DefaultVirtualEthernetAdapter
untagged frames (typically the same as previous parameter).
Is the VID for the default virtual Ethernet adapter (typically
SEADefaultPVID
has the value of 1).
Server SRV 1
LPAR LPAR 1 LPAR 2
VIOS
NIC
Server SRV 1
LPAR LPAR 1 LPAR 2 LPAR 3
VIOS
to HMC FSP
HEA
HEA Ext Port 1 HEA Ext Port 2
Because HEA creates a virtual network for the client partitions and bridges the
virtual network to the physical network, it replaces the need for both the virtual
Ethernet and the Shared Ethernet Adapter. In addition, HEA enhances performance
and improves utilization for Ethernet because HEA eliminates the need to move
packets (using virtual Ethernet) between partitions and then through a SEA to the
physical Ethernet interface. For detailed information, refer to IBM’s Redpaper
Integrated Virtual Ethernet Adapter Technical Overview and Introduction at
www.redbooks.ibm.com/abstracts/redp4340.html.
HEA is configured through HMC. The following list includes some HEA
configuration rules:
Keyboard
and Monitor
Ethernet Switch
Private Network
(192.168.128.0/17)
p6 Server
LPAR LPAR LPAR LPAR LPAR
VIOS RHEL SUSE AIX 5.3 AIX 6.1
FSP
Host Ethernet
Adapter (HEA) NIC under test
p5 Server
LPAR LPAR LPAR
VIOS RHEL SUSE
FSP
Ethernet Switch
Management Network
(172.28.113.0/24) DUT
Production Network
(11.11.1.0/24)
Management Workstation
(Web client–Telnet/SSH Client)
Figure 4.10 IBM Power Series Servers, LPARs, and Network Connections
Chapter 4: Connecting IBM Servers in the Data Center Network 57
HMC runs on a Linux server with two network interfaces: one connects to a private
network for all managed P5/P6 systems (on-board Ethernet adapter on servers,
controlled by FSP process); the other network interface connects to a management
network. In the management network, the management workstation accesses the
HMC Web interface through a Web browser.
There are two ways to set up communication with LPAR (logical partitions):
VIOS LPAR, which is a special version of AIX, performs the bridging between the
virtual Ethernet switch (implemented in Hypervisor) and the external port. For
bridging frames between the physical adapter on the NIC and the virtual Ethernet
adapter connected to the virtual Ethernet switch, another logical device (the SEA)
is created in VIOS.
As illustrated in Figure 4.11, the typical network deployment with the access switch
and LPAR (virtual machine) is as follows:
• The access switch connects to physical NIC, which is assigned to ent1 in VIOS.
• The ent3 (SEA) bridges ent1 (physical NIC) and ent2 (virtual Ethernet
adapters).
• The ent2 (virtual Ethernet adapter) is created and dedicated to LPAR which
runs Red Hat Enterprise Linux.
• The ent3 also supports multiple VLANs. Each VLAN will associate with one
logical Ethernet adapter, for example ent4.
LPAR
LPAR RHEL
ent1 VIOS
NIC
Ethernet SEA (ent3) ent2
Switch
ent4 Virtual Switch 1
Junos Software
SNMP
Routing
Engine
User
Forwarding
Kernel
Table
Embedded Microkernel
Packet
Forwarding Microkernel
Engine
Routing Engines
The Routing Engine runs the Junos operating system, which includes the FreeBSD
kernel and the software processes. The primary operator processes include the
device control process (dcd), routing protocol process (rpd), chassis process
(chassisd), management process (mgd), traffic sampling process (sampled),
automatic protection switching process (apsd), simple network management
protocol process (snmpd) and system logging process (syslogd). The Routing
Engine installs directly into the control panel and interacts with the Packet
Forwarding Engine.
Chapter 4: Connecting IBM Servers in the Data Center Network 59
The Routing Engine itself is never involved in the forwarding of packets. The ASICs in
the forwarding path only identify and send the Routing Engine any exception
packets or routing control packets for processing. There are security mechanisms in
place that prevent the Routing Engine (and control traffic) from becoming attached
or overwhelmed by these packets. Packets sent to the control plane from the
forwarding plane are rate limited to protect the router from DOS attacks. The
control traffic is protected from excess exception packets using multiple queues
that provide a clean separation between the two. The packets are prioritized by the
packet-handling interface, which sends them to the correct queues for appropriate
handling.
The redundant function components in the network devices prevent SPOF and
increase high availability and reliability. Juniper Networks devices typically configure
with a single Routing Engine and Packet Forwarding Engine. To achieve high
availability and reliability, the user has two options:
Junos Processes
Junos processes run on the Routing Engine and maintain the routing tables, manage
the routing protocols used on the router, control the router interfaces, control some
chassis components, and act as the interface for system management and user
access to the router. Major processes are discussed in detail later this section.
The Junos process is a UNIX process that runs nonstop in the background while a
machine is running. All of the processes operate through the Command Line
Interface (CLI). Each process is a piece of the software and has a specific function
or area to manage. The processes run in separated and protected address spaces.
The following sections briefly cover two major Junos processes: the routing protocol
process (rpd) and the management process (mgd).
This process starts all configured routing protocols and handles all routing
messages. It maintains one or more routing tables, which consolidates the routing
information learned from all routing protocols. From this routing information, the
rdp process determines the active routes to network destinations and installs these
routes into the Routing Engine’s forwarding table. Finally, the rdp process
implements the routing policy, which enables an operator to control the routing
information that is transferred between the routing protocols and the routing table.
Using a routing policy, operators can filter and limit the transfer of information as
well as set properties associated with specific routes.
NOTE RPD handles both unicast and multicast routing protocols as data travels to one
destination and travels to many destinations, respectively.
Management Process
Several databases connect to the management process (mgd). The config schema
database merges the packages /usr/lib/dd/libjkernel-dd.so, /usr/lib/dd/
libjroute-dd.so and /usr/lib/dd/libjdocs-dd at initialization time to make /var/
db/schema.db, which controls what the user interface (UI) is. The config database
holds /var/db/juniper.db.
The mgd works closely with CLI, allowing the CLI to communicate with all the other
processes. Mgd knows which process is required to execute commands (user
input).
When the user enters a command, the CLI communicates with mgd over a UNIX
domain socket using Junoscript, an XML-based remote procedure call (RPC)
protocol. The mgd is connected to all the processes, and each process has a UNIX
domain management socket.
Chapter 4: Connecting IBM Servers in the Data Center Network 61
If the command is legal, the socket opens and mgd sends the command to the
appropriate process. For example, the chassis process (chassisd) implements the
actions for the command show chassis hardware. The process sends its response
to mgd in XML form and mgd relays the response back to the CLI.
Mgd plays an important part in the commit check phase. When you edit a
configuration on the router, you must commit the change for it to take effect. Before
the change actually is made, mgd subjects the candidate configuration to a check
phase. The management process writes the new configuration into the config db
(juniper.db).
• Fault management: includes device monitoring and detecting and fixing faults
• Configuration management
• Accounting management: collects statistics for accounting purposes
• Performance management: monitors and adjusts device performance
• Security management: controls device access and authenticates users
The following interfaces (APIs) typically are used to manage and monitor Juniper
Networks network devices:
• CLI
• J-Web
• SNMP
• NETCONF
In addition, Junos also supports other management interfaces to meet various
requirements from enterprise and carrier providers, including J-Flow, sFlow,
Ethernet OAM, TWAMP, etc.
Table 4.2 Methods for Connecting IBM Servers to Juniper Switches and Routers
To the IBM servers, the network device appears as a Layer 2 switch. The
The network device acts as network device interfaces and IBM server’s NIC are in the same Layer 2
Layer 2 switch. broadcast domain. Because the network device interfaces do not configure
Layer 3 IP addresses, they do not provide routing functionality.
To the IBM servers, the network device appears as a Layer 2 switch. The
The network device acts as network device interfaces and IBM server’s NIC are in the same Layer 2
a switch with Layer 3 address. broadcast domain. The network device interfaces configure Layer 3 IP
addresses so that they can route traffic to other connected networks.
To the IBM servers, the network device appears as a Layer 3 router with a
The network device acts as
single Ethernet interface and IP address. The network device does not
a router.
provide Layer 2 switching functionality.
In the next section, several different but typical methods for configuring the MX
Series routers and EX Series switches are presented.
Ethernet interfaces in MX Series routers can support one or many VLANs. Each
Ethernet VLAN is mapped into one logical interface. If logical interfaces are used to
separate traffic to different VLANs, we recommend using the same numbers for
logical interface (unit) and VLAN ID. For instance, the logical interface and the VLAN
ID in the following sample use the same number (100):
interfaces ge-5/1/5 {
unit 0 {
family bridge;
}
}
interfaces ge-5/1/7 {
vlan-tagging;
encapsulation flexible-ethernet-services;
unit 100 {
encapsulation vlan-bridge;
family bridge;
}
}
bridge-domains {
Data01 {
Chapter 4: Connecting IBM Servers in the Data Center Network 63
domain-type bridge;
vlan-id 100;
interface ge-5/1/5.0;
interface ge-5/1/7.100;
}
}
In addition, IRB on the MX Series provides simultaneous support for Layer 2 bridging
and Layer 3 routing on the same interface, such as irb.100 so that the local packets
are able to route to another routed interface or to another bridging domain that has
a Layer 3 protocol configured.
interfaces ge-5/1/5 {
unit 0 {
family bridge;
}
}
interfaces ge-5/1/7 {
vlan-tagging;
encapsulation flexible-ethernet-services;
unit 100 {
encapsulation vlan-bridge;
family bridge;
}
}
interfaces irb {
unit 100 {
family inet {
address 11.11.1.1/24;
}
}
}
bridge-domains {
Data01 {
domain-type bridge;
vlan-id 100;
interface ge-5/1/5.0;
interface ge-5/1/7.100;
routing-interface irb.100;
}
}
Define the configuration of Layer 2 broadcast (bridge) domains under vlan stanza.
Interface membership in VLANs can be defined using one of the following two
methods:
The Ethernet interfaces in EX Series routers can support one or many VLANs. Each
VLAN is mapped into one logical interface. If logical interfaces are used to separate
traffic to different VLANs, we recommend using the same numbers for the logical
interface (unit) and VLAN ID. For example, the logical interface and the VLAN ID in
the following sample use the same number (100):
interfaces ge-5/1/5 {
unit 0 {
family ethernet-switching;
}
interfaces ge-5/1/7 {
unit 0 {
family ethernet-switching {
port-mode trunk;
}
}
vlans {
Data01 {
vlan-id 100;
interface {
ge-5/1/5.0;
ge-5/1/7.0;
}
}
}
Chapter 4: Connecting IBM Servers in the Data Center Network 65
In addition, the EX series Ethernet Switch supports routed interfaces called Routed
VLAN Interfaces (RVI). RVIs are needed to route the traffic from one VLAN to
another. As opposed to IRB, which routes bridge domains, RBI routes VLANs. In the
following code, the RVI interface with IP address 11.11.1.1/24 is associated with VLAN
100 logical interface.
interfaces ge-5/1/5 {
unit 0 {
family ethernet-switching;
}
interfaces ge-5/1/7 {
unit 0 {
family ethernet-switching {
port-mode trunk;
}
}
interfaces vlan {
unit 100 {
family inet {
address 11.11.1.1/24;
}
}
}
vlans {
Data01 {
vlan-id 100;
interface {
ge-5/1/5.0;
ge-5/1/7.0;
}
l3-interface vlan.100;
}
}
interfaces ge-5/0/0
description “P6-1”;
vlan-tagging;
unit 30 {
description Data01;
vlan-id 30;
family inet {
address 11.11.1.1/24;
}
}
66 Data Center Network Connectivity with IBM Servers
Chapter 5
Configuring RSTP/MSTP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Configuring VSTP/PVST+/Rapid-PVST+. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Typically, STP is supported only on legacy equipment and has been replaced with
RSTP and other variants of Spanning Tree. Support for RSTP is mandatory on all
devices that are capable of spanning tree functionality. When interoperating with
legacy switches, a RSTP capable switch automatically reverts to STP. We discuss
STP in this chapter to provide a background on spanning tree functionality.
68 Data Center Network Connectivity with IBM Servers
The root bridge is elected based on priority. A switch, assigned the lowest priority, is
elected as the root. The ports on a switch that are closest (in cost) to the Root
Bridge become the Root Ports (RP).
NOTE There can only be one RP on a switch. A root bridge cannot have an RP.
The ports that have a least cost to the Root Bridge in the network are known as the
Designated Ports (DP). Ports that are not selected as RP or DP are considered to
be Blocked. An optimized active path based on bridge/port priority and cost is
chosen to forward data in the network. The BPDUs that provide the information on
the optimal path are referred to as “superior” BPDUs while those that provide
sub-optimal metrics are referred to as “inferior” BPDUs.
BPDUs mainly consist of the following fields that are used as the basis for
determining the optimal forwarding topology:
• Generating and transmitting BPDUs from all nodes at the configured Hello
interval, irrespective of whether they receive any BPDUs from the RP. This
allows the nodes to monitor any loss of Hello messages and thus detect link
failures more quickly than STP.
• Expediting changes in topology by directly transitioning the port (either edge
port or a port connected to a point-to-point link) from a blocked to forwarding
state.
• Providing a distributed model where all bridges in the network actively
participate in network connectivity.
Root Bridge
DP DP
RP RP
ALT DP
RP – Root Port
DP – Designated Port
ALT – Alternate Port
• Point to Point
• Edge
• Shared or Non-edge
Point to Point
A point-to-point (P2P) interface provides a direct connection between two
switches. Usually, a full duplex interface is set automatically to be P2P.
Edge
The edge interface is another enhancement in RSTP that helps reduce convergence
time when compared to STP. Ports connected to servers (there are no bridges
attached) are typically defined as edge ports. Any changes that are made to the
status of the edge port do not result in changes to the forwarding network topology
and thus are ignored by RSTP.
70 Data Center Network Connectivity with IBM Servers
Shared or Non-edge
A Shared or Non-edge interface is an interface that is half-duplex or has more than
two bridges on the same LAN.
When compared to STP, RSTP introduces the concept of a port state, role and
interface. The state and role of a RSTP based port are independent. A port can send
or receive BPDUs or data based on its current state. The role of a port depends on its
position in the network. The role of a port can be determined by performing a BPDU
comparison during convergence.
Table 5.1 shows the mapping between RSTP port states and roles.
Designated Forwarding
Alternate Discard
Backup Discard
Disabled Discard
The Alternate role in RSTP is analogous to the Blocked port in STP. Defining an edge
port allows a port to transition into a forwarding state, eliminating the 30-second
delay that occurs with STP.
Each MSTI has a spanning tree associated with it. RSTP based spanning tree tables
are maintained per MSTI. Using CIST to distribute this information over the
common instance minimizes the exchange of spanning tree related packets and
thus network traffic between regions.
MSTP is compatible with STP and makes use of RSTP for convergence algorithms.
Chapter 5: Configuring Spanning Tree Protocols 71
MSTI-A
VLAN 501
BPDUs - internal to instance MSTI-A
CIST MSTI-B
BPDUs VLANs 990, 991
between instances BPDUs - internal to instance MSTI-B
MSTI-C
VLANs 100, 200, 300
BPDUs - internal to instance MSTI-C
Figure 5.2 shows the three MSTIs: A, B, and C. Each of these instances consists of
either one or more VLANs. BPDUs specific to the particular instance are exchanged
within each of the MSTIs. The CIST handles all BPDU information which is required
to maintain the topology across the regions. CIST is the instance that is common to
all regions.
With MSTP, bridge priorities and related configurations can be applied on a per
instance basis. Thus, a root bridge in one instance does not necessarily have to be in
a different instance.
When configuring VSTP, the bridge priorities and the rest of the spanning tree
configuration can be applied on a per VLAN basis.
NOTE When configuring VSTP, please pay close attention to the following:
Platforms
IBM BladeCenter
Protocols EX4200 EX8200 MX Series
(Cisco ESM)
Configuration not supported, Configuration not supported,
STP works with MSTP/PVST STP STP works with RSTP (backwards
(backwards compatible) compatible)
Configuration not supported,
RSTP works with MSTP/PVST RSTP RSTP RSTP
(backwards compatible)
MSTP MSTP MSTP MSTP MSTP
PVST+(Cisco)/
PVST+ VSTP VSTP VSTP
VSTP (Juniper)
Rapid-PVST+ (Cisco)/
Rapid-PVST+ VSTP VSTP VSTP
VSTP (Juniper)
Configuring RSTP/MSTP
Figure 5.3 shows a sample MSTP network that can be used to configure and verify
RSTP/MSTP functionality. The devices in this network connect in a full mesh. The
switches and IBM BladeCenter connect in a mesh and are assigned these priorities:
MSTP
Priority 8K
ge-5/3/4
BladeCenter
BladeCenter *blade 8, eth 1,
* blade 6, eth 1, ge-5/3/3 ip=11.22.1.8
ip=11.22.1.6
VLAN [71, 1122]
ge-5/2/2
Priority 16K ge-1/0/2
MX480 BladeCenter
ge-5/1/2
ge-1/0/6 *blade 7, eth 1,
ge-1/0/8 ge-5/1/1 VLAN [71, 1122] ip=11.22.1.7
ge-0/0/21
ge-0/0/20 ge-0/0/7
VLAN [1122]
ge-0/0/9
ge-0/0/0
EX4200
VLAN [71, 1122] ge-0/0/23 172.28.113.175
EX8200 ge-1/0/4
Priority 0K BladeCenter
*blade 9, eth 1,
VLAN [1122]
ge-0/0/14 VLAN [71, 1122] ip=11.22.1.9
ge-0/0/10
ge-0/0/12
ge-0/0/15
VLAN [71]
EX4200 ge-0/0/13
172.28.113.180 Trunk Port 17
Priority 32K
VLAN [1122]
Trunk Port 19
BladeCenter
* blade 10, eth 1, Trunk Port 18
ip=11.22.1.10
*Server connections simulated to each DUT via eth 1 interface connected to BladeCenter Pass-Through Module
Configuration Snippets
The following code pertains to the EX4200-A (RSTP/MSTP):
priority 224;
}
interface ge-0/0/21.0 {
priority 192;
}
interface ge-0/0/23.0 {
priority 224;
}
msti 1 {
bridge-priority 8k;
vlan 1122;
}
msti 2 {
bridge-priority 8k;
vlan 71;
interface ge-0/0/23.0 {
priority 224;
}
}
interface ge-0/0/13.0 {
priority 224;
}
interface ge-0/0/14.0 {
priority 192;
}
interface ge-0/0/15.0 {
priority 240;
edge;
no-root-port;
}
// Define MSTI-1, provide a bridge priority for the instance. Associate a VLAN
with the instance.
msti 1 {
bridge-priority 0;
vlan 1122;
}
// Define MSTI-2, provide a bridge priority for the instance. Associate a VLAN
and interface with the instance.
msti 2 {
bridge-priority 0;
vlan 71;
interface ge-0/0/13.0 {
priority 224;
}
}
rstp {
bridge-priority 40k;
interface ge-5/1/1 {
priority 240;
}
interface ge-5/1/2 {
priority 240;
}
interface ge-5/2/2 {
priority 240;
}
interface ge-5/3/3 {
priority 240;
}
interface ge-5/3/4 {
priority 240;
edge;
no-root-port;
}
}
chandra@HE-RE-0-MX480# show protocols mstp
bridge-priority 8k;
interface ge-5/1/1 {
priority 224;
}
interface ge-5/1/2 {
priority 192;
}
interface ge-5/2/2 {
priority 192;
}
Chapter 5: Configuring Spanning Tree Protocols 77
interface ge-5/3/3 {
priority 224;
}
interface ge-5/3/4 {
priority 240;
edge;
no-root-port;
}
msti 1 {
bridge-priority 4k;
vlan 1122;
}
msti 2 {
bridge-priority 4k;
vlan 71;
interface ge-5/1/1 {
priority 224;
}
}
Verification
Based on the sample network, administrators can verify the RSTP/MSTP
configuration by issuing the show commands to verify that there are two MSTI
instances and one MTSI-0 common instance present on each switch. The following
CLI sample shows these three different MSTI instances and the VLANs associated
with each of them:
Each of these instances should have a RP (ROOT), BP (ALT) and DP (DESG) of its
own:
The following CLI output shows the MSTI-0 information on the Root Bridge. All
ports are in the forwarding state.
1. Check that only the information from instance MSTI-0 (but not MSTI-1 and
MSTI-2) is available on all switches.
2. Confirm that there is only one direct path to any other interface within each
MSTI instance on a switch. All other redundant paths should be designated as
Blocked. Use the show spanning-tree interface command for this purpose.
3. Verify that a change in priority on any MSTI instance on a switch is propagated
through the entire mesh using the show spanning-tree interface command.
Configuring VSTP/PVST+/Rapid-PVST+
Figure 5.4 depicts a sample network consisting of a mesh of EX8200/4200 and
MX480 devices with the Cisco ESM switch. VSTP and PVST+ must be enabled on
the Cisco and Juniper devices, respectively, for interoperability. Two VLANs 1122 and
71 are created on all devices; VSTP is enabled for both of these VLANs.
Chapter 5: Configuring Spanning Tree Protocols 79
Priority
BladeCenter bc_ext:16K BladeCenter
* blade 6, eth 1, bc_int:16K * blade 8, eth 1,
ip=11.22.1.6 ge-5/3/4 ip=11.22.1.8
ge-5/3/3
Priority
bc_ext:24K ge-5/2/2
VLAN [71, 1122]
bc_int:12K
MX480 VLAN [71, 1122]
ge-1/0/2 ge-5/1/2 ge-5/1/1
Trunk Port 17
ge-1/0/6
9.3.71.39
ge-1/0/8 9.3.71.50
Trunk Port 18 9.3.71.40
VLAN [71, 1122] 9.3.71.35
VLAN [71, 1122] 9.3.71.41
ge-0/0/13 ge-0/0/20
ge-0/0/15 ge-0/0/9
EX4200 ge-0/0/7 EX4200
172.28.113.175
Priority
bc_ext:16K
bc_int:8K
BladeCenter BladeCenter BladeCenter
* blade 10, eth 1, * blade 7, eth 1, * blade 9, eth 1,
ip=11.22.1.10 ip=11.22.1.7 ip=11.22.1.9
*Server connections simulated to each DUT via eth 1 interface connected to BladeCenter’s
Pass-Through Module for blade slots [6, 7, 8, 9, 10]
Table 5.4 lists the bridge priorities for each of the VLANs.
EX4200-A – 8K
EX4200-B – 4K
71
EX8200 – 12K
MX480 – 16K
EX4200-A – 16K
EX4200-B – 32K
1122
EX8200 – 24K
MX480 – 16K
Verification
Based on the sample setup as shown in Figure 5.4, verify interoperability of the
VSTP configuration with Cisco PVST+ by performing the following steps.
• 1. Verify that each of the switches with VSTP/ PVST+ enabled has two spanning
trees corresponding to two VLANs. Each VLAN has its own RP (ROOT), BP
(ALT) and DP (DESG). Use the show spanning tree command.
chandra@SPLAB-EX-180> show spanning-tree interface
Spanning tree interface parameters for VLAN 1122
Interface Port ID Designated Designated Port State Role
port ID bridge ID Cost
ge-0/0/10.0 128:523 128:513 17506.0019e2544040 20000 FWD ROOT
ge-0/0/12.0 224:525 224:525 33890.0019e2544ec0 20000 FWD DESG
ge-0/0/14.0 240:527 240:213 17506.001db5a167d0 20000 BLK ALT
ge-0/0/15.0 240:528 240:528 33890.0019e2544ec0 20000 FWD DESG
Spanning tree interface parameters for VLAN 71
Interface Port ID Designated Designated Port State Role
port ID bridge ID Cost
ge-0/0/10.0 128:523 128:523 4167.0019e2544ec0 20000 FWD DESG
ge-0/0/13.0 224:526 224:526 4167.0019e2544ec0 20000 FWD DESG
ge-0/0/14.0 240:527 240:527 4167.0019e2544ec0 20000 FWD DESG
• 2. Confirm that there is only one direct active path per VLAN instance to any
other non-root bridge. All redundant paths should be identified as Blocked. Use
the output of the show spanning-tree interface command for this purpose.
• Rebooting the root port should cause the device with the next lower priority
step up as the root for the particular VLAN. This information must be updated in
the VLAN table on all devices.
• 3. Verify that the original root bridge becomes the primary (active), after the
reboot. This information should be updated on all devices in the mesh.
NOTE Any change in bridge priorities on either of the VSTP must be propagated through
the mesh.
Chapter 5: Configuring Spanning Tree Protocols 81
Configuration Snippets
The following code pertains to the EX4200-A.
vlan 1122 {
bridge-priority 16k;
interface ge-5/1/1 {
priority 240;
}
interface ge-5/1/2 {
priority 240;
}
interface ge-5/2/2 {
priority 240;
}
interface ge-5/3/3 {
priority 240;
}
interface ge-5/3/4 {
priority 240;
}
}
83
Chapter 6
IGMP Snooping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Routers use a group membership protocol to learn about the presence of group
members on directly attached subnetworks. When a host joins a multicast group, it
transmits a group membership protocol message to the group, and sets its IP
process and network interface card to receive frames addressed to the multicast
group.
Junos software supports IP multicast routing with many protocols, such as:
L2 Switch with
Multicast Router IGMP Snooping Video Client
UDP/RTP
Multicast Traffic
LAN
IGMP manages the membership of hosts and routers in multicast groups. IP hosts
use IGMP to report their multicast group memberships to any neighboring multicast
routers. In addition, IGMP is used as the transport for several related multicast
protocols, such as DVMRP and PIMv1. IGMP has three versions that are supported
by hosts and routers:
IGMPv1 – The original protocol defined in RFC 1112. An explicit join message is sent
to the router, but a timeout is used to determine when hosts leave a group.
IGMPv2 – Defined in RFC 2236. Among other features, IGMPv2 adds an explicit
leave message to the join message so that routers can easily determine when a
group has no listeners.
IGMPv3 – Defined in RFC 3376. IGMPv3 supports the ability to specify which
sources can send to a multicast group. This type of multicast group is called a
source-specific multicast (SSM) group and its multicast address is 232/8. IGMP v3
is also backwards compatible with IGMP v1 and IGMP v2.
For SSM mode, we can configure the multicast source address so that the source
can send the traffic to the multicast group. In this example, we create group 225.1.1.1
and accept IP address 10.0.0.2 as the only source.
NOTE The SSM configuration requires that the IGMP version on the interface be set to
IGMPv3.
86 Data Center Network Connectivity with IBM Servers
NOTE When we configure static IGMP group entries on point-to-point links that connect
routers to a rendezvous point (RP), the static IGMP group entries do not generate
join messages toward the RP.
Because PIM Sparse Mode and PIM Dense Mode are the most widely deployed
techniques, they were used in this reference design.
PIM dense mode requires only a multicast source and a series of multicast-enabled
routers that run PIM dense mode to allow receivers to obtain multicast content.
Dense mode ensures that the traffic reaches its prescribed destination by
periodically flooding the network with multicast traffic, and relies on prune
messages to ensure that subnets (where all receivers are un-interested in that
particular multicast group) stop receiving packets.
PIM sparse mode requires establishing special routers called rendezvous points
(RPs) in the network core. This is the point where these routers upstream join
messages from interested receivers and meet downstream traffic from the source
of the multicast group content. A network can have many RPs, but PIM sparse
mode allows only one RP to be active for any multicast group.
On the multicast router, it typically has two IGMP interfaces: upstream IGMP
interface and downstream IGMP interface. We must configure PIM on the upstream
IGMP interfaces to enable multicast routing and to perform reverse path forwarding
for multicast data packets to populate the multicast-forwarding table for the
upstream interfaces. In the case of PIM sparse mode, to distribute IGMP group
memberships into the multicast routing domain.
Only one “pseudo PIM interface” is required to represent all IGMP downstream
(IGMP-only) interfaces on the router. Therefore, PIM is generally not required on all
IGMP downstream interfaces, reducing the amount of router resources, such as
memory.
To enable IGMP report filtering for an interface, include the following group-policy
statement:
protocols {
igmp {
interface ge-1/1/1.0 {
group-policy reject_policy;
}
}
}
88 Data Center Network Connectivity with IBM Servers
policy-options {
//IGMPv2 policy
policy-statement reject_policy {
from {
router-filter 192.1.1.1/32 exact;
}
then reject;
}
policy-statement reject_policy {
//IGMPv3 policy
from {
router-filter 192.1.1.1/32 exact;
source-address-filter 10.1.0.0/16 orlonger;
}
then reject;
}
}
igmp {
accounting; // Accounting Purposes
interface interface-name {
disable;
(accounting | no-accounting); // Individual interface specific
accounting group-policy [ policy-names ];
immediate-leave; // see Note 1 at end of code snippet.
oif-map map-name;
promiscuous-mode; // See Note 2 at end of code snippet.
ssm-map ssm-map-name;
static {
group multicast-group-address {
exclude;
group-count number;
group-increment increment;
source ip-address {
source-count number;
source-increment increment;
}
}
}
version version; // See Note 3 at end of code snippet.
}
query-interval seconds;
query-last-member-interval seconds; // Default 1 Second
query-response-interval seconds; // Default 10 Seconds
robust-count number; // See Note 4 at end of code snippet.
NOTE 1 Use this statement only on IGMP version 2 (IGMPv2) interfaces to which one IGMP
host is connected. If more than one IGMP host is connected to a LAN through the
same interface, and one host sends a leave group message, the router removes all
hosts on the interface from the multicast group. The router loses contact with the
hosts that must remain in the multicast group until they send join requests in
response to the router’s next general group membership query.
NOTE 2 By default, IGMP interfaces accept IGMP messages only from the same
subnetwork. The promiscuous-mode statement enables the router to accept IGMP
messages from different sub-networks.
NOTE 3 By default, the router runs IGMPv2. If a source address is specified in a multicast
group that is configured statically, the IGMP version must be set to IGMPv3.
Otherwise, the source will be ignored and only the group will be added. The join will
be treated as an IGMPv2 group join.
When we reconfigure the router from IGMPv1 to IGMPv2, the router will continue to
use IGMPv1 for up to 6 minutes and will then use IGMPv2.
NOTE 4 The robustness variable provides fine-tuning to allow for expected packet loss on a
subnetwork. The value of the robustness variable is used in calculating the
following IGMP message intervals: Group member interval=(robustness variable x
query-interval) + (1 x query-response-interval) Other querier present interval=
(robustness variable x query-interval) + (0.5 x query-response-interval), last-
member query count=robustness variable. By default, the robustness variable is
set to 2. Increase this value if you expect a subnetwork to lose packets.
ge-5/2/6
Multicast Router
Io0.0
8.8.8.8
MX480 ge-5/2/5
VLAN 1119
IGMP Multicast Client
BNT
Pass-Through
Eth1
}
interface ge-5/2/5.0 {
static {
group 239.168.1.4;
}
}
{master}[edit]
chandra@HE-RE-1-MX480# show protocols pim
rp {
local {
address 8.8.8.8;
}
}
interface all {
mode sparse;
}
interface fxp0.0 {
disable;
}
chandra@HE-RE-1-MX480# show protocols ospf
area 0.0.0.0 {
interface ge-5/2/5.0;
interface lo0.0 {
passive;
}
interface fxp0.0 {
disable;
}
}
chandra@HE-RE-1-MX480# show routing-options
router-id 8.8.8.8;
rp {
static {
address 8.8.8.8;
}
}
interface vlan.1119;
interface me0.0 {
disable;
}
interface all {
mode sparse;
}
chandra@EX-175-CSR# show interfaces lo0
unit 0 {
family inet {
address 6.6.6.6/32;
}
}
chandra@EX-175-CSR# show protocols ospf
area 0.0.0.0 {
interface ge-0/0/44.0;
interface lo0.0 {
passive;
}
interface me0.0 {
disable;
}
}
chandra@EX-175-CSR# show routing-options
router-id 6.6.6.6;
Group: 239.168.1.1
Chapter 6: Configuring the Internet Group Management Protocol 93
Source: *
RP: 8.8.8.8
Flags: sparse,rptree,wildcard
Upstream interface: Local
Group: 239.168.1.1
Source: 10.10.10.254
Flags: sparse,spt
Upstream interface: ge-5/2/6.0
Group: 239.168.1.2
Source: *
RP: 8.8.8.8
Flags: sparse,rptree,wildcard
Upstream interface: Local
Group: 239.168.1.2
Source: 10.10.10.254
Flags: sparse,spt
Upstream interface: ge-5/2/6.0
Source 8.8.8.8
Prefix 8.8.8.8/32
Upstream interface Local
Upstream neighbor Local
Source 10.10.10.254
Prefix 10.10.10.0/24
Upstream interface ge-5/2/6.0
Upstream neighbor 10.10.10.2
Source 10.10.10.254
Prefix 10.10.10.0/24
Upstream interface ge-5/2/6.0
Upstream neighbor Direct
Group: 239.168.1.1
Source: *
RP: 8.8.8.8
Flags: sparse,rptree,wildcard
Upstream interface: ge-0/0/44.0
Group: 239.168.1.1
Source: 10.10.10.254
Flags: sparse,spt
Upstream interface: ge-0/0/44.0
chandra@EX-175-CSR# run show pim neighbors
Instance: PIM.master
B = Bidirectional Capable, G = Generation Identifier,
94 Data Center Network Connectivity with IBM Servers
Source 8.8.8.8
Prefix 8.8.8.8/32
Upstream interface ge-0/0/44.0
Upstream neighbor 22.11.5.5
Source 10.10.10.254
Prefix 10.10.10.0/24
Upstream interface ge-0/0/44.0
Upstream neighbor 22.11.5.5
ge-1/0/20
Multicast Router
Io0.0
9.9.9.9
EX8200 ge-1/0/26
PIM with RIP
ge-0/0/17
Multicast Router Io0.0
6.6.6.6
EX4200 ge-0/0/2
VLAN 2211
IGMP Multicast Client
IBM POWERVM NICs/HEA
(Host Ethernet Adapter)
SEA
(Shared Ethernet Adapter)
Virtual
Network
immediate-leave;
}
interface ge-0/0/2.2211;
interface all;
chandra@EX-175-CSR# show protocols pim
rp {
static {
address 9.9.9.9;
}
}
interface vlan.2211;
interface me0.0 {
disable;
}
interface all {
mode sparse;
}
chandra@EX-175-CSR# show interfaces lo0
unit 0 {
family inet {
address 6.6.6.6/32;
}
}
96 Data Center Network Connectivity with IBM Servers
IGMP Snooping
An access switch usually learns unicast MAC addresses by checking the source
address field of the frames it receives. However, a multicast MAC address can never
be the source address for a packet. As a result, the switch floods multicast traffic on
the VLAN, consuming significant amounts of bandwidth.
Chapter 6: Configuring the Internet Group Management Protocol 97
IGMP snooping regulates multicast traffic on a VLAN to avoid flooding. When IGMP
snooping is enabled, the switch intercepts IGMP packets and uses the content of
the packets to build a multicast cache table. The cache table is a database of
multicast groups and their corresponding member ports and is used to regulate
multicast traffic on the VLAN.
When the switch receives multicast packets, it uses the cache table to selectively
forward the packets only to the ports that are members of the destination multicast
group.
As illustrated in Figure 6.4, the access switch EX4200 connects four hosts and
segments their data traffic with two VLANs, where host1 and host2 belong to VLAN1
and host3 and host4 belong to VLAN2. The hosts at the same VLAN might take
different action on whether to subscribe or to unsubscribe the multicast group. For
instance, host1 has subscribed to multicast group 1, while host2 is not interested in
multicast group1 traffic; host3 has subscribed to multicast group 2, while host4 is
not interested in multicast group 2 traffic. The EX4200 IGMP snooping feature can
accommodate this request so that host1 receives multicast group1 traffic, and host2
does not; host3 receives multicast group 2 traffic, and host4 does not.
Host 1 in VLAN 1
Subscribes Group 1
Host 2 in VLAN 1
Trunk
EX4200
Host 3 in VLAN 2
Subscribes Group 2
VLAN 1
VLAN 2
Multicast Group 1 Traffic
Host 4 in VLAN 2
Multicast Group 2 Traffic
Figure 6.5 shows how multicast traffic is forwarded on a multilayer switch. The
multicast traffic arrives through the xe-0/1/0.0 interface. A multicast group is
formed by the Layer 3 interface ge-0/0/2.0, vlan.0 and vlan.1. The ge-2/0/0.0
interface is a common trunk interface that belongs to both vlan.0 and vlan.1. The
letter R next to an interface name in Figure 6.5 indicates that a multicast receiver
host is associated with that interface.
xe-0/1/0.0 ge-0/0/3.0
EX4200 Series
Switch
VLAN 0 VLAN 1
v100 v200
ge-1/0/0.0 ge-1/0/0.0
Multicast Traffic
Non-Multicast Traffic
(R) Receiving Multicast Traffic
igmp-snooping {
vlan (vlan-id | vlan-number {
disable {
interface interface-name
}
immediate-leave;
interface interface-name {
multicast-router-interface;
static {
group ip-address;
}
}
query-interval seconds;
query-last-member-interval seconds;
query-response-interval seconds;
robust-count number;
}
}
NOTE By default, IGMP snooping is not enabled. Statements configured at the VLAN level
apply only to that particular VLAN.
With the MX Ethernet Router Series in the Junos CLI, we can configure a Layer 2
broadcasting domain with a bridge domain, so that IGMP snooping is configured at
the [bridge-domains] configuration hierarchy. The detailed configuration stanza is
as follows:
multicast-snooping-options {
flood-groups [ ip-addresses ];
forwarding-cache {
threshold suppress value <reuse value>;
}
graceful-restart <restart-duration seconds>;
ignore-stp-topology-change;
}
100 Data Center Network Connectivity with IBM Servers
Two interfaces (ge-5/2/3 and ge-5/2/6) in the MX480 are configured as Layer 2
switches by using bridge domain, which is associated with VLAN 1117. The ge-5/2/6
interface is configured with the multicast-router interface and this interface
connects to the multicasting source; interface ge-5/2/3 is a Layer 2 interface with
the multicasting IP address (239.168.1.3). This configuration allows the interface to
receive and then forward the multicasting packets to their target.
ge-5/2/6
Multicast
Router
ge-5/2/3
ge-5/2/5
ge-0/0/44
ge-5/2/4
MX480 ge-1/0/20
ge-0/0/2
ge-1/0/22 EX4200
ge-0/0/6 VLAN 1117 VLAN 2211
EX8200
Cisco BNT
MM1 ESM 1 Pass-Through
VLAN 1119
Eth1
SoL
Trunk Port 17
Up to 14 GigE Links Eth0
Figure 6.6 MX480, EX8200, EX4200 and IBM Blade Center – IGMP Traffic Flow with IGMP Snooping
Chapter 6: Configuring the Internet Group Management Protocol 101
}
chandra@EX-175-CSR> show configuration vlans 2211
vlan-id 2211;
interface {
ge-0/0/2.0;
ge-0/0/17.0;
}
. . .
Bridge-Domain: bc-igmp
Learning-Domain: default
Interface: ge-5/2/6.0
Interface: ge-5/2/5.0
Group: 239.168.1.2
Group mode: Exclude
Source: 0.0.0.0
Last reported by: 10.10.10.1
Group timeout: 76 Type: Dynamic
As shown in Figure 6.7, two interfaces in the MX480 (ge-5/2/4 and ge-5/2/6) in the
MX480 are configured as Layer 2 switches by using bridge domain, which is
associated with VLAN 1118. The interface ge-5/2/6 which is configured with the
multicast-router-interface, connects to the multicasting source; interface ge-5/2/4
is a Layer 2 interface with multicasting IP address (239.168.1.4) and is set up to
receive and forward multicasting packets to their respective servers.
IGMP Multicast Source
IGMP Multicast Source
(Streaming)
ge-5/2/6
Multicast Router
MX480
ge-5/2/4 ip = 239.168.1.4
IGMP Client
IBM X3500 Server
IP = 10.10.9.1
GW = 10.10.9.1
Figure 6.7 MX480 and IBM x3500 IGMP Traffic Flow with IGMP Snooping
Chapter 7
Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
This chapter covers two techniques for improving data center network
performance:
Class of Service
Typically, when a network experiences congestion and delay, some packets will be
dropped. However, as an aid in preventing dropped packets, Junos CoS allows an
administrator to divide traffic into classes and offers various levels of throughput
and packet loss when congestion and delay occur. This allows packet loss to occur
only when specific rules are configured on the system.
Ingress Processing
Engress Processing
Packet Classification/Marking
Packet Queueing/Shaping
The following is a list of the key steps in the QoS process, together with the
corresponding configuration commands for the process.
1. Classifying: This step examines (for example, EXP bits, IEEE 802.1p bits, or
DSCP bits) to separate incoming traffic.
One or more classifiers must be assigned to a physical interface or a logical
interface must be assigned one or more classifiers to separate the traffic flows.
The classifier configuration is at the [edit class-of-service interfaces]
hierarchy level in Junos CLI.
In addition, the classifier statement further defines how to assign the packet to
a forwarding class with a loss priority. The configuration is at the [edit class-
of-service classifiers] hierarchy level in Junos CLI. For details concerning
packet loss priority and forwarding class, see Defining Loss Priorities and
Defining Forwarding Classes on page 109 of this handbook.
Furthermore, each forwarding class can be assigned to a queue. The
configuration is at the [edit class-of-service forwarding-classes] hierarchy
level.
2. Policing: This step meters traffic. It changes the forwarding class and loss
priority if a traffic flow exceeds its pre-defined service level.
3. Scheduling: This step manages all attributes of queuing, such as transmission
rate, buffer depth, priority, and Random Early Detection (RED) profile.
A schedule map will be assigned to the physical or logical interface. The
configuration is at the [edit class-of-service interfaces] hierarchy level in
Junos CLI.
In addition, the scheduler statement defines how traffic is treated in the output
queue—for example, the transmit rate, buffer size, priority, and drop profile. The
configuration is at the [edit class-of-service schedulers] hierarchy level.
Finally, the scheduler-maps statement assigns a scheduler to each forwarding
class. The configuration is at the [edit class-of-service scheduler-maps]
hierarchy level.
4. Packet Dropping: This step manages drop-profile to avoid TCP
synchronization and protect high priority traffic from being dropped.
The drop-profile defines how aggressively to drop packets that are using a
particular scheduler. The configuration is at the [edit class-of-service
drop-profiles] hierarchy level.
5. Rewrite Marker: This step rewrites the packet CoS fields (for example, EXP or
DSCP bits) according to the forwarding class and loss priority of the packet.
The rewrite rule takes effect as the packet leaves a logical interface that has a
rewrite rule. The configuration is at the [edit class-of-service rewrite-rules]
hierarchy level in Junos CLI.
108 Data Center Network Connectivity with IBM Servers
Table 7.2 Forwarding Classes for MX480, EX4200 and EX8200 Series
Video (AF) Q2 Q4
Data (BE) Q0 Q0
The forwarding class, plus the loss priority defines the per-hop behavior. If the
use case requires associating the forwarding classes with next hops, then the
forwarding policy options are available only on the MX Series.
Table 7.3 compares the multicast routing protocols as they pertain to Juniper
Networks MX4800, EX8200, and EX4200.
110 Data Center Network Connectivity with IBM Servers
classifiers Classify incoming packets based on code point value Yes Yes Yes
code-point-
Mapping of code point aliases to bit strings Yes Yes Yes
aliases
drop-profiles Random Early Drop (RED) data point map Yes Yes Yes
multi-
Multicast class of service - Yes -
destination
restricted-
Map forwarding classes to restricted queues Yes - -
queues
rewrite-rules Write code point value of outgoing packets Yes Yes Yes
traffic-control-
Traffic shaping and scheduling profiles Yes - -
profiles
translation-
Translation table Yes - -
table
Configuring CoS
In this section, we demonstrate a sample configuration scenario for configuring CoS
on the EX4200. Two blade servers connect to two different interfaces to simulate
production traffic by issuing a ping command; the test device (N2X) will generate
significant network traffic classified as background traffic through the EX4200 to
one of the blade servers. This background package will congest with production
traffic, causing packet loss in the product traffic. Because the EX4200 is central to
network traffic aggregation in this scenario, it is reasonable to apply a CoS packet
loss policy on the EX4200 to ensure that no packet loss occurs in the product traffic.
NOTE The configuration scenario and snippet is also applicable to MX Series Ethernet
Routers.
Configuration Description
As illustrated in Figure 7.2, the EX4200 is the DUT, which interconnects IBM blade
servers, and the Agilent Traffic Generator N2X.
ge-203/1
11.22.1.100 ge-0/0/24 ge-0/0/9
N2X 11.22.1.9
ge-304/4 ge-0/0/25 EX4200
11.22.1.200 IBM BladeCenter
ge-0/0/7 Pass-Through Module via
eth 1 Interface on 9th Blade
11.22.1.7
IBM BladeCenter
Pass-Through Module via Background Traffic
eth 1 Interface on 7th Blade Production Traffic
1. The N2X generates network traffic as background traffic onto the EX4200
through two ingress GigE ports (ge-0/0/24 and ge-0/0/25).
2. The EX4200 forwards the background traffic to a single egress GigE port
(ge-0/0/9).
3. At the same time, the blade server uses the ping command to generate
production traffic onto the EX4200 through a different interface (ge-0/0/7).
4. The EX4200 also forwards the production traffic to the same egress port
(ge-0/0/9). From a packet loss policy perspective, the production traffic is low
loss priority, while the background traffic is high.
112 Data Center Network Connectivity with IBM Servers
NOTE The configuration used in this setup was sufficient to achieve confirmation on CoS
functionality (in simplest form). Other detailed configuration options are available
and can be enabled as needed. Refer to the CoS command Hierarchy Levels in the
Junos Software CLI User Guide at www.juniper.net/techpubs/software/junos/
junos95/swref-hierarchy/hierarchy-summary-configuration-statement-class-
of-service.html#hierarchy-summary-configuration-statement-class-of-service.
• 1. Configure the setup as illustrated in Figure 7.2 and by reviewing the CoS
configuration code snippet.
• 2. Create some simple flows on N2X to send from each port-to-port ge-
0/0/9.
• 3. Send the traffic at 50% from each port to 11.22.1.9. (in absence of two ports,
one port could be used to send 100% traffic).
• 4. Configure the DUT to do the CoS-based processing on ingress traffic
from source 11.22.1.7 coming over interface ge-0/0/7 as High Class and low
probability to get dropped and from interfaces ge-0/0/24 and ge-0/0/25 as
High Priority to get dropped.
• 5. Now start the ping from 11.22.1.7 onto 11.22.1.9.
• 6. Tune the line-rate parameter with N2X traffic coming to ge-0/0/9.
• 7. Observe the egress interface statistics and ingress ports statistics to get
confirmation that ping traffic is tagged higher forwarding class and does not
get dropped, while traffic coming from port ge-0/0/24 and ge-0/0/25 gets
dropped on ingress.
code-point-aliases {
ieee-802.1 { //associate the code point aliases
be 000; af12 101; af11 100; be1 001; ef 010;
}
}
forwarding-classes { //assigned the four queues to the forwarding classes
queue 0 BACKGROUND;
queue 3 CONVERSATIONAL;
queue 2 INTERACTIVE;
queue 1 STREAMING;
}
interfaces {
ge-0/0/9 {
//associate the scheduler map, rewrite rules and classifer with the interface
scheduler-map SCHED-MAP;
unit 0 {
classifiers {
ieee-802.1 DOTP-CLASSIFIER;
}
rewrite-rules {
ieee-802.1 DOTP-RW;
}
}
}
}
rewrite-rules {
//define the rewrite rules for each of the forwarding classes. Set the code
points to be used in each case
ieee-802.1 DOTP-RW {
forwarding-class CONVERSATIONAL {
loss-priority low code-point ef;
}
forwarding-class INTERACTIVE {
loss-priority low code-point af12;
}
forwarding-class STREAMING {
loss-priority low code-point af11;
}
forwarding-class BACKGROUND {
loss-priority high code-point be;
}
}
}
scheduler-maps {
//define the scheduler maps for each forwarding class
SCHED-MAP {
forwarding-class BACKGROUND scheduler BACK-SCHED;
forwarding-class CONVERSATIONAL scheduler CONV-SCHED;
forwarding-class INTERACTIVE scheduler INTERACT-SCHED;
forwarding-class STREAMING scheduler STREAMING-SCHED;
}
}
schedulers {
//Specify the scheduler properties for each forwarding class. Priorities
assigned here define how the scheduler handles the traffic.
CONV-SCHED {
transmit-rate remainder;
buffer-size percent 80;
priority strict-high;
}
114 Data Center Network Connectivity with IBM Servers
INTERACT-SCHED;
STREAMING-SCHED {
transmit-rate percent 20;
}
BACK-SCHED {
transmit-rate remainder;
priority low;
}
}
chandra@EX> show configuration firewall
family ethernet-switching {
//Configure a multifield classifer for better granularity. CONVERSATIONAL
class gets higher priority than BACKGROUND
filter HIGH {
term 1 {
from {
source-address {
11.22.1.7/32;
}
}
then {
accept;
forwarding-class CONVERSATIONAL;
loss-priority low;
}
}
term 2 {
then { accept; count all; }
}
}
filter LOW {
term 1 {
from {
source-address {
11.22.1.100/32;
11.22.1.101/32;
}
}
then {
accept;
forwarding-class BACKGROUND;
loss-priority high;
}
}
term 2 {
then { accept; count all; }
}
}
}
chandra@EX > show configuration interfaces ge-0/0/24
unit 0 {
family ethernet-switching {
//Assign the firewall filter to the interface
port-mode access;
filter {
input LOW; output LOW;
}
}
}
chandra@EX> show configuration interfaces ge-0/0/25
unit 0 {
Chapter 7: Understanding Network CoS and Latency 115
family ethernet-switching {
port-mode access;
filter {
input LOW; output LOW;
}
}
}
chandra@EX> show configuration interfaces ge-0/0/7
unit 0 {
family ethernet-switching {
port-mode access;
filter {
input HIGH; output HIGH;
}
}
}
chandra@EX> show configuration interfaces ge-0/0/9
unit 0 {
family ethernet-switching {
port-mode access;
}
}
Latency
Network latency is critical to business. Today, the competitiveness in the global
financial markets is measured in microseconds. High performance computing and
financial trading demand an ultra low-latency network infrastructure. Voice and
video traffic is time-sensitive and typically requires low latency.
Often, latency is measured in various frame sizes – 64, 128, 256, 512, 1024, 1280, 1518
bytes for Ethernet.
The simulated traffic throughput is a critical factor in the accuracy of test results.
For a 1 Gbps full-duplex interface, the transmitting (TX) throughput of simulated
traffic and the receiving (TR) throughput require 1Gbps and the TX/TR throughput
ratio must be at least 99%.
Measuring Latency
IEFT standard RFC 2544 defines performance test criteria for measuring latency of
the DUT. As shown in Figure 7.3, the ideal way to test DUT latency is to use a tester
with both transmitting and receiving ports. The tester connects DUT with two
connections: the transmitting port of the tester connects to the receiving port of the
DUT, and the sending port of the DUT connects to the receiving port of the tester.
The setup also applies to measuring the latency of multiple DUTs, as shown in
Figure 7.3.
DUT 1
DUT 2
Figure 7.4 illustrates two latency test scenarios. We measured the latency of the
MX480 in one scenario; we measured the end-to-end latency of MX480 and
Cisco’s ESM in another scenario. We used Agilent’s N2X with transmitting port (ge-
2/3/1) and receiving port (ge-3/4/4) as a tester.
ge-5/3/5
11.22.1.1
N2X
ge-2/3/1 ge-5/3/7
11.22.1.2 11.22.2.1
MX480
Port 18
N2X
ge-3/4/4 Port 20
11.22.2.2
Cisco
ESM
IBM BladeCenter
Device Latency
End-to-End Latency
In the first test scenario, the N2X and MX480 connections, represented by the
dashed line,are made from the sending ports (ge-2/3/1) of the N2X to the receiving
ports (ge-5/3/5) of the MX480 and from the sending ports (ge-5/3/6) of the
MX480 back to the receiving ports (ge-3/4/4) of the tester.
In second test scenario, the connection among the N2X, MX480 and Cisco’s ESM
(represented by the solid line in Figure 7.4) occurs in the following order:
• Connection from the sending ports of the N2X to the receiving ports of the
MX480
• Connection from the sending port of the MX480 to the receiving port (Port 18)
of Cisco’s ESM
• Connection from the sending port (Port 20) of Cisco’s ESM to the receiving port
of the N2X.
ge-5/3/5 {
//Define a VLAN tagged interface and Ethernet-bridge encapsulation
vlan-tagging;
encapsulation ethernet-bridge;
}
unit 1122 {
//Define a logical unit, vlan-id and a vlan-bridge type encapsulation
encapsulation vlan-bridge;
vlan-id 1122;
ge-0/0/35.0;
}
bc-ext {
//Define a bridge domain and assign VLAN id and interface.
domain-type bridge;
vlan-id 1122;
interface ge-5/3/5.1122;
interface ge-5/3/7.1122;
}
118 Data Center Network Connectivity with IBM Servers
ge-5/3/7 {
unit 0 {
family inet {
address 11.22.2.1/24;
}
}
}
119
Chapter 8
This chapter covers the following software-based high availability features that
operators can enable in the data center:
This chapter first introduces Junos OS based features such as Routing Engine
redundancy, GRES, GR, NSR, NSB and ISSU that are critical to implementing high
availability in the data center. Reliability features such as VRRP, RTG and LAG are
implemented over these key high availability elements.
Any one of the following failures can trigger a switchover from the primary to the
backup Routing Engine:
• Hardware failure – This can be a hard disk error or a loss of power on the primary
Routing Engine.
• Software failure – This can be a kernel crash or a CPU lock. These failures cause
a loss of keepalives from the primary to the backup Routing Engine.
• Software process failure – Specific software processes that fail at least four
times within the span of 30 seconds on the primary Routing Engine.
NOTE To revert to the original primary post-failure recovery, operators must perform a
manual switchover.
redundancy {
graceful-switchover;
keepalive-time seconds;
routing-engine slot-number (master | backup | disabled);
}
2. Specify the threshold time interval for loss of keepalives after which the backup
Routing Engine takes over from the primaryRouting Engine. The failover occurs
by default after 300 seconds when Graceful Routing Engine Switchover is not
configured.
[edit chassis redundancy]
keepalive-time seconds;
4. The Routing Engine mastership can be manually switched using the following
CLI commands:
request chassis routing-engine master acquire on backup Routing Engine
request chassis routing-engine master release on primary Routing Engine
request chassis routing-engine master switch on either primary or
backup Routing Engines
It is important to note that graceful Routing Engine switchover only offers Routing
Engine redundancy, not router level redundancy. Traffic flows through the router
for a short interval during the Routing Engine switchover. However, the traffic is
dropped as soon as any of the routing protocol timers expire and the neighbor
relationship with the upstream router ends. To avoid this situation, operators must
apply graceful Routing Engine switchover in conjunction with Graceful Restart (GR)
protocol extensions.
NOTE Although graceful Routing Engine switchover is available on many other platforms,
with respect to the scope of this handbook, graceful Routing Engine switchover is
available only on the MX Series and EX8200 platforms.
Figure 8.1 shows a primary and backup Routing Engine exchanging keepalive
messages.
Keep-Alives
Master Backup
Routing Engine Routing Engine
For details concerning GR, see the Graceful Restart section on page 126.
2. The operational show system switchover command can be used to check the
graceful Routing Engine switchover status on the backup Routing Engine:
{backup}
chandra@HE-Routing Engine-1-MX480-194> show system switchover
Graceful switchover: On
Configuration database: Ready
Kernel database: Ready
state: Steady State
Virtual Chassis
Routing Engines are built into the EX Series chassis. In this case, Routing Engine
redundancy can be achieved by connecting and configuring two (or up to ten)
EX switches as a part of a virtual chassis. This virtual chassis operates as a single
network entity and consists of designated primary and backup switches. Routing
Engines on each of these two switches then become the master and backup
Routing Engines of the virtual chassis, respectively. The rest of the switches of
Chapter 8: Configuring High Availability 123
the virtual chassis assume the role of line cards. The master Routing Engine on
the primary switch manages all the other switches that are members of the
virtual chassis and has full control of the configuration and processes. It receives
and transmits routing information, builds and maintains routing tables, and
communicates with interfaces and the forwarding components of the member
switches.
The backup switch acts as the backup Routing Engine of the virtual chassis and
takes over as the master when the primary Routing Engine fails. The virtual chassis
uses GRES and NSR to recover from control plane failures. Operators can physically
connect individual chassis using either virtual chassis extension cables or 10G/1G
Ethernet links.
Using graceful Routing Engine switchover on a virtual chassis enables the interface
and kernel states to be synchronized between the primary and backup Routing
Engines. This allows the switchover between primary and backup Routing Engine
to occur with minimal disruption to traffic. The graceful Routing Engine switchover
behavior on the virtual chassis is similar to the description in the Graceful Routing
Engine Switchover section on page 121.
When graceful Routing Engine switchover is not enabled, the line card switches of
the virtual chassis initialize to the boot up state before connecting to the backup
that takes over as the master when Routing Engine failover occurs. Enabling
graceful Routing Engine switchover eliminates the need for the line card switches to
re-initialize their state. Instead, they resynchronize their state with the new master
Routing Engine thus ensuring minimal disruption to traffic.
EX4200 (EX-6)
Backup
EX4200 (EX-7)
Primary
EX4200 (EX-8)
The show virtual-chassis CLI command provides a status of a virtual chassis that
has a master and backup switch and line card. There are three EX4200 switches
connected and configured to form a virtual chassis. Each switch has a member ID
and sees the other two switches as its neighbors when the virtual chassis is fully
functioning. The master and backup switches are assigned the same priority (130)
to ensure a non-revertive behavior after the master recovers.
show virtual-chassis
Virtual Chassis ID: 555c.afba.0405
Mastership Neighbor List
Member ID Status Serial No Model priority Role ID Interface
0 (FPC 0) Prsnt BQ0208376936 ex4200-48p 128 Linecard 1 vcp-0
2 vcp-1
1 (FPC 1) Prsnt BQ0208376979 ex4200-48p 130 Backup 2 vcp-0
0 vcp-1
2 (FPC 2) Prsnt BQ0208376919 ex4200-48p 130 Master* 0 vcp-0
1 vcp-1
Member ID for next new member: 0 (FPC 0)
Use the following operational CLI command to define the 10/1G Ethernet ports that
are used only for virtual chassis inter-member connectivity.
State information for a protocol that is not supported by NSR is the primary Routing
Engine. State information must be refreshed using the normal recovery mechanism
inherent to the protocol.
4. A switchover to the backup Routing Engine must occur when the routing
protocol process (rpd) fails three times consecutively, in rapid intervals. For this
to occur, the following statement must be included.
[edit system processes routing failover]
routing failover other-routing-engine;
Nonstop Bridging
Nonstop Bridging (NSB) enables a switchover between the primary and backup
Routing Engines without losing Layer 2 Control Protocol (L2CP) information. NSB is
similar to NSR in that it preserves interface and kernel information. The difference is
that NSB saves the Layer 2 control information by running a Layer 2 Control Protocol
process (l2cpd) on the backup Routing Engine. For NSB to function, operators must
enable Graceful Routing Engine switchover.
NOTE It is not necessary to start the primary and backup Routing Engines at the same
time. Implementing a backup Routing Engine at any time automatically
synchronizes with the primary Routing Engine when NSB is enabled.
Graceful Restart
A service disruption necessitates routing protocols on a router to recalculate peering
relationships, protocol specific information and routing databases. Disruptions due
to an unprotected restart of a router can cause route flapping, greater protocol
reconvergence times or forwarding delays, ultimately resulting in dropped packets.
However, Graceful Restart (GR) alleviates this situation, acting as an extension to
the routing protocols.
NOTE A helper router undergoing Routing Engine switchover drops the GR wait state
that it may be in and propagates the adjacency’s state change to the network. GR
support is available for routing/MPLS related protocols and Layer 2 or Layer 3 VPNs.
MORE See Table-B.3 in Appendix B of this handbook for a list of GR protocols supported on
the MX and EX Series platforms.
NOTE The GR helper mode is enabled by default even though GR may not be enabled.
If necessary, the GR helper mode can be disabled on a per-protocol basis. If GR is
enabled globally, it can be disabled only if required for each individual protocol.
edit routing-options]
graceful-restart
restart-duration
ISSU runs only on platforms that support dual Routing Engines and requires that
graceful Routing Engine switchover and NSR be enabled. Graceful Routing Engine
switchover is required because a switch from the primary to the backup Routing
Engine must happen without any packet forwarding loss. The NSR with graceful
Routing Engine switchover maintains routing protocol and control information
during the switchover between the Routing Engines.
NOTE Similar to regular upgrades, Telnet sessions, SNMP, and CLI access can be
interrupted briefly when ISSU is being performed.
• The primary and backup Routing Engines must be running the same software
version.
• The status of the PICs cannot be changed during the ISSU process. For
example, the PICs cannot be brought online/offline.
• The network must be in a steady, stable state.
128 Data Center Network Connectivity with IBM Servers
MORE For more details when performing an ISSU using the above-listed methods, see
Appendix A of this handbook.
2. Verify that graceful Routing Engine switchover and NSR are enabled using the
show system switchover and show task replication commands.
3. BFD timer negotiation can be disabled explicitly during the ISSU activity using
the [edit protocols bfd] hierarchy:
[edit protocols bfd]
no-issu-timer-negotiation;
4. Perform a software backup on each Routing Engine using the request system
snapshot CLI command:
{master}
chandra@MX480-131-0> request system snapshot
Verifying compatibility of destination media partitions...
Running newfs (899MB) on hard-disk media / partition (ad2s1a)...
Running newfs (99MB) on hard-disk media /config partition (ad2s1e)...
Copying ‘/dev/ad0s1a’ to ‘/dev/ad2s1a’ .. (this may take a few minutes)
Copying ‘/dev/ad0s1e’ to ‘/dev/ad2s1e’ .. (this may take a few minutes)
The following filesystems were archived: / /config
One of the routers is elected dynamically as a default primary of the group and
is active at a given time. All the other participating routing devices perform a
backup role. Operators can assign priorities to devices manually, forcing them
to act as primary and backup devices. The VRRP primary sends out multicast
advertisements to the backup devices at regular intervals (default interval is
1 second). When the backup devices do not receive an advertisement for a
configured period, the device with the next highest priority becomes the new
primary. This occurs dynamically, thus enabling an automatic transition with
minimal traffic loss. This VRRP action eliminates the dependence on achieving
connectivity using a single routing platform that can result in a single point of
failure. In addition, the change between the primary and backup roles occurs with
minimum VRRP messaging and no intervention on the host side.
Traffic from the hosts is sent to hosts on other networks through EX8200-1 because
it is the primary. When the hosts lose connectivity to EX8200-1 either due to a node
or link failure, EX8200-2 becomes the primary. The hosts start sending the traffic
through EX8200-2. This is possible because the hosts forward the traffic to the
gateway that owns virtual IP address 172.1.1.10, and IP packets are encapsulated in
Ethernet frames destined to a virtual MAC address.
EX8200 - 1
Virtual Address
172.1.1.10/16
EX4200 - 0
Default Gateway
on Each Host set
to 172.1.1.10
EX8200 - 2
MORE For VRRP configuration details, refer to the Junos High Availability Guide at
www.juniper.net/techpubs/software/junos/junos90/swconfig-high-
availability/high-availability-overview.html.
NOTE Although this VRRP sample scenario uses EX4200 devices, it is possible to
configure other combinations of VRRP groups consisting of devices such as:
• EX8200 – EX4200
• EX8200 – MX480
• MX480 – MX480
• EX8200 – EX8200
Figure 8.4 shows devices EX8200-A and EX8200-B, MX480-A and MX480-B to
illustrate the choices of different platforms when configuring VRRP in the network.
132 Data Center Network Connectivity with IBM Servers
BNT
EX8200-A Pass- EX8200-B
Cisco
MM1 Through
ESM 1
MM2
Eth1
SoL
Eth0
The virtual address assigned to the EX4200 group discussed here is 11.22.1.1. The
two devices and the IBM Blade servers physically connect on the same broadcast
domain. EX4200-A is elected as the primary and so the path between the servers
to EX4200-A through the Cisco ESM is the primary preferred path. The link between
the Cisco ESM and EX4200-B is the backup path.
NOTE Cisco’s ESM included in the IBM Blade Center is a Layer 2 switch that does not
support VRRP, but it serves as an access network layer switch connected to routers
that use VRRP. Other switch modules for the IBM Blade Center support Layer3
functionality but are out of the scope of this book.
Chapter 8: Configuring High Availability 133
Configuring VRRP
To configure VRRP on the sample network perform the following steps:
1. Create two trunk ports on Cisco’s ESM. Assign an internal eth0 port on Blade[x]
to same network as VRRP, for example 11.22.1.x.
2. Add a router with a Layer 3 address that is reachable from the 11.22.1.x network
on the blade center. In this case, the MX480 acts as a Layer 3 router that
connects to both EX4200-A and EX4200-B through the 11.22.2.x and11.22.3.x
networks, respectively.
3. This Layer 3 MX480 router also terminates the 11.22.5.X network via interface
ge-5/3/5 with family inet address 11.22.5.1.
4. Verify that this address is reachable from the blade server by configuring the
default gateway to be either 11.22.1.11(ge-0/0/11) or 11.22.1.31 (ge-0/0/31).
5. Configure VRRP between the two interfaces ge-0/0/11 (EX4200-A) and
ge-0/0/31 (EX4200-B). The default virtual address (known as vrrp-id) is
11.22.1.1 with ge-0/0/11 on EX4200-A set to have a higher priority.
Verify operation on the sample network by performing the following steps.
1. Reconfigure the default route on 11.22.1.60 (blade server) to 11.22.1.1 (vrrp router id).
2. Confirm that 11.22.5.1 is reachable from 11.22.1.60 and vice-versa .Perform a
traceroute to ensure that the next hop is 11.22.1.11 on EX4200-A.
3. Either lower the priority on EX4200-A or administratively disable the interface
ge-0/0/11 to simulate an outage of EX4200-A.
4. Confirm that pings from 11.22.1.60 to 11.22.5.1 are still working but use the
backup path to EX4200-B.
5. Perform a traceroute to confirm that the backup path is being used.
NOTE The traceroute command can be used for confirmation in both directions – to and
from the BladeCenter.
This section shows that VRRP statements can be included at the interface
hierarchy level.
Link Aggregation
Link Aggregation (LAG) is a feature that aggregates two or more physical Ethernet
links into one logical link to obtain higher bandwidth and to provide redundancy.
LAG provides high link availability and capacity which results in improved
performance and availability.
Traffic is balanced across all links that are members of an aggregated bundle. The
failure of a member link does not cause traffic disruption. Instead, because there are
multiple member links, traffic continues over active links.
LAG is an 802.3ad standard that can be used in conjunction with Link Aggregation
Control Protocol (LACP). Using LACP, multiple physical ports can be bundled
together to form a logical channel. Enabling LACP on two peers that participate in
a LAG group enables them to exchange LACP packets and negotiate the automatic
bundling of links.
NOTE LAG can be enabled on interfaces spread across multiple chassis; this is known as
Multichassis LAG (MC-LAG). This means that the member links of a bundle can be
configured between multiple chassis instead of only two chassis.
DUT
17
ge-5/0/5
N2X Cisco
ge-304/4 ESM
ge-5/0/1 18
Aggregated
Ethernet
MX480 Etherchannel
Trunk Port 20
N2X
ge-201/1
NOTE The EX8200 or any of the MX Series devices can be used instead of the MX480, as
shown in Figure 8.5.
[edit]
user@host# delete interfaces aeX
NOTE When an aggregated Ethernet interface is deleted from the configuration, Junos
removes the configuration statements related to aeX and sets this interface to the
DOWN state. However, the aggregated Ethernet interface is not deleted until the
chassis aggregated-devices ethernet device-count configuration statement is
deleted.
NOTE Although EX Series Platforms can also perform hash-key based load balancing as of
release 9.6R1.13, they do not have the flexibility to configure the criteria for hashing.
hash-key {
family multiservice {
source-mac;
destination-mac;
138 Data Center Network Connectivity with IBM Servers
payload {
ip {
layer-3 {
[source-ip-only | destination-ip-only];
}
layer-4;
}
}
symmetric-hash;
}
}
• Enabling LACP provided seamless recovery from Routing Engine failover on the
MX480. The Routing Engine took approximately 20 seconds to recover from a
failure with LACP disabled as opposed to no disruption when it was enabled.
• FPCs with only one LAG interface recovered more quickly (in 1.5 seconds) than
FPCs with two interfaces (approximately 55 seconds).
• The switch fabric recovered immediately after a failure in all the scenarios.
• A similar validation was performed using the EX4200 instead of the MX480.
In this case, enabling or disabling the LACP did not make a difference. The
following scenarios were validated:
-- Routing Engine Failover
-- FPC Failover (two LAG links and an interface to the traffic generator)
-- Switch Fabric Failover
-- System Upgrade (without ISSU or graceful Routing Engine switchover)
-- System Upgrade (without ISSU, with graceful Routing Engine switchover)
MORE Table C.1 and Table C.4 in Appendix C of this handbook provide detailed LAG test
results using the EX4200 and MX480.
Figure 8.6 shows an EX Series switch that has links to Switch1 and Switch2,
respectively. RTG is configured on the EX Series switch so that the link to Switch1
is active and performs traffic forwarding. The link to Switch2 is the backup link and
starts forwarding traffic when the active link fails.
NOTE Given the multi-chassis scenario, it is better to use RTG instead of MC-LAG.
140 Data Center Network Connectivity with IBM Servers
Switch 1
Active
EX Series
Backup
Switch 2
Figure 8.7 shows an EX Series switch that has two links to Switch1. RTG is configured
on the EX Series switch so that one of the links to Switch1 is active and performs
traffic forwarding while the other link acts as the backup. The backup link starts
forwarding traffic to Switch1 when the active link fails.
NOTE In this scenario, it may be more efficient in terms of bandwidth and availability to
use LAG instead of RTG. LAG provides better use of bandwidth and faster recovery
because there is no flushing and relearning of MAC addresses.
Active
EX Series Backup
Switch 1
Based on these two scenarios, RTG can be used to control the flow of traffic over
links from a single switch to multiple destination switches while providing link
redundancy.
Both RTG active and backup links must be members of the same VLANs.
NOTE Junos does not allow the configuration to take effect if there is a mismatch of VLAN
IDs between the links belonging to a RTG.
Chapter 8: Configuring High Availability 141
Figure 8.8 shows a sample two-tier architecture with RTG and LAG enabled
between the access-core layers and access-to-server layers. The core consists
of two MX Series devices: MX480-A and MX480-B. Two EX4200 based virtual
chassis (EX4200 VC-A, EX4200 VC-B) and EX8200s-A and B form the access
layer. There are connections from each of the access layer devices to MX480-A
and B, respectively.
ae4
MX480-A ae3 MX480-B
ae1 RTG RTG ae2
We enable LAG and RTG on these links to ensure redundancy and control traffic flow.
We enable LAG on the access devices for links between the following devices:
We enable RTG on the EX4200 VC-A and B so that that links AL-A and Al-B to
MX480-A are active and are used to forward traffic. The set of backup links RL-A
and RL-B from the virtual chasses to MX480-B take over the traffic forwarding
activity when the active link(s) fails.
142 Data Center Network Connectivity with IBM Servers
Configuration Details
To configure a redundant trunk link, a RTG first must be created. As stated earlier,
RTG can be configured on the access switch that has two links – a primary (active)
and a secondary (backup) link. The secondary link automatically starts forwarding
data traffic when the active link fails.
Execute the following commands to configure RTG and to disable RSTP on the EX
switches.
Appendices
Table A.1 lists tasks that are associated with system-dependent commands.
Obviously, a command that works on one platform may not work on another. For
example, the lsdev command only works on the AIX platform.
Logical Host Uses HMC to allocate the virtual Ethernet Adapter to each partition.
Ethernet IBM PowerVM The adapter configuration in the partition depends on the OS,
Adapter (LHEA) including RHEL, SUSE, AIX.
NOTE Some of these commands will change IP address settings immediately, while some
of them require a restart of network service.
NOTE Not all tools will save changes in the configuration database. It means that the
changes may not be preserved after server reboot.
Appendices 145
DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
NETWORK=10.0.1.0
NETMASK=255.255.255.0
IPADDR=10.0.1.27
USERCTL=no
In addition, several other commands can be helpful, as listed in Table A.2.
Commands Description
Queries and changes settings of an Ethernet device, such as auto-negotiation,
ethtool
speed, link-mode, flow-control.
Queries and changes settings of an Ethernet interface. The changes made via
ifconfig ifconfig take effect immediately but they are not saved in the configuration
database.
The following is a sample ifconfig command to create eth0 interface with a fixed IP
address.
Vconfig adds or removes a VLAN interface. When vconfig adds a VLAN interface,
a new logical interface will be formed with its base interface name and the VLAN
ID. Below is a sample vconfig command to add a VLAN 5 interface on the eth0
interface:
Commands Description
Provides format for printing network connections, routing tables, interface statistics
netstat
and protocol statistics.
traceroute Tracks the route packets taken from an IP network on their way to a given host.
For further details concerning the SUSE Linux network configuration commands,
refer to Novell’s Command Line Utilities at www.novell.com/documentation/oes/
tcpipenu/?page=/documentation/oes/tcpipenu/data/ajn67vf.html.
Appendices 147
Commands Definition
Displays configuration, diagnostic and vital product data (VPD) information about
lscfg
the system and its resources.
Displays dynamically reconfigurable slots, such as hot plug slots and their
lslot
characteristics.
Configures devices and optionally installs device software by running the programs
cfgmgr
specified in the Configuration Rules object class.
Displays attribute characteristics and possible values of attributes for devices in the
lsattr
system.
Provides a cursor-based text interface to perform system management. In addition
smitty to a hierarchy of menus, smitty allows FastPath to take users directly to the dialog,
by passing the menu interactive.
Configures an adapter, determines a network adapter hardware address, sets an
smitty chgenet
alternative hardware address or enables Jumbo Frames.
Sets the required value for starting TCP/IP on a host, including setting the host
smit mktcpip name, setting the IP address of the interface in the configuration database, setting
the sub network mask, or adding a static route.
Displays network status, including the number of packets received, transmitted and
Netstat
dropped, and the routes and their status.
Shows Ethernet device driver and device statistics. For example, the command
Entstat
entstat ent0 displays the device generic statistics for ent0.
Commands Definitions
Creates a mapping between a virtual adapter and a physical resource. For example,
mkvdev the following command creates a SEA that links physical ent0 to virtual ent2:
mkvdev –sea ent0 –vadapter ent2 –default ent1 –defaultid 1
Lists the mapping between virtual adapters and physical resources. For example,
lsmap use the following lsmap command to list all virtual adapters attached to vhost1:
lsmap –vadapter vhost1
Changes the attribute on the device. For instance, use the following chdev command
chdev to enable jumbo frames on the ent1 device:
chdev –dev ent0 –attr jumbo _ frame=yes
Changes the VIOS TCP/IP setting and parameters. For example, use the following
chtcpip command to change the current network address and mask to the new setting:
chtcpip –interface en0 –inetaddr 9.1.1.1 –netmask 255.255.255.0
Displays the VIOS TCP/IP setting and parameters. For example, use the following
lstcpip command to list the current routing table:
lstcpip –routetable
Initiates the OEM installation and setup environment so that users can install and
set up software in the traditional way. For example, the oem_setup_env command
oem _ setup _ env can place a user in a non-restricted UNIX root shell so that the user can implement
the AIX commands to install and set up software and use most of the AIX network
commands, including lsdev, rmdev, chdev, netstat, entstat, ping and traceroute.
Commands Definitions
ipconfig Command line utility to get TCP/IP network adapters configuration.
route Command line utility to add or remove a static route. You can make the change
persistent by using the –p option when adding routes.
ping Used to check network connectivity.
tracert Used to tracks the route packets taken from an IP network on their way to a given
host.
NOTE The following values listed in Table B.1 represent approximations in seconds.
~20
NSR (upgrade
LACP ~63 backup first,
Enabled ~ 20 10 Immediate ~20 *
Disabled (57, 63, 64) and then
upgrade the
primary)
NOTE The following values listed in Table B.2 represent approximations in seconds.
LAG Failover Routing FPC Failover Switch Fabric System Upgrade System
Scenarios Engine (FPC with LAG Failover (without ISSU/ Upgrade
Failover and interface without GRES) (without
to traffic) ISSU/with
GRES)
LACP Enabled/
Disabled (Does not ~84
matter if LACP is 0 (82, 86) Immediate 527 152
enabled/disabled)
Appendices 151
NOTE Refer to TableB.2 when reviewing the following system upgrade steps.
1. Download the software package from the Juniper Networks Support Web site.
2. Copy the package to the /var/tmp directory on the router:
user@host>file copy
ftp://username:prompt@ftp.hostname.net/filename /var/
tmp/filename
3. Verify the current software version on both Routing Engines, using the show
version invoke-on all-routing-engines command:
{backup}
user@host> show version invoke-on all-routing-engines
152 Appendices
5. Log in to the router once the new master (formerly backup Routing Engine) is
online. Verify that both Routing Engines have been upgraded:
{backup}
user@host> show version invoke-on all-routing-engines
6. To make the backup Routing Engine (former master Routing Engine) the
primary Routing Engine, issue the following command:
{backup}
user@host> request chassis routing-engine master acquire
Attempt to become the primary routing engine ? [yes,no] (no) yes
Resolving mastership...
Complete. The local routing engine becomes the master.
{master}
user@host>
7. Issue the request system snapshot command on each of the Routing Engines
to back up the system software to the router’s hard disk.
Method 2: Upgrading Both Routing Engines and Manually Rebooting the New
Backup Routing Engine
1. Issue the request system software in-service-upgrade command.
2. Perform steps 1 through 4 as described in Method 1.
3. Issue the show version invoke-on all-routing-engines command to verify
that the new backup Routing Engine (former master) is still running the
previous software image, while the new primary Routing Engine (former
backup) is running the new software image:
{backup}
user@host> show version
4. At this point, a choice between installing newer software or retaining the old
version can be made. To retain the older version, execute the request system
software delete install command.
Appendices 153
5. To ensure that a newer version of software is activated, reboot the new backup
Routing Engine, by issuing the following:
{backup}
user@host> request system reboot
Reboot the system ? [yes,no] (no) yes
Shutdown NOW!
. . .
System going down IMMEDIATELY
Connection to host closed by remote host.
6. Log in to the new backup Routing Engine and verify that both Routing Engines
have been upgraded:
{backup}
user@host> show version invoke-on all-routing-engines
7. To make the new backup the primary, issue the following command:
{backup}
user@host> request chassis routing-engine master acquire
Attempt to become the master routing engine ? [yes,no] (no) yes
8. Issue the request system snapshot command on each of the Routing Engines
to back up the system software to the router’s hard disk.
2. To install the new software version on the new backup Routing Engine, issue
the request system software add command.
NOTE The following Unified ISSU steps relate only to the Junos 9.6 release.
Appendix C: Acronyms
A
AFE: Application Front Ends
B
BPDU: Bridge Protocol Data Unit
C
CBT: Core Based Tree
D
dcd: device control process
E
ESM: Ethernet Switch Module, Embedded Syslog Manager
F
FC: Fibre Channel
G
GRES: Graceful Route Engine Switchover
H
HBA: Host Bus Adapter
I
Appendices 155
L
LAG: Link Aggregation
M
MAC: Media Access Control
N
NAT: Network Address Translation
O
156 Appendices
P
PDM: Power Distribution Module
Q
QoS: Quality of Service
R
RED: random early detection
S
SAN: storage area network
T
TWAMP: Two-Way Active Measurement Protocol
V
VID: VLAN Identifier (IEEE 802.1q)
VLC: VideoLAN
W
WPAR: Workload based Partitioning
158 Appendices
Appendix D: References
• www.juniper.net/techpubs/software/junos/junos90/swconfig-high-
availability/swconfig-high-availability.pdf
• The Junos High Availability Configuration Guide, Release 9.0 presents an
overview of high availability concepts and techniques. By understanding the
redundancy features of Juniper Networks routing platforms and the Junos
software, a network administrator can enhance the reliability of a network and
deliver highly available services to customers.
• IEEE 802.3ad link aggregation standard
• STP - IEEE 802.1D 1998 specification
• RSTP - IEEE 802.1D-2004 specification
• MSTP - IEEE 802.1Q-2003 specification
• www.nettedautomation.com/standardization/IEEE_802/standards_802/
Summary_1999_11.html
Provides access to the IEEE 802 Organization website with links to all 802
standards.
• RFC 3768, Virtual Router Redundancy Protocol
• https://datatracker.ietf.org/wg/vrrp/
Provides access to all RFCs associated with the Virtual Router Redundancy
Protocol (VRRP).
• RFC 2338, Virtual Router Redundancy Protocol for IPv6
• https://datatracker.ietf.org/doc/draft-ietf-vrrp-ipv6-spec/
Provides access to the abstract that defines VRRP for IPv6.
Data Center Network Connectivity with IBM Servers