Beruflich Dokumente
Kultur Dokumente
Published: 2015-04-20
Juniper Networks, Junos, Steel-Belted Radius, NetScreen, and ScreenOS are registered trademarks of Juniper Networks, Inc. in the United
States and other countries. The Juniper Networks Logo, the Junos logo, and JunosE are trademarks of Juniper Networks, Inc. All other
trademarks, service marks, registered trademarks, or registered service marks are the property of their respective owners.
Juniper Networks assumes no responsibility for any inaccuracies in this document. Juniper Networks reserves the right to change, modify,
transfer, or otherwise revise this publication without notice.
The information in this document is current as of the date on the title page.
Juniper Networks hardware and software products are Year 2000 compliant. Junos OS has no known time-related limitations through the
year 2038. However, the NTP application is known to have some difficulty in the year 2036.
The Juniper Networks product that is the subject of this technical documentation consists of (or is intended for use with) Juniper Networks
software. Use of such software is subject to the terms and conditions of the End User License Agreement (“EULA”) posted at
http://www.juniper.net/support/eula.html. By downloading, installing or using such software, you agree to the terms and conditions of
that EULA.
Part 1 Design
Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Design Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Design Topology Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Design Highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Chapter 2 Solution Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Compute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Virtual Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Hypervisor Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Blade Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Access and Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Core Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Edge Routing and WAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Compute Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Network Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Network Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Business-Critical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
High Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Hardware Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Software Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Class of Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Application Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Perimeter Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Secure Remote Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Network Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Out-of-Band Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Network Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Security Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Performance and Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Summary of Key Design Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ?
Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ?
Chapter 3 High Level Testing and Validation Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Key Characteristics of Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
POD1 (QFX3000-M QFabric) Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 81
POD2 (QFX3000-M QFabric) Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Core Switch (EX9214) Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Edge Firewall (SRX3600) Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Edge routers (MX240) Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Compute (IBM Flex chassis) Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 83
OOB-Mgmt (EX4300-VC) Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Hardware and Software Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Part 7 Index
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
Part 1 Design
Figure 9: Virtualized IT Data Center Ecosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Figure 10: Virtualized IT Data Center Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Chapter 2 Solution Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Figure 11: Virtual Machine Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Figure 12: Server Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Figure 13: VMware Distributed Virtual Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Figure 14: VMware Network I/O Control Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Figure 15: Sample Blade Switch, Rear View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Figure 16: Juniper Networks QFabric Systems Enable a Flat Data Center
Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Figure 17: Core Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Figure 18: Core Switching Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Figure 19: Edge Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Figure 20: Edge Routing Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Figure 21: Storage Lossless Ethernet Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Figure 22: Storage Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Figure 23: Virtualized IT Data Center Solution Software Stack . . . . . . . . . . . . . . . 58
Figure 24: MC-LAG – ICCP and ICL Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Figure 25: VRRP and MC-LAG – Active/Active Option . . . . . . . . . . . . . . . . . . . . . . 63
Figure 26: MC-LAG – MAC Address Synchronization Option . . . . . . . . . . . . . . . . . 64
Figure 27: MC-LAG – Traffic Forwarding Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Figure 28: MC-LAG – ICCP Down . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Figure 29: MC-LAG – ICL Down . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Figure 30: MC-LAG – Peer Down . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Figure 31: Class of Service – Classification and Queuing . . . . . . . . . . . . . . . . . . . . 67
Figure 32: Class of Service – Buffer and Transmit Design . . . . . . . . . . . . . . . . . . . 68
Figure 33: Physical Security Compared to Virtual Network Security . . . . . . . . . . . 69
Figure 63: POD1 Topology with the IBM Pure Flex Chassis + 40Gbps CNA
Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
Figure 64: POD 2 Topology Using the IBM Pure Flex System Chassis with the
10-Gbps CNA I/O Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
Chapter 10 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Figure 65: VMware vSphere Client Manages vCenter Server Which in Turn
Manages Virtual Machines in the Data Center . . . . . . . . . . . . . . . . . . . . . . . . 212
Figure 66: VMWare vSphere Distributed Switch Topology . . . . . . . . . . . . . . . . . . 213
Figure 67: VMware vSphere Distributed Switch Topology . . . . . . . . . . . . . . . . . . . 214
Figure 68: Log In to vCenter Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Figure 69: vCenter Web Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
Figure 70: Click Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
Figure 71: Click Related Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Figure 72: Click Uplink Ports and Select a Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Figure 73: Enable LACP Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Figure 74: Infra Cluster Hosts Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
Figure 75: POD1 Cluster Hosts Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
Figure 76: POD2 Cluster Hosts Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
Figure 77: INFRA Cluster VMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
Figure 78: POD1 Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Figure 79: POD2 Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Figure 80: Port Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Figure 81: Port Group and NIC Teaming Example . . . . . . . . . . . . . . . . . . . . . . . . . 224
Figure 82: Configure Teaming and Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
Figure 83: POD1 PG-STORAGE-108 Created for iSCSI . . . . . . . . . . . . . . . . . . . . . 226
Figure 84: VMware Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Figure 85: VMware Fault Tolerance on POD1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Figure 86: VMware vMotion Enables Virtual Machine Mobility . . . . . . . . . . . . . . 228
Figure 87: VMware vMotion Configured in the Test Lab . . . . . . . . . . . . . . . . . . . . 229
Chapter 11 EMC Storage Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
Figure 88: EMC FAST Cache Configuration (Select System, then Properties in
the Drop-Down) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
Figure 89: EMC FAST Cache Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
Figure 90: Pool 1 - Exchange-DB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
Figure 91: Selected Storage Pool Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
Figure 92: Storage Pool Disks Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Figure 93: Storage Pool Properties, Advanced Tab . . . . . . . . . . . . . . . . . . . . . . . . 235
Figure 94: VM-Pool Selected . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
Figure 95: VM-Pool Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
Figure 96: VM-Pool Disk Membership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Figure 97: Exchange-DB-LUN Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
Figure 98: LUN Created for All ESX Hosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
Figure 99: The Selected Pool Was Created for MS Exchange Logs . . . . . . . . . . . 240
Figure 100: Exchange Logs the LUN Created . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Figure 101: Example Storage Group Properties Window . . . . . . . . . . . . . . . . . . . . 242
Figure 102: LUN Added to Storage Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Figure 103: ESXi Hosts Added to Storage Group . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Figure 104: Add LUNs to Storage Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Part 1 Design
Table 4: MetaFabric 1.0 Solution Design Highlights . . . . . . . . . . . . . . . . . . . . . . . . 40
Table 11: Summary of Key Design Elements – Virtualized IT Data Center
Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ?
Chapter 2 Solution Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Table 5: Comparison of Pass-Through Blade Servers and Oversubscribed Blade
Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Table 6: Core Switch Hardware - Comparison of the EX9200 and EX8200
Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Table 7: Core Switch Forwarding - Comparison of MC-LAG and Virtual
Chassis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Table 8: Comparison of Storage Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Table 9: Application Security Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Table 10: Data Center Remote Access Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Chapter 3 High Level Testing and Validation Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Table 12: Hardware and Software deployed in solution testing . . . . . . . . . . . . . . . 84
Table 13: Software deployed in MetaFabric 1.0 test bed . . . . . . . . . . . . . . . . . . . . . 85
Table 14: Networks and VLANs Deployed in the Test Lab . . . . . . . . . . . . . . . . . . . 85
Table 15: Applications Tested in the MetaFabric 1.0 Solution . . . . . . . . . . . . . . . . 86
Table 16: MC-LAG Configuration Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Table 17: IRB, IP Address Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
If the information in the latest release notes differs from the information in the
documentation, follow the product Release Notes.
Juniper Networks Books publishes books by Juniper Networks engineers and subject
matter experts. These books go beyond the technical documentation to explore the
nuances of network architecture, deployment, and administration. The current list can
be viewed at http://www.juniper.net/books.
Documentation Conventions
Caution Indicates a situation that might result in loss of data or hardware damage.
Laser warning Alerts you to the risk of personal injury from a laser.
Table 2 on page xviii defines the text and syntax conventions used in this guide.
Bold text like this Represents text that you type. To enter configuration mode, type the
configure command:
user@host> configure
Fixed-width text like this Represents output that appears on the user@host> show chassis alarms
terminal screen.
No alarms currently active
Italic text like this • Introduces or emphasizes important • A policy term is a named structure
new terms. that defines match conditions and
• Identifies guide names. actions.
• Junos OS CLI User Guide
• Identifies RFC and Internet draft titles.
• RFC 1997, BGP Communities Attribute
Italic text like this Represents variables (options for which Configure the machine’s domain name:
you substitute a value) in commands or
configuration statements. [edit]
root@# set system domain-name
domain-name
Text like this Represents names of configuration • To configure a stub area, include the
statements, commands, files, and stub statement at the [edit protocols
directories; configuration hierarchy levels; ospf area area-id] hierarchy level.
or labels on routing platform • The console port is labeled CONSOLE.
components.
< > (angle brackets) Encloses optional keywords or variables. stub <default-metric metric>;
# (pound sign) Indicates a comment specified on the rsvp { # Required for dynamic MPLS only
same line as the configuration statement
to which it applies.
[ ] (square brackets) Encloses a variable for which you can community name members [
substitute one or more values. community-ids ]
GUI Conventions
Bold text like this Represents graphical user interface (GUI) • In the Logical Interfaces box, select
items you click or select. All Interfaces.
• To cancel the configuration, click
Cancel.
> (bold right angle bracket) Separates levels in a hierarchy of menu In the configuration editor hierarchy,
selections. select Protocols>Ospf.
Documentation Feedback
• Online feedback rating system—On any page at the Juniper Networks Technical
Documentation site at http://www.juniper.net/techpubs/index.html, simply click the
stars to rate the content, and use the pop-up form to provide us with information about
your experience. Alternately, you can use the online feedback form at
https://www.juniper.net/cgi-bin/docbugreport/.
Technical product support is available through the Juniper Networks Technical Assistance
Center (JTAC). If you are a customer with an active J-Care or JNASC support contract,
or are covered under warranty, and need post-sales technical support, you can access
our tools and resources online or open a case with JTAC.
• JTAC hours of operation—The JTAC centers have resources available 24 hours a day,
7 days a week, 365 days a year.
• Find solutions and answer questions using our Knowledge Base: http://kb.juniper.net/
To verify service entitlement by product serial number, use our Serial Number Entitlement
(SNE) Tool: https://tools.juniper.net/SerialNumberEntitlementSearch/
Overview
Cloud, mobility, and big data are driving business change and IT transformation. Enterprise
businesses and service providers across all industries are constantly looking for a
competitive advantage, and reliance on applications and the data center have never
been greater (Figure 1 on page 23).
Traditional networks are physically complex, difficult to manage, and not suited for the
dynamic application environments prevalent in today’s data centers. Because of mergers,
acquisitions, and industry consolidation, most businesses are dealing with data centers
that are distributed across multiple sites and clouds, which adds even more complexity.
Additionally, the data center is so dynamic because the network is constantly asked to
do more, become more agile, and support new applications while ensuring integration
with legacy applications. Consequently, this dynamic environment requires more frequent
refresh cycles.
1. Impedes time to value—Network complexity gets in the way of delivering data center
agility.
The growing popularity and adoption of switching fabrics, new protocols, automation,
orchestration, security technologies, and software-defined networks (SDNs) are strong
indicators of the need for a more agile network in the data center. Juniper Networks has
applied its networking expertise to the problems of today’s data centers to develop and
deliver the MetaFabric™ architecture—a combination of switching, routing, security,
software, orchestration, and SDN—all working in conjunction with an open technology
ecosystem to accelerate the deployment and delivery of applications for enterprises and
service providers.
With legacy data center networks, you needed to create separate physical and virtual
resources at your on-premises data center, your managed service provider, your hosted
service provider, and your cloud provider. All of these resources required separate
provisioning and management (Figure 2 on page 24).
Now, implementing a MetaFabric architecture allows you to combine physical and virtual
resources across boundaries to provision and manage your data center efficiently and
holistically (Figure 3 on page 24).
The goal of the MetaFabric architecture is to allow you to connect any physical network,
with any combination of storage, servers, or hypervisors, to any virtual network, and with
any orchestration software (Figure 4 on page 25). Such an open ecosystem ensures that
you can add new equipment, features, and technologies over time to take advantage of
the latest trends as they emerge.
The MetaFabric architecture addresses the problems common in today’s data center by
delivering a network and security architecture that accelerates time to value, while
simultaneously increasing value over time. The MetaFabric 1.0 virtualized IT data center
solution described in this guide is the first implementation of the MetaFabric architecture.
Future solutions and guides are planned, including a larger scale virtualized IT data center,
IT as a service (ITaaS), and a massively scalable cloud data center.
Domain
This guide addresses the needs that enterprise companies have for an efficient and
integrated data center. It discusses the design and implementation aspects for a complete
suite of compute resources, network infrastructure, and storage components that you
need to implement and support a virtualized environment within your data center. This
guide also discusses the key customer requirements provided by the solution, such as
business-critical applications (such as Microsoft Exchange and SharePoint), high
availability, class of service, security, and network management.
Goals
The primary goal of this solution is to enable data center operators to design and
implement an IT data center that supports a virtualized environment for large Enterprise
customers. The data center scales up to 2,000 servers and 20,000 virtual machines
(VMs) that run business-critical applications.
The MetaFabric 1.0 solution provides a simple, open, and smart architecture and solves
several challenges experienced in traditional data centers:
• Cost—The cost of managing a complex data center can be high. The solution is to
create an open data center to drive operational efficiencies and reduce cost.
• Simple—This solution uses two QFabric systems. Each QFabric system acts like a
single, very large switch and only requires one management IP address for 16 racks of
equipment. In effect, management tasks are reduced by over 90%.
• Smart—In this solution, smart workload mobility with automated orchestration and
template-based provisioning is provided by using Network Director.
The features in a simple, open, and smart architecture in your data center include:
• Integrated solution—By designing a data center with integration in mind, you can blend
heterogeneous equipment and software from multiple vendors into a comprehensive
system. This enables your network to interact efficiently with compute and storage
components that work well together.
• Network visibility—By designing a data center to provide VM visibility, you can connect
the dots between the virtual and physical components in your system. You will know
how your VMs are connected to switches and understand the vMotion history of a VM.
• Scale and virtualization—The solution scales to 20,000 VMs and can support either a
100 percent virtualized compute environment or a mixed physical and virtual
environment.
• Peace of mind—Knowing that a solution has been tested and validated reduces the
anxiety of implementing a new IT project. This solution provides peace of mind because
it has been thoroughly tested by the Juniper Networks Solutions Validation team.
• Reduce deployment rime—Integrating products from multiple vendors takes time and
effort, resulting in lost productivity caused by interoperability issues. This solution
eliminates such issues because the interoperability and integration has already been
verified by the Juniper Networks Solutions Validation team.
Audience
This MetaFabric 1.0 solution is designed for enterprise IT departments that wish to build
a complete end-to-end data center that contains compute, storage, and network
components optimized for a virtualized environment. The enterprise IT data center
segment represents the majority of Fortune 500 companies.
The primary audience for this guide includes the following technical staff members:
• Data center engineers—Responsible for working with architects, planners, and operation
engineers to design and implement the solution.
Juniper Networks solution validation labs subject all solutions to extensive testing using
both simulation and live network elements to ensure comprehensive validation. Customer
use cases, common domain examples, and field experience are combined to generate
prescriptive configurations and architectures to inform customer and partner
implementations of Juniper Networks solutions. A solution-based approach enables
partners and customers to reduce time to certify and verify new designs by providing
tested, prescriptive configurations to use as a baseline. Juniper Networks solution
validation provides the peace of mind and confidence that the solution behaves as
described in a real-world production environment.
This guide is intended to be the first in a series of guides that enable our customers to
build effective data centers to meet specific business goals.
To provide flexibility to your implementation of the virtualized IT data center, there are
several sizes of the MetaFabric 1.0 solution. As seen in Figure 6 on page 28, you can start
with a small implementation and grow your data center network into a large one over
time. The reference architecture tested and documented in this guide uses the large
topology option with two QFX3000-M QFabric points of delivery (PODs) instead of six.
The small option shown in Figure 6 on page 28 uses two QFX3600 switches for
aggregation and six QFX3500 switches for access. Two 40-Gigabit Ethernet ports on
the QFX3500 switch are used as uplinks, while the other two are split into four 40-Gigabit
Ethernet server ports. As a result, each QFX3500 switch has 56 network ports and
implements 7:1 oversubscription. The medium option is a single QFX3000-M QFabric
system with 64 network ports and 768 server ports, resulting in 3:1 oversubscription. The
large option uses 7:1 oversubscription and consists of 6 QFX3000-M QFabric systems.
NOTE: A fourth option not shown in the diagram would be to replace the 6
QFX3000-M QFabric systems with one QFX3000-G QFabric system to build
a data center containing 6144 ports.
The different sizing options solution offer different port densities to meet the growing
needs of the data center. The predefined configuration and provisioning options that
cover the small, medium, and large deployment scenarios are shown in Table 3 on page 29.
Solution Overview
This MetaFabric 1.0 solution identifies the key components necessary to accomplish the
specified goals. These components include compute, network, and storage requirements,
as well as considerations for business-critical applications, high availability, class of
service, security, and network management (Figure 7 on page 30). As a result of these
requirements and considerations, it is critical that all components are configured,
integrated, and tested end-to-end to guarantee service-level agreements (SLAs) to
support the business.
The following sections describe the general requirements you need to include in a
virtualized IT data center.
• Compute on page 30
• Network on page 31
• Storage on page 32
• Applications on page 32
• High Availability on page 33
• Class of Service on page 33
• Security on page 33
• Network Management on page 34
Compute
Because this solution is focused on a virtualized IT environment, naturally many of the
requirements are driven by virtualization itself. Compute resource management involves
the provisioning and maintenance of virtual servers and resources that must be centrally
managed. The requirements for compute resources within a virtualized IT data center
include:
• Location independence for VMs—An administrator must be able to place the VMs on
any available compute resource and move them to any other server as needed, even
between PODs.
• VM visibility—An administrator must be able to view where the virtual machines are
located in the data center and generate reports on VM movement.
• Fault tolerance—If VMs fail, there should be ways for the administrator to recover the
VMs or move them to another compute resource.
• Centralized virtual switch management—Keeping the management for VMs and virtual
switches in one place alleviates the hassle of logging into multiple devices to manage
dispersed virtual equipment.
Network
The network acts as the glue that binds together the data center services, compute, and
storage resources. To support application and storage traffic, you need to consider what
is required at the access and aggregation switching levels, core switching, and edge router
tiers of your data center. These are the areas that Juniper Networks understands best,
so we can help you in selecting the correct networking equipment to support your
implementation of the virtualized IT data center.
• 1-Gigabit, 10-Gigabit, and 40-Gigabit Ethernet Ports—This requirement covers the most
common interface types in the data center.
• Converged data and storage—By sending data and storage traffic over a single network,
this reduces the cost required to build, operate, and maintain separate networks for
data and storage.
• Load balancing—By distributing and alternating the traffic over multiple paths, this
ensures an efficient use of bandwidth and resources to prevent unnecessary
bottlenecks.
• Network segmentation—Breaking the network into different portions lowers the amount
of traffic congestion, and improves security, reliability, and performance.
• Traffic isolation and separation—By carefully planning traffic flows, you can keep
East-to-West and North-to-South data center traffic separate from each other and
prevent traffic from traveling across unnecessary hops to reach its destination. This
allows most traffic to flow locally, which reduces latency and improves application
performance.
Generally speaking, you need to determine which Layer 2 and Layer 3 hardware and
software protocols meet your needs to provide a solid foundation for the traffic that
flows through your data center.
Storage
There are two primary types of storage: local storage and shared storage. Local storage
is generally directly attached to a server or endpoint. Shared storage is a shared resource
in the data center that provides storage services to a set of endpoints. The MetaFabric
1.0 solution focuses primarily on shared storage as it is the foundation for all of the
endpoint storage within a data center. Shared storage can be broken down into six primary
roles: controller, front end, back end, disk shelves, RAID groups, and storage pools.
Although there are many different types of shared storage that vary per vendor, the
architectural building blocks remain the same. Each storage role has a very specific role
and function in order to deliver shared storage to a set of endpoints.
• Boot from shared storage—The advantages of this requirement include easier server
maintenance, more robust storage (such as more disks, more capacity, and faster
storage processors), and easier upgrade options.
• Multiple protocol storage—The storage device must be able to support multiple types
of storage protocols, such as Internet Small Computer System Interface (iSCSI),
Network File System (NFS), and Fiber Channel over Ethernet (FCoE). This provides
flexibility to the administrator to integrate different types of storage as needed.
Applications
For your applications, you need to consider the user experience and plan your
implementation accordingly. Business-critical applications provide the main reason for
the existence of the data center. The other data center components (such as compute,
network, and storage) serve to ensure that these applications are hosted securely in a
manner that can provide a high-quality user experience. Web services, e-mail, database,
and collaboration tools are housed in the data center – these tools form the basis for
business efficiency and must deliver application performance at scale. As such, the data
center architecture should focus on delivering a high-quality user experience through
coordinated operation across all tiers of the data center.
For example, can the Web, application, and database tiers communicate properly with
each other? If you plan to allow VM motion to occur only within an access and aggregation
POD, you can include Layer 3 integrated routing and bridging (IRB) within the access and
aggregation layer. However, if you choose to move VMs from one POD to another, you
need to configure the IRB interface at the core layer to allow the VM to reach the Web,
application, and database servers that are still located in the original POD. Factoring in
such design aspects ahead of time prevents headaches to the data center administrator
in the months and years to come.
High Availability
Keeping your equipment up and running so that traffic can continue to flow through the
data center is a must to ensure that applications run smoothly for your customers. You
should strive to build a robust infrastructure that can withstand outages, failover, and
software upgrades without impacting your end users. High availability should include
both hardware and software components, along with verification. Key considerations
for high availability in an virtualized IT data center include:
• Hardware redundancy—At least two redundant devices should be placed at each layer
of the data center to ensure resiliency for traffic. If one device fails, the other device
should still be able to forward data and storage packets to their destinations. The data
center requires redundant network connectivity and the ability for traffic to use all
available bandwidth.
Class of Service
Because of the storage requirements in the virtualized IT data center, you must include
lossless Ethernet transport in your design to meet the needs for converged storage in the
solution. Also, you must consider the varying levels of class of service necessary to support
end-to-end business-critical applications, virtualization control, network control, and
best-effort traffic.
Security
Another important task is to secure your data center environment from both external
and internal threats. Because this solution contains both physical and virtual components,
you must secure both the applications and traffic that flow through the heart of the data
center (often across VMs) as well as the perimeter of the data center (consisting primarily
of physical hardware, such as an edge firewall). You must also provide secure remote
access to the administrators who are managing the data center.
Network Management
The final challenge is connecting the dots between physical and virtual networking;
bridging this gap enables the data center engineer to quickly troubleshoot and resolve
issues. For network management in a virtualized IT data center, you need to consider
management of fault, configuration, accounting, performance and security (FCAPS) in
your network (Figure 8 on page 34).
For more information about FCAPS (the ISO model for network management), see
ISO/IEC 10040.
• Virtual and physical—You must be able to manage all types of components in the data
center network, regardless if they are hardware-based or virtualized.
• Fault—Errors in the network must be isolated and managed in the most efficient way
possible. You should be able to recognize, isolate, correct, and log faults that occur in
your network.
• Accounting—You must be able to gather network usage statistics, and establish users,
passwords, and permissions.
Design
• Design Considerations on page 37
• Design Scope on page 38
• Design Topology Diagram on page 39
• Design Highlights on page 40
• Solution Design on page 41
• Summary of Key Design Elements on page ?
• Benefits on page ?
Design Considerations
• Compute
• Virtual machines
• Servers
• Hypervisor switch
• Blade switch
• Network
• Access
• Aggregation
• Core switching
• Edge routing
• WAN
• Storage
The design must also include careful planning of other architectural considerations:
• Applications
• High availability
• • Class of service
• Security
• Network management
In general, the design for the solution must satisfy the following high-level requirements:
• The entire data center must have end-to-end convergence for application traffic of
under one second from the point of view of the application.
• Compute nodes must be able to use all available network links for forwarding.
• The out-of-band (OOB) management network must be able to survive the failure of
the data plane within a POD.
Design Scope
This MetaFabric 1.0 solution covers the areas shown in Figure 9 on page 38. Juniper
Networks supplies products that appear in the blue portions of the diagram, while open
ecosystem partner products appear in the black portion. The ecosystem partners for this
solution include IBM (Compute), EMC (Storage), F5 Networks (Services), and VMware
(Virtualization).
Figure 10 on page 39 shows the general layout of the hardware components included in
the MetaFabric 1.0 solution architecture.
Design Highlights
Table 4 on page 40 shows the key features of the MetaFabric 1.0 solution and how they
are implemented with hardware and software from Juniper Networks and our third-party
ecosystem partners.
Compute and virtualization IBM Flex System servers, VMware vSphere 5.1, vCenter
High availability Nonstop software upgrade, in-service software upgrade, SRX JSRP cluster,
MC-LAG Active/Active with VRRP
Solution Design
This section explains the compute resources, network infrastructure, and storage
components required to implement the MetaFabric 1.0 solution. It also discusses the
software applications, high availability, class of service, security, and network management
components of this solution.
The purpose of the data center is to host business-critical applications for the enterprise.
Each role in the data center is designed and configured to ensure the highest quality user
experience possible. All of the functional roles within the data center exist to support the
applications in the data center.
• Compute on page 41
• Network on page 46
• Storage on page 55
• Applications on page 57
• High Availability on page 59
• Class of Service on page 67
• Security on page 68
• Network Management on page 73
• Performance and Scale on page 77
Compute
In the compute area, you need to select the physical and virtual components that will
host your business-critical applications, network management, and security services.
This includes careful selection of VMs, servers, hypervisor switches, and blade switches.
Virtual Machines
A virtual machine (VM) is a virtual computer that is made up of a host operating system
and applications. A hypervisor is software that runs on a physical server, emulating
physical hardware for VMs. The VM operates on the emulated hardware of the hypervisor.
The VM believes that it is running on dedicated, physical hardware. This layer of
abstraction enables the benefit of presentation to the operating system; regardless of
changes to the hardware, the operating system sees the same set of logical hardware.
This enables operators to make changes to the physical environment without causing
issues on the servers hosted in the virtual environment, as seen in Figure 11 on page 42.
Virtualization also enables flexibility that is not possible on physical servers. Operating
systems can be migrated from one set of physical hardware to another with very little
effort. Complete environments, to include the operating system and installed applications,
can be cloned in a virtual environment, enabling complete backups of the environment
or, in some cases, you can clone or recreate identical servers on different physical hardware
for redundancy or mobility purposes. These clones can be activated upon primary VM
failure and enable an easy level of redundancy to exist at the data center application
layer. An extension to the benefit of cloning is that new operating systems can be created
from these clones very quickly, enabling faster service rollouts and faster time to revenue
for new services.
Servers
The server in the virtualized IT data center is simply the physical compute resource that
hosts the VMs. The server offers processing power, storage, memory, and I/O services
to the VMs. The hypervisor is installed directly on top of the servers without any sort of
host operating system, becoming a bare-metal operating system that provides a
framework for virtualization in the data center.
Because the server hosts the revenue generating portion of the data center (the VMs
and resident applications), redundancy is essential at this layer. A virtualized IT data
center server must support full hardware redundancy, management redundancy, the
ability to upgrade software while the server is in service, hot swapping of power supplies,
cooling, and other components, and the ability to combine multiple server or blade chassis
into a single, logical management plane.
The server chassis must be able to provide transport between the physical hardware and
virtual components, connect to hosts through 10-Gigabit Ethernet ports, use 10-Gigabit
Ethernet or 40-Gigabit Ethernet interfaces to access the POD, consolidate storage, data,
and management functions, provide class of service, reduce the need for physical cables,
and provide active/active forwarding.
As seen in Figure 12 on page 43, this solution includes 40-Gigabit Ethernet connections
between QFabric system redundant server Node groups and IBM Flex servers that host
up to 14 blade servers. Other supported connection types include 10-Gigabit Ethernet
oversubscribed ports and 10-Gigabit Ethernet pass-through ports. The solution also has
two built-in switches per Flex server and uses MC-LAG to keep traffic flowing through
the data center.
Hypervisor Switching
The hypervisor switch is the first hop from the application servers in the MetaFabric 1.0
architecture. Virtual machines connect to a distributed virtual switch (dvSwitch) which
is responsible for mapping a set of physical network cards (pNICs) across a set of physical
hosts into a single logical switch that can be centrally managed by a virtualization
orchestration tool such as VMware vCenter (Figure 13 on page 43). The dvSwitch enables
intra-VM traffic on the same switching domain to pass between the VMs locally without
leaving the blade server or virtual environment. The dvSwitch also acts like a Virtual
Chassis, connects multiple ESXi hosts simultaneously, and offers port group functionality
(similar to a VLAN) to provide access between VMs.
The hypervisor switch is a critical piece of the MetaFabric 1.0 architecture. As such, it
should support functions that enable class of service and SLA attainment. Support for
IEEE 802.1p is required to support class of service. Support for link aggregation of parallel
links (IEEE 802.3ad) is also required to ensure redundant connection of VMs. As in the
other switching roles, support for SLA attainment is also a necessity at this layer. The
hypervisor switch should support SNMPv3, flow accounting and statistics, remote port
mirroring, and centralized management and reporting to ensure that SLAs can be
measured and verified.
To complete the configuration for the hypervisor switch, provide class of service on flows
for IP storage, vMotion, management, fault tolerance, and VM traffic. As shown in
Figure 14 on page 44, this solution implements the following allocations for network
input/output (I/O) control shares: IP storage (33.3 percent), vMotion (33.3 percent),
management (8.3 percent), fault tolerance (8.3 percent), and VM traffic (16.6 percent).
These categories have been maximized for server-level traffic.
Blade Switching
The virtualized IT data center features virtual appliances that are often hosted on blade
servers, or servers that support multiple interchangeable processing blades that give the
blade server the ability to host large numbers of VMs. The blade server includes power
and cooling modules as well as input/output (I/O) modules that enable Ethernet
connection into the blade server (Figure 15 on page 45). Blade switching is performed
between the physical Ethernet port on the I/O module and the internal Ethernet port on
the blade. In some blade servers, a 1:1 subscription model (one physical port connects to
one blade) is used (this is called pass-thru switching), with one external Ethernet port
connecting directly to a specific blade via an internal Ethernet port. The pass-through
model offers the benefit of allowing full line bandwidth to each blade server without
oversubscription. The downside to this approach is often a lack of flexibility in VM mobility
and provisioning as VLAN interfaces need to be moved on the physical switch and the
blade switch when a move is required.
Another mode of blade switch operation is where the blade switch enables
oversubscription to the blade servers. In this type of blade server, there may be only 4
external ports that connect internally to 12 separate blade servers. This would result in
3:1 oversubscription (three internal ports to every one external port). The benefit to this
mode of operation is that it minimizes the number of connected interfaces and access
switch cabling per blade server, even though the performance of oversubscribed links
and their connected VMs can degrade as a result. While this architecture is designed for
data centers that utilize blade servers, the design works just as well in data centers that
do not utilize blade servers to host VMs.
Table 5 on page 46 shows that both pass-through blade servers and oversubscribed
blade servers are acceptable choices for this solution in your data center network. In
some cases, you might need the faster speed provided by the 40-Gigabit Ethernet
connections to support newer equipment, while in others you would prefer the line-rate
performance offered by a pass-through switch. As a result, all three blade server types
are supported in this design.
To provide support for compute and virtualization in the virtualized IT data center, this
solution uses:
• Configure an IBM Flex System server with multiple ESXi hosts supporting all the VMs
running business-critical applications (SharePoint, Exchange, and MediaWiki).
This design for the compute and virtualization segment of the data center meets the
requirements of this solution for workload mobility and migration for VMs, location
independence for VMs, VM visibility, high availability, fault tolerance, and centralized
virtual switch management.
Network
The network is often the main focus of the data center as it is built to pass traffic to, from,
and between application servers hosted in the data center. Given the criticality of this
architectural role, and the various tiers within the data center switching block, it is further
broken up into access switching, aggregation switching, core switching, edge routing,
and WAN connectivity. Each segment within the data center switching role has unique
design considerations that relate back to business criticality, SLA requirements,
redundancy, and performance. It is within the data center switching architectural roles
that the network must be carefully designed to ensure that your data center equipment
purchases maximize network scale and performance while minimizing costs.
The aggregation switch acts as a multiplexing point between the access and the core of
the data center. The aggregation architectural role serves to combine a large number of
smaller interfaces from the access into high bandwidth trunk ports that can be more
easily consumed by the core switch. Redundancy should be a priority in the design of the
aggregation role as all Layer 2 flows between the data center and the core switch are
combined and forwarded by the data center aggregation switch role. At this layer, a
switching architecture that supports the combination of multiple switches into a single,
logical system with control and forwarding plane redundancy is recommended. This
switching architecture enables redundancy features such as MC-LAG, loop-free redundant
paths, and in-service software upgrades to enable data center administrators to
consistently meet and exceed SLAs.
One recommendation is to combine the access and aggregation layers of your network
by using a QFabric system. Not only does a QFabric system offer a single point of
provisioning, management, and troubleshooting for the network operator, it also collapses
switching tiers for any-to-any connectivity, provides lower latency, and enables all access
devices to be only one hop away from one another, as shown in Figure 16 on page 48.
Figure 16: Juniper Networks QFabric Systems Enable a Flat Data Center
Network
To implement the access and aggregation switching portions of the virtualized IT data
center, this solution uses the QFX3000-M QFabric system. There are two QFabric systems
(POD1 and POD2) in this solution to provide performance and scale. The QFabric PODs
support 768 ports per POD and feature low port-to-port latency, a single point of
management per POD, and lossless Ethernet to support storage traffic. The use of
predefined POD configurations enables the enterprise to more effectively plan data
center rollouts by offering predictable growth and scale in the solution architecture. Key
configuration steps include:
• Configure the QFX3000-M QFabric systems with 3 redundant server Node groups
(RSNGs) connected to 2 IBM Flex System blade servers to deliver application traffic.
• The first IBM Flex System server uses a 40-Gigabit Ethernet converged network
adapter (CNA) connected to a QFabric system RSNG containing QFX3600 Node
devices (RSNG4).
• The second IBM Flex System server has 10-Gigabit Ethernet pass through modules
connected to RSNG2 and RSNG3 on the second QFabric system.
• Connect the EMC VNX storage platform to the QFabric systems for storage access
using iSCSI and NFS.
• Connect the QFabric systems with the EX9214 core switch by way of a network Node
group containing 2 Node devices which use four 24-port LAGs configured as trunk
ports.
• Configure OSPF in the PODs (within the QFabric system network Node group) towards
the EX9214 core switch and place these connections in Area10 as a totally stubby area.
Core Switching
The core switch is often configured as a Layer 3 device that handles routing between
various Layer 2 domains in the data center. A robust implementation of the core switch
in the virtualized IT data center will support both Layer 2 and Layer 3 to enable a full
range of interoperability and service provisioning in a multitenant environment. Much like
in the edge role, the redundancy of core switching is critical as it too is a traffic congestion
point between the customer and the application. A properly designed data center includes
a fully redundant core switch layer that supports a wide range of interfaces (1-Gigabit,
10-Gigabit, 40-Gigabit, and 100-Gigabit Ethernet) with high density. The port density in
the core switching role is a critical factor as the data center core should be designed to
support future expansion without requiring new hardware (beyond line cards and interface
adapters). The core switch role should also support a wide array of SLA statistics
collection, and should be service-aware to support collection of service-chaining statistics.
The general location of the core switching function in this solution is shown in
Figure 17 on page 49.
Table 6 on page 50 shows some of the reasons for choosing an EX9200 switch over an
EX8200 switch to provide core switching capabilities in this solution. The EX9200 switch
provides a significantly larger number of 10-Gigabit Ethernet ports, support for 40-Gigabit
Ethernet ports, ability to host more analyzer sessions, firewall filters, and BFD connections,
and critical support for in-service software upgrade (ISSU) and MC-LAG. These reasons
make the EX9200 switch the superior choice in this solution.
Table 6: Core Switch Hardware - Comparison of the EX9200 and EX8200 Switches
Solution Requirement EX8200 EX9200 Delta
40G No Yes
MC-LAG No Yes
Table 7 on page 50 shows some of the reasons for choosing MC-LAG as the forwarding
technology over Virtual Chassis in this solution. MC-LAG provides dual control planes, a
non-disruptive implementation, support for LACP, state replication across peers, and
support for ISSU without requiring dual Routing Engines.
Control Planes 1 2
Maximum Chassis 2 2
ISSU No Yes
To implement the core switching portion of the virtualized IT data center, this solution
uses two EX9214 switches with the following capabilities and configuration:
• Key features—240 Gbps line rate per slot for 10-Gigabit Ethernet, support for 40-Gigabit
Ethernet ports, 64 analyzer sessions, scalable to 256,000 firewall filters, and support
for bidirectional forwarding detection (BFD), in-service software upgrade (ISSU), and
MC-LAG groups.
• Configure Layer 2 MC-LAG active/active on the EX9214 towards the QFabric PODs,
the F5 load balancer, and the MX240 edge router (by way of the redundant Ethernet
link provided by the SRX3600 edge firewall) to provide path redundancy.
• Configure IRB and VRRP for all MC-LAG links for high availability.
• Configure IRB on the EX9214 and the QFabric PODs to terminate the Layer 2/Layer
3 boundary.
• Configure a static route on the core switches to direct traffic from the Internet to the
load balancers.
• o Configure OSPF to advertise a default route to the totally stubby areas in the
QFabric PODs. Each QFabric POD has its own OSPF area. Also, configure the EX9214
core switches as area border routers (ABRs) that connect all three OSPF areas, and
designate backbone area 0 over aggregated link ae20 between the two core switches
The edge is the point in the network that aggregates all customer and Internet connections
into and out of the data center. Although high availability and redundancy are important
considerations throughout the data center, it is at the edge that they are the most vital;
the edge serves as a choke point for all data center traffic and a loss at this layer renders
the data center out of service. At the edge, full hardware redundancy should be
implemented using platforms that support control plane and forwarding plane
redundancy, link aggregation, MC-LAG, redundant uplinks, and the ability to upgrade the
software and platform while the data center is in service. This architectural role should
support a full range of protocols to ensure that the data center can support any
interconnect type that may be offered. Edge routers in the data center require support
for IPv4 and IPv6, as well as ISO and MPLS protocols. As the data center might be
multi-tenant, the widest array of routing protocols should also be supported, to include
static routing, RIP, OSPF, OSPF-TE, OSPFv3, IS-IS, and BGP. With large scale multi-tenant
environments in mind, it is important to support Virtual Private LAN Service (VPLS)
through the support of bridge domains, overlapping VLAN IDs, integrated routing and
bridging (IRB), and IEEE 802.1Q (QinQ). The edge should support a complete set of MPLS
VPNs, including L3VPN, L2VPN (RFC 4905 and RFC 6624, or Martini and Kompella drafts,
respectively), and VPLS.
Network Address Translation (NAT) is another factor to consider when designing the
data center edge. It is likely that multiple customers serviced by the data center will have
overlapping private network address schemes. In environments where direct Internet
access to the data center is enabled, NAT is required to translate routable, public IP
addresses to the private IP addressing used in the data center. The edge must support
Basic NAT 44, NAPT44, NAPT66, Twice NAT44, and NAPT-PT.
Finally, as the edge is the ingress and egress point of the data center, the implementation
should support robust data collection to enable administrators to verify and prove strict
service-level agreements (SLAs) with their customers. The edge layer should support
collection of average traffic flows and statistics, and at a minimum should support the
ability to report exact traffic statistics to include the exact number of bytes and packets
that were received, transmitted, queued, lost, or dropped, per application.
Figure 19 on page 53 shows the location of the edge routing function in this solution.
WAN
The WAN role provides transport between end users, enterprise remote sites, and the
data center. There are several different WAN topologies that can be used, depending on
the business requirements of the data center. A data center can simply connect directly
to the Internet, utilizing simple IP-based access directly to servers in the data center, or
a secure tunneled approach using generic routing encapsulation (GRE) or IP Security
(IPsec). Many data centers serve a wide base of customers and favor Multiprotocol Label
Switching (MPLS) interconnection via the service provider’s managed MPLS network,
allowing customers to connect directly into the data center via the carrier’s MPLS
backbone. Another approach to the WAN is to enable direct peering between customers
and the data center; this approach enables customers to bypass transit peering links by
establishing a direct connection (for example, via private leased line) into the data center.
Depending on the requirements of the business and the performance requirements of
the data center hosted applications, the choice of WAN interconnection offers the first
choice in determining performance and security of the data center applications. Choosing
a private peering or MPLS interconnect offers improved security and performance at a
higher expense. In cases where the hosted applications are not as sensitive to security
and performance, or where application protocols offer built-in security, a simple Internet
connected data center can offer an appropriate level of security and performance at a
lower cost.
To implement the edge routing and WAN portions of the virtualized IT data center, this
solution uses MX240 Universal Edge routers. Because the MX240 router offers dual
Routing Engines and ISSU at a reasonable price point, it is the preferred option over the
smaller MX80 router. The key connection and configuration steps are:
• Connect the MX240 edge routers to the service provider networks to provide Internet
access to the data center.
• Configure the two edge routers to be EBGP peers with 2 service providers to provide
redundant Internet connections.
• Configure IBGP between the 2 edge routers and applying a next-hop self export policy.
• Configure BGP local preference on the primary service provider to offer a preferred exit
point to the Internet.
• Export a dynamic, condition-based, default route to the Internet into OSPF on both
edge routers toward the edge firewalls and core switches to provide Internet access
for the virtualized IT data center devices (Figure 20 on page 54).
• Enable Network Address Translation (NAT) to convert private IP addresses into public
IP addresses.
This design for the network segment of the data center meets the requirements of this
solution for 1-Gigabit, 10-Gigabit, and 40-Gigabit Ethernet ports, converged data and
storage, load balancing, quality of experience, network segmentation, traffic isolation
and separation, and time synchronization.
Storage
The storage role of the MetaFabric 1.0 architecture is to provide centralized file and block
data storage so that all hosts inside of the data center can access it. The data storage
can be local to a VM, such as a database that resides within a hosted application, or
shared, such as a MySQL database that can reside on a storage array to serve multiple
different applications. The MetaFabric 1.0 architecture requires the use of shared storage
to enable compute virtualization and VM mobility.
One of the key goals of the virtualized IT data center is to converge both data and storage
onto the same network infrastructure to reduce the overall cost and make operations
and troubleshooting easier. There are several different options when converging storage
traffic: FCoE, NFS, and iSCSI. One of the most recent trends in building a green-field data
center is to use IP storage and intentionally choose not to integrate legacy Fibre Channel
networks. Additionally, because iSCSI has better performance, lower read-write response
times, lower cost, and full application support, iSCSI offers the better storage network
choice over NFS. Additionally, storage traffic is very latency and drop sensitive, so it is
critical that the network infrastructure provide a lossless Ethernet service to correctly
prioritize all storage traffic. As a result, this solution uses both iSCSI and NAS for storage,
and provides a lossless Ethernet service to guarantee the delivery of storage traffic.
Table 8 on page 55 shows a comparison of FCoE, NFS, and iSCSI. Because NFS and iSCSI
meet the same requirements provided by FCoE, plus the ability to scale to 10-Gigabit
Ethernet and beyond, the NFS and iSCSI storage protocols are the preferred choice for
the MetaFabric 1.0 solution.
Figure 21 on page 56 shows the path of storage traffic as it travels through the data center
and highlights the benefit of priority queuing to provide lossless Ethernet transport for
storage traffic. By configuring Priority Flow Control (PFC), the storage device can monitor
storage traffic in the storage VLAN and notify the server when traffic congestion occurs.
The server can pause sending additional storage traffic until after the storage device has
cleared the congested receive buffers. However, other queues are not affected and
uncongested traffic continues flowing without interruption.
9. The QFabric system transmits the traffic to the servers and VMs.
To implement the storage portion of the virtualized IT data center, this solution uses EMC
VNX5500 unified storage with a single storage array. This storage is connected to the
QFabric PODs, which in turn connect to the servers and VMs, as seen in
Figure 22 on page 57. The design assumes that the data center architect wishes to save
on cost initially by sharing a single storage array with multiple QFabric PODs. However,
the design can evolve to allocating one storage array per one QFabric POD, as usage and
demand warrant such expansion.
This solution also implements Data Center Bridging (DCB) to enable full support of
storage traffic. Within DCB, support for priority-based flow control (PFC), enhanced
transmission selection (ETS), and Data Center Bridging Capability Exchange (DCBX)
enables storage traffic to pass properly between all servers and storage devices within
a data center segment and to deliver a lossless Ethernet environment.
This design for the storage segment of the virtualized IT data center meets the
requirements of this solution for scale, lossless Ethernet, the ability to boot from shared
storage, and support for multiple protocol storage.
Applications
Applications in the virtualized IT data center are built as Virtual Machines (VMs) and are
hosted on servers, or physical compute resources that reside on the blade server. This
design for applications meets the requirements of this solution for business-critical
applications and high performance.
The MetaFabric 1.0 solution supports a complete software stack that covers four major
application categories: compute management, network management, network services,
and business-critical applications (Figure 23 on page 58). These applications run on top
of IBM servers and VMware vSphere 5.1.
Compute Management
VMware vCenter is a virtualization management platform that offers centralized control
and visibility into compute, storage, and networking resources. Data center operators
use the de facto, industry-standard vCenter on a daily basis to manage and provision
VMs. VMware vCloud Director allows the data center manager to create an in-house
cloud service and partition the virtualization environment into segments that can be
administered by separate business units or administrative entities. The pool of resources
can now be partitioned into virtual data centers which can offer their own independent
virtualization services. Use of vCenter and vCloud Director offers the first element of
software application support for the MetaFabric 1.0 solution.
Network Management
The MetaFabric 1.0 solution uses Junos Space Management Applications to provide
network provisioning, orchestration, and inventory management. The applications include
Network Director for management of wired and wireless data center networks, and
Security Director for security policy administration.
Network Services
Network load balancing is a common network service. There are two methods to provide
network load balancing: virtual and hardware-based. The virtual load balancer operates
in the hypervisor as a VM. One of the benefits of a virtual load balancer is rapid provisioning
of additional load-balancing power. Another benefit is that the administration of the
virtual load balancer can be delegated to another administrative entity without impacting
other applications and traffic.
However the drawback to a virtual load balancer is that the performance is limited to
the number of compute resources that are available. Hardware load balancers offer
much more performance in traffic throughput and SSL encryption and decryption with
dedicated security hardware.
The MetaFabric 1.0 solution uses the local traffic manager (LTM) from F5 Networks.
Business-Critical Applications
Software applications are made of multiple server tiers; the most common are Web,
application, and database servers. Each server has its own discrete set of responsibilities.
The Web tier handles the interaction with the users and the application. The application
tier handles all of the application logic and programming. The database tier handles all
of the data storage and application inventory.
The following software applications were tested as part of the MetaFabric 1.0 solution:
• Microsoft SharePoint
The SharePoint application requires three tiers: Web, application, and database. The
Web tier uses Microsoft IIS to handle Web tracking and interaction with end users. The
application tier uses Microsoft SharePoint and Active Directory to provide the file
sharing and content management software. Finally, the database tier uses Microsoft
SQL Server to store and organize the application data.
• Microsoft Exchange
The Exchange application requires two tiers: a Web tier, and a second tier that combines
the application and the database into a single tier.
• MediaWiki Application
The MediaWiki application requires two tiers: a combined Web and application tier,
and a database tier. Apache httpd is combined with the hypertext preprocessor (PHP)
to render and present the application, while the data is stored on the database tier
with MySQL.
High Availability
This design meets the high availability requirements of hardware redundancy and software
redundancy.
Hardware Redundancy
To provide hardware redundancy in the virtualized IT data center, this solution uses:
• Redundant server hardware—Two IBM 3750 standalone servers and two IBM Pure Flex
System Chassis
Software Redundancy
To provide software redundancy in the virtualized IT data center, this solution uses:
• In-service software upgrade(for the core switches and edge routers)—Enables the
network operating system to be upgraded without downtime.
• Nonstop active routing—Keeps the Layer 3 protocol state synchronized between the
master and backup Routing Engines.
• Nonstop bridging—Keeps the Layer 2 protocol state synchronized between the master
and backup Routing Engines.
To allow all the links to forward traffic without using Spanning Tree Protocol (STP), you
can configure MC-LAG on edge routers and core switches. The edge routers use MC-LAG
toward the edge firewalls, and the core switches use MC-LAG toward each QFabric POD,
application load balancer (F5), and out-of-band (OOB) management switch.
For this solution, MC-LAG Active/Active is preferred because it provides link-level and
node-level protection for Layer 2 networks and Layer 2/Layer 3 combined hybrid
environments.
• Both core switches have active aggregated Ethernet member interfaces and forward
the traffic. If one of the core switches fails, the other core switch will forward the traffic.
Traffic is load balanced by default, so link-level efficiency is 100 percent.
• The Active/Active method has faster convergence than the Active/Standby method.
Fast convergence occurs because information is exchanged between the routers during
operations. After a failure, the remaining operational core switch does not need to
relearn any routes and continues to forward the traffic.
• Routing protocols (such as OSPF) can be used over MC-LAG/IRB interfaces for Layer
3 termination.
• If you configure Layer 3 protocols in the core, you can use an integrated routing and
bridging (IRB) interface to offer a hybrid Layer 2 and Layer 3 environment at the core
switch.
1. ICCP
• ICCP is a control plane protocol for MC-LAG. It uses TCP as a transport protocol
and Bidirectional Forwarding Detection (BFD) for fast convergence. When you
configure ICCP, you must also configure BFD.
• ICCP synchronizes configurations and operational states between the two MC-LAG
peers.
• ICCP also synchronizes MAC address and ARP entries learned from one MC-LAG
node and shares them with the other peer.
• • Peering with the ICCP peer loopback IP address is recommended to avoid any
direct link failure between MC-LAG peers. As long as the logical connection between
the peers remains up, ICCP stays up.
• Although you can configure ICCP on either a single link or an aggregated bundle
link, an aggregated Ethernet LAG is preferred.
• • You can also configure ICCP and ICL links on a single aggregated Ethernet bundle
under multiple logical interfaces using flexible VLAN tagging supported on MX Series
platforms.
2. ICL-PL
• ICL is a special layer 2 link for Active-Active only between the MC-LAG peers
• If the traffic receiver is single homed to one of the MC-LAG nodes (N1), ICL is used
to forward the packets received by way of the MC-LAG interface to the other MC-LAG
nodes (N2).
Redundancy group ID—ICCP uses a redundancy group to associate multiple chassis that
perform similar redundancy functions. A redundancy group establishes a communication
channel so that applications on ICCP peers can reach each other. A redundancy group
ID is similar to a mesh group identifier.
Service ID—A new service ID object for bridge domains overrides any global switch options
configuration for the bridge domain. The service ID is unique across the entire network
for a given service to allow correct synchronization. For example, a service ID synchronizes
applications like IGMP, ARP, and MAC address learning for a given service across the core
switches. (Note: Both MC-LAG peers must share the same service ID for a given bridge
domain.)
MC-LAG Active/Active is a Layer 2 logical link. IRB interfaces are used to create integrated
Layer 2 and Layer 3 links. As a result, you have two design options when assigning IP
addresses across MC-LAG peers:
• Option 1: VRRP MC-LAG Active/Active provides common virtual IP and MAC addresses
and unique physical IP and MAC addresses. Both address types are needed if you
configure routing protocols on MC-LAG Active/Active interfaces. The VRRP data
forwarding logic has been modified in Junos OS if you configure both MC-LAG
Active/Active and VRRP. When configured simultaneously, both the MC-LAG and VRRP
peers forward traffic and load-balance the traffic between them, as shown in
Figure 25 on page 63.
Data packets received by the backup VRRP peer on the MC-LAG member link are
forwarded to the core link without sending them to the master VRRP peer.
• You configure the same IP address on the IRB interfaces of both node.
• • The peer with the higher IRB MAC address learns the peer’s MAC address through
ICCP and installs the peer MAC address as its own MAC address.
We recommend Option 1 as the preferred method for the MetaFabric 1.0 solution for the
following reasons:
• The solution requires OSPF as the routing protocol between the QFabric PODs and
the core switches on the MC-LAG IRB interfaces and only Option 1 supports routing
protocols.
• Layer 3 extends to the QFabric PODs for some VLANs for hybrid Layer 2/Layer 3
connectivity to the core.
As shown in Figure 27 on page 64, the following forwarding rules apply to MC-LAG
Active/Active:
• Traffic received on N1 from MCAE1 could be flooded to the ICL link to reach N2. When
it reaches N2, it must not be flooded back to MCAE1.
• Traffic received on SH1 could be flooded to MCAE1 and ICL by way of N1. When N2
receives SH1 traffic across the ICL link, it must not be again flooded to MCAE1. N2 also
receives the SH1 traffic by way of the MC-AE link.
• When receiving a packet from the ICL link, the MC-LAG peers forward the traffic to
all local SH links. If the corresponding MCAE link on the peer is down, the receiving
peer also forwards the traffic to its MCAE links.
NOTE: ICCP is used to signal MCAE link state between the peers.
• When N2 receives traffic from the ICL link and the N1 core link is up, the traffic should
not be forwarded to the N2 core link.
Here are the actions that happen when the ICCP link is down and the ICL link is up:
• By default, if the ICCP link fails, as shown in Figure 28 on page 65, the peer defaults to
its own local LACP system ID and the links for only one peer (whichever one negotiates
with the customer edge [CE] router first) are attached to the bundle. Until LACP
converges with a new system ID, there will be minimum traffic impact.
• One peer stays active, while the other enters standby mode (but this is
nondeterministic).
• The access switch selects a core switch and establishes LACP peering.
• With the force-icl-down statement, the ICL link shuts down when the ICCP link fails.
• By configuring these statements, traffic impact is minimized during an ICCP link failure.
Here are the actions that happen when the ICCP link is up and the ICL link is down:
• This configuration ensures a loop-free topology because it does not forward duplicate
packets in the Layer 2 network.
Active MC-LAG node down with ICCP loopback peering with prefer-status-control-active
on both peers:
Here are the actions that happen when both MC-LAG peers are configured with the
prefer-status-control-active statement and the active peer goes down:
• When you configure MC-LAG Active/Active between SW1/SW2 and the QFabric POD,
SW1 becomes active and SW2 becomes standby. During an ICCP failure event, if SW1
has the prefer-status-control active statement and it fails, SW2 is not aware of the
ICCP or SW1 failures. As a result, SW2 mcae-id switches to the default LACP system
ID, which causes the MC-LAG link to go down and up, and results in long traffic
reconvergence times.
• Configure backup-liveness-detection on both the active and standby peers. BFD helps
to detect peer failures and enable sub-second reconvergence.
The design for high availability in the MetaFabric 1.0 solution meets the requirements for
hardware redundancy and software redundancy.
Class of Service
Key design elements for class of service in this solution include network control (OSPF,
BGP, and BFD), virtualization control (high availability, fault tolerance), storage (iSCSI
and NAS), business-critical applications (Exchange, SharePoint, MediaWiki, and vMotion)
and best-effort traffic. As seen in Figure 31 on page 67, incoming packets are sorted,
assigned to queues based on traffic type, and transmitted based on the importance of
the traffic. For example, iSCSI lossless Ethernet traffic has the largest queue and highest
priority, followed by critical traffic (fault tolerance and high availability), business-critical
application traffic (including vMotion), and bulk best-effort traffic with the lowest priority.
As seen in Figure 32 on page 68, the following percentages are allocated for class of
service in this solution: network control (5 percent), virtualization control (5 percent),
storage (60 percent), business-critical applications (25 percent) and best-effort traffic
(5 percent). These categories have been maximized for network-level traffic, as the
network supports multiple servers and switches. As a result, storage traffic and application
traffic are the most critical traffic types in the network, and these allocations have been
verified by our testing.
To provide class of service in the virtualized IT data center and meet the design
requirements, this solution uses:
Security
Security is a vital component of any network architecture and the virtualized IT data
center is no exception. There are various areas within the data center where security is
essential. At the perimeter, security is focused on securing the edge of the data center
from external threats and with providing a secure gateway to the Internet. Remote access
is another area where security is vital in the data center. Operators will often require
remote access to the data center to perform maintenance or new service activations.
This remote access must be secured and monitored to ensure that only authorized users
are permitted access. Robust authentication, authorization and accounting (AAA)
mechanisms should be in place to ensure that only authorized operators are allowed.
Given that the data center is a cost and revenue center that can house the critical data
and applications of many different enterprises, multi-factor authentication is an absolute
necessity to properly secure remote access.
Software application security in the virtualized IT data center is security that is provided
between VMs. A great deal of inter-VM communication occurs in the data center and
controlling this interactivity is a crucial security concern. If a server is supposed to access
a database residing on another server, or on a storage array, a virtual security appliance
should be configured to limit the communication between those resources to allow only
those protocols that are necessary for operation. Limiting the communication between
resources prevents security breaches in the data center and might be a requirement
depending on the regulatory requirements of the hosted applications (HIPPA, for instance,
can dictate security safeguards that must exist between patient and business data). As
discussed in the Virtual Machine section, security in the virtual network, or between VMs,
differs from security that can be implemented on a physical network. In a physical network,
a hardware firewall can connect to different subnets, security zones, or servers and
provide security between those devices (Figure 33 on page 69). In the virtual network,
the physical firewall does not have the ability to see traffic between the VMs. In these
cases, a virtual hypervisor security appliance should be installed to enable security
between VMs.
Application Security
When securing VMs, you need a comprehensive virtualization security solution that
implements hypervisor security with full introspection; includes a high-performance,
hypervisor-based stateful firewall; uses an integrated intrusion detection system (IDS);
provides virtualization-specific antivirus protection; and offers unrivaled scalability for
managing multitenant cloud data center security. The Juniper Networks Firefly Host
(formerly vGW) offers all these features and enables the operator to monitor software,
patches, and files installed on a VM from a central location. Firefly Host is designed to
be centrally managed from a single-pane view, giving administrators a comprehensive
view of virtual network security and VM inventory.
Table 9 on page 70 shows the relative merits of three application security design options:
vSRX, SRX, and Firefly Host. Because other choices lack intrusion detection and prevention,
quarantine capabilities, and mission-critical line-rate performance and scalability, Firefly
Host is the preferred choice for this solution. Additionally, Firefly Host is integrated into
all VMs and provides every endpoint with its own virtual firewall.
Quarantine No No Yes
To provide application security in the virtualized IT data center, this solution uses the
Juniper Networks Firefly Host to provide VM-to-VM application security. Firefly Host
integrates with VMware vCenter for comprehensive VM security and management.
In Figure 34 on page 70, the following sequence occurs for VM-to-VM traffic:
7. The traffic matches the security policy and permits the traffic to continue to the
destination.
Perimeter Security
Edge firewalls handle security functions such as Network Address Translation (NAT),
intrusion detection and prevention (IDP), security policy enforcement, and virtual private
network (VPN) services. As shown in Figure 35 on page 71, there are four locations where
you could provide security services for the physical devices in your data center:
This solution implements option 3, which uses a stateful firewall to protect traffic flows
travelling between the edge routers and core switches. Anything below the POD level is
protected by the Firefly Host application.
To provide perimeter security in the virtualized IT data center, this solution uses the
SRX3600 Services Gateway as an edge firewall. This firewall offers up to 55-Gbps of
firewall performance, which can easily support the VM traffic generated by this solution.
The key configuration tasks include:
• Place redundant Ethernet group reth1 (configured toward the edge routers) in the
non-trust zone.
• Place reth0 (configured toward the core switches) in the trust zone.
• Configure a security policy for traffic coming from the non-trust zone to allow only
access to data center applications.
• Configure Source Network Address Translation (SNAT) for Internet access to the
application servers (private address) to provide Internet access.
• Configure Destination Network Address Translation (DNAT) for remote access to the
data center by translating the Pulse gateway internal IP address to an
Internet-accessible IP address.
The secure remote access application must be accessible through the Internet; capable
of providing encryption, RBAC, and two-factor authentication services; able to access a
virtualized environment; and scale to 10,000 users.
Table 10 on page 72 shows a comparison of the MAG gateway and the Junos Pulse
gateway options. For this solution, the Junos Pulse gateway is superior because it offers
all the capabilities of the MAG gateway as well as being a virtualized application.
Virtualized No Yes
To provide secure remote access to and from the virtualized IT data center, this solution
uses the Juniper Networks SA Series SSL VPN Appliances as remote access systems and
the Junos Pulse gateway.
As shown in Figure 36 on page 73, the remote access flow in the virtualized IT data center
happens as follows:
This design for security in the MetaFabric 1.0 solution meets the requirements for perimeter
security, application security, and secure remote access.
Network Management
It is the combination of these tiers that provides complete orchestration in the data center
and enables operators to turn up new services quickly, and change or troubleshoot existing
services using a single-pane view of the data center. The user interface is responsible for
interacting with the data center operator. This is the interface from which the data center
single-pane view is presented. From the user interface, an operator can view, modify,
delete, or add network elements and new services. The user interface acts as a single
role-based access control (RBAC) policy enforcement point, allowing an operator
seamless access to all authorized devices while protecting other resources from
unapproved access. The application programming interface (API) enables single-pane
management by providing a common interface and language to other applications,
support tools, and devices in the data center network (REST API is an example commonly
used in network management). The API enables the single-pane view by abstracting all
support elements and presenting them through a single network management interface
– the user interface.
The network management platform should have the capability to support specialized
applications. Applications in the network management space are specifically designed
to solve a specific problem in the management of the data center environment. A single
application on the network management platform can be responsible for configuring
and monitoring the security elements in the data center, while another application is
designed to manage the physical and virtual switching components in the data center.
Again, the abstraction of all of these applications into a single-pane view is essential to
data center operations to ensure simplicity and a common management point in the
data center.
The next tier of data center network management is the global network view. Simply
put, this is the tier where complete view of the data center and its resources can be
assembled and viewed. This layer should support topology discovery, the automatic
discovery of not only devices, but how those devices are interconnected to one another.
The global network view should also support path computation (the link distance between
network elements as well as the set of established paths between those network
elements). The resource virtualization tier of network management enables management
of the various endpoints in the data center and acts as an abstraction layer that allows
the operator to manage endpoints that require different protocols such as OpenFlow or
Device Management Interface (DMI).
The common data services tier of network management enables the various applications
and interfaces on the network management system to share relevant information between
the layers. An application that manages a set of endpoints might require network topology
details in order to map and potentially push changes to those network devices. This
requires that the applications within the network management system share data; this
is enabled by the common data services layer.
Managed devices in the network management role are simply the endpoints that are
managed by the network management system. These devices include physical and virtual
switches, routers, VMs, blade servers, and security appliances, to name a few. The
managed devices and the orchestration of services between those devices is the prime
purpose of the network management system. Network management should be the
answer to the question, ”how does a data center operator easily stand up and maintain
services within the data center?” The network management system orchestrates the
implementation and operation of the managed devices in the data center
Finally, integration adapters are required within a complete network management system.
As every device in the data center might not be manageable by a single network
management system, other appliances or services might be required to manage the
entire data center. The integration and coordination of these various network management
tools is the purpose of this layer. Some data center elements such as Virtual Machines
might require VMware ESXi server to manage the VMs and hypervisor switch, while
another network management appliance monitors environmental and performance
conditions on the host server. A third system might be responsible for configuring and
monitoring the network connections between the blade servers and the rest of the data
center. Integration adapters enable each of these components to talk to one another
and, in many cases, allow a single network management system to control the entire
network management footprint from a single pane of glass.
Out-of-Band Management
The requirements for out-of-band management include:
• Administration of the compute, network, and storage segments of the data center.
• Separation of the control plane from the data plane so the management network
remains accessible.
Some of the key elements of this design are seen in Figure 38 on page 76.
To provide out-of-band management in the virtualized IT data center, this solution uses
two pairs of EX4300 switches configured as a Virtual Chassis (Figure 39 on page 77).
The key connection and configuration steps include:
• • Connect all OOB network devices to the EX4300 Virtual Chassis (100-Megabit Fast
Ethernet and 1-Gigabit Ethernet).
• Configure the EX4300 Virtual Chassis OOB management system in OSPF area 2.
• Connect the 2 IBM 3750 standalone servers that host the management VMs (vCenter,
Junos Space, Network Director 1.5, domain controller, and Junos Pulse gateway) to the
EX4300 Virtual Chassis.
• • Create four VLANs to separate storage, compute, network, and management traffic
from each other.
• Manage and monitor the VMs on the test bed using VMware vSphere and Network
Director 1.5.
Network Director
To provide network configuration and provisioning in the virtualized IT data center, this
solution uses Juniper Networks Network Director. Network Director 1.5 is used to manage
network configuration, provisioning, and monitoring
Security Director
To provide security policy configuration in the virtualized IT data center, this solution uses
Juniper Networks Security Director. Security Director is used to manage security policy
configuration and provisioning.
This design meets the network management requirements of managing both virtual and
physical components within the data center and handling the FCAPS considerations.
• The solution must support 20,000 virtual machines and scale up to 2,000 servers.
• The solution must offer less than 3μ latency between servers and 21μ latency between
PODs
• Overview on page 79
• Key Characteristics of Implementation on page 80
Overview
Topology
The MetaFabric 1.0 solution was verified in the Juniper Networks solution validation labs
using the following set of design elements and features:
• Transport: MC-LAG Active/Active with VRRP and SRX JSRP, IRB and VLAN, Lossless
Ethernet
• OOB: EX4300-VC
• Scale and performance: SharePoint, Exchange, Wikimedia scale with Shenick Tester
• POD1 and POD2 are configured with the QFX3000-M QFabric system as an access
and aggregation switch.
• Three redundant server node groups (RSNG) connected to two IBM Flex blade servers
• IBM Flex-1 has 40-Gigabit CNA connected to an RSNG with QFX3600 nodes (RSNG4)
• EMC VNX storage is connected to the QFabric for storage access though iSCSI and
NFS
• QFX3000-M QFabric system is also configured with one network node group (NNG)
with two nodes connected to EX9214 core-switch using 4 X 24port link aggregation
groups (LAGs) configured as trunk ports
• IBM IBM-Flex-2 has 10-Gigabit pass-thru modules connected to RSNG2 and RSNG3
• EMC VNX storage is connected to the QFabric for storage access though iSCSI and
NFS
• QFX3000-M QFabric system is also configured with one NNG node-group with two
nodes connected to EX9214 core-switch using 4 X 32 port LAGs configured as trunk
ports
• IRB is configured on EX9214 and QFabric and QFX-VC to terminate the Layer 2/Layer
3 boundary
• Static route is configured to core-switch the traffic to load balancer (LB) from Internet
• OSPF is configured to send only default routes to the NSSA areas toward POD1 and
POD2
• Security policy is configured for traffic from untrust zone to allow only access to DC
applications
• D-NAT is configured for remote access to the data center for Pulse gateway internal
IP address to Internet accessible IP address
• MX240 pair configured as edge routers connected to service provider network for DC
Internet access
• Both the edge-r1 and edge-r2 EBGP peering with SP1 and SP2
• IBGP is configured between edge-r1 and edge-r2 with next-hop self export policy
• Condition based (based on Internet route) default route injection into OSPF is configured
on both the edge-r1 and edge-r2 toward the firewall/core-switches for Internet access
to VDC devices
• o IBM Flex server is configured with multiple ESXi hosts hosting all the VMs running
the business-critical applications (SharePoint, Exchange, Media-wiki, and WWW)
• All the network device OOB connections are plugged into the EX4200-VC (100 m and
1 Gbps)
• Two X IBM 3750 standalone servers are connected to the EX4300-VC hosting all the
management VMs (vCenter, Junos Space, ND1.5, domain controller, and Pulse gateway)
• VMware vSphere/Network Director 1.5 used to orchestrate the VMs on the test bed
QFX3000-M QFabric system 13.1X50-D15 VLANS, LAG, NNG, RSNG, RVI, OSPF
F5 VIPRION 4480 10.2.5 Build 591.0 DSRM load balancing (direct server return mode)
IBM Flex VMware ESXi 5.1 Compute nodes 10-Gigabit and 40-Gigabit CNA and
10-Gigabit pass-thru
EMC-VNX 7.1.56-5
Table 14: Networks and VLANs Deployed in the Test Lab (continued)
Network Subnets Network Gateway VLAN-ID Vlan-Name
LAN client VM and Traffic 10.94.127.192/27 10.94.127.222 Address space for VMs on the
Generator address external network for simulation of
client traffic
Applications tested as part of the solution were configured with address space shown
in Table 15 on page 86:
Multi-chassis LAG is used in the solution between core and aggregation or access to
enable always-up, loop-free, and load-balanced traffic between the switching roles in
the data center (Figure 41 on page 87).
The physical and logical configuration of the core-to-POD roles in the data center are
shown in Figure 42 on page 89. The connectivity between these layers features 24-link
AE bundles (4 per pod with a total of 96 AE member interfaces between each POD and
the core). The local topology within each data center role are detailed in later sections
of this guide.
The configuration of integrated routing and bridging (IRB) interfaces within this segment
of the data center is outlined in Table 17 on page 90.
Network Configuration
Overview
Configuration of the solution starts with the configuration of the perimeter security;
integration between the edge, perimeter and the data center core; and then continues
with configuration of the access and aggregation roles in the data center (in this solution,
those roles are collapsed into the QFabric POD). Finally, the network must be configured
in the virtual switching role.
Configuring the Network Between the Data Center Edge and the Data Center Core
This configuration includes elements of high availability as the configuration and operation
of the solution are heavily reliant on the use of Juniper Networks Virtual Chassis and
employ multi-chassis link aggregation (MC-LAG) between each data center operational
role.
SRX chassis clustering provides high availability and redundancy by grouping two SRX
Series services gateways (must be the same model) into a cluster. The cluster consists
of a primary node and a secondary node. These nodes provide backup for each other in
the event of software, hardware, or network failures. Session state is synchronized
between the nodes in the SRX cluster to ensure that established sessions are maintained
during failover and reversion. The two nodes synchronize configuration, processes, and
services utilizing two Ethernet links: a control link is established to enable control plane
synchronization and a fabric link is established to enable data plane communication
(traversal of network traffic between cluster nodes).
Redundant Ethernet Trunk Group LAGs (RETH interfaces) can be established across
nodes in a chassis cluster (Figure 43 on page 92). Link aggregation allows a redundant
Ethernet interface (known as a RETH interface in the CLI) to add multiple child interfaces
from both nodes of an SRX cluster, creating a single, virtual interface over which upstream
and downstream devices can communicate. This solution features active/standby SRX
cluster configuration: all active links are located on one SRX, and all standby links are on
the other SRX. In an SRX active/backup cluster, LAG member links from the active node
will forward data traffic. Link Aggregation Control Protocol (LACP) is enabled on the
redundant Ethernet interface similar to any aggregated Ethernet (AE) interface configured
in other routers/ or switches. The SRX RETH interface configuration includes all member
interfaces from both the active and backup node.
Topology
The topology used in this section of the configuration is shown in Figure 44 on page 93.
To configure the network between the data center edge and the data center core, follow
these steps:
1. Configure the SRX reth1 interface and members toward VDC-edge-r1 and VDC-edge-r2.
set chassis cluster reth-count 4
2. Configure the MC-LAG bundle (ae1 and ae3) on VDC-edge-r1 toward the SRX .
NOTE: Hold-down timer configured higher than the BFD timer(1 sec) for
better convergence
set interfaces ae0 unit 0 description "ICCP Link between edge-r1 and edge-r2"
set interfaces ae0 unit 0 vlan-id 4000set interfaces ae0 unit 0 vlan-id 4000
set interfaces ae0 unit 0 family inet address 192.168.1.1/30
set interfaces ae0 unit 1 description "ICL Link to edge-r2-vlan-11"
set interfaces ae0 unit 1 encapsulation vlan-bridge
set interfaces ae0 unit 1 vlan-id 11
5. Configure the Inter-Control Center Communications Protocol (ICCP) on vdc-edge-r1.
6. Configure the MC-LAG bundle (ae1 and ae3) on VDC-edge-r2 toward the SRX.
NOTE: Hold-down timer configured higher than the BFD timer (1 sec) to
get improved convergence if prefer-status-control active is configured on
both MC-LAG nodes.
Verification
The following verification commands (with sample output) can be used to confirm that
the transport, clustering, and MC-LAG configuration were successful.
Verification
Purpose The following verification commands (with sample output) can be used to confirm that
the transport, clustering, and MC-LAG configuration were successful.
Results
1. This output shows an active state for the MC-LAG connections on the edge routers
and confirms that the two MC-LAG bundles are operational. In a failure state, the
typical error showing a broken configuration will be “Exchange error” on this output,
which implies there is misconfiguration on the ICCP/MC-AE configuration.
root@VDC-edge-r01-re0>
2. Verify the reth0 interface on the edge-firewall.
3. Verify that the edge router is selecting the active firewall node for traffic forwarding.
This selection is done based on the gratuitous ARP request sent by the active SRX
firewall.
b. Check the forwarding table to see if the next hop and interface are chosen correctly
Active firewall node (Node 2 is active)
root@VDC-edge-r01-re0> show route forwarding-table destination 192.168.26.3
NOTE: When a failover occurs, the secondary node must announce to the
peer device that it is now owner of the MAC address associated with the
RETH interface (the RETH MAC is shared between nodes). It does this
using gratuitous ARP, or an ARP message that is broadcast without an
ARP request. Once a gratuitous ARP is sent, the local switch udpates its
MAC table to map the new MAC/port pairing. By default, the SRX sends
four gratuitous ARPs per RETH on a failover. These are sent from the
control plane and through the data plane.
4. Verify both LAGs on the edge router (ae1 and ae3). –Note that even though both the
LACP LAGs appear in an “up” state, only the LAG link toward the active cluster firewall
node will forward traffic; the standby node will remain up and ready to take over in
case of failure.
{master}
To allow all the links to forward traffic without being blocked by spanning-tree,
multi-chassis link aggregation (MC-LAG) is configured on the edge routers and core
switches. The edge routers use MC-LAG toward the edge firewalls, and the core switches
use MC-LAG toward each POD switch, application load balancer (F5), and OOB
management switch. MC- is a feature that supports aggregated Ethernet (AE) LAG
bundles spread across more than one device. LACP is used for dynamic configuration
and monitoring on links.
• Do not mix Layer 2 next generation CLI syntax (L2NG, or family ethernet-switching)
and non-l2ng (flexible-ethernet-serivces) syntax on the same interface.
• Static arp is required for integrated routing and bridging (IRB)-to-IRB connectivity
across the ICL.
• Load balancing between mc-lag peers is 100 percent local bias by default.
• Load balancing within local peer is the same as normal lag hashing.
• Two possible options for Layer 3 gateway is Virtual Router Redundancy Protocol
(VRRP) based or irb-mac-sync.
• In a VRRP-based Layer 3 solution, even the VRRP backup node forwards traffic.
• Prefer separate link aggregation group (LAG) links for ICL and ICCP.
• ICCP peering with loopback IP is preferred to use all available links through interior
gateway protocol (IGP).
• Configure mcae with “prefer status control active” on both provider edge routers (PEs)
to avoid lacp system ID flap during active node (SW) reboot.
• prefer-status-control active on both the MC-LAG nodes for all MC-AE interfaces. With
this configuration, LACP system-id will be retained on both ICCP/ICL failures to improve
convergence.
• Loopback IP peering is configured for the ICCP protocol. The ICCP peer can be reached
over protocols in case of direct ICCP link failure.
• The 1-second BFD timer is configured for the ICCP protocol and all IRB/VRRP interfaces.
• More than 1-second hold-down timer is configured on the ICL links to prevent ICL link
start-up before the ICCP during failure events.
Configuring the Network Between the Data Center Core and the Data Center PODs
The steps required to configure network between the data center core and data center
PODs are shown in the following section:
To configure the network between the data center core and the data center PODs, follow
these steps:
[edit]
set chassis node-group NW-NG-0 aggregated-devices ethernet device-count
10Configure the DSCP BA classifier for IPv6.
[edit]
set interfaces NW-NG-0:ae0 description POD1-to-core-MC-LAG-UPLINK
set interfaces NW-NG-0:ae0 aggregated-ether-options minimum-links 1
set interfaces NW-NG-0:ae0 aggregated-ether-options link-speed 10g
set interfaces NW-NG-0:ae0 aggregated-ether-options lacp active
set interfaces NW-NG-0:ae0 unit 0 family ethernet-switching port-mode trunk
c. Enable all the application VLANs on the POD switches toward the core-switches.
[edit]
set interfaces NW-NG-0:ae0 unit 0 family ethernet-switching vlan members MGMT
set interfaces NW-NG-0:ae0 unit 0 family ethernet-switching vlan members Infra
set interfaces NW-NG-0:ae0 unit 0 family ethernet-switching vlan members Tera-VM
set interfaces NW-NG-0:ae0 unit 0 family ethernet-switching vlan members
Security-Mgmt
set interfaces NW-NG-0:ae0 unit 0 family ethernet-switching vlan members Vmotion
set interfaces NW-NG-0:ae0 unit 0 family ethernet-switching vlan members VM-FT
set interfaces NW-NG-0:ae0 unit 0 family ethernet-switching vlan members
Remote-Access
set interfaces NW-NG-0:ae0 unit 0 family ethernet-switching vlan members
Core-transport-1
d. Configure the member links connected to Core-sw1 and Core-sw2 under the AE
bundle.
[edit]
set interfaces interface-range MC-LAG-ae0-members member "n0:xe-0/0/[0-11]"
set interfaces interface-range MC-LAG-ae0-members member "n1:xe-0/0/[0-11]"
set interfaces interface-range MC-LAG-ae0-members description "MC-LAG to
Core-sw ae0"
set interfaces interface-range MC-LAG-ae0-members ether-options 802.3ad
NW-NG-0:ae0
[edit]
set chassis node-group NW-NG-0 aggregated-devices ethernet device-count
10Configure the DSCP BA classifier for IPv6.
[edit]
set interfaces NW-NG-0:ae0 description POD1-to-core-MC-LAG-UPLINK
set interfaces NW-NG-0:ae0 aggregated-ether-options minimum-links 1
set interfaces NW-NG-0:ae0 aggregated-ether-options link-speed 10g
set interfaces NW-NG-0:ae0 aggregated-ether-options lacp active
set interfaces NW-NG-0:ae0 unit 0 family ethernet-switching port-mode trunk
g. Enable all the applications VLAN on the POD switches toward the core-switches.
[edit]
set interfaces NW-NG-0:ae0 unit 0 family ethernet-switching vlan members MGMT
set interfaces NW-NG-0:ae0 unit 0 family ethernet-switching vlan members Infra
set interfaces NW-NG-0:ae0 unit 0 family ethernet-switching vlan members Tera-VM
set interfaces NW-NG-0:ae0 unit 0 family ethernet-switching vlan members
Security-Mgmt
set interfaces NW-NG-0:ae0 unit 0 family ethernet-switching vlan members Vmotion
set interfaces NW-NG-0:ae0 unit 0 family ethernet-switching vlan members VM-FT
h. Configure the member links connected to Core-sw1 and Core-sw2 under the AE
bundle.
i. [edit]
set interfaces interface-range MC-LAG-ae0-members member "n0:xe-0/0/[0-11]"
set interfaces interface-range MC-LAG-ae0-members member "n1:xe-0/0/[0-11]"
set interfaces interface-range MC-LAG-ae0-members description "MC-LAG to
Core-sw ae0"
set interfaces interface-range MC-LAG-ae0-members ether-options 802.3ad
NW-NG-0:ae0
[edit]
set chassis aggregated-devices ethernet device-count 30
b. Specify the members to be included within the aggregated Ethernet bundle ae0.
[edit]
set interfaces NW-NG-0:ae0 description POD1-to-core-MC-LAG-UPLINK
set interfaces NW-NG-0:ae0 aggregated-ether-options minimum-links 1
set interfaces NW-NG-0:ae0 aggregated-ether-options link-speed 10g
set interfaces NW-NG-0:ae0 aggregated-ether-options lacp active
set interfaces NW-NG-0:ae0 unit 0 family ethernet-switching port-mode trunk
c. Configure LACP parameters with static system-id and admin-key on the aggregated
Ethernet bundle.
[edit]
set interfaces ae0 description "MC-LAG to VDC-pod1-sw1-nng-ae0"
set interfaces ae0 aggregated-ether-options lacp active
set interfaces ae0 aggregated-ether-options lacp system-priority 100
set interfaces ae0 aggregated-ether-options lacp system-id 00:00:00:00:00:01
set interfaces ae0 aggregated-ether-options lacp admin-key 1
[edit]
set interfaces ae0 aggregated-ether-options mc-ae mc-ae-id 1
set interfaces ae0 aggregated-ether-options mc-ae redundancy-group 1
set interfaces ae0 aggregated-ether-options mc-ae chassis-id 0
set interfaces ae0 aggregated-ether-options mc-ae mode active-active
set interfaces ae0 aggregated-ether-options mc-ae status-control active
set interfaces ae0 aggregated-ether-options mc-ae init-delay-time 520
set interfaces ae0 aggregated-ether-options mc-ae events iccp-peer-down
force-icl-down
[edit]
set interfaces ae0 unit 0 family ethernet-switching interface-mode trunk
set interfaces ae0 unit 0 family ethernet-switching vlan members MGMT
[edit]
set interfaces ae20 vlan-tagging
set interfaces ae20 aggregated-ether-options lacp active
set interfaces ae20 unit 0 description " ICCP link between core-sw1 and core-sw2"
set interfaces ae20 unit 0 vlan-id 4000
set interfaces ae20 unit 0 family inet address 192.168.2.1/30
[edit]
set interfaces ae9 aggregated-ether-options lacp active
set interfaces ae9 aggregated-ether-options lacp periodic fast
set interfaces ae9 unit 0 description "ICL Link for all VLANS"
set interfaces ae9 unit 0 family ethernet-switching interface-mode trunk
h. Configure the ICL link members with a hold-time value higher than the configured
BFD timer(1s) to have zero loss convergence during recovery of failed devices.
[edit]
set interfaces xe-0/3/6 hold-time up 100
set interfaces xe-0/3/6 hold-time down 3000
set interfaces xe-0/3/7 hold-time up 100
set interfaces xe-0/3/7 hold-time down 3000
set interfaces xe-1/3/6 hold-time up 100
set interfaces xe-1/3/6 hold-time down 3000
set interfaces xe-1/3/7 hold-time up 100
set interfaces xe-1/3/7 hold-time down 3000
set interfaces xe-3/3/6 hold-time up 100
set interfaces xe-3/3/6 hold-time down 3000
set interfaces xe-3/3/7 hold-time up 100
set interfaces xe-3/3/7 hold-time down 3000
set interfaces xe-5/3/6 hold-time up 100
set interfaces xe-5/3/6 hold-time down 3000
set interfaces xe-5/3/7 hold-time up 100
set interfaces xe-5/3/7 hold-time down 3000
[edit]
set interfaces ae9 unit 0 family ethernet-switching vlan members Exchange
set interfaces ae9 unit 0 family ethernet-switching vlan members Infra
set interfaces ae9 unit 0 family ethernet-switching vlan members SQL
set interfaces ae9 unit 0 family ethernet-switching vlan members SharePoint
set interfaces ae9 unit 0 family ethernet-switching vlan members Firewall
set interfaces ae9 unit 0 family ethernet-switching vlan members MGMT
j. Configure IRB on both the core-sw1 and core-sw2 and enable VRRP on the IRBs.
[edit]
set interfaces irb unit 50 family inet address 192.168.50.1/24 arp 192.168.50.2
l2-interface ae9.0
set interfaces irb unit 50 family inet address 192.168.50.1/24 arp 192.168.50.2 mac
4c:96:14:68:83:f0
set interfaces irb unit 50 family inet address 192.168.50.1/24 arp 192.168.50.2 publish
set interfaces irb unit 50 family inet address 192.168.50.1/24 vrrp-group 1
virtual-address 192.168.50.254
set interfaces irb unit 50 family inet address 192.168.50.1/24 vrrp-group 1 priority 125
set interfaces irb unit 50 family inet address 192.168.50.1/24 vrrp-group 1 fast-interval
100
set interfaces irb unit 50 family inet address 192.168.50.1/24 vrrp-group 1 preempt
set interfaces irb unit 50 family inet address 192.168.50.1/24 vrrp-group 1 accept-data
set interfaces irb unit 50 family inet address 192.168.50.1/24 vrrp-group 1
authentication-type md5
set interfaces irb unit 50 family inet address 192.168.50.1/24 vrrp-group 1
authentication-key "$9$Asx6uRSKvLN-weK4aUDkq"
[edit]
set vlans POD1-Transport-1 vlan-id 50
set vlans POD1-Transport-1 l3-interface irb.50
set vlans POD1-Transport-1 domain-type bridge
[edit]
set protocols iccp local-ip-addr 192.168.168.4
set protocols iccp peer 192.168.168.5 redundancy-group-id-list 1
set protocols iccp peer 192.168.168.5 backup-liveness-detection backup-peer-ip
192.168.168.5
set protocols iccp peer 192.168.168.5 liveness-detection minimum-interval 500
set protocols iccp peer 192.168.168.5 liveness-detection multiplier 2
[edit]
set switch-options service-id 1
[edit]
set chassis aggregated-devices ethernet device-count 30
b. Specify the members to be included within the aggregated Ethernet bundle ae0.
[edit]
set interfaces interface-range POD1-MC-LAG-ae0-members member "xe-0/0/[0-7]"
set interfaces interface-range POD1-MC-LAG-ae0-members member "xe-0/1/[0-3]"
set interfaces interface-range POD1-MC-LAG-ae0-members description "MC-LAG
to POD1 ae0"
set interfaces interface-range POD1-MC-LAG-ae0-members ether-options 802.3ad
ae0
c. Configure LACP parameters with static system-id and admin-key on the aggregated
Ethernet bundle.
[edit
set interfaces ae0 description "MC-LAG to VDC-pod1-sw1-nng-ae0"
set interfaces ae0 aggregated-ether-options lacp active
set interfaces ae0 aggregated-ether-options lacp system-priority 100
set interfaces ae0 aggregated-ether-options lacp system-id 00:00:00:00:00:01
set interfaces ae0 aggregated-ether-options lacp admin-key 1
[edit]
[edit]
set interfaces ae0 unit 0 family ethernet-switching interface-mode trunk
set interfaces ae0 unit 0 family ethernet-switching vlan members MGMT
set interfaces ae0 unit 0 family ethernet-switching vlan members Infra
set interfaces ae0 unit 0 family ethernet-switching vlan members Tera-VM
set interfaces ae0 unit 0 family ethernet-switching vlan members Security-Mgmt
set interfaces ae0 unit 0 family ethernet-switching vlan members Vmotion
set interfaces ae0 unit 0 family ethernet-switching vlan members VM-FT
set interfaces ae0 unit 0 family ethernet-switching vlan members Remote-Access
set interfaces ae0 unit 0 family ethernet-switching vlan members POD1-Transport-1
[edit]
set interfaces NW-NG-0:ae0 description POD1-to-core-MC-LAG-UPLINK
set interfaces NW-NG-0:ae0 aggregated-ether-options minimum-links 1
set interfaces NW-NG-0:ae0 aggregated-ether-options link-speed 10g
set interfaces NW-NG-0:ae0 aggregated-ether-options lacp active
set interfaces NW-NG-0:ae0 unit 0 family ethernet-switching port-mode trunk
g. Configure ae bundle (ae9) connected between the core switches as a Layer 2 link.
This will function as the multi-chassis protection link between the core switches.
[edit]
set interfaces ae9 aggregated-ether-options lacp active
set interfaces ae9 aggregated-ether-options lacp periodic fast
set interfaces ae9 unit 0 description "ICL Link for all VLANS"
set interfaces ae9 unit 0 family ethernet-switching interface-mode trunk
set interfaces ae9 unit 0 family ethernet-switching vlan members Exchange
set interfaces ae9 unit 0 family ethernet-switching vlan members Infra
set interfaces ae9 unit 0 family ethernet-switching vlan members SQL
set interfaces ae9 unit 0 family ethernet-switching vlan members SharePoint
set interfaces ae9 unit 0 family ethernet-switching vlan members Firewall
set interfaces ae9 unit 0 family ethernet-switching vlan members MGMT
set interfaces ae9 unit 0 family ethernet-switching vlan members Wikimedia
set interfaces ae9 unit 0 family ethernet-switching vlan members Exchange-Cluster
set interfaces ae9 unit 0 family ethernet-switching vlan members Tera-VM
set interfaces ae9 unit 0 family ethernet-switching vlan members Load-balancer-Ext
set interfaces ae9 unit 0 family ethernet-switching vlan members Security-Mgmt
set interfaces ae9 unit 0 family ethernet-switching vlan members Vmotion
set interfaces ae9 unit 0 family ethernet-switching vlan members VM-FT
set interfaces ae9 unit 0 family ethernet-switching vlan members Remote-Access
h. Configure the ICL link members with a hold-time value higher than the configured
BFD timer(1s) to have zero loss convergence during recovery of failed devices.
[edit]
set interfaces xe-0/3/6 hold-time up 100
set interfaces xe-0/3/6 hold-time down 3000
set interfaces xe-0/3/7 hold-time up 100
set interfaces xe-0/3/7 hold-time down 3000
set interfaces xe-1/3/6 hold-time up 100
set interfaces xe-1/3/6 hold-time down 3000
set interfaces xe-1/3/7 hold-time up 100
set interfaces xe-1/3/7 hold-time down 3000
set interfaces xe-3/3/6 hold-time up 100
set interfaces xe-3/3/6 hold-time down 3000
set interfaces xe-3/3/7 hold-time up 100
set interfaces xe-3/3/7 hold-time down 3000
set interfaces xe-5/3/6 hold-time up 100
set interfaces xe-5/3/6 hold-time down 3000
set interfaces xe-5/3/7 hold-time up 100
set interfaces xe-5/3/7 hold-time down 3000
i. Configure IRB on both the core-sw1 and core-sw2 and enable VRRP.
[edit]
set interfaces irb unit 50 family inet address 192.168.50.2/24 arp 192.168.50.1
l2-interface ae9.0
set interfaces irb unit 50 family inet address 192.168.50.2/24 arp 192.168.50.1 mac
4c:96:14:6b:db:f0
set interfaces irb unit 50 family inet address 192.168.50.2/24 arp 192.168.50.1 publish
set interfaces irb unit 50 family inet address 192.168.50.2/24 vrrp-group 1
virtual-address 192.168.50.254
set interfaces irb unit 50 family inet address 192.168.50.2/24 vrrp-group 1 priority
250
set interfaces irb unit 50 family inet address 192.168.50.2/24 vrrp-group 1 fast-interval
100
set interfaces irb unit 50 family inet address 192.168.50.2/24 vrrp-group 1 preempt
set interfaces irb unit 50 family inet address 192.168.50.2/24 vrrp-group 1 accept-data
set interfaces irb unit 50 family inet address 192.168.50.2/24 vrrp-group 1
authentication-type md5
set interfaces irb unit 50 family inet address 192.168.50.2/24 vrrp-group 1
authentication-key "$9$Asx6uRSKvLN-weK4aUDkq"
[edit]
set interfaces ae0 description "MC-LAG to VDC-pod1-sw1-nng-ae0"
set interfaces ae0 aggregated-ether-options lacp active
set interfaces ae0 aggregated-ether-options lacp system-priority 100
set interfaces ae0 aggregated-ether-options lacp system-id 00:00:00:00:00:01
set interfaces ae0 aggregated-ether-options lacp admin-key 1
[edit]
set vlans POD1-Transport-1 vlan-id 50
set vlans POD1-Transport-1 l3-interface irb.50
set vlans POD1-Transport-1 domain-type bridge
NOTE: The ELS CLI doesn’t need an MC-AE interface ICL under the
bridge configuration. It picks the appropriate MC-AE interface from the
VLAN association on the configured links.
[edit]
set protocols iccp local-ip-addr 192.168.168.5
set protocols iccp peer 192.168.168.4 redundancy-group-id-list 1
set protocols iccp peer 192.168.168.4 backup-liveness-detection backup-peer-ip
192.168.168.4
set protocols iccp peer 192.168.168.4 liveness-detection minimum-interval 500
set protocols iccp peer 192.168.168.4 liveness-detection multiplier 2
[edit]
set switch-options service-id 1
Verification
Purpose The following verification section (with verification commands and sample output) can
be used to confirm that MC-LAG Active/Active between the core and PODs has been
configured correctly.
Results
2. Verify that the ICL is configured with all VLANS and that the ICL is up. Verify that all
VLANS configured on the MC-AE interfaces are properly allowed over the ICL link.
lacp {
active;
periodic fast;
}
}
unit 0 {
description "ICL Link for all VLANS";
family ethernet-switching {
interface-mode trunk;
vlan {
members [ Exchange Infra SQL SharePoint Firewall Compute-MGMT Wikimedia
Exchange-Cluster Tera-VM Load-balancer-Ext Security-Mgmt Vmotion VM-FT
Remote-Access OOB-Transport Load-balancer-Ext-Tera-VM
TrafficGenerator-502 TrafficGenerator-503 TrafficGenerator-504
POD1-Transport-1 POD1-Transport-2 POD1-Transport-3 POD1-Transport-4
POD2-Transport-1 POD2-Transport-2 ];
}
}
}
0
1 expedited-fo 0 0
0
2 assured-forw 0 0
0
3 network-cont 0 0
0
Egress queues: 8 supported, 4 in use
Queue counters: Queued packets Transmitted packets Dropped
packets
0 best-effort 5568474 5568474
0
1 expedited-fo 0 0
0
2 assured-forw 0 0
0
3 network-cont 1352698 1352698
0
Queue number: Mapped forwarding classes
0 best-effort
1 expedited-forwarding
2 assured-forwarding
3 network-control
Logical interface ae8.0 (Index 356) (SNMP ifIndex 540) (Generation 165)
key
xe-3/1/0.0 Actor 100 00:00:00:00:00:09 127 86
9
xe-3/1/0.0 Partner 127 88:e0:f3:1f:f0:a0 127 1
1
xe-3/1/1.0 Actor 100 00:00:00:00:00:09 127 87
9
xe-3/1/1.0 Partner 127 88:e0:f3:1f:f0:a0 127 4
1
LACP Statistics: LACP Rx LACP Tx Unknown Rx Illegal Rx
xe-3/1/0.0 485041 485547 0 0
xe-3/1/1.0 485095 485495 0 0
Marker Statistics: Marker Rx Resp Tx Unknown Rx Illegal Rx
xe-3/1/0.0 0 0 0 0
xe-3/1/1.0 0 0 0 0
Protocol eth-switch, MTU: 9192, Generation: 199, Route table: 5
Flags: Trunk-Mode
Peer State : up
Peer Ip/MCP/State : 192.168.168.5 ae9.0 up
Member Link : ae8
Current State Machine's State: mcae active state
Local Status : active
Local State : up
Peer Status : active
Peer State : up
Logical Interface : ae8.0
Topology Type : bridge
Local State : up
Peer State : up
Peer Ip/MCP/State : 192.168.168.5 ae9.0 up
Member Link : ae10
Current State Machine's State: mcae active state
Local Status : active
Local State : up
Peer Status : active
Peer State : up
Logical Interface : ae10.0
Topology Type : bridge
Local State : up
Peer State : up
Peer Ip/MCP/State : 192.168.168.5 ae9.0 up
Member Link : ae11
Current State Machine's State: mcae active state
Local Status : active
Local State : up
Peer Status : active
Peer State : up
Logical Interface : ae11.0
Topology Type : bridge
Local State : up
Peer State : up
Peer Ip/MCP/State : 192.168.168.5 ae9.0 up
Member Link : ae12
Current State Machine's State: mcae active state
Local Status : active
Local State : up
Peer Status : active
Peer State : up
Logical Interface : ae12.0
Topology Type : bridge
Local State : up
Peer State : up
Peer Ip/MCP/State : 192.168.168.5 ae9.0 up
Member Link : ae13
Current State Machine's State: mcae active state
Local Status : active
Local State : up
Peer Status : active
Peer State : up
Logical Interface : ae13.0
Topology Type : bridge
Local State : up
Peer State : up
Peer Ip/MCP/State : 192.168.168.5 ae9.0 up
Member Link : ae14
Current State Machine's State: mcae active state
Local Status : active
Local State : up
4. Verify that the ICL and MC-AE interfaces are in the same broadcast domain.
5. Verify that BFD is configured on all IRB interfaces with an “Up” status. Also verify that
appropriate timers are configured. (A 6-second timer is supported on EX9200 for
MC-LAG Active/Active.)
Detect Transmit
Address State Interface Time Interval
Multiplier
192.168.2.2 Up ae20.0 6.000 2.000 3
vip
192.168.25.254
irb.15 up 1 backup Active lcl 192.168.15.1
vip
192.168.15.254
irb.16 up 1 backup Active lcl 192.168.16.1
vip
192.168.16.254
irb.20 up 1 backup Active lcl 192.168.20.1
vip
192.168.20.254
irb.50 up 1 backup Active lcl 192.168.50.1
vip
192.168.50.254
irb.51 up 1 backup Active lcl 192.168.51.1
vip
192.168.51.254
irb.52 up 1 backup Active lcl 192.168.52.1
vip
192.168.52.254
irb.53 up 1 backup Active lcl 192.168.53.1
vip
192.168.53.254
irb.54 up 1 backup Active lcl 192.168.54.1
vip
192.168.54.254
irb.55 up 1 backup Active lcl 192.168.55.1
vip
192.168.55.254
irb.101 up 1 backup Active lcl 172.16.1.252
vip 172.16.1.254
vip 172.16.6.254
vip 172.16.7.254
vip 172.16.9.254
vip 10.30.3.254
vip
192.168.25.254
irb.15 up 1 master Active lcl 192.168.15.2
vip
192.168.15.254
vip
192.168.16.254
irb.20 up 1 master Active lcl 192.168.20.2
vip
192.168.20.254
irb.50 up 1 master Active lcl 192.168.50.2
vip
192.168.50.254
irb.51 up 1 master Active lcl 192.168.51.2
vip
192.168.51.254
irb.52 up 1 master Active lcl 192.168.52.2
vip
192.168.52.254
irb.53 up 1 master Active lcl 192.168.53.2
vip
192.168.53.254
irb.54 up 1 master Active lcl 192.168.54.2
vip
192.168.54.254
irb.55 up 1 master Active lcl 192.168.55.2
vip
192.168.55.254
irb.101 up 1 master Active lcl 172.16.1.253
vip 172.16.1.254
vip 172.16.6.254
vip 172.16.7.254
vip 172.16.9.254
vip 10.30.3.254
Routing Configuration
Overview
The MetaFabric 1.0 solution features routing configured between the edge and the service
provider, as well as routing of traffic from the edge and core roles. BGP is implemented
at the edge. OSPF is used as the IGP in the solution.
Topology
The routing topology and configuration are illustrated in Figure 45 on page 124.
• BGP implementation
• Configuring VDC-Edge-R1
• Configuring VDC-Edge-R2
• OSPF implementation
• Configuration examples
In VDC 1.0, Internet connectivity for the data center is achieved by establishing EBGP
peering with multiple service providers. As shown in Figure 45 on page 124, EBGP is
configured from Edge-R1 to SP1 and Edge-R2 to SP2. Internet routes are simulated using
testing tools connected to SP1 and SP2. SP1 and SP2 advertise the same Internet routes
to the edge routers.
The next element of routing in the solution is the configuration of EDGE R1 and EDGE R2
peering via iBGP with an export policy to enable "next-hop self". BGP local preference
is configured to prefer the SP1.
• The edge routers must advertise the data center's business-critical applications’
(SharePoint, Exchange, and Wikimedia) public address space into the Internet for the
Internet users to access the data center resources. To support redundancy, each edge
router is advertising the same prefix into the Internet.
• Application server Internet access is provided using Source NAT on the edge firewall
and forwarded to the edge routers for Internet access to service provider networks.
• Remote access users connecting from Internet will use the Junos Pulse gateway public
IP address for the VPN connection. The SA appliance VM hosting the pulse gateway
service IP address is advertised to the Internet using an export policy.
To configure BGP between the edge and the service provider, follow these steps:
[edit]
set interfaces xe-0/0/20 ether-options 802.3ad ae1
set interfaces xe-0/0/22 ether-options 802.3ad ae1
[edit]
set protocols bgp group EDGE-R1 local-address 10.94.127.229
set protocols bgp group EDGE-R1 neighbor 10.94.127.230 peer-as 64512
2. Configure VDC-Edge-R1.
[edit]
set routing-options autonomous-system 64512
set protocols bgp group SP1 export Export-VDC-Subnets
set protocols bgp group SP1 neighbor 10.94.127.229 peer-as 100
[edit]
set protocols bgp group EDGE-R2 local-address 192.168.168.1
set protocols bgp group EDGE-R2 neighbor 192.168.168.2 peer-as 64512
[edit]
set protocols bgp group EDGE-R2 export next-hop-self
d. Configure the BGP export policy to advertise the applications’ public prefix
[edit]
set policy-options policy-statement Export-VDC-Subnets term App-Server-VIP
from protocol ospf
set policy-options policy-statement Export-VDC-Subnets term App-Server-VIP
from route-filter 10.94.127.128/26 exact accept
[edit]
set policy-options policy-statement Export-VDC-Subnets term Secure-Acces-IP
from protocol ospf
set policy-options policy-statement Export-VDC-Subnets term Secure-Acces-IP
from route-filter 10.94.127.32/27 exact accept
[edit]
set policy-options policy-statement Export-VDC-Subnets term
Server-Internet-NAT-IP from protocol ospf
set policy-options policy-statement Export-VDC-Subnets term
Server-Internet-NAT-IP from route-filter 10.94.127.0/27 exact accept
[edit]
set policy-options policy-statement next-hop-self term 1 then local-preference 200
set policy-options policy-statement next-hop-self term 1 then next-hop self
set policy-options policy-statement next-hop-self term 1 then accept
3. Configure VDC-Edge-R2.
[edit]
set protocols bgp group T0-B6-Gateway neighbor 10.94.127.241 peer-as 300
set protocols bgp group EDGE-R2 export from-ospf
set protocols bgp group EDGE-R2 neighbor 10.94.127.246 peer-as 64512
[edit]
[edit]
set protocols bgp group EDGE-R1 local-address 192.168.168.2
set protocols bgp group EDGE-R1 export next-hop-self
set protocols bgp group EDGE-R1 neighbor 192.168.168.1 peer-as 64512
[edit]
set policy-options policy-statement Export-VDC-Subnets term App-Server-VIP
from protocol ospf
set policy-options policy-statement Export-VDC-Subnets term App-Server-VIP
from route-filter 10.94.127.128/26 exact accept
set policy-options policy-statement Export-VDC-Subnets term Secure-Acces-IP
from protocol ospf
set policy-options policy-statement Export-VDC-Subnets term Secure-Acces-IP
from route-filter 10.94.127.32/27 exact accept
set policy-options policy-statement Export-VDC-Subnets term
Server-Internet-NAT-IP from protocol ospf
set policy-options policy-statement Export-VDC-Subnets term
Server-Internet-NAT-IP from route-filter 10.94.127.0/27 exact accept
set policy-options policy-statement Export-VDC-Subnets term Tera-VM-Server
from route-filter 10.20.127.0/24 exact accept
set policy-options policy-statement Export-VDC-Subnets term TrafficGenerator
from protocol ospf
set policy-options policy-statement Export-VDC-Subnets term TrafficGenerator
from route-filter 10.30.2.0/24 exact accept
set policy-options policy-statement Export-VDC-Subnets term TrafficGenerator
from route-filter 10.30.3.0/24 exact accept
set policy-options policy-statement Export-VDC-Subnets term TrafficGenerator
from route-filter 10.30.4.0/24 exact accept
[edit]
set policy-options policy-statement next-hop-self term 1 from protocol bgp
set policy-options policy-statement next-hop-self term 1 then local-preference 100
set policy-options policy-statement next-hop-self term 1 then next-hop self
set policy-options policy-statement next-hop-self term 1 then accept
Verification
Purpose The following verification commands (with sample output) can be used to confirm BGP
configuration.
Results
1. Verify on Edge-R1 that an EBGP session with SP1 exists. Also verify the iBGP session
with Edge-R2.
State|#Active/Received/Accepted/Damped...
10.94.127.229 100 46637 46471 0 0 2w0d12h
7/16/16/0
0/0/0/0
192.168.168.2 64512 46210 47057 0 0 2w0d12h
0/0/0/0
0/0/0/0
2. Verify on the Edge-R2 that EBGP peering with SP2 and iBGP peering with Edge-R1 is
successful.
State|#Active/Received/Accepted/Damped...
10.94.127.245 200 46378 46671 0 0 2w0d12h
0/16/16/0
0/0/0/0
192.168.168.1 64512 47062 46215 0 0 2w0d12h
8/59/59/0
0/0/0/0
3. Verify the routing table on Edge-R1 by showing routes received via BGP.
4. Verify the routing table on Edge-R2 by showing routes received via BGP.
5. Verify that only the configured prefixes on Edge-R1 are advertised to the service
providers as per the configured policy.
6. Verify that only the configured prefixes on Edge-R2 are advertised to the service
provider as per the configured export policy.
The MetaFabric 1.0 solution uses OSPF as the IGP because of widespread familiarity of
the protocol. Both edge routers and core switches are configured with Layer 2 MC-LAG
(Active/Active).
Layer 3 connectivity for the edge firewalls and POD switches is enabled using a bridge
domain and IRB interfaces. OSPF is also enabled on POD switches on the IRB toward
the core switch for hybrid Layer 2 and Layer 3 connectivity to the core.
• Verification
OSPF was configured in the MetaFabric 1.0 solution validation lab using the following
design considerations and configuration elements:
• Three OSPF areas configured to localize the failure with the area boundary
• Each core switch and edge router needs to be configured with an OSPF priority of 255
and 254 to strictly enforce that the core switch and edge routers will always become
the designated router and backup designated router for that bridge domain.
• All IRBs and VRRP addresses should be advertised into OSPF as passive so that sessions
don’t get established with a server or other devices, but the Layer 3 connected routes
are advertised into OSPF.
• Conditional-based default aggregate routes from edge routers are re-distributed toward
the core and POD switches for Internet connectivity.
• Loop-free Alternate (LFA) is configured on all the OSPF links to improve convergence.
Figure 46: OSPF Area Configuration Between Edge and Core (Including
Out-of-Band Management)
The POD switches are connected to the core switches using LAG. The PODs are configured
as OSPF areas 10 and 11 (Figure 47 on page 131).
1. On the edge routers, check to verify that the route to 12.12.12.12 has been learned via
EBGP (from any service provider using the policy). Create a gendefault policy matching
a route from SP 12.12.12.12.
[edit]
set policy-options policy-statement gendefault term upstreamroutes from route-filter
12.12.12.12/32 exact
set policy-options policy-statement gendefault term upstreamroutes from protocol
bgp
set policy-options policy-statement gendefault term upstreamroutes then accept
set policy-options policy-statement gendefault term deny then reject
[edit]
set routing-options generate route 0.0.0.0/0 policy gendefault
[edit]
set policy-options policy-statement export-default-route term 1 from protocol aggregate
set policy-options policy-statement export-default-route term 1 from route-filter
0.0.0.0/0 exact
set policy-options policy-statement export-default-route term 1 then external type 1
set policy-options policy-statement export-default-route term 1 then accept
[edit]
set protocols ospf export export-default-route
• Calculating a shortest path to the destination and choosing a next hop to circumvent
the failed link. Typically a link-state protocol is being used as IGP, such as OSPF and
IS-IS. Both protocols will have to perform a Dijkstra-algorithm to calculate a loop-free
topology again. This is known as shortest-path first (SPF) calculation.
• Informing the neighbors of this link failure to allow other routers calculating an updated
loop-free topology as well.
After RPD calculates all new next hops for all given prefixes, this new next-hop information
needs to be pushed down to the Packet Forwarding Engine. Once the Packet Forwarding
Engine is updated, traffic is rerouted and flows again. At this stage, local-repair is achieved,
as this node has done all calculations and rerouting. Other nodes in the network might
not yet have received the updates regarding the link failure. Once all nodes in the network
have received the IGP updates regarding the failure and updated their next hops, global
convergence is reached.
The goal of LFA is to pre-program Packet Forwarding Engine backup paths for the known
prefixes in a loop-free manner to speed up convergence without the need to wait for
either local repair start or global repair.
LFA is changing the event flow behavior described in Figure 48 on page 133. Rerouting
begins much earlier, immediately after detecting the link failure. There is no need to wait
for the routing protocol process (rpd) to complete the SPF calculation or wait for
convergence of other routers in the network (global convergence). As a result of being
able to reroute after the Packet Forwarding Engine is aware of the link error, LFA shows
better failover time in the test bed. For LFA to work, the Packet Forwarding Engine must
have a backup next hop preinstalled in it. Because it is preinstalled, there is no need to
wait for an SPF-calculation done by the routing protocol process (rpd). This backup next
hop gets a new name here: the loop-free alternate (LFA). The loop-free alternate next
hop will be used at once if the primary next hop is going down. LFA does not need any
additional protocol. LFA is self-contained and does not rely on any helper node to work
properly; as such, rollout is be done in small steps. LFA preinstalls a backup next hop in
for the forwarding plane. This backup next hop is elected by running multiple SPF
calculations , each with a different neighbor as the root of the tree. Upon link failure, the
backup next hop can be immediately selected, as this will provide loop-free forwarding
for a given destination node.
[edit]
set protocols ospf area 0.0.0.1 interface irb.0 node-link-protection
2. Configure per-packet load balancing (PPLB) to allow the Packet Forwarding Engine
to retain the LFA backup next hops in the Packet Forwarding Engine.
[edit]
set policy-options policy-statement pplb then load-balance per-packet
set policy-options policy-statement pplb then accept
OSPF configuration in the data center is covered in the following section. The following
nodes and solution elements are configured below:
• Firewall configuration
a. Configure and apply export policy, SPF reference bandwidth, LFA, and priority.
[edit]
set protocols ospf export export-default-route
set protocols ospf reference-bandwidth 1000g
set protocols ospf area 0.0.0.1 interface irb.0 node-link-protection
set protocols ospf area 0.0.0.1 interface irb.0 priority 254
[edit]
set protocols ospf area 0.0.0.1 interface irb.0 bfd-liveness-detection
minimum-interval 500
set protocols ospf area 0.0.0.1 interface ae0.0 node-link-protection
set protocols ospf area 0.0.0.1 interface ae0.0 priority 254
set protocols ospf area 0.0.0.1 interface ae0.0 bfd-liveness-detection
minimum-interval 500
set protocols ospf area 0.0.0.1 interface ae0.0 bfd-liveness-detection multiplier 2
set protocols ospf area 0.0.0.1 interface lo0.0
c. Configure the condition policy for the OSPF default route based on the BGP route.
[edit]
set policy-options policy-statement export-default-route term 1 from protocol
aggregate
set policy-options policy-statement export-default-route term 1 from route-filter
0.0.0.0/0 exact
set policy-options policy-statement export-default-route term 1 then external type
1
set policy-options policy-statement export-default-route term 1 then accept
set policy-options policy-statement gendefault term upstreamroutes from protocol
bgp
set policy-options policy-statement gendefault term upstreamroutes from route-filter
12.12.12.12/32 exact
set policy-options policy-statement gendefault term upstreamroutes then accept
set policy-options policy-statement gendefault term deny then reject
set routing-options generate route 0.0.0.0/0 policy gendefault
d. Configure PPLB.
[edit]
set policy-options policy-statement pplb then load-balance per-packet
set policy-options policy-statement pplb then accept
set routing-options forwarding-table export pplb
[edit]
set protocols ospf export OSPF-Export-SA
set protocols ospf reference-bandwidth 1000g
set protocols ospf area 0.0.0.1 interface reth1.0 node-link-protection
set protocols ospf area 0.0.0.1 interface reth1.0 bfd-liveness-detection minimum-interval
500
set protocols ospf area 0.0.0.1 interface reth1.0 bfd-liveness-detection multiplier 2
set protocols ospf area 0.0.0.1 interface reth0.0 node-link-protection
[edit]
set protocols ospf export export-ospf
set protocols ospf reference-bandwidth 1000g
set protocols ospf area 0.0.0.10 stub default-metric 100
set protocols ospf area 0.0.0.10 stub no-summaries
set protocols ospf area 0.0.0.10 interface fxp0.0 disable
set protocols ospf area 0.0.0.10 interface irb.50 node-link-protection
set protocols ospf area 0.0.0.10 interface irb.50 priority 255
set protocols ospf area 0.0.0.10 interface irb.50 bfd-liveness-detection minimum-interval
500
set protocols ospf area 0.0.0.10 interface irb.50 bfd-liveness-detection multiplier 2
set protocols ospf area 0.0.0.10 interface irb.51 node-link-protection
set protocols ospf area 0.0.0.10 interface irb.51 priority 255
set protocols ospf area 0.0.0.10 interface irb.51 bfd-liveness-detection minimum-interval
500
set protocols ospf area 0.0.0.10 interface irb.51 bfd-liveness-detection multiplier 2
set protocols ospf area 0.0.0.10 interface irb.52 node-link-protection
set protocols ospf area 0.0.0.10 interface irb.52 priority 255
set protocols ospf area 0.0.0.10 interface irb.52 bfd-liveness-detection minimum-interval
500
set protocols ospf area 0.0.0.10 interface irb.52 bfd-liveness-detection multiplier 2
set protocols ospf area 0.0.0.10 interface irb.53 priority 255
set protocols ospf area 0.0.0.10 interface irb.53 bfd-liveness-detection minimum-interval
500
set protocols ospf area 0.0.0.10 interface irb.53 bfd-liveness-detection multiplier 2
set protocols ospf area 0.0.0.0 interface lo0.0 passive
set protocols ospf area 0.0.0.0 interface irb.107 passive
set protocols ospf area 0.0.0.0 interface irb.20 node-link-protection
set protocols ospf area 0.0.0.0 interface irb.20 bfd-liveness-detection minimum-interval
500
set protocols ospf area 0.0.0.0 interface irb.20 bfd-liveness-detection multiplier 2
[edit]
set protocols ospf reference-bandwidth 1000g
set protocols ospf area 0.0.0.10 stub no-summaries
set protocols ospf area 0.0.0.10 interface vlan.104 passive
set protocols ospf area 0.0.0.10 interface vlan.108 passive
set protocols ospf area 0.0.0.10 interface vlan.103 passive
set protocols ospf area 0.0.0.10 interface vlan.502 passive
set protocols ospf area 0.0.0.10 interface vlan.503
set protocols ospf area 0.0.0.10 interface vlan.50 node-link-protection
set protocols ospf area 0.0.0.10 interface vlan.50 bfd-liveness-detection
minimum-interval 500
set protocols ospf area 0.0.0.10 interface vlan.50 bfd-liveness-detection multiplier 2
set protocols ospf area 0.0.0.10 interface vlan.51 node-link-protection
set protocols ospf area 0.0.0.10 interface vlan.51 bfd-liveness-detection
minimum-interval 500
set protocols ospf area 0.0.0.10 interface vlan.51 bfd-liveness-detection multiplier 2
set protocols ospf area 0.0.0.10 interface vlan.52 node-link-protection
set protocols ospf area 0.0.0.10 interface vlan.52 bfd-liveness-detection
minimum-interval 500
set protocols ospf area 0.0.0.10 interface vlan.52 bfd-liveness-detection multiplier 2
set protocols ospf area 0.0.0.10 interface vlan.53 node-link-protection
set protocols ospf area 0.0.0.10 interface vlan.53 bfd-liveness-detection
minimum-interval 500
set protocols ospf area 0.0.0.10 interface vlan.53 bfd-liveness-detection multiplier 2
set protocols ospf area 0.0.0.10 interface vlan.501 passive
Verification
Purpose The following verification commands (with sample output) can be used to confirm OSPF
configuration in the data center.
Results
1. Verify that all OSPF sessions are up (command outputs provided for all configured
routers).
Node Coverage:
Route Coverage:
root@VDC-edge-r2-re0%cli
{master}
Node Coverage:
Route Coverage:
Node Coverage:
0.0.0.11 2 2 100.00%
0.0.0.2 0 2 0.00%
Route Coverage:
Node Coverage:
Route Coverage:
Node Coverage:
Route Coverage:
High Availability
The MetaFabric 1.0 solution is designed with both hardware and software redundancy
throughout the data center.
Hardware Redundancy
The following hardware redundancy options are configured in the VDC 1.0 solution:
• Node-level physical redundancy, featuring edge routers, redundant core switches, POD
switches, and an SRX firewall cluster
Software Redundancy
The following software redundancy features are configured in the MetaFabric 1.0 solution:
QFabric-M Configuration
• Link/node-level redundancy using multichassis LAGs on edge router and core switches
• Redundant server node groups (RSNG) on POD1 and POD 2 (QFX3000-M QFabric
system). This is configured on the PODs using the following configuration commands:
set fabric resources node-group RSNG2 node-device n2
set fabric resources node-group RSNG2 node-device n3
• OSPF LFA feature to enable backup next hop during failure events
NOTE: When NSR is enabled, graceful protocol restart is not supported. NSR
is not currently supported on the QFX3000-M QFabric system.
The core switches (EX9200) and edge routers (MX240) feature software redundancy
configured as shown here:
• Graceful Routing Engine switchover (GRES) on Routing Engine hardware failure. This
is configured on MX Series platforms using the following commands:
set groups global chassis redundancy routing-engine 0 master
set groups global chassis redundancy routing-engine 1 backup
set groups global chassis redundancy failover on-loss-of-keepalives
set groups global chassis redundancy graceful-switchover
• Nonstop software upgrade (NSSU) is supported on the QFX3000-M QFabric system
and MX240
NOTE: ISSU is supported only with the presence of 1-Gbps line cards
available in the chassis (EX9200).
• Nonstop active routing (NSR) is supported. This is configured using the following
command:
set groups global routing-options nonstop-routing
• Graceful protocol restart is also supported at the core and edge. Configuration of this
feature is performed using this command:
set groups global routing-options graceful-restart
The edge firewalls (SRX3600) feature the following software redundancy configurations:
• Edge firewall (SRX3600) chassis cluster configuration is performed using the following
commands:
set groups global protocols layer2-control nonstop-bridging
set chassis cluster reth-count 4
set chassis cluster redundancy-group 0 node 0 priority 129
set chassis cluster redundancy-group 0 node 1 priority 128
set chassis cluster redundancy-group 1 node 0 priority 129
set chassis cluster redundancy-group 1 node 1 priority 128
• Fabric links between the SRX chassis are configured using the following commands:
set interfaces fab0 fabric-options member-interfaces ge-5/0/15
set interfaces fab1 fabric-options member-interfaces ge-18/0/15
Verification
The following verification commands (with sample output) can be used to confirm the
configuration and function of high availability features.
Results
1. Verify that all the protocols sessions are up in the backup Routing Engine. This
command output verifies that NSR is configured properly in the EX9200:
2. 2. Verify thatNSR is configured properly. This is done by confirming that all OSPF
sessions are in a “Full” state in the backup Routing Engine. The command below was
run on the MX240:
3. 3. Verify that GRES is configured properly. This is done by confirming that the backup
Routing Engine is ready for switchover. The command below was run on the MX240:
Graceful switchover: On
Configuration database: Ready
Class-of-Service Configuration
Class-of-Service Overview
Requirements
End-to-end class of service is required in a data center environment to ensure a high
quality user experience for users of business-critical applications. The most important
or high priority applications and services should always have priority over other traffic
types. The other type of class of service required is configuration and support for DCB
and the provisioning of Lossless Ethernet to enable communication between storage
arrays as a high priority, lossless medium without causing blocking or interruption of
non-storage traffic.
End-to-end traffic and CoS in this solution have been classified into the following five
forwarding classes (sorted from highest priority to lowest priority, top to bottom):
• Virtualization control
• Storage (Llossless)
• Business applications
The solution queues (buffers and transmit rates) are shown in Table 20 on page 149.
Configuring Class-of-Service
Topology
Figure 49 on page 151 shows the topology of the data center PODs and the connections
to the compute and storage farms in the testing lab. This MetaFabric 1.0 solution uses
iSCSI, and Network File System (NFS) as a storage protocol. EMC VNX is configured as
storage array. All the VM hard drives are mounted on the EMC storage using the iSCSI
transport. The NFS partition is used for file storage. Storage traffic requires lossless
transport end-to-end when traversing the Ethernet network. Incoming traffic destined
for the data center applications is classified at the edge router to Queue 111 and forwarded
to the VDC network through the vdc-perimeter firewall.
1. Configure DCBX and LLDP on the vdc-pod1-sw1 and vdc-pod2-sw1 for the server-facing
and storage-facing interfaces. DCBX and LLDP are needed to exchange the peer
lossless Ethernet capabilities. The important parameters are:
1. Lossless-queue number
[edit]
set protocols dcbx interface all
set protocols lldp interface all
[edit]
set class-of-service forwarding-classes class BestEffort queue-num 0
set class-of-service forwarding-classes class Business_Applications queue-num 1
set class-of-service forwarding-classes class no-loss queue-num 4
set class-of-service forwarding-classes class no-loss no-loss
set class-of-service forwarding-classes class VM_Control queue-num 6
set class-of-service forwarding-classes class Network_Control queue-num 7
3. Configure 802.1p classifiers for the iSCSi traffic and assign code-points as 4.
[edit]
set class-of-service classifiers ieee-802.1 802.1-classifier forwarding-class BestEffort
loss-priority low code-points 000
4. Configure forwarding class-sets for iSCSI and Ethernet traffic (representing different
forwarding classes).
[edit]
set class-of-service forwarding-class-sets no-loss class no-loss
set class-of-service forwarding-class-sets VDC-Lan class BestEffort
set class-of-service forwarding-class-sets VDC-Lan class Business_Applications
set class-of-service forwarding-class-sets VDC-Lan class Network_Control
set class-of-service forwarding-class-sets VDC-Lan class VM_Control
5. Configure the congestion notification profile to enable PFC for queue 3 (no-loss queue).
This configuration is mandatory, enabling lossless behavior on queue 3. This enforces
priority-flow-control for the queue.
[edit]
set class-of-service congestion-notification-profile cnp input ieee-802.1 code-point
100 pfc
6. 6. Configure the transmit rate and priority for each scheduler to allow for bandwidth
sharing.
7. Configure scheulder-maps for iSCSI. This configuration binds the scheduler to the
forwarding class sets.
[edit]
set class-of-service scheduler-maps VDC-Lan forwarding-class BestEffort scheduler
BestEffort
set class-of-service scheduler-maps VDC-Lan forwarding-class Business_Applications
scheduler Business_Applications
set class-of-service scheduler-maps VDC-Lan forwarding-class VM_Control scheduler
VM_Control
set class-of-service scheduler-maps VDC-Lan forwarding-class Network_Control
scheduler Network_Control
set class-of-service scheduler-maps no-loss forwarding-class no-loss scheduler no-loss
8. Configure traffic control profiles. These profiles bind the scheduler maps.
[edit]
9. 9. Apply the classifier, forwarding class set, and CNP profile to the server-facing and
storage-facing interfaces. In the verification lab, these ports are all facing
TrafficGenerator test equipment.
[edit]
set class-of-service interfaces RSNG2:ae0 forwarding-class-set no-loss
output-traffic-control-profile no-loss
set class-of-service interfaces RSNG2:ae0 forwarding-class-set VDC-Lan
output-traffic-control-profile VDC-Lan
set class-of-service interfaces RSNG2:ae0 congestion-notification-profile cnp
set class-of-service interfaces RSNG2:ae0 unit * classifiers ieee-802.1 802.1-classifier
set class-of-service interfaces RSNG2:ae0 unit * rewrite-rules ieee-802.1
802.1-rewrite
[edit]
set class-of-service interfaces n1:xe-0/0/14 forwarding-class-set VDC-Lan
output-traffic-control-profile VDC-Lan
set class-of-service interfaces n1:xe-0/0/14 forwarding-class-set no-loss
output-traffic-control-profile no-loss
set class-of-service interfaces n1:xe-0/0/14 congestion-notification-profile cnp
set class-of-service interfaces n1:xe-0/0/14 unit * classifiers ieee-802.1
802.1-classifier
set class-of-service interfaces n1:xe-0/0/15 forwarding-class-set VDC-Lan
output-traffic-control-profile VDC-Lan
set class-of-service interfaces n1:xe-0/0/15 forwarding-class-set no-loss
output-traffic-control-profile no-loss
set class-of-service interfaces n1:xe-0/0/15 congestion-notification-profile cnp
set class-of-service interfaces n1:xe-0/0/15 unit * classifiers ieee-802.1
802.1-classifier
set class-of-service interfaces n1:xe-0/0/16 forwarding-class-set VDC-Lan
output-traffic-control-profile VDC-Lan
set class-of-service interfaces n1:xe-0/0/16 forwarding-class-set no-loss
output-traffic-control-profile no-loss
set class-of-service interfaces n1:xe-0/0/16 congestion-notification-profile cnp
set class-of-service interfaces n1:xe-0/0/16 unit * classifiers ieee-802.1
802.1-classifier
set class-of-service interfaces n1:xe-0/0/16 unit * rewrite-rules ieee-802.1
802.1-rewrite
[edit]
copy class-of-service interfaces RSNG2:ae0 to RSNG2:ae1
copy class-of-service interfaces RSNG2:ae0 to RSNG2:ae2
copy class-of-service interfaces RSNG2:ae0 to RSNG2:ae4
copy class-of-service interfaces RSNG2:ae0 to RSNG3:ae3
copy class-of-service interfaces RSNG2:ae0 to RSNG3:ae4
copy class-of-service interfaces RSNG2:ae0 to RSNG3:ae5
copy class-of-service interfaces RSNG2:ae0 to RSNG4:ae0
Verification
The following verification commands (with sample output) can be used to confirm that
the DCB and lossless Ethernet configurations are operating as expected.
Results
3. Verify that packets in other queues are dropped during congestion. Also verify that
queue 4 (lossless queue) is showing no drops.
Queued:
Packets : 0 0 pps
Bytes : 0 0 bps
Transmitted:
Packets : 17848847 422311 pps
Bytes : 2284652416 432446792 bps
Tail-dropped packets : Not Available
Total-dropped packets: 71401120 1689149 pps
Total-dropped bytes : 9139343360 1729689096 bps
Queued:
Packets : 0 0 pps
Bytes : 0 0 bps
Transmitted:
Packets : 89238496 2111415 pps
Bytes : 11422527488 2162089504 bps
Tail-dropped packets : Not Available
Total-dropped packets: 0 0 pps
Total-dropped bytes : 0 0 bps
Queued:
Packets : 0 0 pps
Bytes : 0 0 bps
Transmitted:
Packets : 0 0 pps
Bytes : 0 0 bps
Tail-dropped packets : Not Available
Total-dropped packets: 0 0 pps
Total-dropped bytes : 0 0 bps
Queued:
Packets : 0 0 pps
Bytes : 0 0 bps
Transmitted:
Packets : 8700524783 5067522 pps
Bytes : 1113667172224 5189143528 bps
Tail-dropped packets : Not Available
Total-dropped packets: 0 0 pps
Total-dropped bytes : 0 0 bps
Queued:
Packets : 0 0 pps
Bytes : 0 0 bps
Transmitted:
Packets : 17848844 422310 pps
Bytes : 2284652032 432446128 bps
Tail-dropped packets : Not Available
Total-dropped packets: 71401096 1689144 pps
Total-dropped bytes : 9139340288 1729683800 bps
Queued:
Packets : 0 0 pps
Bytes : 0 0 bps
Transmitted:
Packets : 17849477 422311 pps
Bytes : 2284699608 432446792 bps
Tail-dropped packets : Not Available
Total-dropped packets: 71401145 1689147 pps
Total-dropped bytes : 9139345327 1729686776 bps
Security Configuration
• Perimeter Security on page 159
• Host Security on page 177
Perimeter Security
Overview
The Juniper Networks SRX3600 is deployed in this solution as the edge firewall and
provides perimeter security for the virtualized data center network residing between the
edge router and core switch. The SRX3600 is configured in active/backup chassis cluster
mode. Active/backup high availability is the most common type of high availability firewall
deployment and consists of two firewall members of a cluster, one of which actively
provides routing, firewall, NAT, VPN, and security services, along with maintaining control
of the chassis cluster. In case of chassis cluster failover, the backup firewall will become
the active firewall and the active firewall will become the backup.
To configure chassis clustering, you must first configure the cluster-id and node ID on
each cluster member as shown in the following steps:
Control port configuration: Once the chassis members have rebooted, the SRX3600
uses two designated, labeled ports as control ports.
To configure the data fabric, you must configure two fabric interfaces (one on each
chassis) as shown in the following steps. These interfaces are connected to each other
to form the fabric link.
To configure chassis clustering groups, including the host name, backup-router, and
interface addressing, follow these steps:
We will also need to define which device has priority (in JSRP, high priority is preferred)
for the control plane, as well as which device is preferred to be active for the data plane.
Remember that the control plane can be active on a different chassis than the data plane
in active/passive (there isn’t anything wrong with this from a technical standpoint, but
many administrators prefer having both the control plane and data plane active on the
same chassis member).
NOTE: Redundant Ethernet interface LAGs are configured toward the edge
firewall and core switch.
To configure redundant data interfaces on the chassis cluster, follow these steps:
1. Configure redundant Ethernet LAG interface reth0 toward core switches used as the
trust interface.
2. Configure member links for the reth0 from node 0. (Once the chassis cluster is
configured, everything can be configured from the primary node as the cluster behaves
as a single, physical chassis.)
[edit]
set interfaces xe-3/0/0 gigether-options redundant-parent reth0
set interfaces xe-3/0/1 gigether-options redundant-parent reth0
set interfaces xe-4/0/0 gigether-options redundant-parent reth0
set interfaces xe-4/0/1 gigether-options redundant-parent reth0
….
set interfaces xe-16/0/0 gigether-options redundant-parent reth0
set interfaces xe-16/0/1 gigether-options redundant-parent reth0
set interfaces xe-17/0/0 gigether-options redundant-parent reth0
set interfaces xe-17/0/1 gigether-options redundant-parent reth0
3. Configure redundant Ethernet LAG interface reth1 toward edge routers used as the
untrust interface.
[edit]
set interfaces reth1 description "Untrust Zone toward Edge-routers"
set interfaces reth1 vlan-tagging
set interfaces reth1 redundant-ether-options redundancy-group 1
set interfaces reth1 redundant-ether-options minimum-links 1
set interfaces reth1 redundant-ether-options lacp active
set interfaces reth1 redundant-ether-options lacp periodic fast
set interfaces reth1 unit 0 vlan-id 11
set interfaces reth1 unit 0 family inet address 192.168.26.3/24
set interfaces reth1 unit 0 family inet address 10.94.127.30/27
4. Configure redundant member links for reth0 (can be done from node0).
[edit]
set interfaces xe-1/0/0 gigether-options redundant-parent reth1
set interfaces xe-1/0/1 gigether-options redundant-parent reth1
set interfaces xe-2/0/0 gigether-options redundant-parent reth1
set interfaces xe-2/0/1 gigether-options redundant-parent reth1
….
set interfaces xe-14/0/0 gigether-options redundant-parent reth1
set interfaces xe-14/0/1 gigether-options redundant-parent reth1
set interfaces xe-15/0/0 gigether-options redundant-parent reth1
set interfaces xe-15/0/1 gigether-options redundant-parent reth1
[edit]
set security zones functional-zone management host-inbound-traffic system-services
ssh
set security zones functional-zone management host-inbound-traffic system-services
https
set security zones functional-zone management host-inbound-traffic protocols all
set security zones security-zone untrust address-book address TVM-Client-Subnet
10.10.0.0/16
set security zones security-zone untrust address-book address TrafficGenerator-External
10.40.0.0/16
2. Configure outbound security policy for traffic sourcing from the trust zone (reth0) to
the untrust zone (reth1).
[edit]
set security policies from-zone trust to-zone untrust policy Internet-access match
source-address any
set security policies from-zone trust to-zone untrust policy Internet-access match
destination-address any
set security policies from-zone trust to-zone untrust policy Internet-access match
application junos-http
set security policies from-zone trust to-zone untrust policy Internet-access match
application junos-https
set security policies from-zone trust to-zone untrust policy Internet-access match
application junos-http-ext
set security policies from-zone trust to-zone untrust policy Internet-access match
application junos-ntp
set security policies from-zone trust to-zone untrust policy Internet-access match
application junos-dns-udp
set security policies from-zone trust to-zone untrust policy Internet-access match
application ICMP
set security policies from-zone trust to-zone untrust policy Internet-access then permit
3. Configure inbound security policies for traffic sourcing from the untrust zone (reth1)
to the trust zone (reth0).
[edit]
set security policies from-zone untrust to-zone trust policy remote-access match
source-address any
set security policies from-zone untrust to-zone trust policy remote-access match
destination-address SA-server1
set security policies from-zone untrust to-zone trust policy remote-access match
application junos-https
set security policies from-zone untrust to-zone trust policy remote-access then permit
set security policies from-zone untrust to-zone trust policy Exchange-Access match
source-address any
set security policies from-zone untrust to-zone trust policy Exchange-Access match
destination-address Exchange-Server
set security policies from-zone untrust to-zone trust policy Exchange-Access match
application junos-imap
set security policies from-zone untrust to-zone trust policy Exchange-Access match
application junos-pop3
set security policies from-zone untrust to-zone trust policy Exchange-Access match
application junos-ms-rpc-msexchange
set security policies from-zone untrust to-zone trust policy Exchange-Access match
application junos-http
set security policies from-zone untrust to-zone trust policy Exchange-Access match
application junos-https
set security policies from-zone untrust to-zone trust policy Exchange-Access match
application junos-http-ext
set security policies from-zone untrust to-zone trust policy Exchange-Access match
application junos-ms-rpc-msexchange-directory-nsp
set security policies from-zone untrust to-zone trust policy Exchange-Access match
application junos-ms-rpc-msexchange-directory-rfr
set security policies from-zone untrust to-zone trust policy Exchange-Access match
application junos-ms-rpc-msexchange-info-store
set security policies from-zone untrust to-zone trust policy Exchange-Access match
application Exchange
set security policies from-zone untrust to-zone trust policy Exchange-Access match
application junos-smtp
set security policies from-zone untrust to-zone trust policy Exchange-Access then
permit
set security policies from-zone untrust to-zone trust policy MediaWiki-Access match
source-address any
set security policies from-zone untrust to-zone trust policy MediaWiki-Access match
destination-address MediaWiki-Server
set security policies from-zone untrust to-zone trust policy MediaWiki-Access match
application junos-http
set security policies from-zone untrust to-zone trust policy MediaWiki-Access match
application junos-https
set security policies from-zone untrust to-zone trust policy MediaWiki-Access match
application junos-http-ext
set security policies from-zone untrust to-zone trust policy MediaWiki-Access then
permit
set security policies from-zone untrust to-zone trust policy SharePoint-Access match
source-address any
set security policies from-zone untrust to-zone trust policy SharePoint-Access match
destination-address SP-Server
set security policies from-zone untrust to-zone trust policy SharePoint-Access match
application junos-http
set security policies from-zone untrust to-zone trust policy SharePoint-Access match
application junos-https
set security policies from-zone untrust to-zone trust policy SharePoint-Access match
application junos-http-ext
set security policies from-zone untrust to-zone trust policy SharePoint-Access match
application SharePoint
set security policies from-zone untrust to-zone trust policy SharePoint-Access then
permit
set security policies from-zone untrust to-zone trust policy ICMP-allow match
source-address any
set security policies from-zone untrust to-zone trust policy ICMP-allow match
destination-address any
set security policies from-zone untrust to-zone trust policy ICMP-allow match application
ICMP
set security policies from-zone untrust to-zone trust policy ICMP-allow then permit
Verification
The following verification commands (with sample output) can be used to confirm that
the SRX chassis cluster is configured properly.
Results
Cluster ID: 1
Node Priority Status Preempt Manual failover
To configure source NAT on the SRX chassis cluster, follow these steps:
[edit]
set security nat source pool public-pool address 10.94.127.1/32 to 10.94.127.10/32
[edit]
set security nat source rule-set Internet-access from zone trust
set security nat source rule-set Internet-access to zone untrust
set security nat source rule-set Internet-access rule datacenter match source-address
172.16.0.0/16
set security nat source rule-set Internet-access rule datacenter match
destination-address 0.0.0.0/0
set security nat source rule-set Internet-access rule datacenter then source-nat pool
public-pool
3. Configure proxy ARP on the outbound NAT interface (reth1, or untrust, in this example).
[edit]
set security nat proxy-arp interface reth1.0 address 10.94.127.1/32 to 10.94.127.10/32
2. Destination NAT rule set SA with rule SA-rule1 to match packets received to destination
IP address 10.94.127.33/32. For matching packets, the destination address is translated
to the address in the dst-nat-pool-1 pool.
3. Proxy ARP for the address 10.94.127.33/32 on interface reth1.0. This allows the Juniper
Networks security device to respond to ARP requests received on the interface for
that address.
4. Security policies to permit traffic from the untrust zone to the translated destination
IP address in the trust zone.
To configure destination NAT on the SRX chassis cluster, follow these steps:
[edit]
set security nat destination pool dst-nat-SA-pool1 address 10.94.63.24/32
[edit]
set security nat destination rule-set SA from zone untrust
set security nat destination rule-set SA rule SA-rule1 match destination-address
10.94.127.33/32
set security nat destination rule-set SA rule SA-rule1 then destination-nat pool
dst-nat-SA-pool1
Verification
The following verification commands (with sample output) can be used to confirm that
the NAT is configured properly.
Results
node0:
--------------------------------------------------------------------------
Total pools: 1
node1:
--------------------------------------------------------------------------
Total pools: 1
node0:
--------------------------------------------------------------------------
Total port number usage for port translation pool: 645120
Maximum port number for port translation pool: 268435456
Total pools: 1
Pool Address Routing PAT Total
Total rules: 1
Rule name Rule set From To Action
datacenter Internet-access trust untrust
public-pool
node1:
--------------------------------------------------------------------------
Total port number usage for port translation pool: 645120
Maximum port number for port translation pool: 268435456
Total pools: 1
Pool Address Routing PAT Total
Total rules: 1
Rule name Rule set From To Action
datacenter Internet-access trust untrust
public-pool
3. Verify source NAT rules, match conditions, actions, and rule order .
node0:
--------------------------------------------------------------------------
Total rules: 1
Total referenced IPv4/IPv6 ip-prefixes: 2/0
node1:
--------------------------------------------------------------------------
Total rules: 1
Total referenced IPv4/IPv6 ip-prefixes: 2/0
source NAT rule: datacenter Rule-set: Internet-access
Rule-Id : 1
Rule position : 1
From zone : trust
To zone : untrust
Match
Source addresses : 172.16.0.0 - 172.16.255.255
Destination addresses : 0.0.0.0 - 255.255.255.255
Destination port : 0 - 0
Action : public-pool
Persistent NAT type : N/A
Persistent NAT mapping type : address-port-mapping
Inactivity timeout : 0
Max session number : 0
Translation hits : 25621
{primary:node1}
The Junos OS intrusion detection and prevention (IDP) policy enables you to selectively
enforce various attack detection and prevention techniques on network traffic passing
through an IDP-enabled device. It allows you to define policy rules to match a section of
traffic based on a zone, network, and application, and then take active or passive
preventive actions on that traffic. An IDP policy defines how your device handles the
network traffic. It allows you to enforce various attack detection and prevention techniques
on traffic traversing your network.
A policy is made up of rulebases, and each rulebase is comprised of a set of rules. You
define rule parameters, such as traffic match conditions, action, and logging requirements,
and then add the rules to rule bases. After you create an IDP policy by adding rules in one
or more rulebases, you can select that policy to be the active policy on your device. Junos
OS allows you to configure multiple IDP policies, but a device can have only one active
IDP policy at a time. You can install the same IDP policy on multiple devices, or you can
install a unique IDP policy on each device in your network. A single policy can contain
only one instance of any type of rulebase.
For transit traffic to pass through IDP inspection, you configure a security policy and
enable IDP application services on all traffic that you want to inspect. Security policies
contain rules defining the types of traffic permitted on the network and how the traffic
is treated inside the network. Enabling IDP in a security policy directs traffic that matches
the specified criteria to be checked against the IDP rulebases.
NOTE: The action set in the security policy action must be permit. You cannot
enable IDP for traffic that the device denies or rejects.
To install and configure IDP on the SRX chassis cluster, follow these steps:
Install an IDP license to enable IDP signature updates. In order to download and use the
predefined attack signatures in a policy, the IDP license must be installed. If you are using
only custom signatures, you do not need an IDP license. Once your license file is purchased
and available, install the license using the Junos OS terminal.
License usage:
Licenses Licenses Licenses Expiry
Feature name used installed needed
idp-sig 1 1 0 2014-12-15
16:00:00 PST
appid-sig 0 2 0 2014-12-15
16:00:00 PST
logical-system 1 1 0 permanent
Licenses installed:
License identifier: JUNOS466166
License version: 2
Valid for device: AB0813AA0014
Features:
idp-sig - IDP Signature
date-based, 2013-12-16 16:00:00 PST - 2014-12-15 16:00:00 PST
3. Download and install the signature database. After the IDP license is installed, the
iDP signature database can be downloaded and installed by performing the following
steps:
• Make sure the device has the necessary configuration for connectivity to the Internet.
A name server must be configured.
4. Verify the version of the signature database in the Signature DB server. Look for
“successfully retrieved”. In this example, the version in the server is 2327.
node0:
--------------------------------------------------------------------------
Successfully retrieved from(https://services.netscreen.com/cgi-bin/index.cgi).
Version info:2345(Detector=12.6.140140207, Templates=2345)
{primary:node0}
node0:
--------------------------------------------------------------------------
Will be processed in async mode. Check the status using the status checking
CLI
{primary:node0}
root@vdc-edge-fw01-n0> request security idp security-package download status
node0:
--------------------------------------------------------------------------
In progress:platforms.xml.gz 100 % 250 Bytes/ 250
Bytes
{primary:node0}
root@vdc-edge-fw01-n0> request security idp security-package download status
node0:
--------------------------------------------------------------------------
Done;Successfully downloaded
from(https://services.netscreen.com/cgi-bin/index.cgi)
and synchronized to backup.
Version info:2345(Wed Feb 12 19:13:53 2014 UTC, Detector=12.6.140140207)
{primary:node0}
node0:
--------------------------------------------------------------------------
In progress:platforms.xml.gz 100 % 250 Bytes/ 250
Bytes
{primary:node0}
root@vdc-edge-fw01-n0> request security idp security-package download status
node0:
--------------------------------------------------------------------------
Done;Successfully downloaded
from(https://services.netscreen.com/cgi-bin/index.cgi)
and synchronized to backup.
Version info:2345(Wed Feb 12 19:13:53 2014 UTC, Detector=12.6.140140207)
node0:
--------------------------------------------------------------------------
In progress:Updating with new attack or detector for existing running policy...
node1:
--------------------------------------------------------------------------
Done;Attack DB update : successful - [UpdateNumber=2345,ExportDate=Wed Feb 12
19:13:53 2014 UTC,Detector=12.6.140140207]
Updating control-plane with new detector : successful
Updating data-plane with new attack or detector : successful
(The last known good detector link has been updated with the new detector)
9. Once the security policy is configured and the action is set to “permit”, enable IDP
under “application services”. This redirects traffic that matches the security policy to
the IDP service for inspection. Below is an example of traffic flowing from the trust to
the untrust Internet-access security policy.
set security policies from-zone trust to-zone untrust policy Internet-access match
source-address any
set security policies from-zone trust to-zone untrust policy Internet-access match
destination-address any
set security policies from-zone trust to-zone untrust policy Internet-access match
application junos-http
set security policies from-zone trust to-zone untrust policy Internet-access match
application junos-https
set security policies from-zone trust to-zone untrust policy Internet-access match
application junos-http-ext
set security policies from-zone trust to-zone untrust policy Internet-access match
application junos-ntp
set security policies from-zone trust to-zone untrust policy Internet-access match
application junos-dns-udp
set security policies from-zone trust to-zone untrust policy Internet-access match
application ICMP
set security policies from-zone trust to-zone untrust policy Internet-access then permit
application-services idp
10. Enable IDP for inbound traffic (flowing from the untrust security zone to the trust
security zone). Once IDP is enabled in a security policy, the IDP policy should be
activated, monitored for effectiveness, and tuned. The command used to activate
the IDP policy in this example is:
NOTE: There can be only one active IDP policy. The active IDP policy can
be applied to multiple rules.
11. The following display set configuration shows a complete policy called
HTTP-inspection on the perimeter firewall. In this example, two rules are created. The
R1 rule is from the trust security zone to the untrust security zone. The R2 rule monitors
traffic from the untrust security zone to the trust security zone. The IDP rulebase is
configured to match Web-based attacks. Finally, the policy is activated as shown in
Step 10 using the command set security idp active-policy HTTP-inspection.
set security idp idp-policy HTTP-inspection rulebase-ips rule R1 match from-zone trust
set security idp idp-policy HTTP-inspection rulebase-ips rule R1 match source-address
any
set security idp idp-policy HTTP-inspection rulebase-ips rule R1 match to-zone untrust
set security idp idp-policy HTTP-inspection rulebase-ips rule R1 match
destination-address any
set security idp idp-policy HTTP-inspection rulebase-ips rule R1 match application
default
set security idp idp-policy HTTP-inspection rulebase-ips rule R1 match attacks
predefined-attack-groups "Critical - HTTP"
set security idp idp-policy HTTP-inspection rulebase-ips rule R1 match attacks
predefined-attack-groups "Major - HTTP"
set security idp idp-policy HTTP-inspection rulebase-ips rule R1 then action
drop-connection
set security idp idp-policy HTTP-inspection rulebase-ips rule R1 then notification
log-attacks
set security idp idp-policy HTTP-inspection rulebase-ips rule R1 then severity critical
set security idp idp-policy HTTP-inspection rulebase-ips rule R2 match from-zone
untrust
set security idp idp-policy HTTP-inspection rulebase-ips rule R2 match source-address
any
set security idp idp-policy HTTP-inspection rulebase-ips rule R2 match to-zone trust
set security idp idp-policy HTTP-inspection rulebase-ips rule R2 match
destination-address any
set security idp idp-policy HTTP-inspection rulebase-ips rule R2 match application
default
set security idp idp-policy HTTP-inspection rulebase-ips rule R2 match attacks
predefined-attack-groups "Critical - HTTP"
set security idp idp-policy HTTP-inspection rulebase-ips rule R2 match attacks
predefined-attack-groups "Major - HTTP"
set security idp idp-policy HTTP-inspection rulebase-ips rule R2 then action
drop-connection
Verification
The following verification commands (with sample output) can be used to confirm that
IDP is configured and working properly.
Results
1. The show security idp status command output verifies that IDP is configured and
running.
node0:
--------------------------------------------------------------------------
State of IDP: Default, Up since: 2014-02-10 19:51:58 PST (3d 23:29 ago)
Packet Statistics:
[ICMP: 0] [TCP: 146727] [UDP: 523] [Other: 0]
Flow Statistics:
ICMP: [Current: 0] [Max: 42 @ 2014-02-13 09:14:29 PST]
TCP: [Current: 2] [Max: 48 @ 2014-02-12 12:31:09 PST]
UDP: [Current: 0] [Max: 30 @ 2014-02-12 06:00:33 PST]
Other: [Current: 0] [Max: 0 @ 2014-02-10 19:51:58 PST]
Session Statistics:
[ICMP: 0] [TCP: 1] [UDP: 0] [Other: 0]
node1:
--------------------------------------------------------------------------
State of IDP: Default, Up since: 2014-02-11 10:40:32 PST (3d 08:40 ago)
Packet Statistics:
[ICMP: 0] [TCP: 0] [UDP: 0] [Other: 0]
Flow Statistics:
Session Statistics:
[ICMP: 0] [TCP: 0] [UDP: 0] [Other: 0]
2. Verify that the IDP attack table is configured and running on the primary node.
node0:
--------------------------------------------------------------------------
node1:
--------------------------------------------------------------------------
{primary:node1}
3. Verify that the IDP application statistics are incrementing based on the configured
IDP rule set. (output is truncated to show the relevant packet counter on node1.)
node0:
--------------------------------------------------------------------------
IDP applications:
node1:
--------------------------------------------------------------------------
IDP applications:
Host Security
Overview
Juniper Networks Firefly Host is a virtualized firewall that runs on VMware ESX/ESXi for
to secure intra-virtual machine (VM) and inter-VM traffic. Juniper Firefly Host has three
main components:
• Firefly Host kernel module—Virtualized network traffic is secured and analyzed against
the security policy for all VM on the ESX/ESXi host in the Firefly Host kernel module
installed on the host. All connections are processed and firewall security policy is
enforced in the Firefly Host Series kernel module.
Firefly Host protects the VM as well as the hypervisor. When it is deployed into the VMware
environment, Firefly Host Security VM is installed on VMware ESX/ESXi host
(Figure 50 on page 178), it inserts the Firefly Host kernel module into the host’s hypervisor
between the virtual network interface card (NIC) and virtual switch (vSwitch) or
distributed virtual switch (DvSW).
Firefly Host supports vMotion, enabling mobility of both the VM and the Firefly Host. In
cases where a VM is moved to a different virtual machine, the security policy assigned
to that VM moves along with the virtual machine. Because Firefly Host is supported by
vMotion, this VM mobility does not require any additional configuration.
Firefly Host Security Design VM also manages the entire Security Virtual Machine (SVM),
defining security policies, configuring antivirus, IDS, and so on. To secure ESX/ESXi hosts
and VMs, we need to deploy SVM on the ESX/ESXi hosts first. As soon as you have
deployed SVM on each ESX/ESXi host, it will be secured and insert the Firefly Host kernel
on the ESX/ESXi hypervisor.
Firefly Host Security Design VM can be managed through a Web GUI that enables you
to define firewall security policy for all the VMs, similar to how you configure a physical
SRX firewall. Traffic can be controlled between two VMs running on one ESX/ESXi host,
and multiple VMs running on multiple ESX/ESXi hosts.
Firefly Host Security Design VM pushes the firewall security policy to the SVM kernel
module. When traffic enters through a physical network adapter on an ESX/ESXi host,
it travels to the virtual switch or distributed switch first, then visits the Firefly Host kernel
module before being forwarded to the appropriate VM. As the security policy resides in
the kernel module and is based on the security policies, traffic is allowed or denied to or
from the VM.
When you install SVM on the hosts, all the VMs are unsecured by default. Before defining
security policies, you must secure the VM environment.
Step-by-Step Procedure
1. The first step in configuration is to log in to the Firefly Host to select the VMs that
should be secured. The example below contains several ESXi hosts under Unsecured
Network and Secured Network. On the left side (under Unsecured Network),
Win2012-Exch02 VM is not secured. On the right side (under Secured Network),
Win2012-Exch06 VM is secured. To secure or unsecure VM, you need to select or
deselect the check box in front of the VM and click on Secure or Unsecure in the
Settings tab. You also need to secure the port group when securing a VM (this is done
similarly by selecting Secure in the Settings tab for a dvPort Group).
2. Configure a group for one set of applications. The example below shows an application
name (MediaWiki) that represents a single group. Additional application groups can
be created using Add Smart Group under Security Settings, Group tab in Firefly Host.
Define vi.notes which contains the keyword MediaWiki in the Firefly Host. By doing
this, it will detect all VMs that have the keyword MediaWiki in an annotation of VM.
Before defining security policies, it is a good idea to survey the existing VM environment
to obtain a list of the applications hosted in the data center. Creating Smart Groups
initially will save time during security policy configuration.
3. Once groups are defined in Firefly Host, an additional step is required on the vCenter
Server. At the MediaWiki VM summary tab under vCenter Server, add the same keyword
you used in vi notes in the Annotations field in Step 2. This is required to enable the
Firefly Host to properly detect the virtual machine. In the below example, the MediaWiki
Group in the Firefly Host will detect all VMs that are properly annotated with the tag
MediaWiki.
Figure 53: The Annotation Allows Firefly Host to Detect Related VMs
4. Next, define security policies in the Firewall area of the Firefly Host. Also define an
initial, Global rule under Global Policy in Policy Group. This rule creation applies to all
VMs in the environment, enabling security even if an application group isn’t properly
created. To create specific rules, navigate to Policy Groups in the left pane. You will
notice that the policy groups contain both Inbound and Outbound rules. Inbound rule
means traffic is coming into the VM and Outbound rule means traffic is originating
from the VM. Below is an example rule that allows HTTP, HTTPS, and ICMP inbound
to the MediaWiki application VM.
Verification
Many network administrators are required to monitor security status in the data center.
The administrators must be able to see details on allowed traffic, as well as blocked or
anomalous traffic. This information is found in the Logs window. Logging can be enabled
or disabled on a per policy basis. You can also enable logging for all the policies. Please
keep in mind that enabling logging for all policies can have an effect on CPU utilization
and can introduce network congestion or packet drops. Because of this, we do not
recommend enabling logging for all policies.
To see policy logs, you need to enable logging per policy. Once logging is enabled, the
Firefly Host can filter by source IP address, destination IP address, or protocol. This filtering
is performed in the advanced view of the logging screen.
For more information on Firefly Host configuration, troubleshooting, and best practices,
see the Firefly Host Administration Guide at:
Juniper Networks Firefly Host - Installation and Administration Guide for VMware
This section covers the configuration of services in the virtualized IT data center. The
areas covered in this chapter include:
• Compute
• Compute overview
• Hardware overview
• Compute configuration
• Configure management
• Configure switching
• Virtualization
• Virtualization overview
• Configure virtualization
• Load balancing
• Overview
• Configuration
• Applications
• Microsoft Exchange
Overview
The Juniper MetaFabric 1.0 solution is designed around optimizing network, security,
virtualization, mobility, and visibility in the data center environment. To that end, all of
the data center network, security, and resiliency features are designed to support the
hosting of applications in such a way that provides the highest quality user experience.
This section covers the configuration of the physical compute hosts that reside in the
data center.
Requirements
• Solution must provide redundant network connectivity using all available bandwidth.
• Solution must support virtual network identification using Link Layer Discovery Protocol
(LLDP).
• Solution must provide physical and virtual visibility and reporting of VM movements.
The IBM Flex Chassis was selected as the physical compute host. High-level
implementation details include:
• IBM Flex server is configured with multiple ESXi hosts hosting all the VMs running the
business-critical applications (SharePoint, Exchange, Media-wiki, and WWW).
Topology
The topology used in the data center compute, virtualization, and storage design is shown
in Figure 55 on page 186.
To configure the IBM System x3750 M4 in the OOB role, follow these steps:
1. Configure two LAG (ae11 and ae12) interfaces for each IBM system on the management
switch.
[edit]
set interfaces ge-1/0/44 ether-options 802.3ad ae11
set interfaces ge-1/0/45 ether-options 802.3ad ae11
set interfaces ge-1/0/46 ether-options 802.3ad ae12
set interfaces ge-1/0/47 ether-options 802.3ad ae12
set interfaces ae11 description "connection to POD1 Standalone server"
set interfaces ae11 aggregated-ether-options minimum-links 1
set interfaces ae11 unit 0 family ethernet-switching vlan members Compute-VLAN
set interfaces ae12 description "connection to POD2 standalone server"
set interfaces ae12 aggregated-ether-options minimum-links 1
set interfaces ae12 unit 0 family ethernet-switching vlan members Compute-VLAN
set vlans Compute-VLAN vlan-id 800
2. Configure LAG on the IBM system. This configuration step is performed as part of the
virtualization configuration section.
NOTE: Each server has four 10-Gigabit Ethernet NIC ports connected to
the QFX3000-M QFabric system as a data port for all VM traffic. Each
system is connected to each POD for redundancy purposes. The IBM
System 3750 is connected to POD1 using 4 x 10-Gigabit Ethernet. A second
IBM System 3750 connects to POD2 using 4 x 10-Gigabit Ethernet. The
use of LAG provides switching redundancy in case of a POD failure.
3. Configure POD1 to connect to the IBM System 3750 server. Four ports of data traffic
are configured as a LAG and carry several VLANs that are required for the Infra Cluster.
[edit]
set interfaces interface-range POD1-Standalone-server member n2:xe-0/0/8
set interfaces interface-range POD1-Standalone-server member n3:xe-0/0/8
set interfaces interface-range POD1-Standalone-server member n3:xe-0/0/9
set interfaces interface-range POD1-Standalone-server member n2:xe-0/0/9
set interfaces interface-range POD1-Standalone-server ether-options 802.3ad
RSNG2:ae0
set interfaces RSNG2:ae0 description POD1-Standalone-server
set interfaces RSNG2:ae0 unit 0 family ethernet-switching port-mode trunk
[edit]
set interfaces interface-range IBM-Standalone member "n3:xe-0/0/[26-27]"
set interfaces interface-range IBM-Standalone member "n5:xe-0/0/[26-27]"
set interfaces interface-range IBM-Standalone ether-options 802.3ad RSNG3:ae1
set interfaces RSNG3:ae1 description POD2-IBM-Standalone
set interfaces RSNG3:ae1 unit 0 family ethernet-switching port-mode trunk
set interfaces RSNG3:ae1 unit 0 family ethernet-switching vlan members MGMT
set interfaces RSNG3:ae1 unit 0 family ethernet-switching vlan members Storage-POD2
set interfaces RSNG3:ae1 unit 0 family ethernet-switching vlan members Infra
set interfaces RSNG3:ae1 unit 0 family ethernet-switching vlan members SQL
set interfaces RSNG3:ae1 unit 0 family ethernet-switching vlan members SharePoint
set interfaces RSNG3:ae1 unit 0 family ethernet-switching vlan members
Exchange-cluster
set interfaces RSNG3:ae1 unit 0 family ethernet-switching vlan members Exchange
set interfaces RSNG3:ae1 unit 0 family ethernet-switching vlan members Wikimedia
set interfaces RSNG3:ae1 unit 0 family ethernet-switching vlan members Tera-VM
set interfaces RSNG3:ae1 unit 0 family ethernet-switching vlan members Security-Mgmt
set interfaces RSNG3:ae1 unit 0 family ethernet-switching vlan members Vmotion
set interfaces RSNG3:ae1 unit 0 family ethernet-switching vlan members VM-FT
set interfaces RSNG3:ae1 unit 0 family ethernet-switching vlan members Remote-Access
The MetaFabric 1.0 solution utilizes a second set of compute hardware as well. The IBM
Flex System Enterprise Chassis is a 10U next-generation server platform that features
integrated chassis management. It is a compact, high-density, high-performance, and
scalable rack-mount system. It supports up to 14 one-bay compute nodes that share
common resources, such as power, cooling, management, and I/O resources within a
single Enterprise chassis. The IBM Flex System can also support up to seven 2-bay
compute nodes or three 4-bay compute nodes when the shelves are removed. You can
mix and match 1-bay, 2-bay, and 4-bay compute nodes to meet specific hardware needs.
• Fourteen 1-bay compute node bays (can also support seven 2-bay or three 4-bay
compute nodes with the shelves removed).
• Six 2500W power modules that provide N+N or N+1 redundant power. Optionally, the
chassis can be ordered through the configure-to-order (CTO) process with six 2100W
power supplies for N+1 redundant power.
• A wide variety of networking solutions that include Ethernet, Fibre Channel, FCoE, and
• InfiniBand.
• Two IBM Chassis Management Module (CMMs). The CMM provides single-chassis
management support.
The following components can be installed into the rear of the chassis
(Figure 58 on page 190):
• Up to two CMMs.
• Up to six fan modules that consist of four 80-mm fan modules and two 40-mm fan
modules.
The IBM Flex System includes a Chassis Management Module (CMM). The CMM provides
a single point of chassis management as well as the network path for remote keyboard,
video, and mouse (KVM) capability for compute nodes within the chassis. The IBM Flex
System chassis can accommodate one or two CMMs. The first is installed into CMM Bay
1, the second into CMM bay 2. Installing two CMMs provides control redundancy for the
IBM Flex System.
• Power control
• Fan management
• Switch management
• Diagnostics
• Network management
• USB connection: Can be used for insertion of a USB media key for tasks such as firmware
updates.
The switch offers full Layer 2/3 switching and FCoE Full Fabric and Fibre Channel NPV
Gateway operations to deliver a converged and integrated solution, and it is installed
within the I/O module bays of the IBM Flex System Enterprise Chassis. The switch can
help you migrate to a 10-Gb or 40-Gb converged Ethernet infrastructure and offers
virtualization features.
The CN4093 switch is initially licensed for fourteen 10-GbE internal ports, two external
10-GbE SFP+ ports, and six external Omni Ports enabled.
• 00D5823 is the part number for the physical device, which comes with 14 internal
10-GbE ports enabled (one to each node bay), two external 10-GbE SFP+ ports that
are enabled to connect to a top-of-rack switch or other devices, and six Omni Ports
enabled to connect to either Ethernet or Fibre Channel networking infrastructure,
depending on the SFP+ cable or transceiver used.
• 00D5845 (Upgrade 1) can be applied on the base switch when you need more uplink
bandwidth with two 40-GbE QSFP+ ports that can be converted into 4x 10-GbE SFP+
DAC links with the optional break-out cables. This upgrade also enables 14 more internal
ports, for a total of 28 ports, to provide more bandwidth to the compute nodes using
4-port expansion cards.
• 00D5847 (Upgrade 2) can be applied on the base switch when you need more external
Omni Ports on the switch or if you want more internal bandwidth to the node bays.
The upgrade enables the remaining 6 external Omni Ports, plus 14 more internal 10-Gb
ports, for a total of 28 internal ports, to provide more bandwidth to the compute nodes
using four-port expansion cards.
• Fourteen more internal ports and two external 40 GbE QSFP+ uplink ports with Upgrade
1
• Fourteen more internal ports and six more external Omni Ports with the Upgrade 2
license options.
• Upgrade 1 and Upgrade 2 can be applied on the switch independently from each other
or in combination for full feature capability.
The CNA module has a management and console port. There are two different
command-line interface (CLI) modes on IBM/BNT network devices: IBMNOS mode and
ISCLI (Industry Standard CLI) mode. The first time you start the CN4093, it boots into
the IBM Networking OS CLI. To access the ISCLI, enter the following command and reset
the CN4093.
The switch retains your CLI selection, even when you reset the configuration to factory
defaults. The CLI boot mode is not part of the configuration settings. If you downgrade
the switch software to an earlier release, it will boot into menu-based CLI. However, the
switch retains the CLI boot mode, and will restore your CLI choice.
The second modular switching option deployed as part of the solution is the IBM Flex
System EN4091 10 Gb Ethernet Pass-thru Module (Figure 60 on page 192). The EN4091
10-Gb Ethernet Pass-thru Module offers a one-for-one connection between a single node
bay and an I/O module uplink. It has no management interface, and can support both
1-Gbps and 10-Gbps dual-port adapters that are installed in the compute nodes. If
quad-port adapters are installed in the compute nodes, only the first two ports have
access to the Pass-thru module ports.
The necessary 1-GbE or 10-GbE module (SFP, SFP+, or DAC) must also be installed in
the external ports of the pass-thru module. This configuration supports the speed (1 Gb
or 10 Gb) and medium (fiber-optic or copper) for adapter ports on the compute nodes.
Figure 60: IBM Flex System EN4091 10Gb Ethernet Pass-thru Module
The EN4091 10Gb Ethernet Pass-thru Module has the following specifications:
• Internal ports - 14 internal full-duplex Ethernet ports that can operate at 1-Gb or 10-Gb
speeds.
• External ports - 14 ports for 1-Gb or 10-Gb Ethernet SFP+ transceivers (support for
1000BASE-SX, 1000BASE-LX, 1000BASE-T, 10GBASE-SR, or 10GBASE-LR) or SFP+
DAC.
• Unmanaged device that has no internal Ethernet management port. However, it is able
to provide its VPD to the secure management network in the Chassis Management
Module.
• Allows direct connection from the 10-Gb Ethernet adapters that are installed in compute
nodes in a chassis to an externally located top-of-rack switch or other external device.
NOTE: The EN4091 10-Gb Ethernet Pass-thru Module has only 14 internal
ports. As a result, only two ports on each compute node are enabled, one for
each of the two modules that are installed in the chassis. If four-port adapters
are installed in the compute nodes, ports 3 and 4 on those adapters are not
enabled.
• Half-width node: Occupies one chassis bay, half the width of the chassis (approximately
215 mm or 8.5 in.). An example is the IBM Flex System x220 Compute Node.
• Full-width node: Occupies two chassis bays side-by-side, the full width of the chassis
(approximately 435 mm or 17 in.). An example is the IBM Flex System p460 Compute
Node.
The solution lab utilized the IBM Flex System x220 compute node (Figure 61 on page 194).
The IBM Flex System x220 Compute Node, machine type 7906, is the next generation
cost-optimized compute node that is designed for less demanding workloads and
low-density virtualization. The x220 is efficient and equipped with flexible configuration
options and advanced management to run a broad range of workloads. The IBM Flex
System x220 Compute Node is a high-availability, scalable compute node that is
optimized to support the next-generation microprocessor technology. With a balance
of cost and system features, the x220 is an ideal platform for general business workloads.
The x220 is a half-wide compute node and requires that the chassis shelf is installed in
the IBM Flex System Enterprise Chassis. The IBM Flex System x220 Compute Node
features the Intel Xeon E5-2400 series processors. The Xeon E5-2400 series processor
has models with either 4, 6, or 8 cores per processor with up to 16 threads per socket.
The x220 supports LP DDR3 memory LRDIMMs, RDIMMs, and UDIMMs. The x220 server
has two 2.5-inch hot-swap drive bays accessible from the front of the blade server. On
standard models, the two 2.5-inch drive bays are connected to a ServeRAID C105 onboard
SATA controller with software RAID capabilities.
The applications that are installed on the compute nodes can run natively on a dedicated
physical server or they can be virtualized (in a virtual machine that is managed by a
hypervisor layer). All the compute nodes are using the VMware ESXi 5.1 operating system
as a baseline for virtualization, and all the enterprise applications are running as virtual
machines on top of the ESXi 5.1 Operating System.
This solution implementation utilizes two compute PODs. Two IBM Pure Flex Systems
are connected to the QFX3000-M QFabric system POD1, and two Flex Systems are
connected to POD2 (also utilizing QFX3000-M).
The POD1 and POD2 topologies use similar hardware to run the virtual servers.
IBM Pure Flex Pass-thru chassis has four 10-Gb Ethernet I/O Cards. Each I/O card has 14
10-Gb Ethernet network ports for each compute node. That means each compute node
will have four network adapters on the physical connection. Each module has 14 external
network ports which are internally linked with 14 Compute nodes through the back plane.
The 14 Compute nodes have connectivity to all 4 I/O modules, and each compute node
has 4 network ports. All four network ports are connected to different node of RSNG of
the QFX3000-M QFabric system, which gives full redundancy. Also LAG is configured
between the servers and access switches. This configuration ensures utilization of all
the links while providing full redundancy.
The next section shows a sample configuration for connection from the QFX3000-M
(POD1) and the QFX3000-M (POD2) to the two pass-thru chassis compute nodes.
1. Configure POD 1 (QFabric QFX3000-M). Please note that you must use an MTU setting
of 9192. Enabling jumbo frames in the data center generally enables better
performance.
[edit]
set interfaces interface-range IBM-FLEX-2-CN1-passthrough member
"n2:xe-0/0/[30-31]"
set interfaces interface-range IBM-FLEX-2-CN1-passthrough member
"n3:xe-0/0/[30-31]"
set interfaces interface-range IBM-FLEX-2-CN1-passthrough ether-options 802.3ad
RSNG2:ae1
set interfaces interface-range IBM-FLEX-2-CN2-passthrough member
"n3:xe-0/0/[32-33]"
set interfaces interface-range IBM-FLEX-2-CN2-passthrough member
"n2:xe-0/0/[32-33]"
set interfaces interface-range IBM-FLEX-2-CN2-passthrough ether-options 802.3ad
RSNG2:ae2
set interfaces RSNG2:ae1 description "IBM Flex-2 Passthrough-CN1"
set interfaces RSNG2:ae1 unit 0 family ethernet-switching port-mode trunk
set interfaces RSNG2:ae1 unit 0 family ethernet-switching vlan members MGMT
set interfaces RSNG2:ae1 unit 0 family ethernet-switching vlan members Infra
set interfaces RSNG2:ae1 unit 0 family ethernet-switching vlan members Exchange
set interfaces RSNG2:ae1 unit 0 family ethernet-switching vlan members Wikimedia
set interfaces RSNG2:ae1 unit 0 family ethernet-switching vlan members SQL
set interfaces RSNG2:ae1 unit 0 family ethernet-switching vlan members Storage-POD1
set interfaces RSNG2:ae1 unit 0 family ethernet-switching vlan members
Exchange-Cluster
set interfaces RSNG2:ae1 unit 0 family ethernet-switching vlan members SharePoint
set interfaces RSNG2:ae1 unit 0 family ethernet-switching vlan members Security-Mgmt
set interfaces RSNG2:ae1 unit 0 family ethernet-switching vlan members Vmotion
set interfaces RSNG2:ae1 unit 0 family ethernet-switching vlan members VM-FT
set interfaces RSNG2:ae1 unit 0 family ethernet-switching vlan members Remote-Access
set interfaces RSNG2:ae2 description "IBM Flex-2 Passthrough-CN2"
set interfaces RSNG2:ae2 unit 0 family ethernet-switching port-mode trunk
set interfaces RSNG2:ae2 unit 0 family ethernet-switching vlan members MGMT
set interfaces RSNG2:ae2 unit 0 family ethernet-switching vlan members Infra
set interfaces RSNG2:ae2 unit 0 family ethernet-switching vlan members Exchange
set interfaces RSNG2:ae2 unit 0 family ethernet-switching vlan members Wikimedia
set interfaces RSNG2:ae2 unit 0 family ethernet-switching vlan members SQL
set interfaces RSNG2:ae2 unit 0 family ethernet-switching vlan members Storage-POD1
set interfaces RSNG2:ae2 unit 0 family ethernet-switching vlan members
Exchange-Cluster
set interfaces RSNG2:ae2 unit 0 family ethernet-switching vlan members SharePoint
set interfaces RSNG2:ae2 unit 0 family ethernet-switching vlan members Security-Mgmt
set interfaces RSNG2:ae2 unit 0 family ethernet-switching vlan members Vmotion
set interfaces RSNG2:ae2 unit 0 family ethernet-switching vlan members VM-FT
set interfaces RSNG2:ae2 unit 0 family ethernet-switching vlan members Remote-Access
set vlans Exchange vlan-id 104 set vlans Exchange l3-interface vlan.104
set vlans Exchange-Cluster vlan-id 109 set vlans Infra vlan-id 101
set vlans MGMT vlan-id 800 set vlans Remote-Access vlan-id 810
set vlans SQL vlan-id 105 set vlans Security-Mgmt vlan-id 801
set vlans SharePoint vlan-id 102 set vlans Storage-POD1 vlan-id 108
set vlans Storage-POD1 l3-interface vlan.108 set vlans VM-FT vlan-id 107
set vlans Vmotion vlan-id 106 set vlans Wikimedia vlan-id 103
set vlans Wikimedia l3-interface vlan.103
2. Configure POD 2 (QFabric QFX3000-M). Please note the use of an MTU setting of
9192. Enabling Jumbo frames in the data center generally enables better performance.
set groups Jumbo-MTU interfaces <*ae*> mtu 9192 set interfaces interface-range
IBM-FLEX-2-CN1-passthrough member "n3:xe-0/0/[34-35]"
set interfaces interface-range IBM-FLEX-2-CN1-passthrough member
"n5:xe-0/0/[34-35]"
set interfaces interface-range IBM-FLEX-2-CN1-passthrough ether-options 802.3ad
RSNG3:ae0
set interfaces interface-range IBM-FLEX-2-CN2-passthrough member
"n1:xe-0/0/[38-39]"
set interfaces interface-range IBM-FLEX-2-CN2-passthrough member
"n2:xe-0/0/[38-39]"
set interfaces interface-range IBM-FLEX-2-CN2-passthrough ether-options 802.3ad
RSNG2:ae1
The MetaFabric 1.0 solution also utilizes the 40-Gb Ethernet CNA I/O module (in POD1).
A short overview of the operation and configuration of this module is required.
Figure 63 on page 198 shows the POD1 network topology utilizing the IBM Pure Flex System
Chassis with the 40-Gb Ethernet CNA I/O module.
Figure 63: POD1 Topology with the IBM Pure Flex Chassis + 40Gbps CNA
Module
Figure 63 on page 198 is an example of the IBM Pure Flex System Compute Node 1. All
compute nodes in an IBM Pure Flex System utilizing the 10-Gb CNA or 40-Gb CNA modules
will have a similar physical look and connectivity. Actually, I/O Module Switch is integrated
into the IBM Pure Flex System. By looking at the IBM Pure Flex System, you will only see
EXT ports physically. INT ports are not visible behind the chassis. INT ports connect to
backplane of the CNA Fabric switch I/O Module. EXT ports are connected to external
switches to the QFX3000-M QFabric system. Ethernet LAG is also configured between
the QFX POD1 and the compute nodes in POD1. EXT Ports (3 and 7) from each I/O port
are connected to Node (6 and 7) of the QFX3000-M QFabric system. We have created
RSNG4:ae0 LAG between the I/O Module Switch and the QFX3000-M QFabric system.
Without a license, the CNA module has one network port (INT A port) which is internally
linked with an external port (EXT) through the chassis backplane. As shown in the
example, Compute Node 1 will see only one network port (INTA). INTA port will be visible
only to VMware ESXi which is running on the compute node. EXT ports are connected to
external switches where physical cables connect to another layer of switches. After you
install the advanced license for the 40-Gb CNA Fabric Switch I/O Module, an additional
internal port is activated. After installing the license, you will see two ports in each I/O
Module for the compute node.
For instance, Compute Node 1 has two ports (INTA1 and INTB1) on each 40-Gb CNA
Fabric Switch I/O Module. As we have two CNA Fabric Switch I/O Modules, Compute
Node 1 will have four internal network ports. The other 40-Gb CNA Fabric Switch I/O
Module will have the same naming convention (such as INTA1 and INTB1 which is on the
different CNA Fabric Switch I/O Module 1 or I/O Module 2). Once an expanded license is
installed, external ports EXT3, EXT4, EXT5, and EXT6 become a single EXT3 40-Gb port;
and EXT7, EXT8, EXT9, and EXT10 become a single EXT7 40-Gb port on each CNA Fabric
Switch I/O module.
NOTE: Simply creating RSNG between the CNA Fabric Switch and
QFX3000-M QFabric system is not an effective configuration. This
configuration can cause intermittent packet loss because both I/O modules
from the CNA Fabric switch work independently. Intermittent packet loss will
happen if you configure LAG/RSNG only on the QFX3000-M QFabric system
switch. To resolve this issue, LAG must also be configured on the CNA Fabric
Switch I/O module.
Configuration of LAG on CNA Fabric Switches is covered below. In this solution example,
we have cross-connected the EXT1 and EXT2 ports of CNA Fabric Switch I/O Module 1
and 2. This is referred to as an ISL on CNA Fabric Switches. This is the major configuration
required to work LAG efficiently on the internal and external side. LAG is configured on
INT ports and EXT ports. This LAG is configured using the LACP protocol as a trunk port
and carrying multiple VLAN application traffic.
1. Configure the Fabric Switch I/O Module1 on the IBM Pure Flex System 40-Gbps CNA.
exit
!
interface port INTA5
tagging
exit
!
interface port INTB1
tagging
exit
!
interface port INTB2
tagging
exit
!
interface port INTB3
tagging
exit
!
interface port INTB4
tagging
exit
!
interface port INTB5
tagging
exit
!
interface port EXT1
tagging
exit
!
interface port EXT2
tagging
exit
!
interface port EXT3
tagging
exit
!
interface port EXT7
tagging
exit
!
interface port INTA1
lacp mode active
lacp key 1001
!
interface port INTA2
lacp mode active
lacp key 1002
!
interface port INTA3
lacp mode active
lacp key 1003
!
interface port INTA4
lacp mode active
vlan 103
enable
name "WM"
member INTA1-INTA5,INTB1-INTB5,EXT1-EXT3,EXT7
!
vlan 104
enable
name "EXCHANGE"
member INTA1-INTA5,INTB1-INTB5,EXT1-EXT3,EXT7
!
vlan 105
enable
name "SQL"
member INTA1-INTA5,INTB1-INTB5,EXT1-EXT3,EXT7
!
vlan 106
enable
name " Vmotion"
member INTA1-INTA5,INTB1-INTB5,EXT1-EXT3,EXT7
!
vlan 107
enable
name "VM-FT"
member INTA1-INTA5,INTB1-INTB5,EXT1-EXT3,EXT7
!
vlan 108
enable
name "Storage-iSCSI"
member INTA1-INTA5,INTB1-INTB5,EXT1-EXT3,EXT7
!
vlan 109
enable
name "Exchange DAG"
member INTA1-INTA5,INTB1-INTB5,EXT1-EXT3,EXT7
!
vlan 800
enable
name "VDC Mgmt"
member INTA1-INTA5,INTB1-INTB5,EXT1-EXT3,EXT7
!
vlan 801
enable
name "Security-MGMT"
member INTA1-INTA5,INTB1-INTB5,EXT1-EXT3,EXT7
!
vlan 4094
enable
name "VLAN 4094"
member EXT1-EXT2
!
!
vlag enable
vlag tier-id 10
vlag isl vlan 4094
vlag isl adminkey 200
vlag adminkey 1000 enable
2. Configure QFX3000-M QFabric System connectivity to the IBM Pure Flex System
40Gb CNA I/O Module.
[edit]
set chassis node-group RSNG4 node-device n6 pic 1 xle port-range 4 15
set chassis node-group RSNG4 node-device n7 pic 1 xle port-range 4 15
set chassis node-group RSNG4 aggregated-devices ethernet device-count 10
set interfaces interface-range IBM-FLEX-1-40G-IO-1-2-VLAG member n6:xle-0/1/6
set interfaces interface-range IBM-FLEX-1-40G-IO-1-2-VLAG member n7:xle-0/1/6
set interfaces interface-range IBM-FLEX-1-40G-IO-1-2-VLAG member n6:xle-0/1/8
set interfaces interface-range IBM-FLEX-1-40G-IO-1-2-VLAG member n7:xle-0/1/8
set interfaces interface-range IBM-FLEX-1-40G-IO-1-2-VLAG ether-options 802.3ad
RSNG4:ae0
set interfaces RSNG4:ae0 description "40G CNA to IBM-FLEX-1-IO-1"
set interfaces RSNG4:ae0 mtu 9192
set interfaces RSNG4:ae0 aggregated-ether-options lacp active
set interfaces RSNG4:ae0 unit 0 family ethernet-switching port-mode trunk
set interfaces RSNG4:ae0 unit 0 family ethernet-switching vlan members MGMT
set interfaces RSNG4:ae0 unit 0 family ethernet-switching vlan members Infra
set interfaces RSNG4:ae0 unit 0 family ethernet-switching vlan members Exchange
set interfaces RSNG4:ae0 unit 0 family ethernet-switching vlan members Wikimedia
set interfaces RSNG4:ae0 unit 0 family ethernet-switching vlan members SQL
set interfaces RSNG4:ae0 unit 0 family ethernet-switching vlan members Storage-POD1
set interfaces RSNG4:ae0 unit 0 family ethernet-switching vlan members
Exchange-Cluster
set interfaces RSNG4:ae0 unit 0 family ethernet-switching vlan members SharePoint
set interfaces RSNG4:ae0 unit 0 family ethernet-switching vlan members Security-Mgmt
set interfaces RSNG4:ae0 unit 0 family ethernet-switching vlan members Vmotion
set interfaces RSNG4:ae0 unit 0 family ethernet-switching vlan members VM-FT
set interfaces RSNG4:ae0 unit 0 family ethernet-switching vlan members
Remote-Access
NOTE: In this configuration, two node devices (N6 and N7) are part of
Node group RSNG4. Four XLE ports are configured in a LAG as RSNG4:ae0
with the LACP active protocol. RSNG4:ae0 is configured as a trunk carrying
multiple VLANs.
The MetaFabric 1.0 solution also utilizes the IBM Pure Flex System Chassis with the 10-Gb
Ethernet CNA I/O Module (in POD2). A short overview of the operation and configuration
of this module is required. Figure 64 on page 204 shows the POD2 network topology utilizing
the IBM Pure Flex System Chassis with the 10-Gb Ethernet CNA I/O Module.
Figure 64: POD 2 Topology Using the IBM Pure Flex System Chassis with
the 10-Gbps CNA I/O Module
EXT Ports 1 and 2 of I/O Modules 1 and 2 are connected to each other, respectively. This
creates an interswitch link (ISL) between the two I/O modules. The ISL creation enables
both I/O modules to act as a single switch. EXT ports 11, 12, and 16 are connected to the
QFX3000-M QFabric PODs. POD2 also has an RSNG node group that is connected to
servers. Figure 64 on page 204 shows an example of three RSNG node groups in a
QFX3000-M QFabric system connected to an IBM Pure Flex System chassis. The diagram
above features the IBM Pure Flex System chassis with a 10-Gb CNA I/O Module. This
configuration was only used for Compute Node 1. Configuration details for the compute
node connected to POD2 are below.
1. Configure the CNA Fabric Switch I/O module on the IBM Pure Flex System 10-Gb CNA
I/O Module.
tagging
exit
!
interface port INTA4
tagging
exit
!
interface port INTA5
tagging
exit
!
interface port EXT1
tagging
pvid 4094
exit
!
interface port EXT2
tagging
pvid 4094
exit
!
interface port EXT11
tagging
exit
!
interface port EXT12
tagging
exit
!
interface port EXT13
tagging
exit
!
interface port EXT14
tagging
exit
!
interface port EXT15
tagging
exit
!
interface port EXT16
tagging
exit
!
vlan 101
enable
name "Infra"
member INTA1-INTA5,EXT1-EXT2,EXT11-EXT16
!
vlan 102
enable
name "SharePoint"
member INTA1-INTA5,EXT1-EXT2,EXT11-EXT16
!
vlan 103
enable
name "WikiMedia"
member INTA1-INTA5,EXT1-EXT2,EXT11-EXT16
!
vlan 104
enable
name "Exchange"
member INTA1-INTA5,EXT1-EXT2,EXT11-EXT16
!
vlan 105
enable
name "SQL"
member INTA1-INTA5,EXT1-EXT2,EXT11-EXT16
!
vlan 106
enable
name " Vmotion"
member INTA1-INTA5,EXT1-EXT2,EXT11-EXT16
!
vlan 107
enable
name "FT"
member INTA1-INTA5,EXT1-EXT2,EXT11-EXT16
!
vlan 108
enable
name "Storage-iSCSI"
member INTA1-INTA5,EXT1-EXT2,EXT11-EXT16
!
vlan 800
enable
name "VDC Mgmt"
member INTA1-INTA5,EXT1-EXT2,EXT11-EXT16
!
vlan 801
enable
name "Security-Mgmt"
member INTA1-INTA5,EXT1-EXT2,EXT11-EXT16
!
vlan 4094
enable
name "VLAN 4094"
member EXT1-EXT2
!
interface port INTA1
lacp mode active
lacp key 1001
!
interface port INTA2
lacp mode active
lacp key 1002
!
interface port INTA3
lacp mode active
lacp key 1003
!
2. Configure QFX3000-M QFabric System connectivity to the IBM Pure Flex System
10-Gb CNA I/O Module.
[edit]
set interfaces interface-range IBM-FLEX-1-10G-CNA-IO-1-2-VLAG member
"n1:xe-0/0/[24-27]"
set interfaces interface-range IBM-FLEX-1-10G-CNA-IO-1-2-VLAG member
"n2:xe-0/0/[30-31]"
set interfaces interface-range IBM-FLEX-1-10G-CNA-IO-1-2-VLAG ether-options 802.3ad
RSNG2:ae0
set interfaces RSNG2:ae0 description IBM-FLEX-1-10G-CNA
set interfaces RSNG2:ae0 unit 0 family ethernet-switching port-mode trunk
set interfaces RSNG2:ae0 unit 0 family ethernet-switching vlan members MGMT
set interfaces RSNG2:ae0 unit 0 family ethernet-switching vlan members Storage-POD2
set interfaces RSNG2:ae0 unit 0 family ethernet-switching vlan members Infra
set interfaces RSNG2:ae0 unit 0 family ethernet-switching vlan members SQL
set interfaces RSNG2:ae0 unit 0 family ethernet-switching vlan members SharePoint
set interfaces RSNG2:ae0 unit 0 family ethernet-switching vlan members
Exchange-cluster
set interfaces RSNG2:ae0 unit 0 family ethernet-switching vlan members Exchange
set interfaces RSNG2:ae0 unit 0 family ethernet-switching vlan members Wikimedia
set interfaces RSNG2:ae0 unit 0 family ethernet-switching vlan members Security-Mgmt
set interfaces RSNG2:ae0 unit 0 family ethernet-switching vlan members Vmotion
set interfaces RSNG2:ae0 unit 0 family ethernet-switching vlan members VM-FT
set interfaces RSNG2:ae0 unit 0 family ethernet-switching vlan members
Remote-Access
set vlans Exchange vlan-id 104
set vlans Exchange-cluster vlan-id 109
set vlans Infra vlan-id 101
set vlans MGMT vlan-id 800
set vlans Remote-Access vlan-id 810
set vlans SQL vlan-id 105
set vlans SQL l3-interface vlan.105
set vlans Security-Mgmt vlan-id 801
set vlans SharePoint vlan-id 102
set vlans SharePoint l3-interface vlan.102
set vlans Storage-POD2 vlan-id 208
set vlans Storage-POD2 l3-interface vlan.208
set vlans VM-FT vlan-id 107
set vlans Vmotion vlan-id 106
set vlans Wikimedia vlan-id 103
Verification
The following verification commands (with sample output) can be used to confirm the
configuration of compute and compute switching resources in the data center.
Results
EXT1-EXT3 EXT7
103 WM ena dis INTA1-INTA5 INTB1-INTB5
EXT1-EXT3 EXT7
104 EXCHANGE ena dis INTA1-INTA5 INTB1-INTB5
EXT1-EXT3 EXT7
105 SQL ena dis INTA1-INTA5 INTB1-INTB5
EXT1-EXT3 EXT7
106 Vmotion ena dis INTA1-INTA5 INTB1-INTB5
EXT1-EXT3 EXT7
107 VM-FT ena dis INTA1-INTA5 INTB1-INTB5
EXT1-EXT3 EXT7
108 Storage-iSCSI ena dis INTA1-INTA5 INTB1-INTB5
EXT1-EXT3 EXT7
109 Exchange DAG ena dis INTA1-INTA5 INTB1-INTB5
EXT1-EXT3 EXT7
800 VDC Mgmt ena dis INTA1-INTA5 INTB1-INTB5
EXT1-EXT3 EXT7
801 Security-MGMT ena dis INTA1-INTA5 INTB1-INTB5
EXT1-EXT3 EXT7
900 Tera-VM ena dis INTA1-INTA5 INTB1-INTB5
EXT1-EXT3 EXT7
4094 VLAN 4094 ena dis EXT1 EXT2
4095 Mgmt VLAN ena ena EXTM MGT1
Virtualization
Virtualization Overview
In the MetaFabric 1.0 solution, all compute nodes are installed into a virtual environment
featuring the VMware ESXi 5.1 operating system. VMware ESXi provides the foundation
for building a reliable data center. VMware ESXi 5.1 is the latest hypervisor architecture
from VMware. ESXi, vSphere client, and vCenter are components of vSphere. ESXi server
is the most important part of vSphere. ESXi is the virtualization server. All the virtual
machines or Guest OS are installed on the ESXi server.
To install, manage, and access those virtual servers which sit above the ESXi server, you
will need another part of the vSphere suite called vSphere client or vCenter. The vSphere
client allows administrators to connect to ESXi servers and access or manage virtual
machines, and is used from the client machine to connect to the ESXi server and perform
management tasks.
The VMware vCenter server is similar to the vSphere client, but it is a server with even
more power. The VMware vCenter server is installed on a Windows or Linux server. In this
solution, the vCenter server is installed on a Windows 2008 server that is running as a
virtual machine (VM). The VMware vCenter server is a centralized management
application that lets you manage virtual machines and ESXi hosts centrally. VMware
vSphere client is used to access vCenter Server and ultimately manage ESXi servers
(Figure 65 on page 212). VMware vCenter server is compulsory for enterprises to have
enterprise features such as vMotion, VMware High Availability, VMware Update Manager,
and VMware Distributed Resource Scheduler (DRS). For example, you can easily clone
an existing virtual machine by using vCenter server. vCenter is another important part of
the vSphere package.
Figure 65: VMware vSphere Client Manages vCenter Server Which in Turn
Manages Virtual Machines in the Data Center
In Figure 66 on page 213, all the compute nodes are part of a data center and the VMware
HA Cluster is configured on compute nodes. All compute nodes are running ESXi 5.1 OS,
which is a host operating system to all the data center VMs running business-critical
applications. With vSphere Client, you can also access ESXi hosts or the vCenter Server.
The vSphere Client is used to access the vCenter Server and manage VMware enterprise
features.
A vSphere Distributed Switch (VDS) functions as a single virtual switch across all
associated hosts (Figure 66 on page 213). This enables you to set network configurations
that span across all member hosts, allowing virtual machines to maintain a consistent
network configuration as they migrate across multiple hosts. Each vSphere Distributed
Switch is a network hub that virtual machines can use. A vSphere Distributed Switch can
forward traffic internally between virtual machines or link to an external network by
connecting to physical Ethernet adapters, also known as uplink adapters. Each vSphere
Distributed Switch can also have one or more dvPort groups assigned to it. dvPort groups
group multiple ports under a common configuration and provide a stable anchor point
for virtual machines connecting to labeled networks. Each dvPort group is identified by
a network label, which is unique to the current data center. VLANs enable a single physical
LAN segment to be further segmented so that groups of ports are isolated from one
another as if they were on physically different segments. The standard is 802.1Q. A VLAN
ID, which restricts port group traffic to a logical Ethernet segment within the physical
network, is optional.
VMware vSphere distributed switches can be divided into two logical areas of operation:
the data plane and the management plane. The data plane implements packet switching,
filtering, and tagging. The management plane is the control structure used by the operator
to configure data plane functionality from the vCenter Server. The VDS eases this
management burden by treating the network as an aggregated resource. Individual
host-level virtual switches are abstracted into one large VDS spanning multiple hosts at
the data center level. In this design, the data plane remains local to each VDS but the
management plane is centralized.
With the distributed switch feature, VMware vSphere supports provisioning, administering,
and monitoring of virtual networking across multiple hosts, including the following
functionalities:
• Central control of the virtual switch port configuration, port group naming, filter settings,
and so on.
• Link Aggregation Control Protocol (LACP) that negotiates and automatically configures
link aggregation between vSphere hosts and access layer switches.
• Distributed virtual port groups (DVPortgroups) — Port groups that specify port
configuration options for each member port. DVportgroups is a set of DV ports.
Configuration is inherited from dvSwitch to dvPortgroup.
• Private VLANs (PVLANs) — PVLAN support enables broader compatibility with existing
networking environments using the technology.
Figure 67 on page 214 shows an illustration of two compute nodes running ESXi 5.1 OS
with multiple VMs deployed on the ESXi hosts. Notice that two physical compute nodes
are running VMs in this topology, and the vSphere distributed switch (VDS) is virtually
extended across all ESXi hosts managed by the vCenter server. The configuration of VDS
is centralized to the vCenter Server.
A LAG bundle is configured between the access switches and ESXi hosts. As mentioned
in the compute node section, an RSNG configuration is required on the QFX3000-M
QFabric systems.
ESXi 5.1 supports LACP protocol for the LAG, which can be enabled by connecting the
vCenter Server Web GUI only.
NOTE: Link Aggregation Control Protocol (LACP) can only be configured via
the vSphere Web Client.
Configuring LACP
NOTE: All port groups using the Uplink Port Group enabled with LACP must
have the load-balancing policy set to IP hash load balancing, network failure
detection policy set to link status only, and all uplinks set to active.
2. Select vCenter under the Home radio button from the left tab.
4. Locate an Uplink Port Group in the vSphere Web Client. To locate an uplink port group:
b. Click Uplink Port Groups and select an uplink port group from the list.
5. Select the dvSwitch-DVUplinks and click settings from the Actions tab.
6. Click Edit.
7. In the LACP section, use the drop-down box to enable or disable LACP.
8. When you enable LACP, a Mode drop-down menu appears with these options:
• Active — The port is in an active negotiating state, in which the port initiates
negotiations with remote ports by sending LACP packets.
• Passive — The port is in a passive negotiating state, in which the port responds to
LACP packets it receives but does not initiate LACP negotiation.
Set this option to passive (disable) or active (enable). The default setting is passive.
9. Click OK.
VMware clusters enable the management of multiple host systems as a single, logical
entity, combining standalone hosts into a single virtual device with pooled resources and
higher availability. VMware clusters aggregate the hardware resources of individual ESX
Server hosts but manage the resources as if they resided on a single host. Now, when
you power on a virtual machine, it can be given resources from anywhere in the cluster,
rather than from a specific physical ESXi host.
VMware high availability (HA) allows virtual machines running on specific hosts to be
restarted automatically using other host resources in the cluster in the case of host failure.
VMware HA continuously monitors all ESX Server hosts in a cluster and detects failures.
The VMware HA agent placed on each host maintains a heartbeat with the other hosts
in the cluster. Each server sends heartbeats to the other servers in the cluster at 5-second
intervals. If any servers lose heartbeat over three consecutive heartbeat intervals, VMware
HA initiates the failover action of restarting all affected virtual machines on other hosts.
VMware HA also monitors whether sufficient resources are available in the cluster at all
times in order to be able to restart virtual machines on different physical host machines
in the event of host failure. Safe restart of virtual machines is made possible by the locking
technology in the ESX Server storage stack, which allows multiple ESX Server hosts to
have simultaneous access to the same virtual machine files.
VMware Dynamic Resource Scheduler (DRS) automatically provides initial virtual machine
placement and makes automatic resource relocation and optimization decisions as hosts
are added or removed from the cluster. DRS also optimizes based on virtual machine
load, managing resources in events where the load on individual virtual machines goes
up or down. VMware DRS also makes cluster-wide resource pools possible.
The MetaFabric 1.0 solution utilized VMware clusters in both POD1 and POD2. Below are
overview screenshots that illustrate the use of clusters in the solution.
The MetaFabric 1.0 solution test bed contains three clusters: Infra (Figure 74 on page 219),
POD1 (Figure 75 on page 219), and POD2 (Figure 76 on page 219). All clusters are configured
with HA and DRS.
The Infra cluster (Figure 77 on page 220) is running all VMs required to support the data
center infrastructure. The Infra cluster is hosted on two standalone servers (IBM System
x3750 M4). The VMs hosted on the Infra cluster are:
The POD1 cluster (Figure 78 on page 221) hosts the VMs that run all enterprise
business-critical application in the test bed. POD1 is hosted on one IBM Flex pass-thru
chassis and one 40-Gb CNA module chassis. POD1 contains the following
applications/VMs:
• MediaWiki Server
The POD2 cluster (Figure 79 on page 221) hosts the VMs that run all enterprise
business-critical applications in the test bed. POD2 has one IBM Flex pass-thru chassis
and one 10-Gb CNA module chassis. POD2 contains the following applications/VMs:
VMware Enhanced vMotion Compatibility (EVC) configures a cluster and its hosts to
maximize vMotion compatibility. Once enabled, EVC will ensure that only hosts that are
compatible with those in the cluster can be added to the cluster. This solution uses the
Intel Sandy Bridge Generation option for enhanced vMotion compatibility that supports
the baseline feature set.
In the MetaFabric 1.0 solution, EVC is configured as directed in the link provided. A short
overview of the configuration follows.
Each ESXi host in the POD hosts multiple VMs and is part of a different port group. VMs
running on the PODs include Microsoft Exchange, MediaWiki, Microsoft SharePoint,
MySQL database, and Firefly Host (VM security). Because traffic is flowing to and from
many different VMs, multiple port groups are defined on the distributed switch:
• Infra = PG-INFRA-101
• SharePoint = PG-SP-102
• MediaWiki = PG-WM-103
• Exchange = PG-XCHG-104
• vMotion = PG-vMotion-106
These port groups are configured as shown in Figure 80 on page 223. In this scenario, a
port group naming convention was used to ease identification and mapping of VM and
its function (for example, Exchange, SharePoint) to a VLAN ID. For instance, one VM is
NIC teaming is also deployed in the solution. NIC teaming is a configuration of multiple
uplink adapters that connect to a single switch to form a team. A NIC team can either
share the load of traffic between physical and virtual networks among some or all of its
members, or provide passive failover in the event of a hardware failure or a network
outage. All the port groups (PG) except for iSCSI protocol storage groups are configured
with a NIC teaming policy for failover and redundancy. All the compute nodes have four
active adapters as dvUplink in the NIC teaming policy. This configuration enables load
balancing and resiliency. The IBM Pure Flex System with a 10-Gb CNA card has two
network adapters on each ESXi host. Consequently, that system has only two dvUplink
adapters per ESXi host. Figure 81 on page 224 is an example of one port group configuration.
Other port groups are configured similarly (with the exception being the storage port
group).
NOTE: An exception to the use of NIC teaming is an iSCSI port group. The
ISCSI protocol doesn’t support multi-channeling or bundling (LAG). When
deploying iSCSI, instead of configuring four active dvUplinks, a single dvUplink
should be used. In this solution, QFX3000-M QFabric POD1 uses one port
group (PG-storage-108) and QFX3000-M QFabric POD2 uses another port
group (PG-storage-208). These port groups are connected to the storage
array utilizing the iSCSI protocol. Figure 82 on page 224 shows the iSCSI port
group (PG-storage-108). Port group storage 208 is configured in the same
way.
The VMkernel TCP/IP networking stack supports iSCSI, NFS, vMotion, and fault tolerance
logging. The VMkernel port enables these services on the ESX server. Virtual machines
run their own system TCP/IP stacks and connect to the VMkernel at the Ethernet level
through standard and distributed switches. In ESXi, the VMkernel networking interface
provides network connectivity for the ESXi host and handles vMotion and IP storage.
Moving a virtual machine from one host to another is called migration. VMware vMotion
enables the migration of active virtual machines with no down time.
Management of iSCSI, vMotion, and fault tolerance is enabled by the creation of four
virtual kernel adapters. These adapters are bound to their respective distributed port
group. For more information on creating and binding virtual kernel adapters to distributed
port groups, see:
http://pubsv
. mware.com/vsphere-51/index.jsp#comv
. mwarev
. sphere.networking.doc/GUID-59DFD949-A860-4605-A668-F63054204654.html
• Change the port group policy for the iSCSI VMkernel adapter.
NOTE: The ESXi host must have permission to access the storage array. This
is discussed further in the storage section of this guide.
For information about configuring and mounting of iSCSI storage connection to a vSwitch
(either vSwitch or distributed switch), see:
Figure 83 on page 226 shows an example of an ESXi host deployed in POD1. The port
gropus PG-Storage-108 and PG-Storage-208 dvPG have been created for POD1 and
POD2, respectively. (The example shows PG-Storage-108.) VMkernel is configured to
use the 172.16.8.0/24 subnet for hosts in POD1 and the 172.20.8.0/24 subnet for hosts in
POD2 to bind with the respective storage port group to access the EMC storage.
As mentioned earlier, the iSCSI protocol doesn’t support multichannel (LAG) but can
support multipath; you will see only one physical interface bind with the storage port
group. To achieve multipath, separate storage port group and network subnet are required
to access EMC storage as a backup link.
VMware vSphere fault tolerance provides continuous availability for virtual machines by
creating and maintaining a secondary VM that is identical to, and continuously available
to replace, the primary VM in the event of a failure. The feature is enabled on a per virtual
machine basis. This virtual machine resides on a different host in the cluster, and runs in
virtual lockstep with the primary virtual machine. When a failure is detected, the second
virtual machine takes the place of the first one with the least possible interruption of
service. Because the secondary VM is in virtual lockstep with the primary VM, it can take
over execution at any point without interruption, thereby providing fault tolerant protection.
The primary and secondary VMs continuously exchange heartbeats. This exchange allows
the virtual machine pair to monitor the status of one another to ensure that fault tolerance
is continually maintained. A transparent failover occurs if the host running the primary
VM fails, in which case the secondary VM is immediately activated to replace the primary
VM. A new secondary VM is started and fault tolerance redundancy is reestablished
within a few seconds. If the host running the secondary VM fails, it is also immediately
replaced. In either case, users experience no interruption in service and no loss of data.
VMware vSphere HA must be enabled before you can power on fault tolerant virtual
machines or add a host to a cluster that already supports fault tolerant virtual machines.
Only virtual machines with a single vCPU are compatible with fault tolerance.
The MetaFabric 1.0 solution test bed features VMware fault tolerance
(Figure 85 on page 227). This was tested as part of the solution on the port group “PG-Fault
tolerance-107”. VMkernel is bound to this port group. Once fault tolerance is enabled on
a VM, a secondary VM is automatically created.
The VMware VMotion feature, part of VirtualCenter, allows you to migrate running virtual
machines from one physical machine to another with no perceivable impact to the end
user (Figure 86 on page 228). You can use VMotion to upgrade and repair servers without
any downtime or disruptions and also to optimize resource pools dynamically, resulting
in an improvement in the overall efficiency of a data center. To ensure successful migration
and subsequent functioning of the virtual machine, you must respect certain compatibility
constraints. Complete virtualization of all components of a machine, such as CPU, BIOS,
storage disks, networking, and memory, allows the entire state of a virtual machine to
be captured by a set of data files. Therefore, moving a virtual machine from one host to
another is nothing but data transfer between two hosts.
VMware vMotion benefits data center administrators in critical situations, such as:
• Optimizing hardware resources: VMotion lets you move virtual machines away from
failing or underperforming hosts.
• Datastore compatibility: The source and destination hosts must use shared storage.
You can implement this shared storage using a SAN or iSCSI. The shared storage can
use VMFS or shared NAS. Disks of all virtual machines using VMFS must be available
to both source and target hosts.
• CPU compatibility: The source and destination hosts must have compatible sets of
CPUs.
VMware vMotion is configured on all MetaFabric 1.0 hosts (Figure 87 on page 229). VMware
vMotion is using a separate port group called PG-vMotion-106, and VMkernel is bound
to this port group. Network and storage is unique on all hosts, which is a requirement for
vMotion. Once vMotion configuration is completed, active VMs will be moved any available
host where resources are free. DRS can also kick in the vMotion feature if one of the ESX
hosts shows high resource utilization (CPU, memory). You can also manually trigger
vMotion if the need arises to move a VM within the data center.
The MetaFabric 1.0 solution features EMC VMX series storage controllers. The EMC VNX
series implements a modular architecture that integrates hardware components for
Block, File, and Object with concurrent support for native NAS, Internet Small Computer
System Interface (iSCSI), Fiber Channel, and Fibre Channel over Ethernet (FCoE)
protocols. The VNX series is based on Intel Xeon-based PCI Express 2.0 processors and
delivers File (NAS) functionality via two to eight Data Movers and Block (iSCSI, FCoE,
and FC) storage via dual storage processors using a full 6-Gb/s SAS disk drive topology.
Configuring EMC storage for control station can be done by the vendor only. Once you
configure management for EMC, it is accessed using the EMC Unisphere tool (via HTTPS).
To create a FAST Cache, click the FAST Cache tab in the Storage System Properties
window to view FAST Cache information (Figure 88 on page 232). If the FAST Cache has
not been created on the storage system, the Create button in the bottom of the dialog
box is enabled. The Destroy button is enabled when the FAST Cache has been created.
Fast Cache is enabled on EMC VNX5500 storage.
Figure 88: EMC FAST Cache Configuration (Select System, then Properties
in the Drop-Down)
FAST Cache can be created in certain configurations, depending on the storage system
model, and number and size of flash drives installed in the storage system. These criteria
are used to present you with the available options for your configuration. For example,
if an insufficient number of flash drives are available, Unisphere displays an error message
and FAST Cache cannot be created. The number of flash drives can also be manually
selected. The bottom portion of the screen shows the flash drives that will be used for
creating FAST Cache. You can choose the drives manually by selecting the Manual option.
If the LUN is created in a RAID group, you can enable or disable FAST Cache at the LUN
level. It is enabled by default if the FAST Cache enabler is installed on the storage system.
In this example (Figure 89 on page 233), Fast Cache is enabled with four disks.
3. Go to Storage >Storage Configuration >Storage Pools. In the Pools tab, click Create.
Figure 90 on page 234 shows an example of the EMC Unisphere management tool. The
storage pool selected is Pool 1– Exchange-DB.
Figure 91 on page 234 shows the properties of the selected storage pool (Pool 1 –
Exchange-DB). This screen shows the physical and virtual capacity of the storage pool.
The top tabs also enable viewing and modification of disk assignment, advanced
properties, and storage tiering.
Figure 92 on page 235 shows the contents of the Disks tab (under storage pool properties).
From this screen, additional disks can be added to the storage pool. This tab also displays
the physical and operational properties of each disk assigned to the storage pool.
In the storage pool properties Advanced tab (Figure 93 on page 235), you can set the alert
threshold for the storage pool (the percentage utilization that will trigger an alarm) and
enable or disable FAST Cache.
Now that we have viewed the settings of an individual application storage pool, let’s look
at combining multiple application storage pools into an aggregated storage pool.
Figure 94 on page 236 shows the aggregated storage pool.
Figure 95 on page 236 shows the properties of the selected storage pool.
Figure 96 on page 237 shows disk membership of the aggregated storage pool.
• Select Pool.
• For Pool LUNs, only RAID 6, RAID 5, and RAID 1/0 are valid. RAID 5 is the default
RAID type.
If available, the software populates the storage pool for the new LUN with a list of pools
that have the specified RAID type, or displays the name of the selected pool. The Capacity
section displays information about the selected pool. If there are no pools with the
specified RAID type, click New to create a new one.
1. In LUN Properties, select the Thin check box if you are creating a thin LUN.
3. If you want to create more than one LUN, select a number in Number of LUNs to create.
For multiple LUNs, the software assigns sequential IDs to the LUNs as they are available.
For example, if you want to create five LUNs starting with LUN ID 11, the LUN IDs might
be 11, 12, 15, 17, and 18.
1. In LUN Name, either specify a name or select to automatically assign LUN IDs as LUN
names. Choose one of the following:
a. Click Apply to create the LUN with the default advanced properties, or
a. Select a default owner (SP A or SP B) for the new LUN or accept the default value
of Auto.
3. Click Apply to create the LUN, and then click Cancel to close the dialog box. An icon
for the LUN is added to the LUNs view window.
The LUN created for Exchange DB is a single LUN in the Exchange storage Pool
(Figure 97 on page 239).
Figure 98 on page 240 shows the LUN assigned to all the ESX hosts. This LUN is created
so that ESX hosts can access this LUN and mount it as a datastore.
A pool was also created for use as a storage destination for Microsoft Exchange logs
(Figure 99 on page 240).
Figure 99: The Selected Pool Was Created for MS Exchange Logs
Once the pool is created, the LUN can be created (Figure 100 on page 241).
3. Create storage group, then click OK to save changes and close the dialog box. You
can also click Apply to apply the changes without closing the dialog box.
NOTE: Once you enable Storage Groups for a storage system, any host
currently connected to the storage system will no longer be able to access
data on the storage system. To the host, it will appear as if the LUNs were
removed. In order for the host to access the storage data, you must add
LUNs to Storage Group and then connect the host to the Storage Group.
Figure 101 on page 242 shows the properties screen of the ESX-StorageGroup created
in the MetaFabric test lab.
Once the storage group is created, LUNs can be added to the storage group
(Figure 102 on page 243).
4. You must also add any ESXi hosts to the storage group if those hosts need to access
any data housed on the storage group (Figure 103 on page 244).
5. If LUNs are already created, they can be added directly to the storage group by
navigating to the Storage tab, selecting the LUN, and clicking the Add to storage group
button (Figure 104 on page 244).
The prior storage sections have moved you to a point where logical disks now exist
on the storage array. These disks are not formatted and are unusable by the operating
systems until they have been formatted and mounted. These operations are covered
in the next sections.
• Linux clients
• Windows systems configured with third-party applications that provide NFS client
services
When a VNX is configured as an NFS server, file systems are mounted on a Data Mover
and a path to that file system from the Data Mover is exported. Exported file systems
are then available across the network and can be mounted by remote users.
NFS pools are created from the storage pool section of EMC Unisphere. In this example,
we are going to use a storage pool called NFS Pool (Figure 105 on page 245).
You will recall that a LUN must be created on top of the storage pool. The LUN is used
as a logical disk identifier that enables the use of the storage (Figure 106 on page 246).
Next, you must define the NFS pool to enable ESXi hosts to access the storage. To do
this, follow these steps:
2. Go to Storage.
Figure 107 on page 247 shows the NFS pool properties once the pool has been created for
server access.
Once the NFS pool is created, you must export the pool in order to make the file system
or directories available to NFS clients. To do this:
2. From the Choose Data Mover list, select a Data Mover from which to export the file
system.
3. From the File System list, select the file system or checkpoint that contains the
directory to export.
4. To export a subdirectory, add the rest of the path to the string in the field.
Figure 108 on page 248 shows the configuration of NFS export. Note that access to hosts
is assigned on a per-subnet basis and can be assigned as read-only, read/write, root
access, or operator access permissions.
Once you export NFS, the directories contained in the NFS will be available for mounting
on the application servers.
3. Select the Snapshot Configuration Wizard from the right side (Figure 109 on page 248).
4. Click Next, then select the server host where the LUN is configured and mounted
(Figure 110 on page 249).
5. Click Next, then select the target VNX storage system (Figure 111 on page 249), then
click Next.
6. Select the source LUN (the LUN you wish to snapshot). In this example,
SharePoint-SQL-DB is used as the source LUN and added to the list
(Figure 112 on page 250). Once you have selected the source LUN, click Next.
7. Finally, you can select the default settings or uncheck the Accept Snapshot overhead
values check box and modify the default settings (Figure 113 on page 251). Once you
have the settings you want, click Next.
8. Choose to create the Snapshot LUN or choose to not create the Snapshot LUN at the
current time. For this example, we created the Snapshot LUN (Figure 114 on page 252).
Click Next.
10. Click Finish. Configuration of snapshot LUN is complete (Figure 116 on page 254).
Load Balancing
Overview
This section provides the implementation details of the F5 load balancer deployed in the
MetaFabric 1.0 solution test lab. This section explains the following topics:
• Load-balancer topology
• Redundancy
• Traffic flow
Topology
The topology used in testing the F5 load-balancing element of the MetaFabric 1.0 solution
is shown in Figure 117 on page 256. The MetaFabric 1.0 data center solution uses F5 to
load-balance traffic between servers. The solution testing featured two VIPRION C4480
hardware chassis; each chassis is configured with one B4300 blade for 10-Gigabit
connectivity. The VIPRION chassis are running this software package: BIG-IP 10.2.4 Build
591.0 Hotfix HF2 image.
In this solution, two VIPRION systems are connected to the core switches using LAG. The
F5 systems are configured with virtual IPs (VIPs) and server pools to provide
load-balancing services to SharePoint, Wikimedia, and Exchange traffic. SharePoint,
Wiki, and Exchange servers are connected to POD switches on VLAN 102,103, and 104,
respectively. DSR mode is configured in F5 to bypass return traffic from server for all the
VIPs.
Configuring Redundancy
Each VIPRION system is configured as a cluster in this topology, although they can also
be configured as single devices. In this solution, the two VIPRION systems are configured
as two clusters (one cluster per chassis), deployed in active/standby mode for
redundancy. This means that one cluster is active and processing network traffic, while
the other cluster is up and available to process traffic, but is in a standby state. If the
active cluster becomes unavailable, the standby cluster automatically becomes active,
and begins processing network traffic.
For redundancy, a dedicated failover link is configured between two VIPRION systems
as a LAG interface. Interfaces 1/1.3 and 1/1.4 in LAG are configured as failover links on
both systems. The following steps are required to configure redundancy (failover):
3. Create a self IP address and associate the self IP with the failover VLAN.
4. Define a unicast entry specifying the local and remote self IP addresses.
5. Define a multicast entry using the management interface for each VIPRION system.
For more information on redundancy and failover configuration of the F5 Load Balancer,
see:
http://support.f5.com/kb/en-us/solutions/public/11000/900/sol11939.html
http://support.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/VIPRION_configuration_guide_961/clustered_systems_redundant.html
Each LAG in F5 system has two member links. One member link connects to Core Switch
1 and the other connects to Core Switch 2. MC-LAG is configured on the core switches.
To the F5, it appears that the LAG is connecting to a single system. LACP is used as
control protocol for creating LAG between F5 and core switches.
• Create a LAG named External on both F5 systems, and assign interfaces 1/1.5 and 1/1.6
as members of that LAG.
• Create VLAN 15 and name it External on both F5 systems and the VLAN assigned to
the External LAG.
• Create a LAG named core-sw on both F5 systems, and assign interfaces 1/1.1 and 1/1.2
as members of that LAG.
• VLANs 102, 103, and 104 are named Core-Access, Wikimedia-Access, and
Exchange-Access, respectively. These VLANs were created on both the F5 systems
and the VLANs, and are assigned to the core-sw LAG. As per their Access names, the
SharePoint, Wikimedia, and Exchange servers are located in VLANs 102, 103, and 104,
respectively.
• Create a self IP address of 172.16.2.25, 172.16.3.25, and 172.16.4.25 for VLANs 102, 103,
and 104
Internal connections to the servers are configured as a Layer 2 connection through the
POD switches (that are connected to the core switches).
NOTE: For external connections, static routes are advertised from the core
switch for VIPs configured in F5 for clients in the Internet to send requests to
the VIP for specific services like Exchange, Wikimedia, and SharePoint. For
the static route, we configured a floating IP address created for “external”
VLAN 192.168.15.5 as the next hop to reach the VIPs. When the active cluster
fails, the new active uses the configured floating IP address, sends a gratuitous
ARP for this floating IP address, and begins receiving traffic.
NOTE: In this solution testing, nPath routing (DSR, or Direct Server Return)
is used to bypass the F5 for return path traffic from servers, routing traffic
directly to the destination from the application servers.
It is recommended to use the nPath template in the F5 configuration GUI (Template and
wizards window) to configure VIP and server pools in DSR mode. The following link
provides greater detail regarding configuration of nPath in F5 systems:
http://support.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/ltm_implementations_guide_10_1/sol_npath.html
In this solution, configure three VIP addresses 10.94.127.180, .181, and .182, and assign
server pools to these VIPs using the nPath template to service SharePoint, Exchange,
and Wikimedia services.
The following tasks need to be completed in order to configure the BIG-IP system to use
nPath routing:
• Define a virtual server with port and address translation disabled and assign the custom
Fast L4 profile to it.
• Set the default route on your servers to the router’s internal IP address.
• 10.94.127.181:993 (IMAP4)
• 10.94.127.181:995 (POP3)
The following illustrations and steps explain the creation of nPath routing for IMAP4 for
exchange. These examples can be used to guide creation of nPath for other services by
substituting the VIP address and port number.
To configure nPath routing for IMAP4 and Exchange, follow these steps:
1. Using the nPath template create a VIP, server pool, and monitor. This template creates
a Fast L4 profile and assigns it to the VIP address.
2. The above window shows the configuration of nPath routing using the configuration
template.
a. Assign a unique prefix name (my_nPath_IMAP) for the F5 system to name the
server pool, monitor, and other objects.
b. To create the IMAP4 VIP as part of the Exchange service, specify the VIP address
10.94.127.181, TCP port 993.
c. This template gives a choice for creating a new server pool or using an existing
pool. By default, it creates a new server pool using the prefix name shown
(my_nPath_IMAP_pool). Add servers one at a time with the IP address and port
number, as shown.
d. This template gives a choice for creating a new monitor or using an existing monitor.
By default, it creates a new monitor using the prefix name shown
(my_nPath_IMAP_TCP_pool). By default, it uses TCP monitoring and the user can
change the monitoring type. The default interval for the health check is 30 seconds,
and the timeout value is 91 seconds.
e. Click Finished to create nPath routing for an IMAP service with VIP 10.94.127.181:993.
NOTE: The default TCP monitor, with no Send string or Receive string
configured, tests a service by establishing a TCP connection with the pool
member on the configured service port and then immediately closes the
connection without sending any data on the connection. This causes some
services such as telnet and ssh to log a connection error, and fills up the
server logs with unnecessary errors. To eliminate the extraneous logging,
you can configure the TCP monitor to send just enough data to the service,
or just use the tcp_half_open monitor. Depending on your monitoring
requirements, you might also be able to monitor a service that expects
empty connections, such as tcp_echo (by using the default tcp_echo
monitor). NOTE: Each server has four 10-Gb NIC ports connected to the
QFX3000-M QFabric PODs as a data port for all VM traffic. Each system
is connected to each POD for redundancy purposes. The IBM System 3750
is connected to POD1 using 4 x 10-Gigabit Ethernet. A second IBM System
3750 connects to POD2 using 4 x 10-Gigabit Ethernet. The use of a LAG
provides switching redundancy in case of a POD failure.
3. Create or verify other objects (Pool, Profile, or Monitor) using the template created
with the VIP. As you can see, nPath routing created Fast L4 Profile as needed.
4. Verify the VIP by selecting Virtual Servers under the Local Traffic tab. The VIP in this
example is named my_nPath_IMAP_virtual_server with an assigned IP address of
10.94.127.181 and TCP port 993. As per nPath requirement Performance (Layer 4), the
profile also known as Fast L4 profile is assigned to this VIP.
Note that the SNAT pool is disabled for this VIP as per the nPath requirement. When
traffic enters the F5 system for this VIP, the F5 does not perform SNAT; it simply
forwards the traffic to the server as is without modifying the source or destination
address. This VIP simply performs load balancing and sends the traffic to the
destination server with the client IP address as the source address and the VIP address
(10.94.127.181) as its destination.
The flow of traffic to and from the load balancers flows in the following manner:
1. As described in Figure 121 on page 262, three VIPs are created in the F5 system for
SharePoint, Wikimedia, and Exchange. Assume that the client sends the request to
the Wikimedia server. In this case, the client sends the request to the VIP address of
10.94.127.182 and the source IP address is the client’s IP address. The destination IP
address will be the VIP IP address (10.94.127.182) and the destination port will be 80.
As described in the previous section, the core switch advertises the VIP address in the
network. As a result, the edge router knows the route to reach the VIP address of
10.94.127.182.
2. This packet arrives on the active F5 via external LAG. Because of the nPath
configuration, the Wikimedia VIP address load-balances the traffic and sends it to
one of the servers without modifying the source or destination address.
3. Because of the nPath configuration, the Wikimedia VIP address load-balances the
traffic and sends it to one of the servers as is without modifying the source or the
destination address. The F5 system reaches the Wikimedia servers by way of a Layer
2 connection on VLAN 103. An internal LAG connection is a trunk port carrying VLANs
102, 103, and 104 to reach all the servers.
4. The Wikimedia server receives the traffic on the loopback address (configured with
VIP IP 10.94.127.182) and processes it.
5. The Wikimedia server sends this packet back to the client by way of a router and
bypassing the F5 system.
Applications
Overview
The Juniper MetaFabric 1.0 solution featured several business-critical applications in the
test and verification lab. This section will cover implementation details for the following
applications:
• Microsoft Exchange
This section describes the design, planning, and instructions for deploying a highly
available Microsoft Exchange Server 2012 cluster for client access service and mailbox
database server using VMware high availability (HA). It covers configuration guidelines
for the VMware vSphere HA cluster parameters for the cluster and best practice
recommendation. This guide does not cover a full installation of Microsoft Exchange
Server. This section covers the following topics:
This deployment example assumes that VMware HA is configured on ESXi hosts using
vCenter Server. Virtual machines that are running on an ESXi host at the time of complete
failure will be automatically migrated.
• All hosts in a vSphere HA-enabled cluster must have access to the same shared storage
location used by the VM on the cluster. This includes any Fibre channel, FCoE, iSCSI,
and NFS datastores used by the VM. In this solution, we are using iSCSI and NFS
datastores.
• Define VLANs.
3. Configure the VLAN and gateway address on QFabric switch (POD1) as this application
is located only in POD1.
[edit]
set vlans Exchange vlan-id 104
set vlans Exchange l3-interface vlan.104
set interfaces vlan unit 104 family inet address 172.16.4.254/24
set protocols ospf area 0.0.0.10 interface vlan.104 passive
set vlans Exchange-Cluster vlan-id 109
vlan 104
enable
name "EXCHANGE"
member INTA1-INTA5,INTB1-INTB5,EXT1-EXT3,EXT7
vlan 109
enable
name "Exchange DAG"
member INTA1-INTA5,INTB1-INTB5,EXT1-EXT3,EXT7
vlan 109
enable
name "Exchange DAG"
member INTA1-INTA5,EXT1-EXT2,EXT11-EXT16
NOTE: This configuration is not required if you are using IBM Flex System
Pass-thru modules. This configuration is an example of the 40-Gb CNA
I/O Module (module 1 and 2).
[edit]
set interfaces ae1 unit 0 family ethernet-switching vlan members Exchange
set interfaces ae2 description "IBM Standalone server"
set interfaces ae2 unit 0 family ethernet-switching vlan members Exchange
set interfaces ae2 unit 0 family ethernet-switching vlan members Exchange-cluster
set interfaces ae3 description IBM-FLEX-10-CNA-VLAG-BNT
set interfaces ae3 unit 0 family ethernet-switching vlan members Exchange-cluster
set interfaces ae3 unit 0 family ethernet-switching vlan members Exchange
set interfaces ae4 description IBM-FLEX-2-CN-1-Passthrough
set interfaces ae4 unit 0 family ethernet-switching vlan members Exchange-cluster
set interfaces ae4 unit 0 family ethernet-switching vlan members Exchange
set interfaces ae5 description IBM-FLEX-2-CN-2
set interfaces ae5 unit 0 family ethernet-switching vlan members Exchange-cluster
set interfaces ae5 unit 0 family ethernet-switching vlan members Exchange
set interfaces ae6 description IBM-FLEX-2-CN-3
set interfaces ae6 unit 0 family ethernet-switching vlan members Exchange-cluster
set interfaces ae6 unit 0 family ethernet-switching vlan members Exchange
set interfaces ae7 description IBM-FLEX-2-CN-4
set interfaces ae7 unit 0 family ethernet-switching vlan members Exchange-cluster
set interfaces ae7 unit 0 family ethernet-switching vlan members Exchange
set interfaces ae8 description IBM-FLEX-2-CN5
set interfaces ae8 unit 0 family ethernet-switching vlan members Exchange
set interfaces ae8 unit 0 family ethernet-switching vlan members Exchange-cluster
set vlans Exchange vlan-id 104
set vlans Exchange-cluster vlan-id 109
6. Allow same VLANs and configure Layer 3 gateway for Exchange-Cluster on both core
switches Core1 and Core2.
[edit]
set interfaces ae1 description "MC-LAG to vdc-pod1-sw1-nng-ae1"
set interfaces ae1 unit 0 family ethernet-switching vlan members Exchange
set interfaces ae2 description "MC-LAG to vdc-pod1-sw1-nng-ae2"
set interfaces ae2 unit 0 family ethernet-switching vlan members Exchange-Cluster
set interfaces ae4 description "MC-LAG to vdc-pod2-sw1-ae0"
set interfaces ae4 unit 0 family ethernet-switching vlan members Exchange-Cluster
set interfaces ae5 description "MC-LAG to vdc-pod2-sw1-ae1"
set interfaces ae5 unit 0 family ethernet-switching vlan members Exchange
set interfaces ae9 unit 0 description "ICL Link for all VLANS"
set interfaces ae9 unit 0 family ethernet-switching vlan members Exchange
set interfaces ae9 unit 0 family ethernet-switching vlan members Exchange-Cluster
set interfaces ae10 description Layer2-internal-link-MC-LAG-core-sw-to-LB2-standby
set interfaces ae10 unit 0 family ethernet-switching vlan members Exchange
set interfaces irb unit 109 description Exchange-Cluster
set interfaces irb unit 109 family inet address 172.16.9.252/24 arp 172.16.9.253 l2-interface
ae9.0
set interfaces irb unit 109 family inet address 172.16.9.252/24 arp 172.16.9.253 mac
4c:96:14:68:83:f0
set interfaces irb unit 109 family inet address 172.16.9.252/24 arp 172.16.9.253 publish
set interfaces irb unit 109 family inet address 172.16.9.252/24 vrrp-group 1 virtual-address
172.16.9.254
set interfaces irb unit 109 family inet address 172.16.9.252/24 vrrp-group 1 priority 125
set interfaces irb unit 109 family inet address 172.16.9.252/24 vrrp-group 1 preempt
set interfaces irb unit 109 family inet address 172.16.9.252/24 vrrp-group 1 accept-data
set interfaces irb unit 109 family inet address 172.16.9.252/24 vrrp-group 1
authentication-type md5
set interfaces irb unit 109 family inet address 172.16.9.252/24 vrrp-group 1
authentication-key "$9$Asx6uRSKvLN-weK4aUDkq"
set protocols ospf area 0.0.0.0 interface irb.109 passive
set vlans Exchange vlan-id 104
set vlans Exchange-Cluster vlan-id 109
set vlans Exchange-Cluster l3-interface irb.109
9. Click Next, then Finish. Once the Exchange port group is created, you can edit the port
group by right--clicking and then modifying the teaming policy.
10. Repeat Steps 1 through 9 to create the port group for the Exchange-Cluster,
Storage-108, and Storage-208. An example of the PG-Storage-108 port group follows.
NOTE: The storage port group using the iSCSI protocol doesn’t support
port bonding (LAG). In the case of iSCSI, there is only one active uplink.
To create storage via iSCSI protocol for connection to the Exchange VM, follow these
steps:
3. Navigate to Storage > Storage Configuration > Storage Pools. In the Pools tab, click
Create.
4. Provide a name for the storage pool (in this example, Pool 1 – Exchange-DB).
6. Create and allocate LUN to the storage pool. Select the VNX system using the
Unisphere tool.
a. Select Pool.
b. Select a RAID type for the LUN: For Pool LUNs, only RAID 6, RAID 5, and RAID 1/0
are valid. RAID 5 is the default RAID type.
c. If available, the software populates the storage pool for the new LUN with a list of
pools that have the specified RAID type, or displays the name of the selected pool.
The Capacity section displays information about the selected pool. If there are no
pools with the specified RAID type, click New to create a new one.
9. In LUN Properties, select the Thin checkbox if you are creating a thin LUN.
10. Assign a User Capacity and ID to the LUN you want to create.
11. To create more than one LUN, select a number in Number of LUNs to create. For
multiple LUNs, the software assigns sequential IDs to the LUNs as they are available.
For example, to create five LUNs starting with LUN ID 11, the LUN IDs might be 11, 12,
15, 17, and 18.
12. In LUN Name, either specify a name or select automatically assign LUN IDs as LUN
Names.
a. Click Apply to create the LUN with the default advanced properties, or
a. Select a default owner (SP A or SP B) for the new LUN or accept the default value
of Auto.
15. Click Apply to create the LUN, and then click Cancel to close the dialog box. An icon
for the LUN is added to the LUN view window. Below is an example of the Exchange
LUN that was created.
2. Select Hosts > Storage Group. (Once you enable Storage Groups for a storage system,
any host currently connected to the storage system will no longer be able to access
data on the storage system. To the host, it will appear as if the LUNs were removed.
In order for the host to access the storage data, you must add LUNs to the Storage
Group and then connect the host to the Storage Group.)
3. Click OK to save changes and close the dialog box, or click Apply to save changes
without closing the dialog box. Figure 131 on page 276 shows the storage group that
was created. Any new LUNs added will be added to this storage group.
Figure 132 on page 277 shows the LUNs tab of the storage group properties. You can
see all LUNs that have been added to the storage group.
4. From the Hosts tab (Storage Group Properties), you can select hosts to add to the
storage pool (which hosts are able to access the pool).
Once the storage group is created, LUNs can be added directly to the storage pool
from the Storage > LUNs screen.
network. ESXi uses VMkernel ports for system management and IP storage. VMkernel
IP storage interfaces provide access to one or more EMC VNX iSCSI network portals or
NFS servers.
To configure VMkernel:
7. Select the port group (created in previous steps for POD1). Click Next.
8. Configure IP address settings for the VMkernel virtual adapter, click Next, and then
click Finish.
9. Before configuring the VM, make sure that the EMC VNX storage is reachable. You
can do this from the ESXi server shell using vmping.
~ # esxcfg-vmknic -l
~ # ping 172.16.8.1
~ # ping 172.16.8.2
10. From vCenter, click Storage Adapters. If the iSCSI software adapter is not installed,
click Add and install the adapter.
11. Once installed, right-click the iSCSI software adapter and select Properties. You should
see that the software is enabled.
12. Click the Network Configuration tab to verify that the storage is configured and
connected.
13. Click the Dynamic Discovery tab and click Add. Enter the IP and port of your EMC VNX
storage.
14. Click OK and Close. When prompted to rescan the HBA, click Yes. You see a LUN
presented on the server.
15. From the vSphere client, select the Exchange-Logs server and Add Storage.
17. Select the Disk/LUN you want to mount. Verify that you are mounting the proper LUN
using the LUN ID, capacity, and path ID. Click Next.
18. Select VMFS 5.0 which is supported in ESXi 5.1. VMFS 5.0 also supports 2TB+. Click
Next.
19. Notice that the hard disk is blank under Current Disk Layout. Click Next.
21. Select the maximum capacity for this datastore. (The maximum capacity is the default
option.) Click Next and Finish. Click Properties on the created datastore to see an
output similar to the following
2. Select the cluster to create a new VM. Click Create a new virtual machine.
6. Select an operating system (Windows 2012 64-bit was used in this scenario). Click
Next.
7. Exchange CAS requires only one NIC. The Exchange mailbox requires two NICs. You
can add a new NIC here or wait until the VM is created to add another NIC. For now,
leave the default and click Next.
8. Select the virtual disk size for the operating system. (This will be the C:/ drive in the
OS.) Click Finish.
9. The current example is used to create a new VM that can be modified based on your
requirements. For instance, an Exchange mailbox requires additional disks and an
additional network adapter for use in Exchange clusters. An example of a modified
VM is shown below.
Figure 158: Virtual Machine with Additional Disks and Network Adapters
10. Once you have provisioned all of the VM resources, you can start installation by
mounting the installation ISO as a CD. In this case you would first install and update
Microsoft Windows Server 2012. Once the operating system is installed, the Exchange
installation (and all the dependencies, such as AD integration) can be performed.
Overview
The MetaFabric 1.0 solution lab employed Junos Space along with Network Director (a
Junos Space application) in the provisioning and testing of the solution. Specifically,
Junos Space Release 13.1R1 was used along with Network Director 1.5 (with Virtual View
application) for implementation and management of the network. Virtual machine (VM)
orchestration was also controlled by the Junos Space implementation.
Junos Space is installed on a VM on the IBM 3750 standalone server and serves the
out-of-band (OOB) management role in the data center topology. Two IBM 3750 servers
are configured in a single ESX cluster for redundancy and fault tolerance. In case of one
ESX node failure, the Junos Space VM can be moved to the other ESXi host using vMotion.
Network Director 1.5 includes support for the QFX3000-M system to provision and monitor
the entire solution. Network Director Virtual View is enabled for orchestration to enable
tracking of VM movement within the data center test topology.
Security Director is also a component of the management and orchestration of the data
center solution. Security Director is an application installed on Junos Space. It is used to
configure and operate the perimeter security elements of the solution.
Topology
Figure 159 on page 298 illustrates the topology of the out-of-band (OOB) management
network tested in the MetaFabric 1.0 lab.
1. Install Junos Space Release 13.1R1 (or later) into the VM environment. (In this test lab,
this VM was hosted on the IBM standalone ESXi cluster.)
2. Download and install Network Director 1.5 onto Junos Space. This should automatically
install as a Space application.
3. Configure the Network Devices to be managed with the proper SNMP community,
trap groups, and NETCONF.
4. From the logical view of Network Director (Figure 160 on page 299), select Discover
Devices, and enter the IP address (or range) to enable device discovery. The IP
addresses used in this lab were configured as the OOB management addresses on
each network node.
5. Select the virtual-view on the ND GUI and enter the IP address of the VMware vCenter
and authentication details.
ND will pull all the data from the vCenter to the virtual view to enable viewing of all
provisioned virtual machines.
1. Enable Orchestration in ND Virtual View. This must be enabled so that that VLAN and
port changes performed by vMotion will be tracked and configured by Network Director
on the physical network.
• ND will configure the groups config on all devices with the necessary VLANs once
orchestration is enabled.
• During a vMotion event, ND automatically assigns the new port to the VLAN on the
destination switch.
1. Select Device common settings under Build > Logical View and configure the following
parameters:
c. Login banner
d. Logs
e. STP
f. DCBX/LLDP
2. Once the wizard is completed with all the parameters, the selected devices will be in
Pending Deployment mode.
1. Select CoS from the Profile and Configuration Management drop-down menu (found
in Logical View) and click Add from the list of devices families. Then select the Data
Center Switching device family.
2. Select Hierarchical Port Switching (ELS) for QFabric-QFX devices if you want to
configure PFC/ETS CoS. The default CoS-Profile is displayed. Modify the default
traffic settings parameters, if needed.
3. Enable the PFC code point and queue for NO-LOSS behavior. In this example, Queue
3 is chosen as the no-loss queue.
Figure 168: Enable PFC Code-point and Queue for NO-LOSS Behavior
4. This CoS profile will be referenced while creating port configuration in the next steps.
1. Select a VLAN and enter the VLAN-ID and VLAN-Name (ND-Test-VLAN in this
example).
2. Click Next to go to Advanced Settings and configure Layer 2 filters and MAC address
move limits.
1. In Network Director, navigate to Build Mode> Device Management. Select Setup QFabric.
2. Configure aliases for the node and interconnect devices by clicking the default aliases
shown in the GUI as NODE-1 and NODE-2.
3. Configure node devices in a redundant server node group (RSNG), using the aliases
configured in Step 2 (NODE-1 and NODE-2).
1. Click Port to manage the port profile and select Data Center Switching Non ELS.
a. VLAN name.
b. Service type (select server port if the port will be connected to a server).
4. Assign the port profile to the physical interface. Click Assign to list the available ports.
5. Select vdc-pod1-sw1 (QFabric QFX3000-M) and select RSNG2 to select the port
connection.
7. The physical port is now added to the port profiles (assignments) list.
9. Deploy the port profile to the vdc-core-sw1. Go to Deploy Mode and check pending
configuration deployments.
10. Deploy the changes to the device using the Deploy Now option.
Setting Up a QFabric System Using Network Director – Create Link Aggregation Groups
This section covers the configuration of LAGs by using Network Director.
NOTE: MC-LAG is only supported for QFX standalone systems. (This feature
was not supported in the MetaFabric 1.0 test bed.)
1. Select Build > Device Management > Manage Port Groups. Select Add new port group.
2. Select a device from the left pane, then click Select Devices to select the LAG member
links.
Selected links are shown below, which will belong to a LAG (port group).
2. Browse and select the image you would like to upload to the ND repository. Select
Upload.
3. Deploy the downloaded image to the physical devices. Select one of the following
three options:
Figure 191: Stage Image to Device for Install or for Later Installation
4. Click “Next” to select the devices for the upgrade and select the downloaded image
per device family
1. Select Monitor > Select Devices > vdc-pod1-sw1. This opens up the dashboard to enable
viewing of the QFabric System status.
4. Click Run Fabric Analyzer (in the right pane) to get the fabric internal link status.
Overview
Junos Space Security Director is one of the Junos Space management applications and
helps organizations improve the reach, ease, and accuracy of security policy administration
with a scalable, GUI-based management tool. It automates security provisioning through
one centralized Web-based interface to help administrators manage all phases of the
security policy life cycle more quickly and intuitively, from policy creation to remediation.
Security Director provides the following management efficiencies:
1. Scale security policy across multiple Juniper Networks SRX Series Services Gateways,
or manage multiple logical system (LSYS) instances on a single SRX Series device.
3. Define and enforce policies for controlling usage of specific applications such as
Facebook, instant messaging, and embedded social networking widgets through
included AppFW management.
4. Reuse security policies within Junos Space Security Director for improved security
enforcement accuracy, consistency, and compliance.
5. Build the infrastructure for further management innovation across the network through
open and secure Junos Space Network Management Platform integration.
When you finish creating and verifying your security configurations, you can publish these
configurations and keep them ready to be pushed to the security devices. Security Director
helps you deploy all the security configurations to the devices all at once by providing a
single interface that is intuitive. You can select all security devices that you are using on
the network and push all security configurations to these devices.
The Security Director application is divided into seven workspaces, which include Object
Builder, Firewall Policy, NAT Policy, VPN, Downloads, IPS Management, and Security
Director Devices.
• Object Builder—Workspace to create objects used for firewall policy, NAT policy, and
VPN configurations.
• Firewall Policy— Workspace to create and publish firewall policies on supported devices.
To discover network devices, Junos Space uses the SSH and SNMP protocols. Device
authentication is handled through administrator login SSH v2 credentials and SNMP
v1/v2c settings, which are part of the device discovery configuration. You can specify a
single IP address, a DNS hostname, an IP range, or an IP subnet to discover devices on a
network. During discovery, Junos Space connects to the physical device and retrieves the
active configuration and the status information of the device. To connect with and
configure devices, Junos Space uses Juniper Network’s Device Management Interface
(DMI), which is an extension to the NETCONF network configuration protocol. When
discovery succeeds, Junos Space creates an object in the Junos Space database to
represent the physical device and maintains a connection between the object and the
physical device so their information is linked.
Once you have added the device, you might a get mismatched DMI version
(Figure 197 on page 319). DMI version mismatch requires that the DMI be updated to ensure
that the management schema is compatible between Junos Space and the managed
devices.
Once devices are discovered, navigate to the Security Director’s Device pane. Here, you
can see the status of security devices on the network. To sync the device settings with
Junos Space and Security Director, right-click a security device and click Update. This
will import the configuration from the security device and sync the configuration with
the Security Director database.
NOTE: To create zones on Juniper Networks security devices, use the Network
Management Platform. As of the current version of Security Director (13.1P1.14
as of test completion), zone creation is not supported within Security Director.
This might be fixed in future versions of Security Director.
• Application signatures
• Extranet devices
• NAT pools
• Policy profiles
• VPN profiles
• Variables
Figure 200 on page 322 is an example of address object creation using Security Director.
Custom services, NAT pools, devices, and so on can be created in a similar fashion. To
create an object, follow these basic steps:
2. Go to Addresses.
3. Click on the Plus sign to create a new address on the right side.
NOTE: New address objects can also be created under Firewall Policy.
• All Devices—Predefined firewall policy that is available with Security Director. You can
add pre-rules and post-rules. When all the device policy configuration information is
updated on the devices, the rules are updated in the following order:
• Group pre-rules
• Device-specific rules
• Group post-rules
An All Devices policy enables rules to be enforced globally to all the devices managed
by Security Director.
• Group—Type of firewall policy that is shared with multiple devices. This type of policy
is used when you want to update a specific firewall policy configuration to a large set
of devices. You can create group pre-rules, group post-rules, and device rules for a
group policy. When a group firewall policy is updated on the devices, the rules are
updated in the following order:
• Group pre-rules
• Device-specific rules
• Group post-rules
• Device Policy—Type of firewall policy that is created per device. This type of policy is
used when you want to push a unique firewall policy configuration per device. You can
create device rules for a device firewall policy.
• Global Policy—Global Policy Rules are enforced regardless of ingress or egress zones;
they are enforced on any device transit. Any objects defined in the Global Policy Rules
must be defined in the global address book.
The basic settings of a firewall policy are obtained from the policy profile in Security
Director. The basic settings include log options, firewall authentication schemes, and
traffic redirection options.
All device pre-rules and post-rules are applicable to all security devices. Once pre-rules
are published, these rules are applied to all managed security devices. Security Policy
post-rules are published to the security device and can be used to overwrite device-specific
post-rules already deployed on the security device.
The general steps that must be followed to create a new security policy are:
4. Click on Create.
The new policy created is displayed in the right pane. An option is also included to save
the policy and validate the configuration (ensure that the configuration does not contain
errors). After policy creation, you need to publish or publish and update the policy to the
security device.
• Publish policy will push the policy to the Junos Space database. This will also validate
the configuration.
• Publish and update will push the policy to the security device. This is often preferred
as a means of provisioning multiple devices during short maintenance windows as this
feature publishes the device to the Junos Space database, validates that the
configuration will have no errors, and then updates the managed security devices with
the new configuration.
Figure 201 on page 324 shows an example of policy creation after importing a configuration
from a managed security device. The steps that were followed in this example are:
2. The right pane shows a list of policy name and all existing policies.
4. Click on the plus (+) sign to add a new rule or the (-) sign to delete an existing rule. In
this example, we are adding a new rule (Test-1). Click the +sign.
5. While modifying the address, select an address from the existing address book or click
on the plus (+) sign to add a new address.
6. Once you create the rule, use the up or down arrow to move the rules up or down.
• Source NAT—Translates the source IP address of a packet leaving the trust zone
(outbound traffic). It translates the traffic originating from the device in the trust zone.
Using source NAT, an internal device can access the network by using the IP addresses
specified in the NAT policy.
Junos Space Security Director provides you with a workflow where you can create and
apply NAT policies on devices in a network. To create a NAT policy:
2. Click Create NAT Policy from the left pane. You can create a group policy or a device
policy.
3. To create a group policy, enter name of the group policy, a description, and the assigned
device for which policies have been configured.
You can also search for the devices by entering the device name, device IP address, or
device tag in the Search field in the Select Devices section. The above steps can also be
used to create device NAT policies. The Validate button will check the NAT policies for
errors. If any errors are found during the validation, a red warning icon is shown for the
policy or policies containing errors. In the case of NAT policies, incomplete rules and
duplicate rule names should flag as errors during validation. Please note that an existing
policy must be locked before any changes can be attempted. NAT policies can also be
rearranged (moved up or down) using the arrow keys.
2. Right-click the policy on the right side that you want to publish and click Assign devices.
3. Select Schedule at a later time check box if you want to schedule and publish the
configuration later.
4. Click Next.
5. To preview the configuration changes that will be pushed to the device, click View
Configuration in XML format > XML configuration tab.
6. Click Close.
8. Click Publish and Update if you want to publish and update the devices with the
configuration.
You can view any job under Jobs for the status. Figure 203 on page 326 shows two NAT
policies configured on the selected firewall.
Typical jobs in Junos Space Network Management Platform include device discovery,
deploying services, pre-staging devices, and performing functional and configuration
audits. Jobs can be scheduled to occur immediately or in the future. For all jobs scheduled
in Junos Space Network Management Platform, you can view job status from the Jobs
workspace. Junos Space Network Management Platform maintains a history of job status
for all scheduled jobs. When a job is scheduled from a workspace, Junos Space Network
Management Platform assigns a job ID that serves to identify the job (along with the job
type) in the Manage Jobs inventory page.
You can perform the following tasks from the Jobs workspace:
• View statistics about average execution times for jobs, types of jobs that are run, and
success rate
• Cancel a scheduled job or in-progress job (when the job has stalled and is preventing
other jobs from starting).
• Archive old jobs and purge them from the Junos Space Network Management Platform
database.
Administrators can sort and filter on audit logs to determine which users performed what
actions on what objects at what time. For example, an Audit Log administrator can use
audit log filtering to track the user accounts that were added on a specific date, track
configuration changes across a particular type of device, view services that were
provisioned on specific devices, or monitor user login/logout activity over time. To use
the audit log service to monitor user requests and track changes initiated by users, you
must be assigned the Audit Log Administrator role.
Over time, the Audit Log Administrator will archive a large volume of Junos Space Network
Management Platform log entries. Such log entries might or might not be reviewed, but
they must be retained for a period of time. The Archive Purge feature helps you manage
your Junos Space Network Management Platform log volume, allowing you to archive
log files and then purge those log files from the Junos Space Network Management
Platform database. For each Archive Purge operation, the archived log files are saved in
a single file, in CSV format. The audit logs can be saved to a local server (the server that
functions as the active node in the Junos Space Network Management Platform fabric),
or a remote network host or media. When you archive data to a local server, the archived
log files are saved to the default directory /var/lib/mysql/archive.
The Audit Logs Export feature enables you to download audit logs in CSV format so that
you can view the audit logs in a separate application or save them on another machine
for further use, without purging them from the system.
This section provides a high-level overview of test results and a summary of target and
actual scale and performance vectors achieved during the solution testing.
At a high level, the solution met or exceeded each of the following solution scale and
performance goals:
General
• Solution must allow compute nodes to use all available network links for forwarding.
• Solution must support orchestration and movement of virtual resources between the
PODs.
• Solution must support an out-of-band (OOB) management network that can survive
the failure of the data plane within a POD.
Security
• Solution must provide redundant network connectivity using all available bandwidth.
• Solution must provide physical and virtual visibility and reporting of VM movements.
Application Scaling
Network Management
• Solution must support monitoring and provisioning using Junos Space and Network
Director 1.5.
There were some exceptions to the solution requirements that were discovered during
testing. An overview of these exceptions can be found in the “Known Issues” section of
this chapter.
Scale
Known Issues
There were several persistent issues in testing that caused the test results to fall outside
of the solution requirements. These issues include:
• This is a limitation of the storage array and the iSCSI protocol. iSCSI does support
multi-path in the event of failure.
• In-service software upgrade (ISSU) is not supported by the MX Series if the chassis is
populated with all 10-Gbps line cards. A 1-Gbps line card is required in order to start
ISSU.
• Graceful Routing Engine switchover (GRES) requires a 6-second BFD timer to work
properly on both the MX and EX Series. This is a known issue on MX Series (and now
on EX/QFabric).
• Junos Firefly Host does not support Application Layer Gateways (ALGs).
Index
• Index on page 335
Index
Symbols
#, comments in configuration statements...................xix
( ), in syntax descriptions....................................................xix
< >, in syntax descriptions...................................................xix
[ ], in configuration statements........................................xix
{ }, in configuration statements........................................xix
| (pipe), in syntax descriptions..........................................xix
B
braces, in configuration statements................................xix
brackets
angle, in syntax descriptions.....................................xix
square, in configuration statements.......................xix
C
comments, in configuration statements.......................xix
conventions
text and syntax.............................................................xviii
curly braces, in configuration statements.....................xix
customer support....................................................................xx
contacting JTAC...............................................................xx
D
documentation
comments on..................................................................xix
F
font conventions...................................................................xviii
M
manuals
comments on..................................................................xix
P
parentheses, in syntax descriptions................................xix
S
support, technical See technical support
syntax conventions..............................................................xviii