Beruflich Dokumente
Kultur Dokumente
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 1 of 132
Contents
Overview ................................................................................................................................................................... 4
Cisco Services ...................................................................................................................................................... 4
Target Audience.................................................................................................................................................... 5
Prerequisites ......................................................................................................................................................... 5
Introduction .............................................................................................................................................................. 5
Deployment Models for Interconnecting Cisco ACI Fabrics .................................................................................. 6
Cisco ACI Stretched Fabric ............................................................................................................................... 6
Cisco ACI Dual-Fabric Design .......................................................................................................................... 7
Cisco ACI Dual-Fabric Design Overview ............................................................................................................... 8
Reference Topology .............................................................................................................................................. 8
Cisco Data Center Interconnect ............................................................................................................................ 9
vPC as DCI Transport ..................................................................................................................................... 12
OTV as DCI Transport .................................................................................................................................... 13
VXLAN as DCI Transport ................................................................................................................................ 15
Dual-Fabric Layer 2 and Layer 3 Connectivity .................................................................................................... 17
Layer 2 Reachability Across Sites................................................................................................................... 17
Layer 3 Reachability Across Sites................................................................................................................... 23
Policy Consistency Across Cisco ACI Fabrics .................................................................................................... 26
Policy Consistency for Layer 2 Communication .............................................................................................. 27
Policy Consistency for Layer 3 Communication .............................................................................................. 28
Hypervisor Integration ......................................................................................................................................... 30
L4-L7 Service Integration Models ....................................................................................................................... 31
Data Center Firewall Deployment ................................................................................................................... 31
Cisco ASA Deployment Models ...................................................................................................................... 31
Cisco ASA Active-Standby Deployment ..................................................................................................... 31
Cisco ASA Cluster Deployment .................................................................................................................. 33
Multitenancy Support .......................................................................................................................................... 35
Cisco UCS Director and Cisco ACI Dual-Fabric Design ..................................................................................... 37
Storage Considerations ...................................................................................................................................... 40
Cisco ACI Dual-Fabric: Deployment Details ........................................................................................................ 40
Validated Topology ............................................................................................................................................. 40
Cisco ACI Fabric ............................................................................................................................................. 42
Firewalls .......................................................................................................................................................... 42
Data Center Interconnect ................................................................................................................................ 44
WAN Connectivity ........................................................................................................................................... 46
Logical Traffic Flow ............................................................................................................................................. 47
Traffic from DC1 to the WAN .......................................................................................................................... 47
Traffic from the WAN to DC1 .......................................................................................................................... 50
Traffic from the WAN to Data Center for Stretched Subnets ........................................................................... 52
Routed Traffic from DC1 to DC2 ..................................................................................................................... 54
Dual-Fabric Layer 2 and Layer 3 Connectivity .................................................................................................... 55
Deploying Layer 2 Connectivity Between Sites ............................................................................................... 55
Deploying Layer 3 Connectivity Between Sites ............................................................................................... 59
Deploying Hypervisor Integration ........................................................................................................................ 68
Cisco ASA Cluster Integration in a Cisco ACI Dual-Fabric Design ..................................................................... 71
Cisco ASA Cluster Configuration: Admin Context ........................................................................................... 72
ASA Cluster Configuration: Tenant Context .................................................................................................... 77
WAN Integration Considerations ......................................................................................................................... 84
North-South Traffic Flows ............................................................................................................................... 84
Deploying VXLAN as a DCI Solution .................................................................................................................. 89
Testing and Results ............................................................................................................................................... 99
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 2 of 132
Traffic Generator: Emulated Device Configuration .............................................................................................. 99
Traffic Generator: Streams................................................................................................................................ 100
Testing Overview .............................................................................................................................................. 100
Results Summary .............................................................................................................................................. 101
Test Results: Worst Affected Flows Only .......................................................................................................... 101
Link from ACI Leaf 1 in DC1 to the local Nexus 9300 VXLAN DCI device .................................................... 102
Nexus 9300 VXLAN DCI device node failure ................................................................................................ 104
Peer link failure between the Nexus 9300 DCI devices ................................................................................ 106
Cisco ASA Cluster Member Failure (Slave Node in DC1) ............................................................................. 109
Cisco ASA Cluster Member Failure (Master Node)........................................................................................... 114
Cisco ASA Cluster Member Failure (Slave Node DC2) .................................................................................... 118
Customer edge router: link with ACI fabric failure ............................................................................................. 122
Customer Edge Router WAN Link Failure ........................................................................................................ 125
Cisco ACI Border Leaf Node Failure ................................................................................................................. 128
Cisco ACI Spine Nodespine Failure .................................................................................................................. 130
Conclusion ........................................................................................................................................................... 131
Demonstrations of the Cisco ACI Dual Fabric Design ....................................................................................... 132
For More Information ........................................................................................................................................... 132
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 3 of 132
Overview
In the past few years, a new requirement has emerged for enterprises and service providers: they must provide a
data center environment that is continuously available. Customers expect applications to always be available, even
if the entire data center experiences a failure.
Enterprises and services providers also commonly need to be able to place workloads in any data center where
computing capacity exists. And they often need to distribute members of the same cluster across multiple data
center locations to provide continuous availability in the event of a data center failure.
To achieve such a continuously available and highly flexible data center environment, enterprises and service
providers are seeking an active-active architecture.
When planning an active-active architecture, you need to consider both active-active data centers and active-active
applications. To have active-active applications, you must first have active-active data centers. When you have
both, you have the capability to deliver new service levels by providing a continuously available environment.
A continuously available, active-active, flexible environment provides several benefits to the business:
● Increased uptime: A fault in a single location does not affect the capability of the application to continue to
perform in another location.
● Disaster avoidance: Shift away from disaster recovery and prevent outages from affecting the business in
the first place.
● Easier maintenance: Taking down a site (or a part of the computing infrastructure at a site) for maintenance
should be easier, because virtual or container-based workloads can be migrated to other sites while the
business continues to deliver service non disruptively during the migration and while the site is down.
● Flexible workload placement: All the computing resources on the sites are treated as a resource pool,
allowing automation, orchestration, and cloud management platforms to place workloads anywhere, more
fully utilizing resources. Affinity rules can be set up on the orchestration platforms so that the workloads are
co-located on the same site or forced to exist on different sites.
● Extremely low recovery time objective (RTO): A zero or nearly zero RTO reduces or eliminates
unacceptable impact on the business of any failure that occurs.
This document provides a guide to designing and deploying Cisco® Application Centric Infrastructure (Cisco ACI™)
in two data centers in an active-active architecture that delivers the benefits listed here.
The design presented in this document helps enterprises and service providers achieve a fully programmable,
software-defined multiple–data center infrastructure that reduces total cost of ownership (TCO), automates IT
tasks, and accelerates data center application deployments.
Cisco Services
Cisco Services offerings are available to assist with the planning, design, deployment, support, optimization, and
operation of the solution described on this document.
Effective design and deployment is essential to reduce risk, delays, and the total cost of adopting an active-active
architecture.
For an overview of Cisco Services for Cisco ACI, see Services for Cisco Application Centric Infrastructure and
Cisco Nexus 9000 Series Switches.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 4 of 132
Target Audience
The target audience for this document includes network and systems engineers and cloud, data center, and other
solution architects who are involved on the design of active-active data centers.
Prerequisites
To best understand the design presented in this document, the reader should have basic knowledge of Cisco ACI
and how it works and is designed for operation in a single site.
For more information, see the Cisco ACI white papers available at Cisco.com.
Introduction
This document explains the use of two independent Cisco ACI fabrics deployed on two data centers that are
interconnected through Cisco Data Center Interconnect (DCI) technologies such as virtual port channel (vPC),
Virtual Extensible LAN (VXLAN), and Overlay Transport Virtualization (OTV) at Layer 2 and Layer 3 to support an
active-active dual-data center design.
The goal of such a design is to support an active-active architecture and deliver its benefits as described in the
“Overview” section of this document. The design discussed in this document enables interconnection of two Cisco
ACI fabrics, each managed with a separate and dedicated Cisco Application Policy Infrastructure Controller (APIC)
cluster.
Each site contains Cisco Nexus 9000 Series Switches used as leaf and spine switches in the Cisco ACI fabric. The
number and models of the Cisco Nexus 9000 Series Switches used at each site are independent from each other.
For example, if the primary site requires more leaf switches than the secondary site, you can deploy Cisco Nexus
9500 platform modular spine switches at the primary site and deploy Cisco Nexus 9336PQ Switches as fixed spine
switches at the other site (the Cisco Nexus 9500 platform supports more interfaces, and so supports more leaf
switches, than the Cisco Nexus 9336PQ).
The Cisco ACI fabrics deployed in each data center location can be interconnected using the following DCI
technologies:
● Virtual port channel: You should use vPC technology only to interconnect two Cisco ACI fabrics in a point-
to-point manner. It requires the use of dark fibers or dense wavelength-division multiplexing (DWDM)
circuits between the fabrics. Connecting more than two sites with vPC is usually not recommended.
● Virtual Extensible LAN and Overlay Transport Virtualization: VXLAN and OTV are multipoint technologies
that you can use to interconnect more than two Cisco ACI fabrics; however, the focus of this document is on
a design with just two sites. With VXLAN and OTV, IP connectivity is required between the sites, and
connections can cross multiple Layer 3 devices (for example, some WAN routers).
Each Cisco ACI fabric is managed and configured independently from any other Cisco ACI fabric. The APIC cluster
on each site serves as the central point of management and operations for each fabric.
When operating two (or more) independent Cisco ACI fabrics, you need to synchronize policy across the sites to
provide an active-active architecture and support flexible workload placement and virtual machine live migration
(using VMware vMotion or similar technology). The process for achieving automated policy synchronization is
described later in this document.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 5 of 132
To provide true active-active architecture, in addition to supporting it from a network perspective, the design also
needs to integrate Layer 4 through Layer 7 (L4-L7) services such as firewalls and server load balancers. The Cisco
Adaptive Security Appliance (ASA) supports multisite active-active firewall clustering with sites located hundreds of
miles (or kilometers) apart and so this design uses ASA firewalls.
Note: Other firewall solutions with similar active-active capabilities can be used but are outside the scope of this
document.
Virtual machine manager (VMM) integration is performed on a per-site basis. The virtual machine controller, such
as the VMware vCenter, VMware vShield, or Microsoft System Center Virtual Machine Manager (SCVMM), is
integrated with the APIC cluster on each site. For example, vCenter in Site 1 would be integrated with the APIC
cluster in Site 1. VMware vSphere Release 6.0 and later adds a new feature, called Cross vCenter vMotion, that
supports the live migration of virtual machines between vCenter server instances. The design presented in this
document supports this new feature as one of the technologies for providing an active-active architecture.
Storage synchronization between the sites is an important consideration in an active-active architecture. Although
detailed deployment of a storage solution is outside the scope of this document, some overall considerations are
presented in the “Storage Considerations” section of this document.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 6 of 132
The stretched fabric is managed by a single APIC cluster, consisting of three APIC controllers, with two APIC
controllers deployed at one site and the third deployed at the other site. The use of a single APIC cluster stretched
across both sites, a shared endpoint database synchronized between spines at both sites, and a shared control
plane (IS-IS, COOP, and MP-BGP) defines and characterizes a Cisco ACI stretched fabric deployment.
Note: This document does not cover the Cisco ACI stretched fabric deployment model. For more information
about this model, please refer to the following document:
http://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/kb/b_kb-aci-stretched-fabric.html.
A dual-fabric design has an APIC cluster per site, and each cluster includes three (or more) APIC controllers. The
APIC controllers at one site have no direct relationship or communication with the others at other sites. The use of
an APIC cluster at each site, independent from other APIC clusters, with an independent endpoint database and
independent control plane (using IS-IS, COOP, and MP-BGP) per site, defines a Cisco ACI dual-fabric design.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 7 of 132
Figure 3. Cisco ACI Dual-Fabric Design
This document focuses on the design and deployment of a Cisco ACI dual-fabric design.
Reference Topology
The validated Cisco ACI dual-fabric design consists of two Cisco ACI fabrics, one per site, interconnected through
one of the following DCI options: back-to-back vPC over dark fiber, back-to-back vPC over DWDM, or VXLAN or
OTV (Figure 4).
Each fabric is composed of Cisco Nexus 9000 Series spine and leaf switches, and each site has an APIC cluster
consisting of three or more APIC controllers. Between the sites, over the DCI links, Layer 2 is extended by
configuring a static endpoint group (EPG) binding that extends an EPG to the other site using the DCI technology.
At the remote site, a static binding using the same VLAN ID maps the incoming traffic to the correct EPG.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 8 of 132
For Layer 3 connectivity between the sites, Exterior BGP (eBGP) peering is established between the border leaf
switches. Each Cisco ACI fabric is configured with a unique autonomous system number (ASN). Over this eBGP
peering system, IP prefixes relative to subnets that are locally defined at each site are advertised.
For the perimeter firewall, to handle north-south communication (WAN to data center and data center to WAN), the
reference topology presented in this document deploys an active-active ASA cluster, with two ASA devices at each
site. The topology also has been validated using an active-standby firewall design with, for example, the active
ASA at Site 1 and the standby ASA at Site 2. On both cases, the firewalls are inserted without a service graph,
instead using IP routing with a Layer 3 out between the ACI fabric and the Firewalls using OSPF as the routing
protocol.
The ASA cluster solution is better suited for an active-active architecture, because north-south communication is
through the local ASA nodes for IP subnets that are present at only one of the sites. When an ASA cluster is used,
the cluster-control-link (CCL) VLAN is extended through the DCI links. For traffic to subnets that exist at both sites,
if the traffic entering through site 1 needs to be sent to a host in data center 2, intracluster forwarding keep flows
symmetrical for the IP subnets present at both sites.
For inter-EPG filtering, Cisco ACI contracts were used during the validation process. A stateful firewall can also be
used for inter-EPG communication, but this option is beyond the scope of this document.
The ASA uses the Open Shortest Path First (OSPF) peering shown in the reference topology between the ASA
firewalls and the WAN edge routers to learn about the external networks and to advertise to the WAN edge devices
the subnets that exist in the Cisco ACI fabric. A detailed discussion of packet flow is provided in the section
“Logical Traffic Flow” later in this document.
Between the WAN edge routers and the WAN, the reference design uses eBGP because it provides demarcation
of the administrative domain and provides the option to manipulate routing policy.
The Cisco ACI dual-fabric design supports multitenancy. In the WAN edge routers, Virtual Routing and Forwarding
(VRF) provides logical isolation between the tenants, and within each VRF instance an OSPF neighborship is
established with the ASA firewall. In the ASA firewall, multiple contexts (virtual firewalls) are created, one per
tenant, so that the tenant separation is preserved. Tenant separation is maintained by creating multiple tenants in
the Cisco ACI fabric and extending Layer 3 connectivity to the firewall layer by using per-tenant (VRF) logical
connections (Layer 3 outside [L3Out] connections). Per-tenant eBGP sessions are also established between the
Cisco ACI fabrics, effectively creating between the fabrics multiple parallel eBGP sessions in a VRF-lite model over
the DCI extension.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 9 of 132
Three DCI options are proposed (Figure 5):
● One very simple option, limited to dual-site deployments, uses vPC. In this case, the border leaf switches of
both fabrics are simply connected back to back using either dark fiber of DWDM connections.
● The second option uses the most popular DCI technology: OTV. It is still uses vPC to connect to the fabric,
but it uses a Layer 3 routed connection over the core network.
● The third option is still emerging. It uses VXLAN technology to offer Layer 2 extension services across sites.
Both OTV and VXLAN allow you to interconnect more than two sites together. This document focuses on
interconnection of two sites, but technically you can connect more sites (if you have more than two sites, contact
your account team to be sure that you have full support).
Whatever technology is chosen for the interconnection, the DCI function must meet a set of requirements.
Remember that the aim of DCI is to allow transparency between sites with high availability: that is, to allow open
Layer 2 and Layer 3 extension while helping ensure that a failure in one data center is not propagated to another
data center.
For meet this goal, the main technical requirement is the capability to control Layer 2 broadcast, unknown unicast,
and multicast flooding at the data-plane level while helping ensure control-plane independence.
Layer 2 extension must be dual-homed for redundancy, but without allowing the creation of end-to-end Layer 2
loops that can lead to traffic storms that can overflow links and saturate the CPUs of switches and virtual
machines.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 10 of 132
Thus, DCI deployments must also complement support for Layer 2 extension with storm control (Figure 6).
The storm-control rate limiter must be tuned to a value that is determined mainly by the overload of the virtual
machine CPU. This value also depends on the server used and the ratio of physical to virtual servers because
each virtual machine receives the broadcast flow. A good starting point is to rate limit broadcast, unknown unicast,
and multicast traffic to 100 Mbps, which is 1 percent of a 10-Gbps link.
Cisco ACI border leaf nodes can rate-limit broadcast, unknown unicast, and multicast traffic at ingress from the DCI
vPC. This rate limiting is very specific, allowing a different limitation to be applied to different types of traffic.
● Broadcast traffic must be strictly limited, because it is the traffic that reaches the most the CPUs.
● Unknown unicast traffic also must be strictly limited. Under normal conditions, the amount of this type of
traffic should be small in a network as Address Resolution Protocol (ARP) exchange causes the learning of
remote MAC addresses.
● Layer 2 multicast traffic is more difficult to limit because some applications will be using it. If possible, you
should limit the intersite multicast traffic because this traffic can access the CPUs of multiple virtual
machines and switches at the same time. However, you must be sure to verify in every specific network
environment the amount of legitimate multicast traffic needed to avoid degrading the applications that rely
on it.
Note: For more information about how to configure storm control on the APIC, please refer to this link:
http://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/kb/b_KB_Configuring_Traffic_Storm_Control
_in_APIC.html.
Another consideration is the speed of the DCI links. In a metropolitan environment, with short distances between
the sites, an active-active data center configuration, and live migration of virtual machines between the sites, the
amount of bandwidth can be high, and two or more 10-Gbps connections may be needed. In the case of disaster
recovery, with a long distance between the data centers, the amount of bandwidth is dictated mainly by the time
needed to replicate the data.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 11 of 132
In addition, the Layer 2 transport must adapt to the core transport, which can be Multiprotocol Label Switching
(MPLS), IP, or DWDM, helping ensure fast convergence, flapping protection, and if possible, path diversity to allow
the heartbeats of server clusters to be transported on different physical paths.
Latency considerations are also part of the general DCI analysis. In the testing performed for the architecture
presented in this document, no latency was added between sites, but the basic assumption is that high latency is
supported because both Cisco ACI fabrics are totally independent from each other, and the only control planes
between them use either BGP or learning bridges (both supported over long distances).
Therefore, the latency considerations are not specific to the Cisco ACI dual-fabric model but instead are relative to
the other components of the solution. For example, in general virtual machine live migration is subject to limitations
introduced by the hypervisor. Traditionally, the maximum supported latency has been a round-trip time (RTT) of 10
milliseconds (ms), but this capability is evolving along with hypervisor-vendor recommendations (for example,
VMware supports RTT of up to 100 ms starting with vSphere Release 6.0).
An important consideration for applications is the location of the application data, with latency recommendations
based on storage replication. With asynchronous replication, there is no real limit, and so the limit depends on
disaster-recovery needs. Synchronous replication, however, has a strict limitation that depends on the deployed
technology. For example, EMC VPLEX and NetApp MetroCluster solutions support a maximum RTT latency limit of
10 ms.
Other types of clustering solutions deployed across data center sites may introduce similar limitations. For
example, server cluster extension is in general limited to 10 ms, but this limitation is evolving among cluster
vendors. Cluster extension considerations also apply to the deployment of active-active firewall solutions over
separate sites. Starting with Cisco ASA Software Release 9.5(1), ASA clusters are supported over two sites
deployed with 20 ms of RTT latency.
You must be sure to assess the impact that latency may have on applications deployed across data center sites.
You must especially be sure to assess the impact between the application tier and the database tier. In
deployments in which the latency between sites increases too much, the best approach usually is to deploy all
application tiers at the same site with local storage. In this case, you also will likely want applications to use
networks services (such as firewalls and load balancers) that are locally deployed at the same site.
When planning DCI deployments, you also need to consider path optimization. The goal of this optimization is to
attract traffic from the WAN directly to the data center in which the requested resource is deployed. Two
technologies can be used for this optimization: Cisco Locator/ID Separation Protocol (LISP) and host-based
routing. Both methods can help ensure optimal inbound traffic delivery to endpoints that are part of IP subnets that
are extended across separate data center sites. Integration of these technologies with a Cisco ACI dual-fabric
design is not described in this document, and the access from the WAN analyzed in the “WAN Connectivity”
section of this document is traditional access with a firewall.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 12 of 132
Figure 7. Cisco ACI Dual-Fabric Design with Back-to-Back vPC
You can use any number of links to form the back-to-back vPC, but for redundancy reasons, two is the minimum,
and this is the number validated in this document.
This dual-link vPC can use dark fiber. It can also use DWDM, but only if the DWDM transport offers high quality of
service. Because the transport in this case is ensured by Link Aggregation Control Protocol (LACP), you should not
rely on a link that offers only three 9s (99.9 percent) or less resiliency. In general, private DWDM with high
availability is good enough.
When using DWDM, you need to keep in mind that loss of signal is not reported. With DWDM, one side may stay
up while the other side is down. Cisco ACI allows you to configure Fast LACP to detect such a condition, and the
design reported in this document validates this capability to achieve fast convergence.
The core principles on which OTV operates are the use of a control protocol to advertise MAC address reachability
information (instead of using data-plane learning) and packet switching of IP encapsulated Layer 2 traffic for data
forwarding. OTV can be used to provide connectivity based on MAC address destinations while preserving most of
the characteristics of a Layer 3 interconnection.
Before MAC address reachability information can be exchanged, all OTV edge devices must become adjacent to
each other from an OTV perspective. This adjacency can be achieved in two ways, depending on the nature of the
transport network that interconnects the various sites. If the transport is multicast enabled, a specific multicast
group can be used to exchange control protocol messages between the OTV edge devices. If the transport is not
multicast enabled, an alternative deployment model is available starting from Cisco NX-OS Software Release
5.2(1). In this model, one OTV edge device (or more) can be configured as an adjacency server to which all other
edge devices register. In this way, the adjacency server can build a full list of the devices that belong to a given
overlay.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 13 of 132
An edge device forwards Layer 2 frames into and out of a site over the overlay interface. There is only one
authoritative edge device (AED) for all MAC unicast and multicast addresses for each given VLAN. The AED role is
negotiated, on a per-VLAN basis, among all the OTV edge devices that belong to the same site (that is, that are
characterized by the same site ID).
The internal interface facing the Cisco ACI fabric can be a vPC on the OTV edge device side. However, the
recommended attachment model uses independent port channels between each AED and the Cisco ACI fabric, as
shown in Figure 8.
Each OTV device defines a logical interface, called a join interface, that is used to encapsulate and decapsulate
Layer 2 Ethernet frames that need to be transported to remote sites.
OTV requires a site VLAN, which is assigned on each edge device that connects to the same overlay network.
OTV sends local hello messages on the site VLAN to detect other OTV edge devices in the site, and it uses the
site VLAN to determine the AED for the OTV-extended VLANs. Because OTV uses IS-IS protocol for this hello, the
Cisco ACI fabric must run software release 11.1 or later. This requirement is necessary because previous releases
prevented the OTV devices from exchanging IS-IS hello message through the fabric.
Note: An important benefit of the OTV site VLAN is the capability to detect a Layer 2 back door that may be
created between the two Cisco ACI fabrics. To support this capability, you should use the same site VLAN on both
Cisco ACI sites.
One of the main requirements of every LAN extension solution is Layer 2 connectivity between remote sites without
compromising the advantages of resiliency, stability, scalability, etc. obtained by interconnecting sites through a
routed transport infrastructure. OTV achieves this goal through four main functions:
● Spanning-tree isolation
● Unknown unicast traffic suppression
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 14 of 132
● ARP optimization
● Layer 2 broadcast policy control
OTV also offers a simple command-line interface (CLI), or it can easily be set up using a programming language
such as Python.
Because Ethernet frames are carried across the transport infrastructure after OTV encapsulation, you need to
consider the size of the maximum transmission unit (MTU).
As shown in Figure 9, OTV encapsulation increases the overall MTU size of 50 bytes. Consequently, you should
increase the MTU size of all the physical interfaces along the path between the source and destination endpoints to
account for those additional 50 bytes. An exception can be made when you are using the Cisco ASR 1000 Series
Aggregation Services Routers as the OTV platform, because these routers do support packet fragmentation.
Note: The figure above shows the new OTV encapsulation available on Cisco Nexus 7000 family of switches
starting from NXOS Software release 7.2 and on F3 series linecards.
In summary, OTV is designed for DCI, and it is still considered the most mature and functionally robust solution for
extending multipoint Layer 2 connectivity over a generic IP network. In addition, it offers native functions that allow
a stronger DCI connection and increased independence of the fabrics.
VXLAN has a 24-bit virtual network identifier (VNI) field that theoretically allows up to 16 million unique Layer 2
segments in the same network. Although the current network software and hardware limitations reduce the usable
VNI scale in actual deployments, the VXLAN protocol by design has at least lifted the 4096-VLAN limitation in the
traditional IEEE 802.1Q VLAN name space. VXLAN can solve this dilemma by decoupling Layer 2 domains from
the network infrastructure. The infrastructure is built as a Layer 3 fabric that doesn’t rely on Spanning Tree Protocol
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 15 of 132
for loop prevention or topology convergence. The Layer 2 domains reside on the overlay, with isolated broadcast
and failure domains.
The VTEP is a switch (physical or virtual) that originates and terminates VXLAN tunnels. The VTEP encapsulates
the end-host Layer 2 frames within an IP header to send them across the IP transport network, and it decapsulates
VXLAN packets received from the underlay IP network to forward them to local end hosts. The communicating
workloads are unaware of the VXLAN function.
VXLAN is a multipoint technology and can allow the interconnection of multiple sites. In the solution proposed in
this document, a VXLAN standalone network simply offers Layer 2 extension services to the Cisco ACI fabrics.
This Layer 2 DCI function is used both to stretch Layer 2 broadcast domains (IP subnets) across sites and to
establish Layer 3 peering between Cisco ACI fabrics to support routed communication.
As shown in Figure 10, logical back-to-back vPC connections are used between the Cisco ACI border leaf nodes
and the local pair of VXLAN DCI devices. Both DCI devices use a peer link between each other and connect to the
fabric border leaf nodes using either two or four links. Any edge VLAN is then connected to a VXLAN segment that
is transported using one only VNI (also called the VXLAN segment ID).
Figure 10. Using VXLAN as the DCI Option on Cisco Nexus 9000 in NX-OS Mode
The transport network between VTEPs can be a generic IP network. Unicast Layer 2 frames are encapsulated in
unicast Layer 3 VXLAN frames sent to the remote VTEP (both remote VXLAN devices advertise themselves in the
VXLAN network as a single anycast VTEP logical entity), and the packet is delivered to one of the remote DCI
nodes, with load balancing and backup. This backup is managed by the underlay routing protocol at the
convergence speed of this protocol. In the tests conducted for this document, BGP was used in conjunction with
Bidirectional Forwarding Detection (BFD) for fast convergence, but any other routing protocol, such as OSPF or IS-
IS, can also be used.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 16 of 132
Layer 2 broadcast, unknown unicast, and multicast frames must be delivered across the VXLAN network. Two
options are available to transport this multidestination traffic:
● Use multicast in the underlay Layer 3 core network. This is the optimal choice when a high level of Layer 2
multicast traffic is expected across sites.
● Use head-end replication on the source VTEP to avoid any multicast requirement to the core transport
network. This is the option that is validated in this document.
VXLAN can also rate-limit broadcast, unknown unicast, and multicast traffic, and as shown in previous Figure 6 this
should be used in conjunction with ACI storm-control capabilities.
VXLAN uses BGP with an Ethernet VPN (EVPN) address family to advertise learned hosts. The BGP design can
use edge-to-edge BGP peering, which is the best choice for a dual site, or it can use a route reflector if the network
is more complex, in which case Internal BGP (iBGP) can be used. VXLAN can provide Layer 2 and Layer 3 DCI
functions, both using BGP to advertise the MAC address, IP host address, or subnet connected. As previously
mentioned, in this document VXLAN is used as a pure Layer 2 DCI, and no Layer 3 option is used. The Layer 3
peering is fabric to fabric in an overlay of VXLAN Layer 2 in a dedicated VLAN.
One interesting VXLAN option is the capability to perform ARP suppression. Because VXLAN advertises both
Layer 2 MAC addresses and Layer 3 IP addresses and masks at the same time, the remote node can reply to ARP
locally without the need to flood the ARP request through the system.
Note: ARP suppression in VXLAN fabrics to extend only Layer 2 (and not Layer 3) connectivity is not supported
at the time of this writing, so it was not configured in validating this design.
The use of VXLAN starts and terminates at the Cisco ACI leaf layer and should not be confused with the potential
use of VXLAN as DCI technology as discussed in the previous section. When deploying a Cisco ACI dual-fabric
design, you thus must extend Layer 2 and Layer 3 reachability for endpoints connected to those separate fabrics
across the physical infrastructure that interconnects the sites.
In a Cisco ACI fabric, a Layer 2 broadcast domain is represented by a logical entity called a bridge domain. In
Cisco ACI, a bridge domain is a Layer 2 forwarding construct used to constrain broadcast and multicast traffic.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 17 of 132
Endpoints are always part of a bridge domain (BD). You can, however, group these endpoints into subgroups,
called endpoint groups, or EPGs, defined within a bridge domain. Each EPG can belong to only a given bridge
domain, but multiple EPGs can be part of the same bridge domain, as shown in Figure 11.
Figure 11. Multiple EPGs Can Be Part of the Same Bridge Domain
This association of endpoints into separate EPGs allows you to isolate them (for security policy enforcement) even
when they belong to the same Layer 2 domain.
The validated solution discussed in this document used a network-centric approach in which EPGs were mapped
to bridge domains one to one, as shown in Figure 12.
Workloads belonging to the same EPG are allowed to communicate freely with each other. The information about
the security group for specific endpoints is carried within the Cisco ACI fabric in a specific field in the VXLAN
header. As a consequence, this information is lost when the packets are decapsulated by the Cisco ACI border leaf
nodes and sent to the DCI connection.
Layer 2 reachability thus must be extended between endpoints across separate Cisco ACI fabrics to help ensure
that EPGs are correctly mapped to bridge domains at each data center site. Because a separate APIC cluster is
deployed at each site, this mapping must be configured independently, and it must be configured consistently.
Traffic originating from an endpoint belonging to a given EPG leaves the Cisco ACI fabric through a VLAN hand-off
performed by the Cisco ACI border leaf nodes. This process occurs independent of the specific DCI technology
deployed to interconnect the sites. Mapping an EPG on each site to a common VLAN ID allows you to provide
Layer 2 adjacencies between endpoints that are part of those security groups, as shown in Figure 13. It also allows
each site to classify the incoming packets into the correct EPG, in other words the VLAN ID is used on ingress to
detect the EPG membership of traffic coming from the other site.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 18 of 132
Figure 13. Static VLAN-to-EPG Mapping for Layer 2 Reachability Across Sites
A static VLAN-to-EPG mapping is defined on the border leaf nodes to help ensure that the VLAN=bridge
domain=EPG equation is kept consistent on both sites. The result is a logical end-to-end extension of the Layer 2
broadcast domain, which allows the two endpoints to become Layer 2 adjacent even when connecting to separate
Cisco ACI fabrics. This extension also allows support for live migration (or vMotion, to use the VMware
nomenclature) of endpoints across the sites.
Note: The example in Figure 13 shows the use of the same EPG and bridge domain names on both Cisco ACI
fabrics (EPG1 and BD1). This approach is recommended to simplify the design from an operational point of view.
However, note that the main requirement for achieving Layer 2 adjacency between the endpoints is that you map
the same VLAN tag to an EPG on each side and then verify that the endpoints are connected to those specific
EPGs (independent of the specific names they may have).
From a traffic flow perspective, Layer 2 communication between endpoints connected to separate fabrics is
achieved as shown in Figure 14.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 19 of 132
Traffic originating at EP1 is sent VXLAN-encapsulated across Cisco ACI Fabric 1, until it reaches the border leaf
nodes in Fabric 1. At that point, the traffic is de-encapsulated, and the VLAN hand-off to the DCI devices occurs.
The DCI devices in Fabric 1 perform the VLAN extension to the DCI devices in Fabric 2, which then forward the
VLAN-encapsulated traffic to the border leaf nodes of Site 2. After the border leaf nodes in Cisco ACI Fabric 2
receive the traffic, they classify it to the correct EPG based on the incoming VLAN ID, then they perform VXLAN re-
encapsulation, and the traffic is sent to the specific leaf node to which the destination is connected.
As a result of the traffic flow depicted in Figure 14, Cisco ACI Fabric 1 discovers EP2 as a local device connected
to the vPC local port on the border leaf nodes, and the opposite happens in Cisco ACI Fabric 2 (in which EP1 is
locally discovered on the border leaf nodes). This process is an important consideration for endpoint scalability,
because the mapping database in the spine devices of each fabric will need to maintain information about all the
local endpoints plus the endpoints in the remote fabric that belong to Layer 2 segments that are stretched across
sites. Keep in mind that the number of endpoints that can be stored in the spine node mapping database depends
mainly on the type of Cisco Nexus 9000 Series platforms deployed in that role, as shown in Figure 15.
Note: You mix different types of platforms as spine nodes in the same Cisco ACI fabric. However, keep in mind
that the endpoint scalability value is the minimum common denominator of all the deployed switch models. For the
latest scalability numbers, please refer to the ACI Verified Scalability Guides available at
http://www.cisco.com/c/en/us/support/cloud-systems-management/application-policy-infrastructure-controller-
apic/tsd-products-support-series-home.html
To allow Layer 2 communication between endpoints deployed in separate Cisco ACI fabrics, several configurations
are needed:
● ARP flooding must be enabled in the bridge domains defined on the two Cisco ACI fabrics. This
configuration is required when EP1, for example, is trying to send an ARP request to EP2, but EP2 has not
yet been discovered by the border leaf nodes in Fabric 1. Enabling ARP flooding in Fabric 1 helps ensure
that the ARP request can be sent to Fabric 2 across the DCI connection, so that EP2 can receive it and
respond.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 20 of 132
● Layer 2 unknown unicast flooding should be enabled in the bridge domains defined on the two Cisco ACI
fabrics. This configuration is needed in the event that the local border leaf nodes lose MAC address
information about remote endpoints that are part of the extended Layer 2 domain. A local endpoint (EP1)
may still have valid ARP information in its local cache and hence may still be creating data traffic directed to
the remote EP2 device. If you don’t enable unknown unicast flooding, the data traffic will be dropped at the
spine layer, because no information is available for the remote MAC address destination.
Note: For a more detailed description of the step-by-step packet flow, see the section “Cisco ACI Dual-Fabric:
Deployment Details.”
You also need to safely extend Layer 2 between data center sites to avoid the risk of creating end-to-end Layer 2
loops. Cisco ACI offers three main built-in functions to handle the creation of looped topologies, as shown in
Figure 16.
These functions, and their impact on the specific dual-fabric design discussed in this document, are as follows:
● Link-Layer Discovery Protocol (LLDP) loop protection: Every time two ports of different leaf nodes that
belong to the same Cisco ACI fabric are connected together, the exchange of LLDP packets causes the
connection to be disabled. The connection is disabled because no leaf-to-leaf connections are ever allowed
for leaf nodes that belong to the same fabric. Note that connection is not disabled when you connect two
leaf nodes of different Cisco ACI fabrics, as it is required when you deploy back-to-back vPC as a DCI
solution.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 21 of 132
● Spanning-tree loop detection: From a spanning-tree point of view, the Cisco ACI fabric is considered to be
like a wire. Therefore, when a Layer 2 device connected on the south side of the fabric originates a
spanning-tree BPDU frame, the Cisco ACI leaf node receiving the BPDU will forward it to all the other leaf
nodes that have local ports and that are part of the same EPG. Assuming that the EPG-to-VLAN mapping is
consistent on all the leaf nodes, this behavior allows the Layer 2 devices connected to the fabric to detect
and block a Layer 2 loop (as shown in the example in Figure 16).
In the context of the dual-fabric design discussed here, the recommended approach is to use vPC
connections between the border leaf nodes and the DCI devices. If you use back-to-back vPC or VXLAN as
DCI technologies, a single vPC logical connection is used on the Cisco ACI border leaf nodes, so a Layer 2
loop cannot be created. If you instead use OTV for DCI, two separate vPC connections are established
between the border leaf nodes and the local OTV devices, as shown earlier in Figure 8.
OTV has embedded capabilities to prevent the creation of data-plane end-to-end Layer 2 loops, helping
ensure that only one local OTV device is handling the Layer 2 traffic (unicast, multicast, and broadcast)
associated with each extended VLAN segment. This feature is very important because spanning-tree
BPDUs are not forwarded across the OTV logical connection, so those loops cannot be detected at the
control-plane level. A Layer 2 spanning-tree loop could be created, however, if you mistakenly connect two
local OTV devices. As shown in Figure 17, the Cisco ACI spanning-tree loop detection mechanism will help
ensure that those BPDUs are looped back toward the OTV devices, allowing them to break the loop.
● Miscabling Protocol (MCP) loop detection: This function is the latest addition to the set of Cisco ACI
loopback protections (available with Cisco ACI Software Release 11.1 and later). It detects Layer 2 loops
created southbound of the Cisco ACI fabric by sending MCP probes out from edge Layer 2 ports. Reception
of these probes (which are Layer 2 multicast frames) on a Layer 2 port on a different (or even on the same)
Cisco ACI leaf node indicates that a loop has been created south of the Cisco ACI fabric, and the interface
will be disabled. This mechanism can provide protection for the same scenarios as described for spanning-
tree loop detection. In addition, it can break Layer 2 loops that do not result in the generation of spanning-
tree BPDU frames. A typical example is the deployment of a firewall in transparent (or bridged) mode that
loops traffic between its interfaces because of a misconfiguration.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 22 of 132
Note: MCP loop detection can be enabled at the interface level. It is disabled by default.
For the dual-fabric design, MCP can be used to detect local loops, as shown previously in Figure 17. MCP
cannot be used on Layer 2 ports that connect two separate Cisco ACI fabrics together, because a leaf
device in one Cisco ACI fabric always drops MCP frames that originate from a leaf device that belongs to a
separate Cisco ACI fabric (because of a unique identifier that is added in the MCP packet).
Note that a tenant is only a logical container of VRF instances, bridge domains, and EPGs that is usually defined
for administrative purposes (the administrator of each tenant usually has rights to modify only constructs that apply
to the administrator’s dedicated environment).
Note: For more information about multitenancy support in the Cisco ACI fabric, see the section “Multitenancy
Support.”
For the specific dual-fabric design under discussion, the relationship between these logical constructs is shown in
Figure 19.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 23 of 132
Figure 19. Tenants, Private Networks, Bridge Domains, and EPGs for Dual-Fabric Design
In this case, a single private network (VRF instance) is defined for each tenant. All the bridge domains are
associated with the same VRF instance. Also, a single IP subnet is defined in each bridge domain, leading to a
one-to-one mapping between IP subnets and Layer 2 broadcast domains (a common networking practice).
This design raises a question, however: If endpoints that are part of the same IP subnet can be deployed across
separate Cisco ACI fabrics, where is the default gateway used when traffic needs to be routed to endpoints that
belong to different IP subnets?
Cisco ACI uses the concept of an anycast gateway: that is, every Cisco ACI leaf node can function as the default
gateway for the locally connected devices. When you deploy a dual-fabric design, you will want to use the anycast
gateway function across the entire system independent of the specific fabric to which an endpoint connects.
Figure 20 shows this model.
Figure 20. Pervasive Default Gateway Used Across Separate Cisco ACI Fabrics
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 24 of 132
The goal is to help ensure that a given endpoint always can use the local default gateway function on the Cisco
ACI leaf node to which it is connected. To support this model, each Cisco ACI fabric must offer the same default
gateway, with the same IP address (common virtual IP address 100.1.1.1 in the example) and the same MAC
address (common virtual MAC address). The latter is specifically required to support live mobility of endpoints
across different Cisco ACI fabrics, because with this approach the moving virtual machine preserves in its local
cache the MAC and IP address information for the default gateway.
Note: The capability to have the default gateway active on multiple sites requires ACI software release 1.2(1i) or
later.
When routing must be performed from an endpoint connected to a Cisco ACI fabric, two scenarios are possible:
● The destination is an internal IP subnet. In the specific dual-fabric design discussed here, an internal IP
subnet is either a subnet that is only locally defined in the same Cisco ACI fabric to which the endpoint
belongs, or it is an IP subnet stretched across Cisco ACI fabrics (using the Layer 2 DCI capabilities to
extend Layer 2 broadcast domains).
● The destination is an external IP subnet: In this case, the IP subnet is defined in the remote Cisco ACI fabric
only or in the WAN and therefore considered an external Layer 3 network domain (and not stretched across
sites or located in the WAN).
For routed communication between two endpoints connected to different Cisco ACI fabrics, the main difference in
these two scenarios is in the way that routed traffic is sent to the destination endpoint:
● If the destination endpoint is connected to an internal IP subnet, routing to the destination IP subnet is
performed in Cisco ACI Fabric 1, and the Layer 2 connection that stretches the bridge domain between data
center sites sends the traffic to the destination, as discussed earlier in the section “Layer 2 Reachability
Across Sites.” Figure 21 shows this scenario.
● If the destination endpoint is connected to an external IP subnet, a Layer 3 routing path must be defined
between the two Cisco ACI fabrics. As shown in Figure 22, this approach helps ensure that traffic can be
routed across the DCI connection to reach the remote destination endpoint.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 25 of 132
Figure 22. Routing to an Endpoint Connected to an External IP Subnet
You can create the Layer 3 peering between the two Cisco ACI fabrics shown in Figure 22 using a L3Out
logical function enabled on the border leaf nodes in each site. Note that when you run Cisco Nexus 9000
Series platforms in Cisco ACI mode, you can establish a dynamic Layer 3 peering over a vPC connection (a
function that at the time of this writing is not supported when you deploying Cisco Nexus 9000 Series
platforms running NX-OS standalone mode). As a consequence, the validated Cisco ACI dual-fabric design
proposes the use of the same vPC logical interface as is used to bridge Layer 2 traffic for routed
communications with the remote site. With this approach, the vPC logical interface on the border leaf nodes
is associated with the L3Out interface, and a specific VLAN is carried across the DCI connection to allow
establishment of Layer 3 dynamic peering between the border leaf nodes.
Figure 23 shows the validated use of eBGP sessions between the border leaf nodes that belong to separate
Cisco ACI fabrics.
Figure 23. Use of eBGP Control Plane between Cisco ACI Fabrics
Note: The same design model shown in Figure 23 is applicable independent of the DCI technology used to
extend Layer 2 domains between fabrics (vPC back-to-back, OTV, Virtual Private LAN Service [VPLS], VXLAN,
etc.).
Application of security policies (called contracts) between endpoints connected to a Cisco ACI fabric is a two-steps
process:
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 26 of 132
● First, the policy must be created and applied to (at least) two EPGs (one is called the provider of the policy,
the other is called the consumer).
● After the policy is in place, it can be applied to control communication between endpoints belonging to the
two different EPGs. For this mechanism to work, the endpoints must be classified and associated with the
correct EPG.
Association of an endpoint with a specific EPG is achieved by mapping the port to which the endpoint is connected
(which can be a physical or logical interface: for example, a vPC interface, a VLAN or a VXLAN ID) to that EPG.
This mapping can be static (when classifying traffic from physical endpoints: bare-metal servers, routers, switches,
etc.) or dynamic (when creating a VMM domain as the result of establishing a relationship between the APIC and
the VMM; for more information about this topic, see the section “Deploying Hypervisor Integration”).
As previously discussed, in the Cisco ACI dual-fabric design communication can be established between endpoints
connected to separate Cisco ACI fabrics. Because independent APIC clusters manage those fabrics, you must
help ensure that Layer 2 and Layer 3 traffic is properly classified at the point of entrance into the fabric, as
discussed in the following two sections.
In the Cisco ACI dual-fabric design, one-to-one mapping is performed between those VLAN tags and the
corresponding EPGs. This approach allows traffic flows to be associated with the proper security group (EPG),
achieving a logical EPG extension across data center sites, as shown in Figure 24.
In the example in Figure 24, Web1 and App1 EPGs are defined on the APIC that manages Cisco ACI Fabric 1,
together with the contract (security policy) C1 that governs the communication between endpoints that belong to
those EPGs. Web2 and App2 are EPGs defined on the APIC that manages Cisco ACI Fabric 2, and a C2 contract
(security policy) is applied between them. In a scenario in which the two application tiers are stretched between the
two separate sites (with the goal of being able to freely move across data centers the workloads that belong to
those tiers), you must help ensure that endpoints that belong to the Web1 EPG in Fabric 1 are treated as part of
the same extended EPG as the endpoints that belong to the Web2 EPG defined on the APIC in Fabric 2.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 27 of 132
An endpoint belonging to EPG Web1 and trying to communicate with an endpoint in the extended application EPG
(represented by the App1 EPG deployed in Fabric 1 and the App2 EPG deployed in Fabric 2) is subject to the C1
security policy defined on the APIC in Cisco ACI Fabric 1. This is the case independent of whether the application
endpoint is connected to Fabric 1 EPG App1 or Fabric 2 EPG App2, because routing between the web and
application IP subnets happens locally inside Cisco ACI Fabric 1, and traffic is then bridged across sites with the
VLAN tag (VLAN 301) associated with the application EPG.
Note in the example in Figure 24 that return traffic from the endpoint in EPG App2 to the endpoint in EPG Web1 is
subject to the C2 contract (security policy) because the application endpoint is connected to Fabric 2. For this
reason, the two policies must be configured consistently to avoid unexpected behavior depending on the Cisco ACI
fabric to which the endpoints are physically connected. Manual synchronization is possible but is an operationally
complex and tedious process. Use of an orchestrator that communicates with both APIC clusters to properly
configure Cisco ACI parameters is the recommended approach, as it will be discussed in the section “Cisco UCS
Director and Cisco ACI Dual-Fabric Design.”
Also, this asymmetric behavior across the two fabric does not represent a problem because Cisco ACI contracts
are stateless, so there is no need to see both legs of the same communication to allow traffic, as would be the case
with a L4-L7 stateful firewall implementation.
Note that when you define a specific filter entry associated with a Cisco ACI contract, you can enable a Stateful
option. This option is used to program a reflective access control list (ACL) in the hardware to allow TCP packets
only if the acknowledgment (ACK) flag is set, and it does not perform a true stateful inspection. This behavior is
shown in Figure 25, in which a stateful Cisco ACI contract is configured between EPG A and EPG B to allow only
traffic destined for port 80.
Enabling the Stateful flag creates a second entry that permits traffic from an endpoint in EPG B to travel to an
endpoint in EPG A (sourced from port 80) only if the ACK flag is set in the TCP header. The asymmetric traffic path
shown in Figure 24 does not create problems even when this stateful behavior is enabled.
In the first scenario, in line with the approach taken in this document, you deploy separate EPGs in separate bridge
domains. In this case, endpoints that belong to different EPGs are also part of separate IP subnets, as shown in
Figure 26
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 28 of 132
Figure 26. One EPG per Bridge Domain with a Unique IP Subnet
In this case, traffic can be classified into a specific security group (EPG) based simply on the IP subnet of the
sourcing endpoint. In the example in Figure 26, when endpoint 172.10.1.1, part of the Web1 EPG in Fabric 1,
sends traffic to Fabric 2, this traffic can be mapped to an Ext-Web1 EPG associated with the L3Out logical
construct defined on the border leaf nodes in Fabric 2. The external EPG associated with the L3Out interface is
used to model external Layer 3 networks that try to communicate with resources internal to the Cisco ACI fabric.
From a policy point of view, a specific security contract must always be provided between the external EPGs
associated with L3Out connections and the internal EPG to which the destination endpoint belongs (in this
example, App2 in Fabric 2). The main requirement here is that this contract must be consistent with the contract
used when a web endpoint tries to communicate with a local application endpoint (in other words, C1 and C2
should be consistent). This requirement helps ensure consistent policy enforcement of intrafabric and interfabric
communication between different application tiers.
Note that communication between an endpoint that is part of the Web1 EPG and an endpoint this is part of the
App2 EPG (shown in Figure 26) is always subject to two contracts: The C1 policy controls communication in Cisco
ACI Fabric 1 between the internal Web1 EPG and the external EPG Ext-App2. The C2 contract governs
communication in Fabric 2 between the Ext-Web1 EPG and the internal App2 EPG.
In the second scenario, multiple EPGs are associated with the same bridge domain. In the example shown in
Figure 27, in which a single IP subnet is associated with the bridge domain, endpoints that belong to different
EPGs are part of the same IP subnet.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 29 of 132
In this case, you cannot map an IP prefix to an External EPG because two specific IP addresses that are part of
the same IP subnet prefix may refer to endpoints that belong to different EPGs. A more precise approach is thus
required, in which you use the specific host route information to map the Layer 3 traffic from Fabric 1 to the proper
external EPG defined in Fabric 2.
Note: This second scenario is not discussed in the rest of this paper.
Hypervisor Integration
To provide tight integration between physical infrastructure and virtual endpoints, Cisco ACI can integrate with
hypervisor management servers (VMware vCenter, Microsoft SCVMM, and OpenStack are available options at the
time of this writing). These hypervisor management stations are usually referred to as virtual machine managers, or
VMMs. You can create one (or more) VMM domains by establishing a relationship between the VMM and the APIC
controller.
For more detailed information about APIC integration with vCenter and SCVMM, please refer to the following links:
● http://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/1-
x/virtualization/b_ACI_Virtualization_Guide/b_ACI_Virtualization_Guide_chapter_010.html
● http://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/application-centric-
infrastructure/guide-c07-735992.html
In the dual-fabric solution, separate APIC clusters are deployed to manage different Cisco ACI fabrics; hence,
different VMM domains are created in separate sites. Depending on the specific deployment use case, you may
want to allow endpoint mobility across data center sites, which requires moving workloads across VMM domains.
At the time of this writing, the only possible solution is to integrate the APIC with VMware vSphere Release 6.0,
because this release introduces support for live migration between VMware ESXi hosts managed by different
vCenter servers (Figure 28).
Cisco ACI Release 11.2 introduces support for integration with vCenter 6.0, so it is the minimum recommended
release needed to support live migration across the dual-fabric deployment. Note that Cisco ACI Release 11.2
supports live mobility only when the native VMware vSphere Distributed Switch (DVS) virtual switch is used.
Starting with the next Cisco ACI release, support will be extended to deployments using the Cisco Application
Virtual Switch (AVS) on top of vSphere.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 30 of 132
Note: For more information about the use of AVS with vSphere and the integration with Cisco ACI, please refer
to the following document: http://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/application-
centric-infrastructure/white-paper-c11-731999.html
● Managed integration using a service graph with a device package: In this case, the Cisco ACI fabric
provides both connectivity and configuration of the L4-L7 device using the device package. The device
package normally is provided by the vendor and consists of a packaged set of scripts that the APIC uses to
configure the device. It is similar to a device driver in a typical operating system.
● Unmanaged integration using a service graph only, without a device package required: In this model, the
Cisco ACI fabric provides only connectivity to and from the L4-L7 device. The configuration of the device is
left to the end user and can be performed either using native tools provided by the manufacturer of the
device or using an orchestration tool for the device. This approach does not require the use of device
packages, which allows you to integrate essentially all third-party network service functions, even when a
device package from the specific vendor is not available.
● No integration, with traffic to and from the L4-L7 appliance mapped manually: This deployment model
doesn’t use either a service graph or a device package. Traffic is mapped to and from the L4-L7 appliance
either on the basis of Layer 2 or using Layer 3 routing. In other words, the network service device is
connected to the Cisco ACI fabric as a generic physical resource.
● Internal firewalls: These are used for east-west traffic between security zones inside a data center. In many
scenarios, you may be able to use the contracts-based security natively provided by the Cisco ACI fabric.
● External (perimeter) firewalls: These protect north-south communication in and out of the data center. This
type of communication often requires stateful traffic inspection, so external firewalls are commonly deployed
using dedicated security appliances.
For the purposes of this document and the validation process, a Cisco ASA firewall is used for the perimeter, and
Cisco ACI fabric contracts are used to protect the east-west traffic.
The perimeter firewall often also handles other types of communication in the organization, and a dedicated team
often manages it, so the design discussed here uses the “no integration” deployment model. So from the viewpoint
of Cisco ACI, the firewall is an externally managed physical device.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 31 of 132
different contexts as active on different appliances, providing load sharing (for example, you can have half of the
contexts active in one appliance, and half active in the other appliance), as shown in Figure 29.
The active ASA firewall (or better, the active context) peers with the Cisco ACI fabric on its internal interface. At the
same time, an OSPF peering is established between the outside interface and the WAN edge routers, using the
Cisco ACI fabric as pure Layer 2 transport for the establishment of this peering.
Assume that all these IP subnets are associated with the same tenant using ASA Context 1 for communication with
the external network domain. The following occurs:
● Traffic from subnet A (DC1 only) leaving the fabric and destined for subnet D in the WAN uses the local
ASA in DC1. This traffic uses the optimal forwarding path.
● Traffic from subnet B (DC2 only) destined for the WAN uses the Layer 3 DCI connection between the
fabrics to get to DC1, where the active ASA is located. This traffic uses a suboptimal forwarding path. A
suboptimal path is also used for the return traffic from subnet D to subnet B.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 32 of 132
● Traffic from subnet C destined for the WAN:
◦ Traffic originating from DC1 uses the local ASA in DC1. This traffic uses the optimal forwarding path.
◦ Some devices currently located in DC2 will have to use the Layer 3 DCI between data centers to reach
the active ASA in DC1. This traffic uses a suboptimal forwarding path.
Note that the return traffic from subnet D to subnets A, B, and C follows a path that is symmetric to that in the
outbound direction, because the only entry point into the fabric is the active firewall deployed in Site 1.
The introduction of a second deployment model, the ASA cluster, is required if the goal is to improve the north-
south forwarding behavior.
● Scalability up to 16 nodes
● Simple management using the master node as a central point of management for the whole cluster
● State sharing using the cluster control link (CCL)
● High availability
ASA clustering differs from the traditional active-standby deployment model. In cluster mode, every member of the
cluster has the same configuration, is capable of forwarding every traffic flow, and can be active for all flows.
In the event of a failure, connectivity is maintained through the cluster because the connection information is
replicated to at least one other unit in the cluster. Each connection has a replicated connection residing on a
different cluster unit and takes over if a failure occurs.
The benefits of clustering over multiple data centers include the following:
● Some of the north-south traffic flows in a multisite data center can be asymmetrical or suboptimal (as
discussed in the previous section). Clustering features force asymmetrical flows to become symmetrical.
● Clustering provides transparent stateful live migration with automatic redirection of flows.
● Clustering provides consistent firewall configurations between data centers.
● New connections can be offloaded to other members of the cluster if the firewall is too busy.
● Clustering provides strong redundancy and disaster recovery in the event of an appliance or link failure.
The design discussed in this document uses two physical ASA 5585-X appliances in each data center, for a total of
four devices in the cluster. Within each data center, firewalls are attached to the Cisco ACI fabric using vPC
technology. Each firewall uses two vPCs: one for the data traffic, and one for the Cluster Control Protocol (CCP)
traffic: the CCL, as shown in Figure 30.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 33 of 132
Figure 30. Cisco ASA Cluster Deployment Model
This design uses multiple VLANs to bring the traffic in and out of the firewall through the data port channel. VLANs
are also used to access the multiple contexts of the firewall.
The firewalls are running in routed mode. Within each site, each firewall peers on its inside interface with the local
Cisco ACI fabric using a dynamic routing protocol (OSPF). On the outside interface, the firewall peers with the local
WAN edge routers through the Cisco ACI fabric (the fabric performs only Layer 2 transport functions in this specific
case).
A dedicated VLAN is used to provide the CCL connectivity between cluster nodes within each site. This VLAN is
also extended through the DCI link to the other data center, allowing all four firewalls to communicate and build the
cluster. This connectivity uses the CCL vPC out of the ASA.
Figure 31 presents the same example as previously discussed for the active-standby ASA option, but showing the
traffic-path-optimization benefits of an ASA cluster deployment.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 34 of 132
● Traffic from subnet A (DC1 only) leaving the fabric and traveling to subnet D in the WAN uses one of the
local ASA devices in DC1. This traffic uses the optimal forwarding path.
● Traffic from subnet B (DC2 only) leaving the fabric and traveling to the WAN uses one of the local ASA
devices in the data center. This traffic uses the optimal forwarding path.
● Traffic from stretched subnet C:
◦ Traffic originating from DC1 uses one of the local ASA devices in DC1. This traffic uses the optimal
forwarding path.
◦ Traffic for some devices currently located in DC2 use one of the local ASA devices in DC2. This traffic
uses the optimal forwarding path.
As you can see, the cluster mode uses the optimal forwarding path in all situations for outbound traffic. The same
is true for traffic from the WAN to the nonstretched IP subnets (A and B). For stretched subnet C, the WAN will
likely learn the IP prefix information from both sites, so traffic may be steered indifferently to Site 1 or Site 2. The
deployment of an ASA cluster helps ensure that this communication can succeed even when it results in
asymmetric behavior (for instance, outbound traffic to the WAN leaves from Site 1 and reenters through Site 2).
This approach works because the ASA cluster nodes can redirect flows through the CCL to the cluster node that
holds the state information for each specific flow.
Note: Several techniques (global site load balancer [GSLB], LISP, etc.) can be used to influence the inbound
traffic flows destined for stretched subnets to help ensure that they are delivered to the correct sites: the site to
which the specific destination endpoint is connected. With these techniques, you would not have to perform cluster
redirection within ASA devices, which is a suboptimal traffic behavior. These inbound path optimization options are
beyond the scope of this document.
Multitenancy Support
The Cisco ACI dual-fabric design discussed in this document fully supports multitenancy. This section provides an
overview of this support.
For simplicity, the example in Figure 32 shows one Cisco ACI fabric. However, the design and the multitenancy
considerations are the same for the other site (or other sites, if more than two are used).
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 35 of 132
Logical isolation across different tenants is maintained end to end as described here for traffic coming from the
WAN and going to the data center. The same logic applies for the return traffic from the data center to the WAN.
● In the WAN (for traffic between a WAN router and WAN edge router), multitenancy is achieved through well-
known WAN multitenant solutions, such as MPLS Layer 3 VPN and VRF-lite. Figure 32 shows a VRF-lite
approach between the WAN edge router and the WAN router. In this case, the routers are interconnected
through an IEEE 802.1Q trunk with multiple Layer 3 subinterfaces, with each interface associated with a
different VRF instance. There is one eBGP session per VRF instance between the routers for route
distribution.
● To preserve multitenancy between the WAN edge router and the firewall, the WAN edge router is
connected to the Cisco ACI fabric with a Layer 3 port-channel interface, and Layer 3 subinterfaces are
created and allocated to different VRF instances (VRF-Lite approach). On the Cisco ACI fabric, on a per-
tenant basis, an EPG is created and statically bonded to the vPC to which the WAN edge router is
connected. This same EPG is also statically bonded to the vPC connected to the ASA firewall. Therefore,
the Cisco ACI fabric provides a transit Layer 2–only EPG and bridge domain on a per-tenant basis for the
Layer 3 subinterface on the WAN edge router to communicate with the firewall context associated with the
same tenant. Between the WAN edge router and the firewall context, OSPF is used as the routing protocol
for the WAN edge router to advertise WAN subnets to the firewall and for the firewall to advertise the tenant
subnets to the WAN.
● The ASA in this design is configured in multiple-context routed mode to provide multitenancy support, and it
runs OSPF within the context with the WAN edge router. The ASA provides the firewall services for traffic
entering and leaving the Cisco ACI fabric and is inserted in the data path through routing. When the traffic
leaves the ASA and travels toward the Cisco ACI fabric or toward the WAN edge router, the traffic is tagged
with the VLAN ID that was configured on the Cisco ACI fabric. This tagging helps ensure that the traffic is
forwarded by the fabric within the designated tenant.
● As previously mentioned in the section Layer 3 Reachability Across Sites, multitenancy is natively
supported in the Cisco ACI fabric itself. All configurations that dictate traffic forwarding in Cisco ACI are part
of a tenant. The application abstraction demands that EPGs always be part of an application network
profile, and the relationship between EPGs through contracts can span application profiles within the same
tenant and even between tenants.
Bridging domains and routing instances to move IP packets across the fabric provide the transport
infrastructure for the workloads defined in the EPGs. Within a tenant, you define one or more Layer 3
networks (VRF instances), one or more bridge domains per network, and EPGs to divide the bridge
domains.
● In the Cisco ACI dual-fabric design, each tenant is also configured with two L3Out logical connections for
routed connectivity to external networks. The external networks in this case are the WAN and the subnets
that exist exclusively in the remote data center. One of the L3Out interfaces provides the OSPF peering
between the fabric and the ASA firewall, and the fabric uses it to receive the WAN routes and to advertise
the internal (tenant) subnets to the ASA, which then advertises them to the WAN.
The second L3Out interface is configured on each tenant (and is called L3Out-DCI in Figure 32). This
interface is configured with eBGP and is used to establish BGP peering between the fabrics in the data
centers so that each fabric can advertise its local subnets to the other fabric. Here, a local subnet is a
subnet that, for example, exists only in Site 2 but still needs to be reachable from Site 1. This subnet is
considered an external subnet from the perspective of Site 1, and it is learned through the eBGP peering
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 36 of 132
established through the L3Out-DCI. This eBGP peering session is transported over the DCI links that
interconnect the two sites and is established one per tenant (VRF-lite configuration).
Cisco UCS Director uses a workflow orchestration engine with workflow tasks that support the computing,
networking, storage, and virtualization layers. Cisco UCS Director supports multitenancy, which enables policy-
based and shared use of the infrastructure.
Cisco UCS Director integrates with Cisco ACI by communicating with an APIC cluster. When Cisco UCS Director
establishes a connection with the APIC, it discovers all infrastructure elements in the Cisco ACI fabric.
To establish the connection from Cisco UCS Director to the APIC, you just provide the IP address of one of the
APIC controllers in the APIC cluster with a username and password. Cisco UCS Director will automatically discover
the IP address of other APIC nodes that are part of the same APIC cluster. If the IP address of the APIC that was
used to establish the connection goes down or is not reachable for 45 seconds, Cisco UCS Director tries to use
any reachable controller IP address to interact with the APIC cluster.
After Cisco UCS Director has established connection with the APIC, information about the Cisco ACI fabric
becomes available in Cisco UCS Director. A list of the information collected and displayed by Cisco UCS Director
is available in the document Cisco UCS Director Configuration Guide for Cisco ACI.
Cisco UCS Director can establish connection with one or more APIC clusters, including with APIC clusters
deployed at multiple sites as in the Cisco ACI dual-fabric design described in this document.
After Cisco UCS Director has established connection with multiple APIC clusters, it can deploy multitier
applications in one or more data centers and create Cisco ACI objects (EPGs, bridge domains, subnets, etc.) in
multiple APIC clusters simultaneously.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 37 of 132
To deploy a multitier application in Cisco ACI, Cisco UCS Director creates an application profile or uses an existing
one. An application profile defines the following:
● Cisco ACI network tiers for delivering application resources for the associated tenant profile
● A suitable resource group, which defines the capacity and quality of the Cisco UCS physical, virtual,
computing, and storage resources for each application component
● Cisco ACI network services that are required to deliver the appropriate service quality and security for the
application
To perform automated provisioning of the Cisco ACI configuration, Cisco UCS Director uses workflows that consist
of tasks. Cisco UCS Director comes preconfigured with more than 200 workflow tasks specific to Cisco ACI, out of
a total of more than 1800 network, computing, and storage tasks. You can drag and drop the tasks to create the
desired workflow (Figure 34).
If you need some specific Cisco ACI tasks that are not available in the Cisco UCS Director library, you can use an
open-source tool to automatically generate Cisco UCS Director custom tasks for Cisco ACI automation. This tool is
available at https://github.com/erjosito/request and is shown in Figure 35.
Figure 35. Tool for Creating Cisco UCS Director Custom Tasks
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 38 of 132
You can use this tool to automatically generate WFDX files containing custom tasks that can be imported into
Cisco UCS Director. You need to capture the Cisco ACI representational state transfer (REST) calls (for example,
with the API inspector or the Save As function) and paste them in the tool.
You can then use these custom tasks to build more complex workflows by combining them with the preconfigured
Cisco ACI tasks.
In the Cisco ACI dual-fabric design discussed in this document, a single instance of Cisco UCS Director is
deployed, and it establishes connection with the APIC clusters in both data centers. After that, Cisco UCS
Director becomes the central and preferred platform for the provisioning of application network profiles, EPGs,
bridge domains, etc., as shown in Figure 36, because it keeps the configuration between the two APIC clusters
consistent. Cisco UCS Director discovers configuration changes performed directly in the APIC controllers as it
monitors the infrastructure for changes, and it reflects those changes in its object model. The configuration,
however, is not automatically synchronized with the other APIC cluster.
Figure 36. Cisco UCS Director Integration in a Cisco ACI Dual-Fabric Design
Cisco UCS Director includes several workflows for Cisco ACI provisioning, and these can easily be customized.
You can perform customization or create new workflows by yourself, or Cisco Advanced Services can assist with
the creation of the workflows.
A demonstration of the use of Cisco UCS Director with Cisco ACI to provision and deprovision a three-tier
application is available here.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 39 of 132
Storage Considerations
The Cisco ACI dual-fabric design does not impose specific storage requirements. However, to achieve an active-
active architecture, you must consider the required storage architecture for your application.
For disaster recovery, asynchronous replication is usually acceptable. This approach allows data centers to be
physically located hundreds or even thousands of miles apart. If you use asynchronous replication, some amount
of data loss on application failover must be acceptable because not all data will be synchronized.
With synchronous replication, when disk I/O is performed by the application or by the file cache system on the
primary disk, the server waits for I/O acknowledgment from the local disk and from the remote disk before sending
an I/O acknowledgment to the application or to the file system cache. This mechanism allows failover without data
loss, but it limits the distance between the data centers. Usually, synchronous replication is limited to tens or
hundreds of miles and is commonly used within metropolitan areas.
For live migration of virtual machines, the storage infrastructure should provide host access to shared storage.
Therefore, during a vMotion operation, the migrating virtual machine must be on storage that is accessible to both
the source and target hypervisors hosts.
Depending on the hypervisor, live migration of virtual machines can be supported in environments without shared
storage. For example, the VMware vSphere 6.0 hypervisor can support this migration. In this case, you can use
vMotion to migrate virtual machines to different computing resources and storage devices simultaneously, which
means that you can migrate virtual machines across storage accessibility boundaries. This approach is useful for
performing cross-cluster and cross-data center migrations when the target cluster machines may not have access
to the source cluster’s storage. Note that vMotion migration in environments without shared storage take
considerably longer than such migration in environments with shared storage.
Note: Recommendations about specific storage products and solutions for an active-active data center
architecture are beyond the scope of this document.
Validated Topology
Figure 37 shows the overall validated topology, its components, the software versions used, and the physical
connectivity. Subsequent subsections provide details about each specific area.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 40 of 132
Figure 37. Cisco ACI Dual-Fabric Validated Topology
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 41 of 132
Cisco ACI Fabric
Each data center has its own Cisco ACI fabric with the Cisco Nexus 9000 Series used as leaf and spine switches
and three APIC controllers forming each APIC cluster per site.
The leaf switches are connected to the spine switches in a full bipartite graph, or Clos architecture. There are no
links between the leaf nodes or between the spine nodes.
The APIC controllers are connected on any ports of the fabric leaf switches with 10 Gigabit Ethernet interfaces.
Each APIC has two 10-Gbps interfaces, and each interface should be connected to a different leaf switch. Each
APIC can be connected to a different pair or to the same pair of leaf switches. In the topology in Figure 37, all three
APICs are connected to the same pair of leaf switches (leaf 101 and leaf 102).
The Cisco ACI platform tested in this design guide uses these software releases:
Software releases later than the ones listed here also support the design discussed in this document.
Note: You should adjust the actual numbers and models of the leaf and spine switches and APIC controllers
based on the number of ports and bandwidth required in your specific deployment.
Firewalls
Two options for firewall configuration were validated: active-active failover and ASA cluster deployments.
For active-active failover, you divide the security contexts on the ASA into two failover groups. A failover group is
simply a logical group of one or more security contexts. One group is assigned to be active on the primary ASA,
and the other group is assigned to be active on the secondary ASA. When a failover occurs, it occurs at the failover
group level. In a Cisco ACI dual-fabric design using active-active failover, one ASA is located in data center 1, and
the other is located in data center 2, and the ASAs are configured in routed mode with multiple contexts.
For the physical connection between the ASA and Cisco ACI, the ASAs connect to the Cisco ACI leaf switches
using an EtherChannel with LACP on the ASA side and a vPC on the Cisco ACI leaf switches. Separate
EtherChannels are used for the data and failover links. ASA does not allow user data and the failover link to share
interfaces, even if different subinterfaces are configured for user data and failover. The failover link and stateful
failover link (also known as the state link) share the same link, because this is the best way to reduce the number
of physical interfaces used.
The second firewall design validated in the Cisco ACI dual-fabric design uses ASA clustering. ASA clustering lets
you group multiple ASAs together as a single logical device. A cluster provides all the convenience of a single
device (management, integration into a network, etc.) while achieving the increased throughput and redundancy of
multiple devices.
ASA clustering is the preferred firewall option for the Cisco ACI dual-fabric design. In an ASA cluster, all the
firewalls actively forward traffic, allowing better utilization of the resources. Also, the cluster automatically manages
any asymmetric traffic, redirecting the traffic back to the specific ASA node that owns the connection.
In the Cisco ACI dual-fabric design with ASA clustering, each data center has two Cisco ASA 5585-X firewalls. Any
Cisco ASA firewall that supports ASA clustering can support the validated design. The four ASA firewalls present in
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 42 of 132
the topology are optimally configured in an all-active ASA cluster as detailed in the section “Cisco ASA Cluster
Integration” later in this document.
The ASA software release tested on this design guide is Cisco ASA Software Release 9.5(1).
For the physical connectivity to establish the ASA cluster, each ASA has two EtherChannels. One is used for data
traffic, and the other is used for the cluster control link, or CCL. As shown in Figure 38, the EtherChannel member
links are connected to two different Cisco ACI leaf switches, and a vPC is configured on the Cisco ACI fabric.
EtherChannels and vPC have built-in redundancy. If one link fails, traffic is rebalanced between remaining links. If
all the links in the EtherChannel fail on a particular device but other devices are still active, the failed device is
removed from the cluster.
The recommended way to connect ASAs to the Cisco ACI fabric is to use EtherChannels with LACP on the ASA
connected to a vPC with LACP on the Cisco ACI fabric.
In the Cisco ACI dual-fabric design, the ASA cluster is configured in routed mode with multiple contexts using
individual interfaces. OSPF is used as the routing protocol for peering between the Cisco ACI fabric and the ASAs
and between the ASAs and the WAN edge routers using subinterfaces on the data EtherChannel to best utilize the
available interfaces.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 43 of 132
For the operation of the ASA cluster, each CCL has an IP address on the same subnet. This subnet should be
isolated from all other traffic and should include only the ASA CCL interfaces. In the case of the dual-fabric design,
the CCL must be extended between the data centers. To achieve this extension, the CCL links of all the devices
are placed in the same EPG and bridge domain in the Cisco ACI fabrics, and this EPG and bridge domain are
extended over the DCI links.
To help ensure CCL operation, the RTT between devices must be less than 20 milliseconds (ms). This maximum
latency enhances compatibility with cluster members installed on different sites. To check the latency, enter a ping
command on the CCL between devices.
For intersite clustering, you need to size the DCI bandwidth appropriately. You should reserve bandwidth on the
DCI for CCL traffic equivalent to the following calculation:
If the number of members differs at each site, use the larger number for your calculation.
The minimum bandwidth for the DCI should not be less than the size of the CCL for one member. For example, for
four cluster members total at two sites, with two members at each site, you need a 5-Gbps CCL per member:
For intersite clustering, do not configure connection rebalancing. You do not want connections rebalanced to
cluster members at a different site. The cluster implementation does not differentiate between members at multiple
sites for incoming connections. Therefore, connection roles for a given connection may span sites. This behavior is
expected.
VXLAN and back-to-back vPC were validated as part of the Cisco ACI dual-fabric design. However, OTV is also
supported as a DCI option.
Note: At the time of this writing, OTV is still the recommended DCI solution because of its level of maturity and
the specific functions it offers in this environment.
Figure 39 shows the physical connectivity when VXLAN is used as the DCI option.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 44 of 132
Figure 39. VXLAN as the DCI Option
A double-sided vPC is used between the Cisco ACI leaf switches and a pair of Cisco Nexus 9300 platform
switches used for DCI: that is, both the Cisco ACI leaf switches and the Cisco Nexus 9000 Series Switches running
in NX-OS mode run vPC. To provide increased resiliency and bandwidth, four links are bundled in the same vPC,
and LACP is used on the vPC on both sides.
The Cisco Nexus 9000 Series Switches running in NX-OS mode extend Layer 2 between the sites using VXLAN.
The software validated on those switches is Cisco NX-OS Release 7.0(3)I2(2a).
Between the Cisco Nexus 9000 Series Switches running in NX-OS mode, three links are recommended. Two links
are bundled in a port channel to form the vPC peer link. The third link (192.168.1.0/24 in the topology) is a Layer 3
routed link used to prevent a DCI network isolation scenario. In other words, if a Layer 3 uplink from the DCI
switches to the DCI Layer 3 network fails, then this link between the DCI switches is used as the alternative path to
protect the DCI switches from being isolated.
Note: Alternatively, you can create this Layer 3 peering between the Cisco Nexus 9000 Series Switches by
using a dedicated VLAN carried on the vPC peer link (with corresponding switch virtual interfaces [SVIs] defined on
the two switches).
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 45 of 132
To reach the other site, each DCI switch has a Layer 3 routed uplink to the WAN or DCI network that leads to the
remote site (in the topology in Figure 30, the DCI WAN network is represented by the DCI router). The use of two
DCI routers is recommended, with each DCI switch connected to a different DCI router so that there is no single
point of failure.
Because the traffic between the sites is VXLAN encapsulated using unicast, one or multiple Layer 3 hops between
the sites are supported.
WAN Connectivity
For connectivity from the Cisco ACI fabrics in the data centers to the WAN (remote sites), both data centers have a
local WAN exit point with one or more WAN edge routers.
For connectivity between the Cisco ACI fabrics and the WAN edge routers in each site, each WAN edge router
uses a local port channel to connect to both Cisco ACI border leaf nodes (LACP is used to allow dynamic bundling
of the physical links in the port channel).
The ports on the Cisco ACI fabric connected to the WAN edge router are statically bonded to an EPG, hence
providing Layer 2 services between the WAN edger routers and the ASA firewalls as well as connecting the WAN
edge routers in the two data centers through the DCI links. Figure 40 shows the physical connection.
Figure 40. Cisco ACI Fabric Connection to the WAN Edge Router
Any router that supports LACP, Layer 3 port channels, OSPF, BGP, and optionally VRF (when multitenancy is
required) can be used as the WAN edge router in the Cisco ACI dual-fabric design.
From a logical perspective, when you connect the WAN edge router to the fabric, you are connecting a router to a
regular EPG port. In other words, the fabric is providing Layer 2 services between the WAN edge router and the
ASA firewall. No L3Out connection is established for this type of connectivity.
Note that when the WAN edge router is connected to a regular EPG port, Cisco Discovery Protocol and LLDP must
be disabled on the router or on the fabric port for the endpoint information to be learned, as shown in Figure 41.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 46 of 132
Figure 41. Cisco Discovery Protocol and LLDP Required Configuration
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 47 of 132
The default gateway for the 10.100.14.0/24 subnet is located in the Cisco ACI fabric in DC1 (because this subnet is
defined only at this site). First consider the routing table in Leaf3, where the source Spirent port is located:
i06-9396-03# show ip route vrf TnT-14:TnT-14NET 192.14.99.0/24
IP Route Table for VRF "TnT-14:TnT-14NET"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
From the output, you can see two paths to the destination subnet advertised through BGP (this is the MP-BGP
VPNv4 process used by the Cisco ACI border leaf nodes to advertise external IP prefixes to the fabric). The next-
hop IP addresses are the VTEP addresses of Leaf1 and Leaf2.
Following the path to the destination, here is the routing table on Leaf1:
i06-9396-01# show ip route vrf TnT-14:TnT-14NET 192.14.99.0/24
IP Route Table for VRF "TnT-14:TnT-14NET"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
The next-hop IP addresses you see here belong to the two ASA devices in DC1 and are learned through OSPF.
On the basis of the hashing, select either ASA1 or ASA2. The ASA device that is selected will become the flow
owner.
Note: If there is any asymmetry in the return traffic, the ASA cluster will redirect the return flow through the CCL
back to the ASA node that owns the flow (the node that originally created state information for the outbound flow).
Following the path to the destination, here is the routing table on the first ASA node in DC1:
DC1-ASA-1_i05-5585-01/TnT-14/master# show route 192.14.99.0
Routing Table: TnT-14
Routing entry for 192.14.99.0 255.255.255.0
Known via "ospf 1", distance 110, metric 1
Tag Complete, Path Length == 1, AS 100, , type extern 2, forward metric 10
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 48 of 132
Last update from 192.14.20.1 on outside, 42:14:25 ago
Routing Descriptor Blocks:
* 192.14.20.1, from 192.14.1.1, 42:14:25 ago, via outside
Route metric is 1, traffic share count is 1
Route tag 3489661028
The next-hop IP address you see here belongs to the local WAN edge router in DC1, which communicates external
routing information to the ASA through OSPF.
Note: The test topology uses one customer edge router. However, for production environments you should
deploy a pair of routers to provide resiliency in the design.
Following the path to the destination, here is the routing table on the local WAN edge router:
le06-2911-01_DC1#show ip route vrf TnT-14 192.14.99.0
The next-hop IP address on the path to the destination is the IP address of the WAN router. Notice that this is
information is learned through eBGP.
The destination subnet is locally attached on port Gi0/1/0: the Spirent traffic generator.
This concludes the trace of traffic from DC1 to the WAN router. In a production deployment the WAN router will be
connected to a WAN network and will route the traffic to the remote sites.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 49 of 132
Traffic from the WAN to DC1
For completeness, consider the routing information from the WAN back to DC1 (Figure 43).
Here is the routing information for the destination network in DC1 on the WAN router:
le06-2911-02_WAN#show ip route vrf TnT-14 10.100.14.0
The next hop points to the WAN edge router in DC1. The following shows the routing table on the WAN edge
router in DC1:
le06-2911-01_DC1#show ip route vrf TnT-14 10.100.14.0
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 50 of 132
Redistributing via bgp 100
Advertised by bgp 100 route-map OSPF-INTO-BGP
Last update from 192.14.20.102 on Port-channel10.1238, 1d21h ago
Routing Descriptor Blocks:
* 192.14.20.105, from 102.102.12.12, 1d21h ago, via Port-channel10.1238
Route metric is 20, traffic share count is 1
192.14.20.102, from 102.102.12.12, 1d21h ago, via Port-channel10.1238
Route metric is 20, traffic share count is 1
The destination network is reachable through two paths, represented by the ASA devices in DC1. On the basis of
the hashing, select one of the ASAs as the next-hop to the destination.
Note: As previously mentioned (and as shown in the example in Figure 43), for stateful traffic such as TCP, if
the flow owner is, for example, ASA1 in DC1, but the return traffic is hashed to ASA2, the firewall will use the CCL
to send the traffic back to the flow owner (ASA1).
The destination network is reachable through two paths: Leaf1 and Leaf2 of the Cisco ACI fabric in DC1. On the
basis of the hashing, select one of these leaf nodes as the next hop to the destination. Here is the routing table on
Leaf1:
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 51 of 132
Because the destination subnet is behind an anycast gateway, the destination subnet is shown as directly
attached, and the VTEP information shown is Spine1. The Cisco ACI fabric handles the delivery of packets to the
destination leaf and port using the endpoint information database.
The second destination (192.14.31.12) is a path to Leaf2 that Leaf1 learns from the OSPF peering used to
communicate with the ASA devices in DC1 (this path is not the preferred one and is not used).
You can trace the destination endpoint (the IP address behind the Spirent tester attached to Leaf3) from Leaf1 by
entering the following commands:
i06-9396-01# show system internal epm endpoint ip 10.100.14.101
This command returns the details of the interface on which the destination endpoint is located (Tunnel4). After you
know the interface number, you can verify the destination VTEP:
i06-9396-01# show interface Tunnel4
Tunnel4 is up
MTU 9000 bytes, BW 0 Kbit
Transport protocol is in VRF "overlay-1"
Tunnel protocol/transport is ivxlan
Tunnel source 10.0.192.95/32 (lo0)
Tunnel destination 10.0.192.92
Last clearing of "show interface" counters never
Tx
0 packets output, 1 minute output rate 0 packets/sec
Rx
0 packets input, 1 minute input rate 0 packets/sec
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 52 of 132
the traffic will be redirected to the ASA that owns the flow. A deterministic entry point for traffic destined for a
stretched subnet may also be desirable. You can, for example, treat one of the data centers as primary for this type
of traffic.
In the design validated, multipath is not enabled, and BGP is used to select the best path to the stretched subnet,
10.1.4.0/24.
Here is the output for the routing table on the WAN router:
Best-path selection is left at the default setting, and 192.14.10.1 (DC1) is selected based on the lowest neighbor IP
address:
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 53 of 132
Routed Traffic from DC1 to DC2
To analyze the traffic flows between IP subnets that exist only in DC1 (10.100.14.0/24) and IP subnets that exist
only in DC2 (10.200.14.0/24), start in DC1 Leaf3, where the traffic originates. Here is the routing information in the
Cisco ACI fabric in DC1 (the output from Leaf3):
i06-9396-03# show ip route vrf TnT-14:TnT-14NET 10.200.14.0
IP Route Table for VRF "TnT-14:TnT-14NET"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
The next-hop information you see here consists of the VTEP IP addresses of Leaf1 and Leaf2 (border leaf nodes),
where the Layer 3 DCI is connected.
Examine the routing table on one of the boarder leaf nodes (Leaf1):
i06-9396-01# show ip route vrf TnT-14:TnT-14NET 10.200.14.0
IP Route Table for VRF "TnT-14:TnT-14NET"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
Then next-hop IP addresses are Leaf1 and Leaf 2 (border leaf nodes) in DC2. This information comes from the
BGP peering that exists between the data centers.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 54 of 132
'%<string>' in via output denotes VRF <string>
The address 10.200.14.0/24 is local to DC2, so you can see that the subnet is locally attached.
● Deploy Layer 2 DCI technology across sites to extend Layer 2 network connectivity across the entire
system.
● Logically extend across sites the EPG to which the endpoints belong to classify intersite Layer 2 traffic and
properly apply security policies.
Note: As will be clarified in the discussion that follows, the name assigned to the EPG in each fabric
has only local significance (in the example in Figure 44, EP1 belongs to EPG Web1, and EP2 belongs to
EPG Web2, but they can be part of the same logical extended EPG).
This section presents the step-by-step procedure that allows successful intersite Layer 2 communication.
The first main step is to complete an ARP exchange between the endpoints connected to separate Cisco ACI
fabrics. For this discussion, consider the situation in which the two endpoints have just been connected to the
network and have not yet sent any packets (a probably unrealistic scenario, but useful to show how Cisco ACI can
handle this worst-case situation).
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 55 of 132
Figure 45 shows the sequence of steps required to send the ARP request between EP1 connected to Cisco ACI
Fabric 1 and EP2 connected to Cisco ACI Fabric 2.
1. EP1 generates a Layer 2 broadcast ARP request to resolve the MAC address of EP2 that is part of the same
IP subnet.
2. The local Cisco ACI leaf node receives the ARP request and uses it to learn the MAC and IP address
information of the locally connected EP1 (the Cisco ACI leaf node can look into the payload of the ARP
request to retrieve this information). Notice that the leaf node also classifies EP1 as part of the Web1 EPG
based on the IEEE 802.1Q tag used by the endpoint to send the ARP request to the leaf (alternatively, a
VXLAN tag can be used for the same purpose when you integrate the solution with AVS or Open vSwitch).
The MAC and IP information is then communicated to the spine nodes by using the COOP control plane, so
that it can be added to the spine hardware proxy database. Also, because EP2 is not known at this point, the
Cisco ACI leaf node encapsulates the ARP request and sends it to the anycast VTEP address that identifies
the proxy service on the spine nodes (each spine node can receive traffic sent to this address). The leaf node
inserts information about the source EPG Web1 in the VXLAN header.
3. The specific spine that receives the frame decapsulates the packet, and because EP2 is not known in the
database, it floods the ARP frame in the bridge domain. Note that for this flooding to occur, you must enable
ARP flooding in the bridge domain configuration, as shown in Figure 46.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 56 of 132
Note: Unknown unicast flooding (achieved by disabling the Hardware Proxy option) is required in case an
endpoint in Fabric 1 still has a valid local ARP cache entry to send traffic to another endpoint in the same IP
subnet but for some reason the destination endpoint information is not present anymore in the spine database.
This scenario is probably rare, so a configuration to remove Layer 2 flooding from the fabric could be used, by
enabling Hardware Proxy.
4. All the Cisco ACI leaf nodes on which the bridge domain is actively instantiated receive the VXLAN
encapsulated frame. One of the border leaf nodes decapsulates the frame and floods it out of the local
interfaces in the same bridge domain (and EPG), including the logical vPC connection used for Layer 2 DCI
communication. For this flooding to occur, you must use the VLAN-to-EPG mapping configuration discussed in
the section “Layer 2 Reachability Across Sites” section and shown in Figure 47.
The configuration in Figure 47 maps the Web1 EPG to which EP1 belongs to a path represented by the DCI
vPC connection on the border leaf nodes, specifying that VLAN tag 1220 is used when sending the traffic
outside the local fabric. This process logically extends the Web1 EPG to Cisco ACI Fabric 2.
5. One of the border leaf nodes in Fabric 2 receives the IEEE 802.1Q tagged frame from the Layer 2 DCI
connection. A configuration similar to the one shown in Figure 47 is applied to the local border leaf nodes, so
that packets tagged with VLAN 1220 are classified as part of the Web2 EPG representing the logical extension
of the EPG to which EP1 belongs in Fabric 1.
Note: As previously mentioned, the names assigned to the EPGs in Cisco ACI Fabric 1 and Fabric 2 have
only local significance. What is important is for the VLAN-to-EPG mapping to be consistent on both sides to
help ensure the logical extension of the EPG across sites.
The border leaf node receiving the packet learns the MAC and IP addresses of EP1. From the point of view of
Cisco ACI Fabric 2, EP1 appears to be locally connected to this border leaf node, so a COOP update is also
sent to the local spines to save this information in the hardware database. Because the border leaf node at this
point does not know yet the location of EP2, the packet is VXLAN encapsulated and sent to the anycast VTEP
of the local spine nodes. As before, the local border leaf node inserts the Web2 EPG value in the VXLAN
header.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 57 of 132
6. One of the spine nodes receives the frame, and because EP2 is not yet known in the spine proxy database, it
floods the ARP frame in the bridge domain, assuming that a setting similar to what is shown in Figure 46 is
also applied to the bridge domain in Fabric 2.
7. All the local Cisco ACI leaf nodes that have the specific bridge domain active receive the flooded ARP request
and learn EP1 MAC and IP address information. (The border leaf node in Fabric 2 is the next hop because it
encapsulated the packet received from the Layer 2 DCI connection before flooding it to the fabric.) The frame
is also flooded on all the local interfaces that are part of the bridge domain, so the frame also reaches the
intended destination EP2.
Figure 48 shows the sequence of events that allow EP2 to reply to the ARP request generated by EP1.
Figure 48. ARP Reply Delivered to the Endpoint in Cisco ACI Fabric 1
8. EP2 sends the ARP reply destined for the MAC address of EP1 that generated the request.
9. The Cisco ACI leaf node to which EP2 is connected receives the ARP reply and uses it to learn EP2 MAC and
IP address information and communicate it to the local spine nodes through COOP. The leaf node then
encapsulates the packet and sends it to the local border leaf nodes. At this point, Leaf6 already knows that
EP1 and EP2 have been locally discovered in Cisco ACI Fabric 2 as belonging to the same Web2 EPG, so
from a policy perspective, traffic is allowed.
10. The receiving border leaf node decapsulates the traffic and learns the EP2 information associated with the
local leaf node to which EP2 is connected. The border leaf node then performs a Layer 2 lookup and forwards
the packet to the Layer 2 DCI connection to Cisco ACI Fabric 1 (again, policy allows this communication
because EP1 and EP2 are part of the same Web2 EPG).
11. One of the border leaf nodes in Cisco ACI Fabric 1 receives the packet, decapsulates it, and learns EP2 as a
local device connected to the Layer 2 DCI interface. This event triggers a COOP update destined for the local
spine nodes. The border leaf node classifies EP2 as part of the locally defined Web1 EPG; performs a Layer 2
lookup; and encapsulates the traffic to the Proxy A VTEP address identifying the local spines (since no specific
information for EP1 are learned yet).
12. The spine nodes receiving the frame decapsulates it and then re-encapsulates it toward Leaf1. After
decapsulating it and performing a Layer 2 lookup, the leaf node forwards the packet to the locally connected
EP1 device.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 58 of 132
After the ARP exchange is completed, all the leaf nodes have the proper information to allow successful Layer 2
data-plane communication between EP1 and EP2: the second main step. As shown in Figure 49, VXLAN
encapsulation is used for communication within the fabric (at both sites) between the computing leaf node and the
border leaf node, whereas hand-off to a VLAN is performed to carry the traffic between sites through the deployed
Layer 2 DCI solution.
1. EP1 generates a data packet with the MAC and IP destination addresses of EP2.
2. The local leaf node receives the packet and classifies it as part of the Web1 EPG. The local leaf performs the
Layer 2 lookup for the EP2 MAC address, determines that the packet needs VXLAN encapsulation, and send
it to the local border leaf nodes. From a policy perspective, the leaf knows that EP2 belongs to the same Web1
EPG as EP1 (this information was previously learned on the data plane from the ARP reply).
3. One of the local border leaf nodes receives the packet, decapsulates it, and performs a Layer 2 lookup,
determining the need to forward the traffic through the DCI vPC connection to EP2.
4. One of the border leaf nodes in Fabric 2 receives the packet. The traffic is classified as part of the Web2 EPG
based on the static VLAN-to-EPG mapping previously described. Traffic destined for EP2 thus is allowed by
the policy, so after the Layer 2 lookup, the packet is VXLAN encapsulated and sent to Leaf6.
5. The leaf receives the packet, decapsulates it, performs a Layer 2 lookup, and forwards it locally to EP2.
As previously mentioned, the initial assumption for the design discussed in this document is that each EPG is
deployed as part of a separate bridge domain. The consequence is that endpoints that belong to separate EPGs
are deployed in different IP subnets. Therefore, communication between them is always routed.
To enable this routed communication between endpoints connected to different Cisco ACI fabrics, you must
perform the following two main steps:
1. Configure a consistent default gateway for bridge domains that are extended across sites.
2. Establish Layer 3 peering between the two Cisco ACI fabrics to allow routed communication between bridge
domains (IP subnets) that are not extended across sites.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 59 of 132
As discussed in the introductory section of this document, the goal is to provide a distributed default gateway
function, so that independent of the site to which an endpoint is connected, a local default gateway on the
endpoint’s directly connected Cisco ACI leaf node is always available. Deploying a distributed default gateway
across independent Cisco ACI fabrics requires some configuration coordination to help ensure that a common
MAC address and IP address can be assigned to the gateway.
Cisco ACI Software Release 11.2 and later add a new option for the bridge domain configuration. This option
assigns a virtual MAC (vMAC) address for the fabric to use when replying to ARP requests for the default gateway
(Figure 50).
Figure 50. MAC and IP Address Configuration for an Extended Bridge Domain in Cisco ACI Fabric 1
Figure 50 shows the required bridge domain configuration for the IP and MAC addresses:
● Custom MAC Address: This is the physical MAC address associated with the SVI that is created on the
Cisco ACI leaf nodes to perform the default gateway functions. This MAC address is never used in ARP
reply messages sent to locally connected endpoints and should be unique per site.
● Virtual MAC Address: This is the MAC address value that is returned in the ARP replies sent from leaf
switches to local endpoints and is the MAC that the endpoints use when sending traffic to the fabric traffic
that needs to be routed. A common value must be configured in separate Cisco ACI fabrics for the bridge
domains that must be extended across sites.
● Subnets: Two IP addresses are usually configured here. One (Primary IP Address = True) is a unique IP
address associated with the SVI deployed on the Cisco ACI leaf nodes on which this specific bridge domain
is deployed. The second IP address (Virtual IP = True) is the common (shared) default gateway IP address
consistently configured across fabrics and used by the connected endpoints.
The endpoints connected to this bridge domain thus are configured with the virtual IP address as the default
gateway, and when an ARP request for the gateway is sent, the leaf replies with the virtual MAC address
information. Thus, to enable transparent endpoint mobility across sites, you need to configure matching values in
the bridge domain deployed in Cisco ACI Fabric 2, as shown in Figure 51.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 60 of 132
Figure 51. MAC and IP Address Configuration for an Extended Bridge Domain in Cisco ACI Fabric 2
Note: The custom MAC address for all the Layer 3 interfaces deployed in the Cisco ACI fabric has a default
value of 00:22:BD:F8:19:FF (shown earlier in Figure 50). This setting also applies to Layer 3 interfaces deployed in
different Cisco ACI fabrics. As a consequence, when interconnecting Cisco ACI fabrics using a Layer 2 DCI
solution, you have to modify the MAC address on one side for all the bridge domains that must be extended across
sites to avoid confusing the Layer 2 DCI devices deployed between the Cisco ACI fabrics. (In the example in
Figure 51, the Custom MAC address was assigned the value 00:22:BD:F8:19:02.)
After the distributed default gateway is consistently deployed across sites, Layer 3 communication can successfully
begin. This section considers two specific scenarios:
● Layer 3 communication between endpoints that belong to bridge domains that are extended across Cisco
ACI fabrics
● Layer 3 communication between endpoints that are part of bridge domains that are locally defined on
separate Cisco ACI fabrics
Note: In contrast to the discussion of Layer 2 communication, the discussion here assumes that endpoints have
already been discovered (that is, that endpoints generated some ARP traffic that populated the tables on the leaf
nodes in both fabrics).
To establish Layer 3 peering between separate Cisco ACI fabrics, the solution discussed in this document
proposes the establishment of eBGP sessions between the two pairs of border leaf nodes deployed in each Cisco
ACI fabric (Figure 52).
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 61 of 132
As shown in Figure 52, a full mesh of eBGP sessions is established between the border leaf nodes that are part of
the different Cisco ACI fabrics. Although the focus of the testing described in this document is on IPv4
communication, the same control plane can be used to exchange IPv6 prefixes for the various tenants deployed
across the sites.
The different border leaf nodes peer with each other as four routers on the same VLAN segment extended across
sites through the Layer 2 DCI solution of choice. This behavior is possible because the Cisco Nexus 9000 Series in
Cisco ACI mode supports the establishment of dynamic routing peering over a vPC connection. (vPC is used by
each pair of border leaf nodes to connect to the DCI functional block.)
For the configuration, the Layer 3 peering between sites requires the definition of a L3Out connection. Figure 53
shows the L3Out (L3-out-BGP) configuration defined for specific tenant TnT-14 on the border leaf nodes in Cisco
ACI Fabric 1.
Figure 53. Tenant-Specific L3Out for Layer 3 Peering Between Cisco ACI Fabrics
The configuration of L3Out shown in Figure 53 specifies the Cisco ACI devices (node-101 and node-102) that are
used to establish the eBGP peering with the second Cisco ACI fabric. The interface associated with the L3Out
connection is the same vPC logical connection used for Layer 2 communication across sites. A specific VLAN
(1239) is used to tag traffic that is routed through the L3Out connection. The border leaf nodes in Cisco ACI Fabric
1 use SVIs in that Layer 2 segment to establish eBGP peering with the remote border leaf nodes (Figure 54).
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 62 of 132
Figure 54. L3Out Logical Interface Configuration
An important configuration related to the L3Out DCI connection is the creation of the external network (External
EPG) used to represent the external destinations that can be reached through the L3Out connection (Figure 55).
Figure 55. Configuration of the External Network Associated with L3Out DCI
As shown in Figure 55, a prefix 0.0.0.0/0 is associated with the configured external EPG, so Cisco ACI classifies all
traffic received on this L3Out connection (mostly sourced from the remote data center site) as part of this specific
external EPG. Specific security policies (contracts) can hence be applied to all communication sent to internal
EPGs.
Figure 55 also shows the specific configurations associated with the external EPG:
● External Subnets for the External EPG: This configuration is used to classify traffic that is received through
this L3Out connection as part of this external EPG. In this specific example, all the traffic received (whatever
the source IP address) will be considered part of the external EPG and subject to the configured security
policies (contracts) for communication with to other EPGs defined in Cisco ACI Fabric 1.
● Export Route Control Subnet and Aggregate Export: This configuration helps ensure that all the external
prefixes learned for this tenant through different L3Out connections are advertised from this L3Out DCI
interface to the border leaf nodes in Cisco ACI Fabric 2. Without this setting, no external prefixes would be
advertised by default.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 63 of 132
Note: In this design, this configuration applies specifically to the external prefixes learned from the
L3Out connection to the firewall nodes local to Fabric 1.
Figure 56 shows the flag settings required to create the configuration just described and associated with the
0.0.0.0/0 subnet.
Similar considerations apply to the creation of the L3Out DCI on the border leaf nodes in Cisco ACI Fabric 2. After
these configurations have been completed, routing traffic can flow across sites.
You also should enable Bidirectional Forwarding Detection on the eBGP peering between border leaf nodes across
sites. BFD is a detection protocol designed to provide fast-forwarding path-failure-detection times for all media
types, encapsulations, topologies, and routing protocols. It helps reduce the outage experienced under various
failure scenarios (both link and switch based). The configuration of BFD is performed in two steps:
1. On L3Out, set the BFD flag for each remote configured eBGP peer, as shown in Figure 57.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 64 of 132
2. Create a BFD interface profile associated with the logical Interface defined for L3Out (Figure 58).
Figure 58. Creating a BFD Interface Profile Associated with the L3Out Logical Interface
You can usually use the default interface profile, with the specific values shown in Figure 59.
As shown earlier in Figure 21 and Figure 22, there are two scenarios for routed communication across sites:
● Routing between endpoints that belong to IP subnets that are stretched across sites: In this case, the
routing always occurs locally in the Cisco ACI fabric in which the source endpoint is located. Then traffic is
sent to the destination in the second Cisco ACI fabric using the Layer 2 path across sites. This process is
similar to the process discussed earlier in the section “Deploying Layer 2 Connectivity Between Sites.”
● Routing between endpoints belonging to IP subnets that are not stretched between Cisco ACI fabrics: In
this case, the Layer 3 peering established across sites represents the Layer 3 path required to establish
communication. Figure 60 shows the steps required to enable this communication.
Note: For simplification, the discussion here assumes that EP1 and EP2 have already been discovered
within each Cisco ACI fabric.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 65 of 132
Figure 60. Routing Between Endpoints in Nonstretched IP Subnets
● The IP subnet associated with the bridge domain to which the EPG belongs is marked for external
advertisement (Figure 62).
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 66 of 132
Figure 62. Configuring an IP Subnet to Be Advertised Externally
This configuration is required on both Cisco ACI fabrics to help ensure a successful exchange of IP prefixes
associated with IP subnets that are not stretched across sites.
3. One of the two local border leaf nodes receives the packet, decapsulates it, and performs a Layer 3 lookup,
finding a match for the IP subnet to which the destination belongs. The IP subnet advertisement is received
through the border leaf nodes deployed in the remote Cisco ACI fabric, so traffic is routed through the Layer 3
DCI infrastructure to one of the two next-hop border leafs.
Note: Currently, host-route advertisement outside the Cisco ACI fabric is not supported, and only IP
prefixes for the endpoint subnets can be advertised out.
4. One of the border leaf nodes in Fabric 2 receives the packet and performs a Layer 3 lookup. Assuming that no
traffic was previously sent to EP2, no specific information will be programmed in the border leaf tables, so the
packet is sent to one of the spine nodes (encapsulated to the proxy VTEP address).
5. The receiving spine node decapsulates the traffic, finds the information for EP2 in the local database (because
the initial assumption was that the endpoints had already been discovered), and encapsulates the packet to
Leaf6.
6. Leaf6 decapsulates the packet and forwards it to EP2.
Figure 63 shows how traffic is then sent back to EP1.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 67 of 132
Figure 63. Routing Between Endpoints in Nonstretched IP Subnets (Return Traffic)
7. EP2 sends a packet destined for EP1 IP address. Leaf6 performs a Layer 3 lookup and encapsulates traffic to
the local border leaf nodes (selecting Layer 3 or Layer 4 as the next hop).
8. One of the local border leaf nodes receive the packet, decapsulates it, and performs a Layer 3 lookup, finding
the information for the IP subnet destination learned through the eBGP session with the remote border leaf
nodes. The traffic is then routed through the Layer 3 DCI connection.
9. One of the border leaf nodes in Cisco ACI Fabric 1 receives the packet, performs a Layer 3 lookup, and
encapsulates it to Leaf1, following specific host-route information. (EP1-specific information was learned from
the previous traffic destined for Cisco ACI Fabric 2.)
10. Leaf1 receives the packet, decapsulates it, and sends it to the EP1 destination.
VMware vSphere Release 6.0 and later offers a solution to provide live mobility support even under those
circumstances, allowing live vMotion migration for workloads across ESXi servers managed by separate vCenter
server instances. This is known as cross vCenter server vMotion.
Note: Cisco ACI also supports integration with Microsoft Hyper-V and with OpenStack, but at the time of this
writing these hypervisors do not support mobility across servers that are part of separate VMM domains.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 68 of 132
Figure 64. Integrating Multiple VMM Domains
Two (or more) separate VMM domains are created in each Cisco ACI fabric by peering the local APIC with the
local vCenter server. A separate DVS is pushed to the ESXi computing clusters deployed in the separate sites.
As shown in Figure 65, each APIC controller establishes a relationship with a different vCenter server (DC1-VC6-1
in Site 1 and DC2-VC6-2 in Site 2). This leads to the creation of two separate DVSs (DVS-DC1-VC-vDS and DVS-
DC2-VC-vDS), which are then pushed to the separate clusters of ESXi hosts managed by each vCenter server.
Note that the names of the port groups in the example in Figure 65 are identical. This is the case because the
application profiles (and associated EPGs) created on the two independent APIC clusters were named the same.
This naming scheme is recommended for operational ease, but it is not required to enable mobility across the two
separate domains.
Figure 66 shows how the information for the separate VMM domains is displayed on the vSphere Web Client.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 69 of 132
Figure 66. View of Separate VMM Domains in VMware vSphere 6.0
In Figure 66, you can see the two separate DVSs and the associated port groups.
As mentioned, the port groups are named the same because of the identical naming convention used for the EPGs
defined on the separate APIC clusters, but they are completely independent of each other. To verify this
independence, note that the VLAN tags associated with port groups are different because they are independently
negotiated between the APIC controllers and their local vCenter servers (Figure 67).
Figure 67. Identically Named Port Groups with Independent VLAN Tags
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 70 of 132
When migrating a virtual machine across ESXi servers managed by different vCenter servers, you must specify the
port group to which the virtual machine should be connected on the destination ESXi host, as shown in Figure 68.
Figure 68. Selecting the Virtual Machine Port Group on the Destination VMware ESXi Host
Note: The behavior discussed in this section differs from the behavior prior to vSphere 6.0. Prior to that release,
live migration was always performed in the context of the same port group of the same DVS and so required
consistent VLAN tagging and virtual machine attachment points across different ESXi hosts.
The design discussed in this document uses two physical ASA 5585-X appliances in each data center, for a total of
four devices in the cluster. Within each site, firewalls are attached to the Cisco ACI fabric using vPC technology.
Each firewall uses two vPCs: one for the data traffic and one for the cluster control protocol (CCP) traffic (the
cluster control link, or CCL). Figure 69 shows the setup.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 71 of 132
Figure 69. Cisco ASA Node Connection to the Cisco ACI Fabric
The data VLANs are not extended between the sites. The ASA cluster is deployed in routed mode with multiple
contexts using individual interfaces, and it is inserted in the data path using IP routing for the north-south traffic.
Specifically, OSPF is used as the routing protocol between each unit and its local Cisco ACI fabric.
The CCL VLAN and Layer 2 segment is extended between the sites. Cisco ACI fabrics provide a Layer 2 bridge
domain on a dedicated vPC for the CCL VLAN, which is then extended through DCI to the other site. Figure 70
shows this solution.
You can use the sample here as a reference to construct the configuration for all units by allocating a unique IP
address for each unit for the cluster interface. The IP address on the cluster interface on all units must be in the
same subnet: for example, 1.1.1.0/24 is used in this design.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 72 of 132
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 73 of 132
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 74 of 132
For the CCL, as shown earlier in Figure 70, the Cisco ACI fabric simply provides a Layer 2 service between the
vPCs on the fabric to connect to ASA Port-channel11 (the CCL on the ASA side). Then the CCL VLAN is extended
between the sites (Figure 71).
Figure 71. Static Binding to EPG for the Cisco ASA Failover VLAN
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 75 of 132
Figure 72. Logical Layer 2 Extension of the CCL Segment
After the configuration is applied, the ASA cluster is formed and becomes operational between the sites, as shown
here.
DC1-ASA-1_i05-5585-01/master# show cluster info
Cluster fw: On
Interface mode: individual
This is "DC1-ASA-1_i05-5585-01" in state MASTER
ID : 1
Version : 9.5(1)
Serial No.: JAD1928006U
CCL IP : 1.1.1.1
CCL MAC : 80e0.1d58.8608
Last join : 16:04:54 UTC Jan 16 2016
Last leave: N/A
Other members in the cluster:
Unit "DC2-ASA-2_E05-asa5585x-02" in state SLAVE
ID : 2
Version : 9.5(1)
Serial No.: JAD170600RE
CCL IP : 1.1.1.4
CCL MAC : acf2.c5f2.c5f0
Last join : 15:33:14 UTC Jan 16 2016
Last leave: N/A
Unit "DC2-ASA-1_i05-5585x-02" in state SLAVE
ID : 3
Version : 9.5(1)
Serial No.: JAD1928009Y
CCL IP : 1.1.1.3
CCL MAC : 54a2.747c.b668
Last join : 15:21:08 UTC Jan 16 2016
Last leave: 15:15:05 UTC Jan 16 2016
Unit "DC1-ASA-2_E05-5585x-01" in state SLAVE
ID : 4
Version : 9.5(1)
Serial No.: JAD170900KX
CCL IP : 1.1.1.2
CCL MAC : acf2.c5f2.c584
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 76 of 132
Last join : 16:49:46 UTC Feb 1 2016
Last leave: 16:36:46 UTC Feb 1 2016
Each unit in the ASA cluster has an inside and an outside interface, with each interface assigned a unique IP
address.
Each ASA unit forms an OSPF neighborship with the local Cisco ACI border leaf nodes (nodes deployed in the
same data center site) through the inside interface. Each unit also forms an OSPF neighborship with the local edge
router through the outside interface.
The following listing is the actual configuration for a tenant context (VRF instance), in this case named TnT-14.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 77 of 132
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 78 of 132
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 79 of 132
With this configuration, each ASA unit receives a unique address in the inside and outside interfaces, as shown
here.
DC1-ASA-1_i05-5585-01/TnT-14/master# cluster exec show interface summary
DC1-ASA-1_i05-5585-01(LOCAL):*****************************************
Interface outside "outside", is up, line protocol is up
MAC address a249.0602.0004, MTU 1500
IP address 192.14.20.102, virtual IP 192.14.20.100, subnet mask
255.255.255.0
Interface inside "inside", is up, line protocol is up
MAC address a249.0602.0002, MTU 1500
IP address 192.14.31.102, virtual IP 192.14.31.100, subnet mask
255.255.255.0
DC2-ASA-2_E05-asa5585x-02:********************************************
Interface outside "outside", is up, line protocol is up
MAC address a249.0604.0004, MTU 1500
IP address 192.14.20.103, subnet mask 255.255.255.0
Interface inside "inside", is up, line protocol is up
MAC address a249.0604.0002, MTU 1500
IP address 192.14.31.103, subnet mask 255.255.255.0
DC2-ASA-1_i05-5585x-02:***********************************************
Interface outside "outside", is up, line protocol is up
MAC address a249.0606.0004, MTU 1500
IP address 192.14.20.104, subnet mask 255.255.255.0
Interface inside "inside", is up, line protocol is up
MAC address a249.0606.0002, MTU 1500
IP address 192.14.31.104, subnet mask 255.255.255.0
DC1-ASA-2_E05-5585x-01:***********************************************
Interface outside "outside", is up, line protocol is up
MAC address a249.0608.0004, MTU 1500
IP address 192.14.20.105, subnet mask 255.255.255.0
Interface inside "inside", is up, line protocol is up
MAC address a249.0608.0002, MTU 1500
IP address 192.14.31.105, subnet mask 255.255.255.0
To form the OSPF neighborship between the ASA units and the Cisco ACI fabrics, the Cisco ACI fabric is
configured with an external routed network, also known as L3Out, within the associated tenant.
The screenshots shown in Figure 74 through 77 are from an APIC in Data Center 2. The same configuration is
applied in the APIC in Data Center 1 (using a different IP address).
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 80 of 132
Figure 74. L3Out Connection Between the Cisco ACI Fabric and the Cisco ASA Nodes
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 81 of 132
Figure 76. L3Out: Defining Logical Interface Profiles
With this configuration, the ASA forms an OSPF neighborship with the Cisco ACI fabric on its inside interface. As
shown in the following output, each ASA has three OSPF neighbors on its inside interface. For example, ASA DC1-
ASA-1_i05-5585-01 forms an OSPF neighborship with the two local border leaf nodes (192.14.31.11 and
192.14.31.12) as well as with the other ASA (192.14.31.105) in the same data center.
Each ASA also forms an OSPF neighborship with the local WAN edge router on its outside interface, 192.14.20.1
in Data Center 1, as well with the other ASA in the same data center (192.14.20.105) as seen by DC1-ASA-1_i05-
5585-01.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 82 of 132
The following output shows the OSPF neighbors for each ASA unit in the cluster.
DC1-ASA-1_i05-5585-01/TnT-14/master# cluster exec show ospf neighbor
DC1-ASA-1_i05-5585-01(LOCAL):*****************************************
Neighbor ID Pri State Dead Time Address Interface
101.101.11.11 1 2WAY/DROTHER 0:00:02 192.14.31.11 inside
102.102.12.12 1 FULL/BDR 0:00:02 192.14.31.12 inside
192.14.31.105 1 FULL/DR 0:00:02 192.14.31.105 inside
192.14.1.1 10 FULL/DR 0:00:02 192.14.20.1 outside
192.14.31.105 1 FULL/DROTHER 0:00:02 192.14.20.105 outside
DC2-ASA-2_E05-asa5585x-02:********************************************
Neighbor ID Pri State Dead Time Address Interface
192.14.31.104 1 2WAY/DROTHER 0:00:02 192.14.31.104 inside
201.201.11.11 1 FULL/DR 0:00:02 192.14.31.13 inside
202.202.12.12 1 FULL/BDR 0:00:02 192.14.31.14 inside
192.14.2.1 1 FULL/DROTHER 0:00:02 192.14.20.2 outside
192.14.31.104 1 FULL/DR 0:00:02 192.14.20.104 outside
DC2-ASA-1_i05-5585x-02:***********************************************
Neighbor ID Pri State Dead Time Address Interface
192.14.31.103 1 2WAY/DROTHER 0:00:02 192.14.31.103 inside
201.201.11.11 1 FULL/DR 0:00:02 192.14.31.13 inside
202.202.12.12 1 FULL/BDR 0:00:02 192.14.31.14 inside
192.14.2.1 1 FULL/DROTHER 0:00:02 192.14.20.2 outside
192.14.31.103 1 FULL/BDR 0:00:02 192.14.20.103 outside
DC1-ASA-2_E05-5585x-01:***********************************************
Neighbor ID Pri State Dead Time Address Interface
101.101.11.11 1 FULL/DROTHER 0:00:02 192.14.31.11 inside
102.102.12.12 1 FULL/BDR 0:00:02 192.14.31.12 inside
192.14.31.102 1 FULL/DROTHER 0:00:02 192.14.31.102 inside
192.14.1.1 10 FULL/DR 0:00:02 192.14.20.1 outside
192.14.31.102 1 FULL/BDR 0:00:02 192.14.20.102 outside
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 83 of 132
WAN Integration Considerations
The previous section discussed how the ASA firewall establishes connectivity with the Cisco ACI fabric and the
WAN edge routers. This section discusses how the WAN edge routers peer with the WAN devices to allow external
communication into the Cisco ACI fabric.
Figure 78 shows the routing protocol peering sessions established in the validated solution documented here.
Figure 78 shows only one node per site, whether a firewall or a WAN edge router, but in reality you should
duplicate the nodes per site to help ensure full redundancy.
As previously discussed, peering between the Cisco ACI Fabric in DC 1 and the Cisco ACI Fabric in DC 2 using
eBGP is required to allow east-west communication between subnets that are localized to only on one fabric (not
stretched across the entire system).
Peering between the firewall and the WAN edge router uses OSPF to meet convergence requirements.
Note that the WAN edge routers also have an eBGP peering between them. This peering is established over a
Layer 2 path offered by Cisco ACI through the DCI connection. This peering is optional and is required only if one
of the sites could be isolated from the WAN in the event of a major service provider failure. If each site has a dual
connection to a pair of service providers, then this failure scenario becomes improbable.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 84 of 132
Figure 79. Ingress and Egress Traffic Paths for Localized IP Subnets
As previously discussed, one of the main reasons to use a Cisco ACI dual-fabric design is the capability to stretch
IP subnets across separate sites. Traffic originating from an endpoint connected to those stretched subnets always
will follow the optimal path because of the existence of a local default gateway. Figure 80 shows this behavior.
For inbound traffic coming from the WAN, one option is to not perform any optimization and to let the WAN select
the best path to reach the stretched IP prefix, whether the destination workload is located in DC1 or DC2.
Note: At the time of this writing, Cisco ACI does not offer the capability to advertise host-route information
through L3Out.
If for operational reasons you want to ensure that inbound traffic for extended subnets always enters through DC1,
then you can advertise those routes in the WAN with a better metric through the WAN edge router connection in
Site 1. You can achieve this goal in several ways. The specific validated scenario prepends the autonomous
system path (AS-Path) because BGP is the routing protocol used to peer with the WAN, as discussed later in this
section.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 85 of 132
Figure 81 shows the inbound traffic behavior after this configuration.
After the traffic ingress via data center 1, if the destination workload is local to DC 1 it will be delivered directly to it.
If the destination workload is located in data center 2 but the firewall state is present on the firewall in data center
1, then the firewall function is performed in data center 1 and the traffic is then sent to data center 2 via the layer 2
connection (DCI).
On the right side of Figure 81 it shows the case where the traffic enters via data center 1 but the state for the
connection is owned by the firewall in data center 2. In this case the firewall in data center 1 redirects the traffic to
the firewall in data center 2 via the ASA cluster CCL link that has been previously extended over the DCI links.
On this validated design, to improve convergence after a failure, OSPF is used on the ASA firewalls while on the
WAN side, BGP is used. The WAN edge router interconnects the two routing protocols.
The main goal for the WAN is to help ensure reachability through backup paths to the pair of data centers. The
WAN also needs to help ensure some form of inbound traffic localization when the destination is a subnet stretched
across sites, and this goal can be difficult to achieve. Cisco ACI does not allow /32 host routing, so inbound
optimization based on advertisement of specific host routes is not possible.
However, such optimization can be performed in several ways. This section discusses the solution that was
validated.
All WAN edge routers require peering between each other. This peering helps ensure that if isolation occurs at the
WAN edge, traffic still can flow from the other site. The peering is implemented over a stretched VLAN that joins all
WAN edge routers; and the EPG, bridge domain, and VLAN DCI are created to allow the DC1 WAN edge to
connect to the DC2 WAN edge.
In this document, each data center is in a different BGP autonomous system (AS). Cross-site peering between
WAN edges thus uses eBGP.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 86 of 132
As said, it is important to ensure optimized access to local subnets that are not stretched across sites. You must
differentiate traffic advertised by one OSPF instance from another OSPF instance. To differentiate traffic, you use
an OSPF tag to control redistribution in BGP. You also use the tag to prepend the AS-Path to control traffic coming
back to the Cisco ACI dual sites. Initially Cisco ACI fabric creates this tag when L3Out DCI eBGP peering is
distributed to OSPF. Therefore, only the subnets isolated on the other fabric receive such a tag, and local subnets
are not tagged. When you append this tag as the BGP AS-Path, you help ensure that this suboptimal path is used
only in the event of a failure.
To better understand the BGP setup, refer to the BGP best-path selection algorithm described here:
http://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/13753-25.html.
This section now analyzes the routing announcement from the data center.
By default, the OSPF tag is set in NX-OS. You need to add it to the BGP path.
router bgp 100
address-family ipv4 vrf TnT-14
redistribute ospf 14 route-map OSPF-INTO-BGP
Then localization appears, shown here as seen from the remote site:
le06-2911-02_WAN#sh ip bgp vpnv4 all
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 14:14 (default for vrf TnT-14)
* 10.1.4.0/24 192.14.11.1 20 0 200 ?
*> 192.14.10.1 20 0 100 ?
* 10.100.14.0/24 192.14.11.1 1 0 200 4294967295 i
*> 192.14.10.1 20 0 100 ?
*> 10.200.14.0/24 192.14.11.1 20 0 200 ?
* 192.14.10.1 1 0 100 4294967295 i
The subnet 10.100.14.0/24 is deployed only on Cisco ACI Fabric 1. It is known from both WAN edges, but the
preferred path is the Site 1 WAN edge router, because it has the shortest autonomous system path.
The stretched subnet 10.1.4.0/24 can be known through Site 1 and through Site 2.
Here’s how to avoid a routing redistribution loop. Because the Cisco ACI fabrics have an eBGP peering between
them, creating an eBGP peering between the WAN edges would create a loop. Normally, in BGP this loop would
be avoided using the autonomous system path, but in the proposed design the ASA firewalls use OSPF, leading
the Cisco ACI and the WAN edge to redistribute between OSPF and BGP, breaking the autonomous system path
rule. To get around such a mutual redistribution, the BGP community can be used to control advertisement
between sites.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 87 of 132
OSPF advertisements are redistributed with a local community, in this specific example 100:100, and the eBGP
peering between the WAN edge devices is configured to drop all received routing advertisements carrying such
community value.
le06-2911-03_DC2#sh run | sec bgp
router bgp 200
address-family ipv4 vrf TnT-14
neighbor 192.14.15.1 send-community both
neighbor 192.14.15.1 route-map BGP-IN-INTER_SITE in
Now look at the traffic egressing the data center. The Cisco ACI fabric is peering using OSPF with the ASA, which
is peering with the local WAN edge routers. The WAN edge router is peering with the remote router, announcing
the remote address locally to the ASA.
The exit path from the data center is then taken by the Cisco ACI, which has two L3Out paths: one through the
local ASA and WAN edge router, and one through the eBGP peering with the other site’s Cisco ACI fabric. By
default, eBGP has a better administrative distance than OSPF, and so the traffic would cross the interfabric link
before reaching the WAN, which is not what is wanted. So you need to change the administrative distance of the
eBGP in the Cisco ACI fabric to a number higher than OSPF administrative distance (a higher numbers means
lower preference) to give preference to the local OSPF routes via the ASA (Figure 82).
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 88 of 132
When traffic reaches the local WAN edge, it will use the WAN network and not go through the other data center
site through the inter-WAN edge peering, because traffic will exit through the shortest autonomous system path.
le06-2911-01_DC1#sh ip bgp vpnv4 all
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 14:14 (default for vrf TnT-14)
* 192.14.99.0 192.14.15.2 4294967290 0 200 300 i
*> 192.14.10.2 0 4294967295 0 300 i
The VXLAN feature used in this DCI document is very simple. VXLAN is a powerful technology that provides Layer
2 and Layer 3 overlays and multiple features such as the anycast gateway and ARP suppression. However, for the
Cisco ACI dual-fabric design discussed here, the only real requirement is Layer 2 VLAN extension (Layer 3
functions are not needed).
Several steps are needed to set up Layer 2 connectivity across sites through VXLAN:
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 89 of 132
1. Enable required features.
The solution requires MP-BGP EVPN for the control plane, VXLAN for the data-plane encapsulation, vPC (with
LACP enabled) to connect the Cisco ACI border leaf nodes to the local VXLAN devices (Nexus 9000 in NX-OS
mode), and BFD to control the underlay convergence across the VXLAN core. On the VXLAN leaf nodes in
both sites, you need to configure the following:
nv overlay evpn
feature bgp
feature vn-segment-vlan-based
feature lacp
feature vpc
feature lldp
feature bfd
feature nv overlay
2. Create the underlay network.
You need IP connectivity between sites. This connectivity requires a routing protocol to be enabled, and this
design depends on the core routing. The design proposed in this document assumes that the core offers a
BGP connection, which may be the case if the core is an independent global network or an MPLS solution
from a service provider. It can also be the case that if you are using a simple fiber or DWDM network if you
want to increase site independence for the control plane with two different autonomous systems connected by
eBGP.
You should use a different loopback for the underlay and for the overlay. In the process of recovering from
a node-down event, the overlay loopback is maintained in the down state until the expiration of a specific
delay timer (180 seconds is set by default). Sharing a common loopback would hence also delay the
reestablishment of routing connectivity in the underlay network (in addition to other services, such as
TACACS, which normally require connectivity to the loopback interface). Below is the configuration for the
loopback for the underlay network, the config for the loopback for the overlay network is shown in the
overlay section later on in the document.
interface loopback0
description Loopback for BGP peering
ip address 11.11.11.11/32
b. Configure DCI links and the core network (Figure 83).
In the testing performed, the core network is very simple. Each VXLAN DCI device has only two ports to
the core. One port (e1/45) is connected to the long-distance network (it could be either an IP and MPLS
network or a Fiber and DWDM network), and a second port (e1/46) is connected to the other DCI node at
the same site, as the backup path. In reality, this alternative port can be a VLAN on the vPC peer link; you
do not need to use a dedicated physical port.
The MTU must be changed on these ports to accommodate VXLAN tunnel encapsulation over a Layer 2
frame, which is an 50 additional bytes. The Cisco Nexus 9000 Series do not support fragmentation, so
you must have a core network that supports this frame size.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 90 of 132
Figure 83. VXLAN Frame Format
In this testing, a jumbo MTU size was configured for all the Layer 3 interfaces along the path.
interface Ethernet1/45
description Core underlay link
no switchport
mtu 9216
ip address 192.168.2.11/24
no shutdown
!
interface Ethernet1/46
description Core underlay backup path
no switchport
mtu 9216
ip address 192.168.1.11/24
no shutdown
You then must enable a routing protocol to help ensure connectivity and backup. In this testing, eBGP is
used to help ensure that separate autonomous systems can be deployed in different sites.
One recommended option is to enable is BGP dampening. With this option enabled, if a long-distance link
starts to flap—which is a realistic case when you use a service provider DWDM connection—after several
flaps the link will get a penalty and will be dropped from the BGP routing table.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 91 of 132
dampening
network 11.11.11.11/32
neighbor 192.168.1.12
remote-as 100
address-family ipv4 unicast
next-hop-self
neighbor 192.168.2.1
bfd
remote-as 300
update-source Ethernet1/45
address-family ipv4 unicast
next-hop-self
The DCI core design can have several options, depending on the use of Interior Gateway Protocol (IGP)
or BGP. The tested solution uses a eBGP design and so requires the Next-Hop-Self feature to help
ensure the reachability of the next hop on any path.
The underlay can easily be verified using ping and sh ip bgp. All peers should be up.
DC1-93-01_i05-9372-01# sh ip bgp
BGP routing table information for VRF default, address family IPv4 Unicast
BGP table version is 45, local router ID is 11.11.11.11
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-
best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist,
I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 92 of 132
ii. Configure BFD.
The long-distance links are usually ‘fragile’ links, especially when they use DWDM, so you should enable
BFD to verify end-to-end connectivity with a lite protocol that does not overload the switch’s CPU and
that allows a very fast trigger.
You can tune BFD with very fast hellos down to 50 ms, but the recommendation is to use timers with 3 x
150 ms, which leads to detection of link failure within 450 ms. That time is fast enough to reach
subsecond convergence after a DCI link failure.
bfd interval 150 min_rx 150 multiplier 3
bfd startup-timer 0
router bgp 100
neighbor 192.168.2.1 remote-as 300
bfd
Verify BFD neighborship as follows:
DC1-93-01_i05-9372-01# sh bfd neighbors
As mentioned in the underlay discussion, you should use a different loopback for the overlay. The overlay
loopback will be automatically shut down on node recovery, potentially affecting any other function associated
with it.
To help ensure backup and load balancing from the VXLAN core, use the anycast capability. Use of the same
IP address for the service on two nodes generates anycast. Here, the service is the VTEP, which is identical
on the pair of DCI nodes. Use a secondary IP address on both nodes for loopback 1 for the pair of DCI nodes.
The two DCI nodes (vPC peers) need to have the exact same secondary loopback IP address. They both
advertise this anycast VTEP address on the underlay network so that the upstream devices learn the /32 route
from both vPC VTEPs and can load-share VXLAN unicast encapsulated traffic between them.
interface loopback1
ip address 11.11.11.12/32
ip address 11.11.12.12/32 secondary
The primary address shown above is not used, because it was provisioned to enable routing on it. The
recommendation, however, is to use another loopback as discussed on the section “Create the underlay
network”.
The system uses the VNI, also called the VXLAN segment ID, along with the VLAN ID to identify the Layer 2
segments in the VXLAN overlay network.
Each VLAN is associated with one only Layer 2 VNI. A VLAN can have either global significance, which would
limit the number of VLANs to 4000, or port significance, which that allows up to 16 million VLANs. Of course,
there is a physical limitation on the total number of active VNIs that a switch can handle, and this value is
evolving and depends on the hardware used, the software loaded, and the testing recommendation.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 93 of 132
At the time of this testing, the Cisco Nexus 9300 platform is limited to 1000 active Layer 2 VNIs. This number
is evolving with new software and hardware releases, so check the latest updates. The configuration used in
the testing reported here is limited to 1000 extended VLANs between the Cisco ACI fabrics and uses the
global definition of a VLAN.
The VNI is defined using 24 bits. In this testing, 31000 was added to the VLAN number to create the VNI.
vlan 1,1001-2000
vlan 1001
vn-segment 31001
vlan 1002
vn-segment 31002
…
vlan 2000
vn-segment 32000
c. Configure the NVE tunnel interface.
Next create the NVE interface that is used as the VXLAN tunnel interface.
In this testing, because of the simplicity of the core DCI network, unicast is used to replicate Layer 2
broadcast, unknown unicast, and multicast traffic. In a dual-site scenario, there is not much advantage in using
multicast in the core given that multidestination traffic needs to be sent only to a remote pair of VXLAN
devices.
interface nve1
no shutdown
source-interface loopback1
host-reachability protocol bgp
member vni 31001
ingress-replication protocol bgp
member vni 31002
ingress-replication protocol bgp
…
member vni 32000
ingress-replication protocol bgp
The NVE interface relies on BGP for the host reachability advertisement, and it uses loopback1 as the VTEP.
In addition, the multicast replication occurs for any VTEP learned through BGP that has this VNI defined.
DC1-93-02_i05-9372-02# sh nve vni ingress
Interface VNI Replication List Source Up Time
--------- -------- ----------------- ------- -------
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 94 of 132
For details on how to configure VXLAN with a multicast-enabled underlay, please refer to the VXLAN Network
with MP-BGP EVPN Control Plane Design Guide available at
http://www.cisco.com/c/en/us/products/collateral/switches/nexus-9000-series-switches/guide-c07-734107.html
MP-BGP EVPN can transport Layer 2 information such as MAC addresses, and also Layer 3 information such
as host IP addresses (host routes) and IP subnets. For this purpose, it uses two forms of routing
advertisements:
● Type 2
◦ Used to announce host MAC and host IP address information for the endpoint directly connected to the
VXLAN fabric
◦ Extended community: router MAC address (for Layer 3 VNI) and sequence number
● Type 5
◦ Advertises IP subnet prefixes or host routes (associated, for example, with locally defined loopback
interfaces)
◦ Extended community: router MAC address, uniquely identifying each VTEP node
In the solution presented in this document, VXLAN is used only to extend a Layer 2 broadcast domain.
Therefore, BGP is using only Type-2 advertisement, carrying MAC address information without the IP
information populated.
First, each DCI node has to establish MP-BGP EVPN peering sessions with every remote DCI node, edge to
edge. Because each data center is in a different autonomous system, multihop MP-eBGP EVPN is used.
router bgp 65500
neighbor 21.21.21.21
remote-as 65501
update-source loopback0
ebgp-multihop 10
address-family l2vpn evpn
send-community both
route-map NEXT-HOP-UNCHANGED out
neighbor 22.22.22.22
remote-as 65501
update-source loopback0
ebgp-multihop 10
address-family l2vpn evpn
send-community both
route-map NEXT-HOP-UNCHANGED out
!
route-map NEXT-HOP-UNCHANGED permit 10
set ip next-hop unchanged
This peering also must be performed on all DCI nodes of other sites.
The multihop MP-eBGP must be able to cross the core network, because overlay edge-to-edge peering is
performed, so enough hops must be allowed.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 95 of 132
By default, eBGP advertisement enforces the local node’s update source address as the next hop for all
network layer reachability information (NLRI), but in VXLAN the next hop is used to advertise the VTEP that is
not the same IP as the loopback used for the MP-eBGP peering, so the next hop must not be changed in an
intermediate-node advertisement. Each eBGP speaker must implement an outbound route map to avoid next-
hop change for the EVPN address family as per the configuration above.
sh l2route evpn mac all
Topology Mac Address Prod Next Hop (s)
----------- -------------- ------ ---------------
1181 0004.1401.0006 Local Po2
1181 0004.1402.0001 BGP 21.21.22.22
Here, the leaf node has detected two different MAC addresses. One is locally connected and detected from
the local Cisco ACI fabric, and the other was learned on the remote VTEP, which is connected to the second
Cisco ACI fabric.
sh bgp l2vpn evpn 0004.1402.0001
Advertised path-id 1
Path type: external, path is valid, is best path, no labeled nexthop
Imported from
21.21.21.21:33948:[2]:[0]:[0]:[48]:[0004.1402.0001]:[0]:[0.0.0.0]/112
AS-Path: 65501 , path sourced external to AS
21.21.22.22 (metric 0) from 21.21.21.21 (21.21.21.21)
Origin IGP, MED not set, localpref 100, weight 0
Received label 31001
Extcommunity: RT:65500:31001 SOO:21.21.22.22:0 ENCAP:8
In this BGP NLRI, the next hop is 21.21.22.22, which is the remote VTEP, and the VNI is 31181.
The BGP advertisement for a MAC address appends the MAC address to a route distinguisher (rd) as
recommended by the EVPN address family standard, but VXLAN makes no use of this element. More
interesting is the route-target (RT) use. When a VTEP announces a MAC address (with its appended route
distinguisher), it associates an extended community with a discriminator that allows the remote VTEP to learn
the MAC address as part of a specific L2VNI (and consequently as part of the local VLAN mapped to it).
You can automatically generate this route target, but this would require including the autonomous system
number. However, the proposed solution uses a different autonomous system for each Cisco ACI fabric, so
you cannot rely on automatic generation of the route target. Therefore, the recommended approach is to
explicitly define the route target in a symmetric fashion on every VTEP.
evpn
vni 31001 l2
rd auto
route-target import 65500:31001
route-target export 65500:31001
…
vni 32000 l2
rd auto
route-target import 65500:32000
route-target export 65500:32000
sh bgp l2vpn evpn 0004.1402.0001
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 96 of 132
Advertised path-id 1
Path type: external, path is valid, is best path, no labeled nexthop
Imported from
21.21.21.21:33948:[2]:[0]:[0]:[48]:[0004.1402.0001]:[0]:[0.0.0.0]/112
AS-Path: 65501 , path sourced external to AS
21.21.22.22 (metric 0) from 21.21.21.21 (21.21.21.21)
Origin IGP, MED not set, localpref 100, weight 0
Received label 31001
Extcommunity: RT:100:31001 SOO:21.21.22.22:0 ENCAP:8
f. Example: Write Python script to generate extension for 1000 VLANs.
The VXLAN CLI requires a few lines of configuration for each VNI that you set up. First you need to associate
a VLAN with a VNI. Then you need to create an entry in the NVE interface and define a route target on the
EVPN.
In the testing performed, the VXLAN DCI is set up for 1000 VLANs to cross over, and this can require many
lines of code. Here is an example of Python script running on the four DCI switches to populate VNI
automatically. You can also write a Python script running on a server to populate multiple switches from there.
python
from cli import *
i=1900
while i < 2001:
vni = "3%i" %i
command =" conf ; vlan " + str(i) + " ; vn-segment " + str(vni) + " ; exit"
print command
cli (command)
command = "conf ; int nve1 ; member vni " + str(vni) + " ; ingress-
replication protocol bgp ; exit"
print command
cli (command)
command = "conf ; evpn ; vni " + str(vni) + " l2 ; rd auto ; route-target
import 100:" + str(vni) + " ; route-target export 100:" + str(vni) + " ; exit"
print command
cli (command)
i=i+1
4. Connect the DCI nodes to Cisco ACI using vPC.
a. Create vPC.
The vPC between the DCI node and Cisco ACI fabric is not unusual. The implementation on the DCI side is
simple because the vPC is not associated with any SVI.
Nevertheless, special considerations are required to make the recovery of a leaf node part of a vPC domain.
You need to meet the conditions described here.
Traffic originating from the Cisco ACI fabric should start using the vPC leg to the recovered leaf node only
when the node has fully reestablished connectivity (control plane and data plane) with the rest of the network.
If a node is brought up before the device can successfully establish routing adjacencies, traffic will be
temporarily black-holed (the outage may last several seconds). The recommended solution for this problem is
to configure delay restore for the vPC leg connections, as shown in the sample configuration here:
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 97 of 132
vpc domain 1
peer-switch
role priority 1000
peer-keepalive destination 10.50.138.4
delay restore 180
peer-gateway
auto-recovery
ip arp synchronize
vPC connections between the hosts and the recovering leaf node will be kept down for 180 seconds, providing
the time required for the switch to reestablish routing adjacencies and exchange route information with the
neighbor devices.
Traffic destined for the Cisco ACI fabric and received from the other data center will be steered to the
recovering device as soon as it starts advertising the anycast VTEP IP address to the underlay control plane.
This advertisement may lead to a problem the opposite of the one just discussed. If the advertisement of the
anycast VTEP address occurs before the vPC peer link and the vPC leg connection to the fabric are
recovered, traffic will be black-holed as well.
NX-OS Software Release 7.0(3)I2(2) and later offers an option natively on Cisco Nexus 9000 Series Switches
to eliminate this concern. The option keeps the loopback interface used as the VTEP in the down state for a
certain period of time to help ensure that the recovering node can reestablish connectivity with the vPC peer
before inbound traffic is received from the fabric core. As shown in the sample output here, the default hold-
down-time value is 180 seconds. This timer can be tuned from 0 to 1000 seconds.
sh nve interface nve 1 detail
Source Interface hold-down-time: 180
To create the vPC, with Cisco Nexus 9000 Series in standalone NX-OS mode, a pair of peer links are
required.
interface port-channel1
switchport mode trunk
spanning-tree port type network
vpc peer-link
Then, a standard vPC connection is created to the fabric. Both two links and four links are supported.
interface port-channel2
switchport mode trunk
switchport trunk allowed vlan 1000-1919
mtu 9216
vpc 2
b. Optionally, configure storm control on the VXLAN side.
As stated earlier, you need to control the amount of traffic that is forwarded from one data center to another.
Unicast traffic is not a particular challenge because it reaches only one only virtual machine, but if the
destination MAC address is unknown, then the traffic may be flooded to all the servers in the VNI/VLAN.
The case of unknown unicast MAC address traffic is rare, especially because data centers are no longer
spanning-tree based, but both the Cisco ACI and the VXLAN standalone devices can control it. The DCI model
requires you to drastically rate-limit unknown unicast traffic.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 98 of 132
The case of broadcast traffic is more sensitive. Broadcast traffic reaches the virtual machine host CPU, and
limitations must be drastic. Some measurement shows that a standard virtual machine can handle about 100
Mbps, leading to a requirement of 1 percent of the broadcast traffic on a 10-Gbps link.
Multicast traffic also consists of sensitive frames that reach the CPU, but the rate limiting depends on the
application’s requirements. Below is an example config that applies storm control on the DCI nodes for traffic
coming from the Cisco ACI fabric.
interface port-channel2
storm-control broadcast level 1.00
storm-control multicast level 1.00
storm-control unicast level 1.00
Figure 84 provides a summary of the configuration steps needed to enable VXLAN as a Layer 2 DCI.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 99 of 132
● Connected to Leaf3, port e1/4, emulating hosts: 10.1.4.100-105 belonging to the subnet stretched between
both data centers.
DC2 used the following tester ports and emulated endpoints:
Each stream was configured for a 10-Mbps load, with a fixed frame length of 1024 with a UDP header.
Testing Overview
Testing was divided into two areas:
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 100 of 132
● Failure testing for the Cisco ACI fabric components
◦ Leaf nodes
◦ Spine nodes
Results Summary
Table 1 summarizes the tests performed and the results.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 101 of 132
Link from ACI Leaf 1 in DC1 to the local Nexus 9300 VXLAN DCI device
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 102 of 132
On Failure
On Recovery
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 103 of 132
Nexus 9300 VXLAN DCI device node failure
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 104 of 132
On Failure
On Recovery
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 105 of 132
Peer link failure between the Nexus 9300 DCI devices
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 106 of 132
The vPC secondary brings down its port channels because the peer link is down.
vPC domain id : 1
Peer status : peer link is down
vPC keep-alive status : peer is alive
Configuration consistency status : success
Per-vlan consistency status : success
Type-2 inconsistency reason : Consistency Check Not Performed
vPC role : secondary
Number of vPCs configured : 1
Peer Gateway : Enabled
Dual-active excluded VLANs : -
Graceful Consistency Check : Enabled
Auto-recovery status : Enabled, timer is off.(timeout = 240s)
Delay-restore status : Timer is off.(timeout = 180s)
Delay-restore SVI status : Timer is off.(timeout = 10s)
vPC status
----------------------------------------------------------------------
id Port Status Consistency Reason Active vlans
-- ---- ------ ----------- ------ ------------
2 Po2 down success success -
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 107 of 132
On Failure
On Recovery
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 108 of 132
Cisco ASA Cluster Member Failure (Slave Node in DC1)
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 109 of 132
This test focuses on failure of the ASA cluster member (ASA 2 in DC1).
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 110 of 132
Here is summary of connections across the cluster nodes. ASA2 in DC1 is powered off. This unit was handling 13
connections:
DC2-ASA-2_E05-asa5585x-02:********************************************
12 in use, 20 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 10 in use, 20 most used
centralized connections: 5 in use, 14 most used
DC2-ASA-1_i05-5585x-02:***********************************************
23 in use, 39 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 8 in use, 27 most used
centralized connections: 5 in use, 35 most used
DC1-ASA-2_E05-5585x-01:***********************************************
13 in use, 20 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 8 in use, 21 most used
centralized connections: 5 in use, 18 most used
DC1-ASA-1_i05-5585-01/TnT-14/master#
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 111 of 132
On Failure
At this point, all connection that were handled by the unit that went down have been rebalanced on the other units
in the cluster:
DC2-ASA-2_E05-asa5585x-02:********************************************
15 in use, 20 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 16 in use, 24 most used
centralized connections: 7 in use, 18 most used
DC2-ASA-1_i05-5585x-02:***********************************************
26 in use, 39 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 12 in use, 28 most used
centralized connections: 7 in use, 36 most used
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 112 of 132
On Recovery
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 113 of 132
Cisco ASA Cluster Member Failure (Master Node)
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 114 of 132
This test focused on the failure of the ASA cluster master node (ASA 1 in DC1).
DC2-ASA-2_E05-asa5585x-02:********************************************
12 in use, 20 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 10 in use, 24 most used
centralized connections: 5 in use, 20 most used
DC2-ASA-1_i05-5585x-02:***********************************************
23 in use, 39 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 8 in use, 28 most used
centralized connections: 5 in use, 37 most used
DC1-ASA-2_E05-5585x-01:***********************************************
7 in use, 20 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 11 in use, 11 most used
centralized connections: 5 in use, 12 most used
On Failure
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 115 of 132
Worst case: 3.9 seconds
DC1-ASA-2_E05-5585x-01/slave>
Cluster unit DC1-ASA-2_E05-5585x-01 transitioned from SLAVE to MASTER
DC1-ASA-2_E05-5585x-01/master>
DC1-ASA-2_E05-5585x-01/master>
DC2-ASA-2_E05-asa5585x-02:********************************************
13 in use, 20 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 14 in use, 26 most used
centralized connections: 5 in use, 22 most used
DC2-ASA-1_i05-5585x-02:***********************************************
24 in use, 39 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 11 in use, 30 most used
centralized connections: 5 in use, 39 most used
DC1-ASA-2_E05-5585x-01/TnT-14/master#
On Recovery
The node rejoins the cluster as a slave:
DC1-ASA-2_E05-5585x-01/TnT-14/master#
Beginning configuration replication to Slave DC1-ASA-1_i05-5585-01
End Configuration Replication to slave.
FROM DC1-ASA-1_i05-5585-01:
Cluster unit DC1-ASA-1_i05-5585-01 transitioned from DISABLED to SLAVE
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 116 of 132
Traffic is not affected.
DC1-ASA-1_i05-5585-01:************************************************
9 in use, 18 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 11 in use, 11 most used
centralized connections: 5 in use, 9 most used
DC2-ASA-2_E05-asa5585x-02:********************************************
13 in use, 20 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 10 in use, 26 most used
centralized connections: 5 in use, 23 most used
DC2-ASA-1_i05-5585x-02:***********************************************
24 in use, 39 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 9 in use, 30 most used
centralized connections: 5 in use, 42 most used
DC1-ASA-2_E05-5585x-01/TnT-14/master#
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 117 of 132
Cisco ASA Cluster Member Failure (Slave Node DC2)
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 118 of 132
This test focused on failure of the ASA cluster node (ASA 1 in DC2).
DC1-ASA-1_i05-5585-01:************************************************
7 in use, 18 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 11 in use, 11 most used
centralized connections: 5 in use, 9 most used
DC2-ASA-2_E05-asa5585x-02:********************************************
16 in use, 20 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 10 in use, 26 most used
centralized connections: 7 in use, 24 most used
DC2-ASA-1_i05-5585x-02:***********************************************
27 in use, 39 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 9 in use, 30 most used
centralized connections: 7 in use, 44 most used
DC1-ASA-2_E05-5585x-01/TnT-14/master#
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 119 of 132
On Failure
On Recovery
The node rejoins the cluster:
DC1-ASA-2_E05-5585x-01/TnT-14/master#
DC1-ASA-1_i05-5585-01:************************************************
8 in use, 18 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 8 in use, 22 most used
centralized connections: 5 in use, 13 most used
DC2-ASA-2_E05-asa5585x-02:********************************************
29 in use, 37 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 4 in use, 26 most used
centralized connections: 5 in use, 31 most used
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 120 of 132
DC2-ASA-1_i05-5585x-02:***********************************************
7 in use, 18 most used
Cluster:
fwd connections: 0 in use, 0 most used
dir connections: 18 in use, 18 most used
centralized connections: 5 in use, 11 most used
DC1-ASA-2_E05-5585x-01/TnT-14/master#
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 121 of 132
Customer edge router: link with ACI fabric failure
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 122 of 132
This test focused on failure of the link between local customer edge router in DC1 and ACI fabric 1. This router
uses a port-channel interface to connect to the fabric. Logically, the fabric provides connectivity from the customer
edge router to the outside interface of the local ASA in DC1.
<…snip>
One of the members of port-channel10 failed by pulling the cable between the router and ACI leaf switch.
On Failure
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 123 of 132
On Recovery
Worst case: 20 ms
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 124 of 132
Customer Edge Router WAN Link Failure
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 125 of 132
This test focused on failure of the link between customer edge router in DC1 and the WAN router in DC1. In the
test environment, this router has a single link to the WAN. Logically, BGP is used to peer with the WAN router (next
hop 192.14.10.2). The fabric also provides BGP peering to the customer edge router in DC2 (next hop
192.14.15.2).
This was the BGP table on the router before the failure:
le06-2911-01_DC1#show ip bgp vpnv4 vrf TnT-14
BGP table version is 541, local router ID is 192.10.10.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, x best-
external, f RT-Filter
Origin codes: i - IGP, e - EGP, ? - incomplete
On Failure
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 126 of 132
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 14:14 (default for vrf TnT-14)
*> 10.1.4.0/24 192.14.20.102 20 0 ?
*> 10.100.14.0/24 192.14.20.102 20 0 ?
*> 10.200.14.0/24 192.14.20.102 1 0 4294967295 i
*> 101.101.11.11/32 192.14.20.102 12 0 ?
*> 102.102.12.12/32 192.14.20.102 12 0 ?
*> 192.14.13.0 192.14.15.2 4294967290 0 200 300 i
*> 192.14.20.0 0.0.0.0 0 0 ?
*> 192.14.31.0 192.14.20.102 11 0 ?
*> 192.14.99.0 192.14.15.2 4294967290 0 200 300 i
The backup link over the Cisco ACI fabric to the customer edge router in DC2 is now being used to reach the
subnets present on the remote sites such as a branch office.
On Recovery
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 127 of 132
Cisco ACI Border Leaf Node Failure
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 128 of 132
This test focused on failure of the Cisco ACI border node (leaf 101).
On Failure
On Recovery
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 129 of 132
Cisco ACI Spine Nodespine Failure
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 130 of 132
This test focused on failure of the Cisco ACI spine node (spine 101). Because each leaf can be reached through
two different spine nodes, the failure of one of the spine nodes has little impact on traffic.
On Failure
On Recovery
Conclusion
This document provides a guide to designing and deploying Cisco Application Centric Infrastructure in two data
centers using an active-active architecture. Enterprises and services providers require a data center environment
that is continuously available and that protects against single points of failure, including failure of an entire data
center. The solution described in this document address those needs by offering a software-defined multiple–data
center infrastructure that reduces total cost of ownership (TCO), accelerates data center application deployment,
and supports business continuity.
The document defines the characteristics of a Cisco ACI dual-fabric deployment consisting of independent Cisco
Application Policy Infrastructure Controller clusters for each site and provides an overview of the design. For data
center interconnection for Layer 2 extension, two options are presented: one based on dark fiber that uses back-to-
back vPC, and one that uses VXLAN as a Layer 2 DCI over a Layer 3 core.
In Cisco ACI, in addition to considering connectivity, you need to consider Cisco ACI policy. The document
discussed policy design and application for Layer 2 and Layer 3 traffic between sites and between the WAN and
the Cisco ACI data centers.
Cisco ACI also supports integration with virtual machine managers and hypervisors, allowing the fabric to provide
network services to virtual machines. This document discussed how this integration works for a dual-fabric design
across two data centers, and how Cisco ACI supports cross-data center live migration when used in combination
with VMware vSphere Release 6.0 or later.
The document also discussed how security services are integrated in an active-active architecture dual-data center
design through the use of Cisco ASA firewalls in active- standby mode or, preferably, in ASA cluster mode.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 131 of 132
Multitenancy is built in Cisco ACI. This document describes how to preserve and maintain multitenancy across data
centers and also for flows to the WAN. The document also discusses integration with the WAN on both sites and
the traffic flows between remote sites such as branch offices and the data center.
The design presented in this document has been validated by Cisco in a lab environment that replicates a real-
world customer setup, and therefore detailed test results, including convergence time, are provided.
Ultimately, this document provides a reference guide to help you design and deploy Cisco ACI in two data centers
to meet the business needs of an always-available, multisite network infrastructure.
● Cisco ACI Multisite: Geo-Distributed Wireless LAN Controllers over Two Cisco ACI Fabrics: This
demonstration shows a geographically distributed redundant solution for a Cisco Wireless LAN Controller
deployed on top of the design described in this document.
● Cisco ASA Cluster over Two Cisco ACI Fabrics in a Dual–Data Center Design: This demonstration shows
the ASA cluster working as explained in this document.
● Cisco ACI Dual DC Innovations: Cisco ACI Across Two Data Centers, Intersite Toolkit, and Cross-vCenter
vMotion: This video demonstrates some of the Cisco ACI innovations introduced in Cisco ACI Release
1.2(1) that were used on the design presented in this document. This video demonstrates Cisco ACI
integration with VMware vSphere 6.0 and its support for cross-vCenter vMotion and virtual IP address and
virtual MAC address capabilities for optimized forwarding.
© 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 132 of 132