Sie sind auf Seite 1von 7

Understanding Up/Down InfiniBand Routing Algorithm | Mellanox Interconnect Community 4/29/17, 14)51

All Places > Solutions > Documents

Understanding Up/Down InfiniBand Version 12

Routing Algorithm
Created by ophirmaor on Jan 13, 2016 4:56 PM. Last modified by ophirmaor on Feb 14,
2017 4:17 PM.

This post discusses the up/down InfiniBand routing algorithm.


This post is fairly basic. However, the reader should have a good understanding of networking and
familiarity with InfiniBand concepts.

References
Overview
Configuration

References
opensm(8) - Linux man page
Understanding the GUID Routing Order File (SM Configuration)
Understanding the Root GUID File (SM Configuration)
HowTo Prevent InfiniBand Credit Loops
VPI Gateway Considerations

Overview
Several InfiniBand routing engines may be configured on a network such as Min Hop, Up Down,
Down Up, Fat Tree and more (see opensm ). Up/Down (UpDn) and Fat Tree are the most
commonly used InfiniBand routing algorithms for Clos/fat tree networks.

Note: This includes trees built using director switches and 1U switchesthe two levels of physical
switch enclosure represent 3 tiers of switch ASICs because each director switches contains 2 tiers
of ASICs.

Like most IB routing algorithms, UpDn uses the shortest path(s) available between any two
endpoints. It can route any collection of IB-connected switches and HCAs. Most importantly and
unlike MinHop, UpDn guarantees credit-loop free routing in the fabric. UpDn begins with a list of

https://community.mellanox.com/docs/DOC-2402 Page 1 of 7
Understanding Up/Down InfiniBand Routing Algorithm | Mellanox Interconnect Community 4/29/17, 14)51

the switch ASICs that form the root or top level of the fabric. This list is set with the Subnet
Manager (SM) flag --root_guid_file. It is a simple text file with a line for each globally unique ID
(GUID) of a root ASIC. Although UpDn has an option to auto-discover the root ASICs, it is strongly
recommended that a root GUID list be supplied. The root GUID list must be updated if a root
switch ASIC is replaced or if the topology is expanded, and every SM must have an identical copy
of the GUID list.

To begin routing the fabric, the UpDn algorithm starts with the root switch ASICsto which we will
refer as Distance 0 (zero). The algorithm then finds every switch ASIC that is one hop (one link)
away from the roots. These ASICs can be thought of as Distance 1, because they are one hop away
from the root switches. The algorithm then discovers all switch ASICs that are two hops from the
root switches, these can be thought of Distance 2. The process continues until every switch ASIC
has been assigned a distance from the roots. The following diagram shows an example 3-tier
fabric with the distances assigned.

This process generates a Breadth-First Spanning Tree (BFSP) which is analogous to the approach
used by the Spanning Tree Protocol (STP) used in Ethernet. Unlike STP, UpDn allows multiple roots,
and strives to provision as many paths as possible between each pair of end nodes. The UpDn
algorithm then finds all of the possible shortest paths between every pair of endpoints. Next,
UpDn discards any path that contains a hop from a Distance N ASIC to a Distance N+1 ASIC,
followed by a hop back to Distance N. That is, it discards any path that goes "down" (away from
the roots) and then "up" (toward the roots). Legal paths can go up, or down, or up and then down,
or stay at the same level, but never down and then up. By discarding these paths and not
provisioning them in the switches, UpDn guarantees no logical loops and no credit loops in
routing that can lead to the traffic stoppage..

The following diagram shows examples of allowed and disallowed paths.

https://community.mellanox.com/docs/DOC-2402 Page 2 of 7
Understanding Up/Down InfiniBand Routing Algorithm | Mellanox Interconnect Community 4/29/17, 14)51

Note: The two potential paths between nodes E and F are both the same length (same number of
hops) but only one obeys the UpDn rule. The disallowed path contains a DnUp segment.

The credit loop-free property of UpDn (and Fat Tree) routed topology is critical for reliable
network operation.
However, since some potential paths are discarded, there are cases where a pair of end nodes can
become disconnected and unable to communicate one to another.
The calculate_missing_routes opensm option when set to TRUE (the default value) in opensm
configuration file guarantees connectivity between all endpoints in the fabric in credit loop-free
manner with UPDN and Fat Tree routing.

For example, consider a different fabric that has nodes connected above the leaf switches (nodes
G, H, and J). Nodes connected to L1 switches (A, B, C, etc.) have legal UpDn paths to nodes G, H,
and J. There is a legal UpDn path between nodes G and H. However, there is no legal path between
G and J, and these nodes will not be able to communicate with each other. Setting
calculate_missing_routes to TRUE will provide credit-loop free routing between all endpoints.

There may be cases where nodes do not need to communicate with each other (e.g. storage nodes
that do not communicate among themselves). However, this is rare. The best practice for a Clos-5

https://community.mellanox.com/docs/DOC-2402 Page 3 of 7
Understanding Up/Down InfiniBand Routing Algorithm | Mellanox Interconnect Community 4/29/17, 14)51

3-tier fabric is not to connect nodes to the L2 switches.

Note: The diagrams above apply equally well to two different cases: A fabric built from 3 tiers of
1U switches, and a fabric that uses two director switches with 1U switches below them. In the
latter case, nodes E, F, and G represent nodes cabled to the leaf modules of the director switches.

Scatter-Ports

When assigning logical paths to physical links, the UpDn algorithm tries to map the same number
of paths per link to maximize use of the available bandwidth. This balancing is done statically,
without knowledge of actual workloads and traffic patterns. Path balancing decisions are made
locally, at each switch, without assuming anything about the physical topology. The resulting path
assignments may not be optimal for typical Clos/Fat Tree workloads.

A routing option called scatter-ports is available for MinHop and UpDn routing engines. It
instructs the routing algorithm to randomize the local assignments of paths to links, which often
results in better link utilization. The scatter-ports option requires an integer argument, which is
the seed for the random number generator. It is recommended to use a prime number for the
seed; a seed of zero turns off randomization.

Note: scatter-ports configuration is available only on SM running on a host (or UFM), it is not
supported in case the SM is running on a switch.

Configuration

1. The routing engine algorithm is configured with the flag --routing_engine of the opensm
command. The supported engines are: minhop, updn, dnup, file, ftree, lash, dor, torus-2QoS,
dfsssp, sssp, pqft, chain.

In case you are using SM running on an InfiniBand switch, run the following command on the
MLNX-OS CLI:

switch (config) # ib sm routing-engines ftree updn minhop

In case of an issue in the fabric, it is better to fall down to updn and not minhop. In case fat tree
and updn cant converge it will fail to minhop.

https://community.mellanox.com/docs/DOC-2402 Page 4 of 7
Understanding Up/Down InfiniBand Routing Algorithm | Mellanox Interconnect Community 4/29/17, 14)51

2. The list of roots for the UpDn routing algorithm is configured with the flag --root_guid_file of
the opensm command.

In case you are using SM running on an InfiniBand switch, use the following command to set the
list of root GUIDs.

switch (config) # ib sm root-guid <root-guid>

Doing that will force the routing algorithm to use those specific switches as root GUIDs.

How Do I find the root GUIDs?

a. Run ibswitches on the network (from a switch or from the host) to get the list of switches and
their GUIDs. The GUIDs are marked in red below

mti-mar-sx21 [my-sm-cluster: master] (config fae) # ibswitches


Switch : 0xf45214030011e4f0 ports 36 "MF0;mti-mar-sx22:SX6036/U1" enhanced port 0 lid 2 lmc 0
Switch : 0x0002c903007fbbe0 ports 36 "MF0;mti-mar-sx21:SX6036/U1" enhanced port 0 lid 1 lmc 0
...

b. Filter the switches that are Spine switches in the cluster, and get their GUIDs

c. Run the command on the switch:


Note: you need to add ':' after each byte, same as MAC address

switch (config) # ib sm root-guid 0x00:02:c9:03:07:7f:bb:e0


switch (config) # ib sm root-guid 0xf4:52:14:03:00:11:e4:f0

In case, for example, you have 18 Spines and 36 leafs, it is recommended to run this command 18
times adding 18 spines GUIDs (on the SM switch)

2580 Views Categories: Tags:

https://community.mellanox.com/docs/DOC-2402 Page 5 of 7
Understanding Up/Down InfiniBand Routing Algorithm | Mellanox Interconnect Community 4/29/17, 14)51

Average User Rating

(2 ratings)

0 Comments

Company Products
About Mellanox Adapters and Cables
Management InfiniBand/VPI Adapter Cards
Board of Directors Ethernet Adapter Cards
Timeline Switches and Gateways
Quality InfiniBand/VPI Switch Systems
Philanthropy Ethernet Switches
Industry Memberships Gateways
Research Partners Software & Drivers
Corporate Headquarters USA Mellanox OFED
Corporate Headquarters Israel WinOF Driver
Regional Offices Application Accelerator Software
Technical Support Unified Fabric Manager (UFM)
Virtualization for Infiniband and Ethernet
RDMA Software for GPU
Firmware Tools

Solutions Support/Education
HPC Solutions Questions MyMellanox Login
RDMA/RoCE and Storage Solutions Global Services
Performance Tuning End-of-Life Products
RDMA/RoCE and Storage Solutions Firmware Download
Interconnect Solutions InfiniBand/VPI Drivers download
Storage Solutions Questions Mellanox Academy
Big Data Solutions Questions Products Overview
Windows Driver Solutions InfiniBand White Papers
Linux/VMWare Driver Solutions Ethernet White Papers
Programming Solutions Silicon Photonics White Papers
Lab Tips and Fun Webinars
Ethernet Switch Solutions Videos
Mellanox NEO Solutions Podcasts
Lab Tips and Fun Case Studies
Virtualization Solutions
Cloud and Acceleration Solutions Events
Cloud Solutions Questions Mellanox News & Events

https://community.mellanox.com/docs/DOC-2402 Page 6 of 7
Understanding Up/Down InfiniBand Routing Algorithm | Mellanox Interconnect Community 4/29/17, 14)51

Community News
Latest Release Announcements

2017 Mellanox Technologies. All Rights Reserved - Legal/Privacy Policy

Home | Top of page | Help 2017 Jive Software | Powered by

https://community.mellanox.com/docs/DOC-2402 Page 7 of 7

Das könnte Ihnen auch gefallen