Sie sind auf Seite 1von 116

Advanced Switching Overview

Seth Zirin
Principal Engineer Intel Corporation
ASI-SIG FMS WG Chair

Joe Bennett
Principal Engineer Intel Corporation
ASI-SIG PI-8 WG Chair

Copyright 2004, PCI-SIG, All Rights Reserved

Advanced Switching Overview


ASI Technical Introduction PI-8 Technical Review

PCI Express*

Advanced Switching

Star
PCI-SIG Developers Conference

Dual Star

Mesh

*Other names and brands may be claimed as the property of others Copyright 2004, PCI-SIG, All Rights Reserved 2

ASI Technical Introduction


Principal Engineer Intel Corporation
ASI-SIG FMS Workgroup Chair

Seth Zirin

Copyright 2004, PCI-SIG, All Rights Reserved

Agenda
Introduction Core AS Architecture Protocol Interfaces Configuration Structures Software & Management

PCI Express

Advanced Switching

Star
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved

Dual Star

Mesh
4

PCI Express* Layer Reuse


PCI Express Advanced Switching Any Protocol Any Topology Peer-to-Peer / Multicast Quality of Service

PCI* Software SoftwareSoftware PCI* AS Software PCI PnP Model PCI PnP AS Fabric Model Model (init, enum, conf) (init, enum, conf) (init, enum, conf) PCI Express PCI Express Protocol AS Protocol Protocol Point-to-Point Data Link 2.5 Gbps Copper

Transaction Data Link Physical

Packet Based Reliable Transport Serial, Dual-Simplex

*Other names and brands may be claimed as the property of others PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 5

Protocol Encapsulation
Core Management Encapsulations
Device Configuration Events
PI-4 PI-5

Chained Encapsulations
Multicast Flow Labeling SAR Functions
PI-0 PI-1 PI-2

Optional Encapsulations
PCI Express Tunneling Load/Store Push-Pull Queuing/Messaging Socket Data Transport
PI-8 SLS SQ SDT

Extensible Via Vendor/End-User Encapsulations


Hardware or Software Implementations
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 6

Path-Based Routing
EP1
SW1

SW2

EP3
SW4

EP2
Source EP1 EP1 EP1
PCI-SIG Developers Conference

SW3 Switch
Destination EP2 EP3 EP3 Device Path SW1 SW1, SW2, SW4 SW1, SW3, SW4

AS

EP4
Turn List 2 0, 3, 0 1, 3, 1
7

Copyright 2004, PCI-SIG, All Rights Reserved

Topology-Agnostic Fabric

EP EP AS AS EP AS EP AS

EP EP AS EP AS EP AS

EP AS

EP EP AS EP AS EP AS

EP AS

EP EP AS EP AS

AS AS EP FM EP EP

AS EP AS EP EP FM
8

EP AS

AS EP

AS AS FM EP

EP

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

AS Endpoint Variations
PCI Device Drivers Host Interface Advanced Switching Device Drivers Host Interface Host Interface Arbitration PCI Express
Transaction Layer

Arbitration SQ
Other PIs

PI-8

Core SLS PIs

AS Transaction Layer Reliable Data Link Layer Physical Layer

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

Agenda
Introduction Core AS Architecture Protocol Interfaces Configuration Structures Software & Management

PCI Express

Advanced Switching

Star
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved

Dual Star

Mesh
10

Link Layer Enhancements


Modified PCI Express Link State Model
Protected State for Fabric Access Security

Credit-Based Flow Control


Single Credit Category for Headers & Payloads Single Credit Category for Completions & Writes Credit Denomination is 64 Bytes (Not 32 Bytes)

AS Leverages PHY & Link of PCI Express


PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 11

AS Packet Framing
Transaction Layer P-CRC L-CRC Frame Frame SEQ#

AS Header

Payload (0-2KB)

Link Layer PHY Layer

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

12

Transaction Layer Protocol


AS Header AS Payload
(PI Defines Format) (PI Defines Format)

Three General Classes of Protocol in AS


PI

Native Protocols
Management, Congestion Control, Segmentation/Reassembly, etc.

Encapsulated Protocols
e.g., PCI Express, Ethernet, etc.

Proprietary Protocols
Vendor-Provided for Closed Systems

Protocol Interface (PI) Header Field


Defines Payload Content & Format

Payload Interpreted Only by Endpoints Multiple Simultaneous Encapsulations


Per VC, Link, Endpoint, etc.

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

13

General AS Packet Header


3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 9 8 7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 F E C N Credits Required Turn Pool T O Traffic S O Class P C P R C

Header CRC

Turn Pointer

PI

FECN: TS: OO: PCRC: P: PI: D:

Forward Explicit Congestion Notification Type Specific Ordered-Only Payload CRC Perishable (Discard Eligibility) Protocol Interface Direction (Forward / Reverse)

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

14

Protocol Interface (PI) Chaining


PIs Can be Chained Together Within Single Packet
Multicast Congestion/Flow ID SAR

Example: SAR of Ethernet Packet


AS Enet Header PI-X

Payload

CRC

AS SAR Header PI-2

Enet PI-X

Payload

CRC

AS Header

SAR PI-2

Enet PI-X

Payload

CRC

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

15

Path Routing
0 Ingress 100 0101 F
Pointer Turn Pool Dir

Egress 100 0101 F

2 3

Switch #1: 8 Port

Switch #2: 16 Port 2 1 0

4 5

AS Header

100 0101 F
Direction Bit
F - Forward B - Backward

Turn Pool is Unique Signature Simplifies Switches: No Unicast Lookup Tables or CAMs Packets Easily Returned to Sender Instead of Dropped Ideal for Redundancy with Extremely Fast Failover
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved

16

Fault Conditions
Pointer Turn Pool Dir

AS Header

Bad 100 0101 B F Link


0

Packet Egress 100 0101 B 4

1 0

Switch #1

5
(4) Source Receives the Packet it Sent

Switch 1 #2 2 3 4 5

100 0101 B
(3) Packet Follows the Reverse Path Back to its Source (1) Direction Bit is Flipped (2) Switch #2 Re-Injects Packet, Normally

Reliable Link Layer Detects Inability to Forward Packet Packets Can be Automatically Routed Back to Source
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 17

Route Redundancy
EP1
SW1 AS

SW2

EP3
SW4

EP2
Source EP1 EP1 EP1 EP1 Destination EP3 EP3 EP3 EP3

SW3 Switch

EP4
Turn List 0, 3, 0 1, 3, 1 0, 4, 1, 1 1, 1, 4, 0
18

Primary Backup Backup Backup

Device Path SW1, SW2, SW4 SW1, SW3, SW4 SW1, SW2, SW3, SW4 SW1, SW3, SW2, SW4

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

AS Quality of Service
Same TC/VC Mechanism as PCI Express
Traffic Class (TC): Packet Tags for Traffic Differentiation
3-bit Tag is Invariant Through the Fabric

TCs are Mapped to VCs

Cost/Performance Flexibility

AS Supports Deadlock-Free Encapsulation of PCI Express


1 - 8 VC Queues (0 N 7)

AS Header
3-bit Traffic Class Packet Ingress Map Packets to VC Queues Based on 3-Bit TC

VC #0 Output Port VC #N

Flexible Differentiation of Traffic


PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 19

Virtual Channels (VC)


AS Defines Three VC Types Unicast VCs with Bypass Capability
Required for Load/Store Protocols
(e.g., PCI Express, SLS) Ordered Queue Bypass Queue

Architecture Support for 8 BVCs Minimum Packet Size of 192 Bytes

Optional Unicast VCs with No Bypass


Comms-Oriented, Ordered-Only Flows Architecture Support for 8 OVCs Minimum Packet Size of 64 Bytes
Ordered Queue

Optional Multicast VCs


Also Ordered-Only Flows Architecture Support for 8 MVCs Minimum Packet Size of 64 Bytes
Multicast Queue

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

20

Traffic Classes (TC)


TC Used to Group Flows of Traffic
Enables Differentiated Service Through Fabric

Eight Traffic Classes per VC Type TC Value Carried End-to-End in AS Header Fixed TC to VC Mappings Within VC Type
Independent Mappings per VC Type
(Bypass, Ordered, Multicast)

Mapping is Function of Active Number of VCs on Port

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

21

TC-VC Mapping Examples


Switch Link
TC[0:1] TC[2:3] TC[4-6] TC7 VC0 VC1 VC2 VC3 TC[0:1] TC[2:3] TC[4-6] TC7 TC0 TC1 TC2 TC3 TC4 TC5 TC6 TC7 VC0 VC1 VC2 VC3 VC4 VC5 VC6 VC7 TC0 TC1 TC2 TC3 TC4 TC5 TC6 TC7

Endnodes
TC[0:6] TC[7] VC0 VC1 TC[0:6] TC[7]

Endnode

TC[0:7]

VC0

TC[0:7]

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

22

AS Multicast
Maximum of 64K Multicast Groups, Minimum of 1 Switch Lookup Tables Specify Output Ports
16-bit Multicast Group ID Field in Packet Header Software is Required for Setup, Supervision & Teardown

Endpoints Can Write, Listen, or Both


Single or Multiple Writers or Listeners; Loopback Supported

Applications: Conferencing, Media Broadcast, Control,


Management, Sync, Heartbeat, etc.
Multicast LUT Index Ports

AS Switch
Output Port Output Port Output Port Output Port
23

AS Packet Header Group ID Payload

Replication
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved

AS Multicast Packet Header


3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 9 8 7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 F E C N Credits Required Turn Pool Multicast Group Index Origin Specific Data Secondary PI Traffic Class P C P R C PI (0000000b)

Header CRC

Turn Pointer

00b

FECN: PCRC: P: PI: R:

Forward Explicit Congestion Notification Payload CRC Perishable (Discard Eligibility) Protocol Interface Reflected

Turn Pool & Turn Pointer Built Along The Way


PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 24

AS Congestion Management
Enables Over-Subscription
Regulates Traffic Flows to Avoid Congesting Links & Components Builds on PCI Express Base (VCs, Credit-Based Flow Control) Balances Performance & Cost
Minimizes Rather than Eliminates Congestion Supports End-to-End CM via PIs and/or Upper-Layer Protocols

New Congestion Management Mechanisms


Status-Based, Per-TC Link Flow Control (SBFC) Minimum Bandwidth Scheduler (Switch Egress Scheduling) Endpoint Source Injection Rate Limiting

Normalized Control Interfaces for Interoperability


Vendor-Specific Implementation Options Possible

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

25

AS Congestion Management
Ingress Stacks
Tunneled Tunneled Protocol Protocol Tunneling Tunneling PI PI
Ingress Flow Multiplexing

Communication Flows

Egress Stacks
Tunneled Tunneled Protocol Protocol Tunneling Tunneling PI PI
Egress Flow De-Multiplexing

Tunneling Flows End to End Flow Control Feedback (PEI defined & optional) AS Ingress Scheduled Flows

AS Flow CM Model

Ingress AS Fabric

Locally Scheduled Local Status Feedback

Switch
AS Fabric AS Trans PCI Ex Link PCI Ex Phy

Locally Scheduled Local Status Feedback VC Arbitrated Flow VC FC Credits

Switch
AS Fabric AS Trans PCI Ex Link PCI Ex Phy

Locally Scheduled Local Status Feedback VC Arbitrated Flow VC FC Credits

Egress AS Fabric

AS Trans PCI Ex Link PCI Ex Phy

VC Arbitrated Flow VC FC Credits

AS Trans PCI Ex Link PCI Ex Phy

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

26

Credit-Based Flow Control


Per-VC Link Flow Control
Credit-Based

Same Mechanism as PCI Express Nearest Neighbors Avoid Congesting Input Ports
Data Never Sent to Depleted Input Buffer

Credit Exchanged Using DLLPs


One BVC per DLLP Up to Two OVCs per DLLP Up to Two MVCs per DLLP

Credit Denomination
64 Bytes
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 27

Status-Based Flow Control


Per-TC Link Flow Control
Status-Based Allows Transmission When In/Out Path Through Next Switch for TC is Un-Congested

Optional Normative Nearest Neighbors Avoid Congesting Output Ports


Reported Per Output Port & TC Combination Explicit XON-XOFF Time-Based XOFF

Sent via New DLLP Ordered-Only Traffic


Bypass Traffic Indirectly
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 28

Switch Egress Link Scheduling


Link VC Transmit Scheduling
Up to 7 BVC Queues, Fabric Management VC Queue, 8 OVC Queues & 4 MVC Queues

Fabric Management VC (#7) is Highest Priority Class of Service Queue (CSQ) Scheduler
VC Queues Serviced Based on Configured Weightings Constrained by CBFC & SBFC

Optional Normative
Minimum Bandwidth Scheduler VC Arbitration Table Scheduler

Can be Vendor-Specific
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 29

Switch Egress Scheduling


Packets From SEs Switching Function
DLLPs Fabric Mgmt Channel (1 Queue) Unicast Multicast Ordered-Only Unicast Bypassable Unicast A CSQ for a BVC is Comprised of an Ordered and A Bypassable Sub-Queue

FMC Queue (TC7) DLLP Queue Egress CSQs

Ordered

Bypassable

FMC Rate Limiter


DLLPs

Minimum BW Inner CSQ Scheduler

Strict Priority Scheduler Optional SBFC Feedback Link Layer Credit Availability

Strict Priority Outer CSQ Scheduler

CSQ
This Portion of the Scheduler Provides ByPassable Queue Strict Priority Service Over the Ordered Queue (as Long as Bypassable Link Credit is Available)

MinBW Scheduler
Strict Priority Scheduler TLPs

Egress Link Architecture


Copyright 2004, PCI-SIG, All Rights Reserved 30

PCI-SIG Developers Conference

Packets To Egress Link

Endpoint Injection Rate Limiting


Connection Queues Provide Flow Isolation Multiple Granularity Options for CQs
One CQ Per TC One CQ Per TC/Destination Pair

Each CQ is Associated with a Token Bucket


Token Buckets Limit Packet Flow Rates

Enables Fine-Grained Rate Adaptation Maximum of 64K CQs

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

31

Endpoint Queuing & Scheduling


AS Transaction Layer Processing
Optional SBFC Per-TC Status Table Per-TC Status Ingress Header Processor Ingress Packet Payload Connection Queues and Token Buckets
Per-TC Queues Token Bucket

AS Header Path

Source
Packet Handle

Mapping Table

Connection Queue Select Ingress AS Packet

TC0-7
Per-TC Queues Token Path Bucket

AS TLPs
Source Scheduler

Packet Handle
Reverse Mapping

TurnPool & TC

TC0-7

PCI Express Link Layer Processing

PCI Express Physical Layer Processing

To/From Switch Element

CM State Machine

Queue Select

Per-TC Queues Path Token Bucket N

1 or More Lanes

TC0-7
SBFC Feedback

CBFC Feedback (Credit Exhausted Indicators) Destination Scheduler Per-TC Queues @ Egress AS TLPs

Destination
Egress Packet

DeMux Egress AS Packet

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

32

Agenda
Introduction Core AS Architecture Protocol Interfaces Configuration Structures Software & Management

PCI Express

Advanced Switching

Star
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved

Dual Star

Mesh
33

Protocol Encapsulation Layer


Protocol Interface (PI) Headers Enable Transport of Multiple Protocols Through Same AS Fabric
Protocol Interface PI-0 PI-1 PI-2 PI-3 PI-4 PI-5 PI-6 & PI-7 PI-8 to PI-95 PI-96 to PI-126 PI-127 Description Multicast Path Building Flow Identification for Congestion Management Segmentation and Re-Assembly (SAR) Reserved for Future AS Fabric Management Interfaces Device Management Event Reporting Reserved for Future AS Fabric Management Interfaces ASI-SIG Defined PIs Vendor Defined PIs Invalid

Chaining PIs

Several Protocol PIs are Defined by ASI-SIG


PCI Express (PI-8), SDT (PI-9), SLS (PI-10), SQ (PI-11)
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 34

PI-0 Multicast
3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 9 8 7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 F E C N Credits Required Turn Pool Multicast Group Index Origin Specific Data Secondary PI Traffic Class P C P R C PI (0000000b)

Header CRC

Turn Pointer

00b

FECN: PCRC: P: PI: R:

Forward Explicit Congestion Notification Payload CRC Perishable (Discard Eligibility) Protocol Interface Reflected

Turn Pool & Turn Pointer Built Along The Way


PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 35

PI-2 Segmentation & Reassembly


SAR Required When PDU Size > Fabric MPS PI-2 Defines Standardized Format
Support for In-Order & Out-of-Order Applied to Other PIs via Chaining PDU Trailer Added for Integrity Check

An Endpoint Function, Switches do not SAR


Ethernet 1518 Bytes Port 1
1 of 4 2 of 4 3 of 4 4 of 4

No SAR
TDM

Ethernet

TDM

SAR
TDM TDM

AS Switch Fabric
Port 2

In Order SAR
Port 3 TDM 1 of 4 TDM 2 of 4 TDM 3 of 4

Out of Order SAR


TDM 4 of 4 TDM 2 of 4 TDM 1 of 4

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

36

PI-4 Device Management


Load/Store Oriented Configuration/Control
Used to Read/Write Device Control Structures

Privileged Operation Three Packet Types


Read Request Read Completion Write Request

Supports Several Transfer Sizes


Masked Bytes Within Single DWord
(Any One, Two, or Three Bytes)

Full DWords Blocks of Two to Eight DWords


PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 37

PI-5 Events
Generated by Devices (Switches & Endpoints) Report Exception Conditions or Request Attention
e.g., Interrupts, Status Reports, Errors

Two Major Types


Events with Configurable Fixed Destination
Generally Sent to a Fabric Manager or Supervisor

Return-to-Sender (RtS) Events


Response to Sender about a Packet Contains Headers from Original Packet

Always Backward Routed to Identify Where Generated Short & Long Packet Formats

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

38

PI-5 Event Packet Formats


Short-Form PI-5 Events
3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 9 8 7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 0 V R ETI Physical Port # Sub-Class Code Class Code Event Vector or Event-Specific Data

Long-Form PI-5 Events


3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 9 8 7 6 5 4 3 2 1 0 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 1 V R ETI Physical Port # Sub-Class Code Class Code Event Vector or Event-Specific Data Event Data Event Data Event Data Event Data

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

39

PCI Express Encapsulation (PI-8)


PCI Express Devices Communicate Transparently
Tunneled via AS Fabric No Changes to PCI Express Devices or Drivers

Dynamic Binding of Subsystems to Controlling Agents


Bus Number to Path for Configuration Accesses Local Memory Aperture to Path for 32/64-bit Memory Accesses Local I/O Aperture to Path for 32-bit I/O Accesses

New Native AS Software is Required to Set Bindings


Can Reside Locally or Elsewhere Within AS Fabric

Bindings Cause Standard Hot Plug Events

Stay Tuned for Lots More Detail on PI-8 Later


PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 40

PI-9 Socket Data Transfer (SDT)


Socket (Queue Pair) RDMA Model
Connection-Oriented Supports Push & Pull (i.e., Write & Read) Efficient Direct Data Placement

Low-Overhead Inter-Processor Communication Channels


Handles Define Bound Ends of Peer-to-Peer Connections Buffers are Published Locally to Handles using Descriptors Engines Exchange Data Between Bound Pairs of Handles
Descriptors Define and Conceal Local Data Placement at Each End All Data Movement & Transfer Synchronization is Offloaded from CPU

Point-to-Multipoint Replication via Multicast

Clean Peer-to-Peer Transactions Between Address Domains


No Apertures or Memory Addresses Exposed

Secure, Application-Level Interfaces Up to 64K Connections Per Endpoint


PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 41

RDMA Using PI-9 (SDT)


Source
Memory
Application
Data Data Descriptor List Handle Array

RDMA
SDT

Destination
Memory
Handle Array Descriptor List

AS Fabric SDT

Descriptor List

Descriptor List

Buffer

PCI-SIG Developers Conference

Buffer

Copyright 2004, PCI-SIG, All Rights Reserved

42

PI-10 Simple Load/Store (SLS)


Native AS Load/Store Service Aperture-Based with Local Address Translation
Independent Address Domain Per Aperture Per Endpoint Up to 4K Apertures Per Endpoint

Simple Memory Read/Write Semantics


Read, Posted, Acknowledged, Sequenced & Multicast Write

Fully Topology-Independent
Any Endpoint to Any Endpoint Multiple Simultaneous Peers Per Aperture (Shared Device I/O)

Multiple Options for Robustness & Security


Path Protection Access for Specific Source Endpoints Range Protection Access Within Range at Target Node Key Protection Packet Contains Assigned Access Key

PCI Compatible
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 43

SLS Direct BAR Mapping


Multiple Independent Address Domains
4GB 4GB

Read or Write to PCI Address Location Falls Within BAR Memory Region

BAR N

BAR 0 1GB

AS Fabric

1GB

Agent A Local Memory

Unique 32-bit PCI Address Space

Unique 32-bit PCI Address Space

Agent B Local Memory


44

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

SLS Aperture Usage


Memory Aperture 1 Aperture 2 Aperture 3 Aperture 4
Endpoint Memory Endpoint Memory Endpoint Memory Endpoint Memory Endpoint Memory

Endpoint

Memory Aperture 1 Aperture 5 Aperture 3 Aperture 4

AS Fabric

Endpoint

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

45

PI-11 Simple Queuing (SQ)


Datagram-Style Communication, with Push & Pull
Supports Both Unicast & Multicast Supports Path Protection & Access Keys Like SLS

Up to 4K Push & 4K Pull Queues


Per Endpoint
Enqueue

Push Queues

PUSH
Push Queues
Enqueue ACK

Target

AS Fabric

Dequeue Request

Pull Queues

PULL
Pull Queues
PCI-SIG Developers Conference

Target
Dequeue Response

SQ-Enabled AS Ports or Bridges


46

Copyright 2004, PCI-SIG, All Rights Reserved

Agenda
Introduction Core AS Architecture Protocol Interfaces Configuration Structures Software & Management

PCI Express

Advanced Switching

Star
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved

Dual Star

Mesh
47

Configuration Structures
Supported by All Endpoints & Switches Rich Set of Standardized & Normalized Capabilities
Streamlines Management of Diverse Constellation of Devices Designed to be Extensible

Local or Through-Fabric Access via PI-4


Enables Remote Device Configuration & Control

Access Control Features


Multiple Apertures for Partitioning Write & Read Privileges Per Aperture Fabric-Wide or Granted to Specific Devices

Includes Communication and Synchronization Features


Scratchpads, Semaphores & Doorbells
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 48

Device Configuration Spaces


Aperture 0
Region 1 00h 3Fh 40h

Apertures 1-N
Region 2 00h

Device Header Capability 1 Data Capability 2 Data PCI Cap Record


Region 2 FFh 100h

AS Capability 1 AS Capability 2 AS Capability 3 AS Capability N Capability N Data

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

49

Device Header
31 16 15 87 0

Device ID Reserved Class Code Reserved

Vendor ID Revision ID

00h 04h 08h 0Ch

Reserved Subsystem ID Subsystem Vendor ID Reserved Reserved Capability Ptr Reserved Reserved Reserved Or PCI Revision 2.3 Capability Records
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved

28h 2Ch 30h 34h 38h 3Ch 40h FCh


50

Capability Structure Chaining


31 20 19 16 15 0

Next Cap. OffsetAS Capability ID Header Ver. AS Capability ID Local Write Flags Local Read Flags Capability Structure Content

00h 04h

31

20 19

16 15

Next Cap. Offset

Ver.

AS Capability ID

Capability Structure Content Capability-Specific Data


31 20 19 16 15 4 3 0

00h 04h

000

Ver. AS Capability ID Capability-Specific Data Capability Structure Content Capability Table Pointer
Copyright 2004, PCI-SIG, All Rights Reserved

AP

00h 04h 06h


51

PCI-SIG Developers Conference

Protocol Interface Capability


31 16 15 14 11 8 7 6 4 3 0

AS Capability ID Header SW Entry Size R # SW Entries R # HW Entries Hardware PI Table Pointer AP Software PI Table Pointer AP
31 8 7 6 4 3 0

00h 04h 08h 0Ch

Protocol Identifier Protocol Identifier Hardware PI Table Pointer

R R

PI PI AP

31

8 7 6

4 3

Protocol Identifier Protocol Identifier Software PI Identifier Protocol Table Pointer

R R R

PI PI AP PI

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

52

Configuration Space Permission


31 16 15 12 11 4 3 0

AS Capability ID Header Reserved # of Entries Local Write Flags Local Read Flags Global Write Flags Global Read Flags CSP Table Pointer AP

00h 04h 08h 0Ch 10h

31 30

16 15

87

E E R E R R

Reserved Ingress Port Reserved Ingress Port Reserved Pool Turn Ingress Port Turn Pool Path Write Flags Turn Pool Path Read Flags

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

53

Agenda
Introduction Core AS Architecture Protocol Interfaces Configuration Structures Software & Management

PCI Express

Advanced Switching

Star
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved

Dual Star

Mesh
54

AS Architectural Elements
AS
END POINT END POINT

Endpoints
Touched by Software Can Host Software Might Manage Fabric

AS
END POINT

AS

Switches
END POINT

Touched by Software

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

55

Fabric Initialization Overview


Phase 1: Post-Reset & Initialization (Hardware)
Link Training, Node Initialization, Credit Exchange

Phase 2: Master Election (Hardware)


Blind Broadcast (PI-0:0) Candidate Masters Negotiate for Ownership of Fabric or Sub-Fabric Spanning Trees All Devices Configured with Spanning-Tree Owner Many-to-Many Relationship of Fabrics to Masters

Initialization Initialization Master Master Election Election Discovery Discovery & & Configuration Configuration

Phase 3: Discovery & Configuration (Software)


Fabric Managers Discover Fabric Using PI-4 Reads Devices Configured Using PI-4 Writes
(e.g, Permissions, Event Routing, etc.)

Any Node Can Perform Independent Discovery


(Based on Configuration of Permission)

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

56

Time

Fabric Initialization Overview


Phase 4: Fabric Management (Software)
All Management Models Supported
Centralized, Distributed, Hybrids Initialization Initialization Master Master Election Election Discovery Discovery & & Configuration Configuration Fabric Fabric Management Management

Any Node Can Make Path Decisions Multicast Group Management


Creation, Deletion, Join, Leave

Fault Management
HA, Redundancy, Fail-Over / Take-Over

Policy Management & Enforcement Performance Management


e.g., Load Balancing

Congestion Management

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

57

Time

Fabric Manager
Privileged Fabric Entity
Selected via PI-0:0
Fabric Owner
(FMGR)

Known to All Switches and Endpoints


Spanning Tree Owner (ST[0], ST[1]) Fabric Representative for Attached Fabrics

Discovery & Configuration via PI-4 Multicast Group Management Supervision & Maintenance via PI-4/PI-5 Failover & Redundancy Coordination Source for Fabric Support Services
(e.g., Event & Topology Services)

AS

EP

EP

AS

EP

Fabric-Wide Responsibility
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved

EP
58

Fabric Discovery Process


Exploration via PI-4
Read-Only Iterative Breadth (or Depth) First Limited by 31-bit Turnpool Reach
Fabric Owner
(FMGR)

1 2

Knowledge Gained
Identification of Devices Characteristics & Capabilities of Each Device Fabric Topology
All Paths Between Device Pairs
EP

AS

EP

4 6

3
AS

Required to Compute Routes Through Fabric

EP

5
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved

EP
59

Host Endpoint Software Stack


Platform Management Infrastructure Fabric Management Functions Applications Applications TCP/IP Stack Interface AS Device Driver

Application Interfaces AS Device Driver AS Device Driver

Exposed Advanced Switching Services APIs PCI PCI PCI AS Portal Device Driver Device Driver Device Driver Device Driver Advanced Switching Portal Driver
Software Hardware

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

60

Core Software Structure


End-User Applications
User API Application APIs & Libraries API Fabric Management API Device-Independent AS / PI Device Drivers API Device / Platform-Specific Device Drivers API Endpoint Management Services API API

End-User Device-Drivers
User API

Spectrum of Portability
PCI-SIG Developers Conference

61

Copyright 2004, PCI-SIG, All Rights Reserved

ASI SIG AS Simulator


Components Include:
Switches, Endpoints, Data Sources & Sinks, Co-Simulation Interface

Test Performance Corner Cases Evaluate Alternate Implementations & Topologies

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

62

Software Simulator Example


Example trace of the announce packet after reset Running... Releasing reset to all devices Generating PI0:0 Announce packet (960) Owner EUI = 0x00000020 00000000 1.901200044: -- EP_0, Packet(960) -> fabric 1.901200044: -- Switch_0, PI0:0 Announce Packet(960) -> Update ST[0] 1.901200088: -- Switch_0[0,1], Packet(980) -> fabric 1.901200088: -- Switch_0[0,2], Packet(981) -> fabric 1.901200088: -- Switch_0[0,3], Packet(982) -> fabric 1.901200088: -- Switch_1, PI0:0 Announce Packet(981) -> Update ST[0] 1.901200132: -- Switch_1[0,1], Packet(983) -> fabric 1.901200132: -- Switch_1[0,2], Packet(984) -> fabric 1.901200132: -- Switch_1[0,3], Packet(985) -> fabric 1.901200132: -- Switch_2, PI0:0 Announce Packet(984) -> Update ST[0] 1.901200176: -- Switch_2[0,1], Packet(986) -> fabric 1.901200176: -- Switch_2[0,2], Packet(987) -> fabric 1.901200176: -- Switch_2[0,3], Packet(988) -> fabric 1.901200440: -- EP_1, PI0:0 Announce Packet(980) -> Update ST[0] 1.901200440: -- EP_7, PI0:0 Announce Packet(982) -> Update ST[0] 1.901200484: -- EP_2, PI0:0 Announce Packet(983) -> Update ST[0] 1.901200484: -- EP_6, PI0:0 Announce Packet(985) -> Update ST[0] 1.901200528: -- EP_3, PI0:0 Announce Packet(986) -> Update ST[0] 1.901200528: -- EP_4, PI0:0 Announce Packet(987) -> Update ST[0] 1.901200528: -- EP_5, PI0:0 Announce Packet(988) -> Update ST[0]

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

63

Please Return for the Second Half


Advanced Switching

Star

Dual Star

Mesh

Copyright 2004, PCI-SIG, All Rights Reserved

64

PI-8 Technical Review


Principal Engineer Intel Corporation
ASI-SIG PI-8 Workgroup Chair

Joe Bennett

Copyright 2004, PCI-SIG, All Rights Reserved

65

Agenda
Architecture Overview AS Registers for PI-8 Tunneling Mechanism PCI Express Configuration / Hot Plug PCI Express Virtual Channels Power Management Reset / Training PCI Express Errors PCI Express Advanced Switching
Star
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved

Dual Star

Mesh
66

PCI Express*-to-AS Bridge Model


I/O Board I/O Board Processor Board Processor Board Root Complex PCI Express Switch Topology
Host Switch Switch ExpressExpressto-AS toBridge

AS Node

IO Switch Switch AS-toAS- toExpress Bridge

PCI Express Switch Topology

PCIe* PCIe* IO Device

AS Switch Topology I/O Board I/O Board

PCIe* PCIe* IO Device

PCIe* PCIe* IO Device

PCIe* PCIe* IO Device AS Node

AS-toAS- toExpress Bridge

PCIe* PCIe* IO Device PCI Express Switch Topology

AS IO Switch Switch Node

PCIe* PCIe* IO Device

Express-to-AS Bridge spawns virtual PCI Express ports Each virtual port connected through the AS fabric to an AS-to-Express bridge
AS-to-Express bridge connects to other PCI Express device types

Express-to-AS bridge and AS-to-Express bridge bound by AS fabric through a set of binding registers
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 67

Challenge PCI Express Software Transparency


To be software transparent
AS-PCI Express bridges must be identified as a valid PCI Express component
Root port, switch, endpoint, or bridge to PCI/PCI-X/etc.

When devices added/removed from AS fabric, PCI Express software must be notified via a hot plug event to reconfigure the sub-tree

Solution: AS-to-PCI Express bridges are PCI Express switches


Each bridge has full PCI configuration header
Including PCI-PM, MSI(X), Subsystem ID, and PCI Express capability Optionally contains PCI Express Enhanced capabilities

Allows PCI Express software identification and hot plug with no PCI Express software implications

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

68

Express-to-AS Bridge Host Switch


Upstream port connected to a root complex
Through root port, PCI Express* switch, PCI Express* bridge
PCI Express* Link Upstream port PCI Express* PCI Express* to AS Bridge to AS Bridge

Downstream ports connected to one or more AS ports


In the limit, all 256 downstream ports may be connected to a single AS port

PCI-PCI PCIBridge

Virtual PCI Bus

Hot Plug Controllers

PCI-PCI PCIBridge
HPC

PCI-PCI PCIBridge
HPC

PCI-PCI PCIBridge
HPC

PCI-PCI PCIBridge
HPC

Downstream ports (1 minimum, 256 maximum)

PI-8 Formatter PIAS Transaction Layer

Advanced Switching Link

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

69

AS-to-Express Bridge I/O Switch


Advanced Switching Link

Upstream port connected to the AS fabric One or more downstream ports connected to PCI Express ports
For further connections to endpoints or other PCI Express switches

Upstream port

PI-8 Formatter PIAS Transaction Layer

AS to AS to PCI Express* PCI Express* Bridge Bridge

PCI-PCI PCIBridge

Virtual PCI Bus

Hot Plug Controllers Downstream ports (1 minimum, 256 maximum)

PCI-PCI PCIBridge
HPC

PCI-PCI PCIBridge
HPC

PCI-PCI PCIBridge
HPC

PCI-PCI PCIBridge
HPC

PCI Express* Links


70

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

Agenda
Architecture Overview AS Registers for PI-8 Tunneling Mechanism PCI Express Configuration / Hot Plug PCI Express Virtual Channels Power Management Reset / Training PCI Express Errors PCI Express Advanced Switching
Star
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved

Dual Star

Mesh
71

Creating Virtual PCI Express Link


To connect a downstream port of a Host Switch to an I/O switch, a binding register is needed
Creates an AS path to between components

Several bindings are needed in the host switch


One binding per downstream PCI Express port of the Host Switch One for the upstream port of the IO Switch

Binding tables exist in AS configuration space, configured by the AS fabric manager

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

72

Host Switch Capability


# of Routes
B E R R

Reserved Reserved

Capability ID = 0000h P0
Egress

Rsvd

Port 0 Turn Pointer

Port 0 Request Turn Pool Port 0 Check Turn Pool

B E R R

Reserved Port N Request Turn Pool Port N Check Turn Pool

PN
Egress

Rsvd

Port N Turn Pointer

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

73

Host Switch Capability Details


Host Switch is one-to-many mapping
For each implemented downstream port, a binding register is needed to an IO switch

Binding registers map in incrementing fashion


Register set 0 maps to lowest numbered device/function downstream port Register set 255 maps to highest numbered device/function downstream port

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

74

Host Switch Capability Details


When generating request packets, the turn pointer and request turn pool are used When checking request packets from an IO switch, the check turn pool is used as a protection check The egress port dictates which of the ports (up to 4) request and completions use

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

75

I/O Switch Capability


Reserved
B R E B E B E

Capability ID = 0001h Reserved Request Turn Pool Check Turn Pool


Egress

Rsvd

Turn Pointer

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

76

I/O Switch Capability Details


When generating request packets, the turn pointer and request turn pool are used When checking request packets from a host switch, the check turn pool is used as a protection check The egress port dictates which of the ports (up to 4) request and completions use

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

77

Binding
Three methods of establishing Host Switch / IO Switch binding
Hardware (Predetermined configuration via pinstrapping, SROM pre-load, etc.) Third-party AS agent (CPU with AS aware fabric management software) AS aware software running on PCI Express CPU (Using AS portal being defined by PI-8 Specification Team)

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

78

Agenda
Architecture Overview AS Registers for PI-8 Tunneling Mechanism PCI Express Configuration / Hot Plug PCI Express Virtual Channels Power Management Reset / Training PCI Express Errors PCI Express Advanced Switching
Star
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved

Dual Star

Mesh
79

PCI Express* Packet Encapsulation and Extraction


AS Header PCI Express Header Data Source Bridge AS Fabric Destination Bridge

Source Bridge encapsulates PCI Express Packet AS switches route encapsulated packets based on AS path specification Destination Bridge extracts original PCI Express packet
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 80

PI-8 AS Header
Header CRC D Turn Pointer 0 Credits Required T 0 S TC 0 0 PI Turn Pool

Green fields fixed fields for all PI-8 packets


PI (8h) Perishable, Packet CRC, Ordered Only, and FECN must all be 0.

Red fields calculated from PCI Express packet


TC (unmodified, unless TC=7)
This is just the AS TC, the PCI Express TC remains unchanged in the PCI Express header

TS (set for reads and non-posted writes, cleared for posted writes) D (cleared on requests, set on responses)

Yellow fields taken from binding table Blue fields calculated


Header CRC (from constructed header) Credits Required (based upon PCI Express packet length)
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 81

Checks Performed
D bit in AS header determines whether the packet is a request or completion AS Events
AS Malformed Packet (return to sender event)
Perishable bit or OO bit set (all packets) TS field set for completion packets

AS Invalid Turn Pointer (return to sender event)


Turn pool not 0 on request packets Turn pointer does not match RPTR on completion packet

PI-8 Protection Event (sent to fabric manager)


Turn pool does not match CPOOL on request packet Turn pool does not match RPOOL on completion packet

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

82

Transaction Layer CRC


AS header contains a transaction layer CRC similar to PCI Express End-to-End CRC
Called Packet CRC

Packet CRC does not adequately cover the packet between two PCI Express endpoints
PCI Express links still on either side of fabric

End to end CRC, if desired by PCI Express, should use the ECRC field PI-8 specifies that the packet CRC bit must be 0.
Results in AS Malformed Packet Event

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

83

PCI Express Ordering


A path from the host switch to an IO switch must match the path from the IO switch back to the host switch
This ensures the fabric acts as a virtual PCI Express link

If path from host-to-IO switch is different than that from IO-to-host switch, possibilities exist for PCI Express* ordering rule violations Example:
Device writes to system memory, updates an internal flag indicating data written Host reads the flag If completion has different AS path from write, it could be returned to the host before the host write occurs
Switch link congestion, for example

A subsequent read of memory results in stale data Correct bindings is the responsibility of the AS fabric manager
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 84

PI-8: No Chaining
Multicast operations, such as replication of PCI Express* messages, handled by PCI Express* logic of the bridge
Example: reset

No PI-1
No usage model identified to reschedule traffic based upon PI-8 logic congestion.

No PI-2
PCI Express* MPS field must fit within the MPS field of the AS link the PI-8 bridge is attached to
The PCI Express* logic of the PI-8 bridge will therefore break larger packets into correct AS sizes as necessary, ensuring no SAR needed
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 85

Agenda
Architecture Overview AS Registers for PI-8 Tunneling Mechanism PCI Express Configuration / Hot Plug PCI Express Virtual Channels Power Management Reset / Training PCI Express Errors PCI Express Advanced Switching
Star
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved

Dual Star

Mesh
86

Introduction
Link between host switch and I/O switch is virtual
i.e. no link or PHY layer

Detecting the presence of the link must therefore also be virtualized


No DLLPs

PI-8 sequences the connection via the binding enable bits in the PI-8 device PI structure Items to be virtualized
Mechanism to ensure negotiated link speed and width communicated Mechanism to set bit in slot status register of host switch to allow hot plug event to be generated
Either PME or interrupt
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 87

Hot Add to AS Fabric


When a device is added to the AS fabric, the fabric manager is (optionally) notified, so that it may be configured
PI-5 link up event

Once a PI-8 capability is detected, it must be bound to an RC so that PCI Express may configure it Process
1. Program a route into the I/O switch back to the host switch 2. Set binding enable in the I/O switch 3. Program a route into unused host switch port, giving path to I/O switch 4. Set binding enable in the host switch

Setting the binding enable in the host switch kicks off the hot plug hardware process
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 88

Hot Add to AS Fabric


When AS fabric manager sets binding enable bit in the host switch, it passes an AS message to the I/O switch
PI-8 specific PI-5 event Sends the Max Link Width and Max Link Speed from its PCI Express Link Capabilities register

AS Header 0 0 00 0h Physical Port # Reserved 11h Max Link Width 88h Max Link Speed

PI-5 Header

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

89

Hot Add to AS Fabric


When I/O switch receives the PI-5 event, it compares the received max link and speed to its own internal value It updates its negotiated link width and speed to be the lowest common denominator of the two Example
I/O Switch Link Capabilities: MLS = 2.5 Gb/s, MLW = 8 Received event: MLS = gen 2, MLW = 4 I/O Switch Link Status: MLS = 2.5 Gb/s, MLW = 4

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

90

Hot Add to AS Fabric


After updating slot status register, I/O switch sends same message back to host switch
PI-5 Event Contains its copy of the Link Capabilities register

Upon receiving message from I/O switch, host switch does same update in slot status register After updating its link status register, the hot plug may now occur
Host switch updates Presence Detect Status and Presence Detect Changed in its slot status register If these events enabled in PCI Express, software event signaled to RC operating system

This event triggers PCI Express software to configure the new sub-tree
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 91

Hot Add to AS Fabric


Host Switch Start N I/O Switch Start N

Binding Enable Set? Y Send PI-5 Event to IO Switch

Binding Enable Set? Y


PI5 Event Received ?

Timeout ? Y Send PI-5 Timeout event to FM

PI5 Event Received ?

Timeout ? Y Send PI-5 Timeout event to FM

Y Update NLW/NLS, Send PI-5 Event to Host Switch

Y Update NLW/NLS, Send PI-5 Event to Host Switch

End

End
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 92

Hot Add Error


A hot plug is only successful after the bridges have been bound and the max link width/speed PI-5 event has been exchanged A timeout mechanism exists to let AS fabric manager know the link did not negotiate
Called the Link Capabilities Timeout Event Timer based 10ms to 50ms

Event signaled to fabric manager


FM can choose to retry (clear and re-set binding enable) or choose other action

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

93

Hot Remove from AS Fabric Host Switch


When AS fabric manager clears binding enable host switch modifies PC Express* slot status register
Presence Detect is cleared Presence Detect Change is set If enabled via PCI Express software, hot plug software event signaled

When AS fabric manager clears binding enable in IO switch, IO switch resets its PCI Express registers and PCI Express interface
Ensures registers are in a default idle state, allowing fabric manager to hot swap this IO switch into a new host switch
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 94

PCI Express / AS Software Ordering


If AS software comes up first
All connections will be made, PCI Express software will see full trees

If PCI Express software comes up first


Configuration will stop at host switches, as no bound downstream ports will result in unsupported request being returned to PCI Express software Upon AS software binding, hot plug events will cause PCI Express software to re-enumerate the sub-trees

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

95

Agenda
Architecture Overview AS Registers for PI-8 Tunneling Mechanism PCI Express Configuration / Hot Plug PCI Express Virtual Channels Power Management Reset / Training PCI Express Errors PCI Express Advanced Switching
Star
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved

Dual Star

Mesh
96

History
PCI Express virtual channels are not enabled by default as in AS
PCI Express software must explicitly enable them

Concern that if virtual channels not identified to PCI Express software, it may not enable TCs in the endpoints
Unknown variable, different OSes may act differently

Additionally, did not want to report more VCs than actually existed on the AS links
PCI Express software may think it is getting differentiation in traffic that it is not actually getting
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 97

Mapping of PCI Express to AS Virtual Channels


AS can have any number of virtual channels
PCI Express may only have a power-of-2 virtual channel count

Limits what PCI Express may report


AS BVC Count 1 2,3 4,5,6,7 8 Max PCI Express VC Count 1 2 4 8

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

98

Agenda
Architecture Overview AS Registers for PI-8 Tunneling Mechanism PCI Express Configuration / Hot Plug PCI Express Virtual Channels Power Management Reset / Training PCI Express Errors PCI Express Advanced Switching
Star
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved

Dual Star

Mesh
99

Power Management Categories


AS power management
Controls state of the link and the AS device Owned by AS software

PCI Express power management


Link
called Active State Power Management or ASPM Enabled by software, managed by hardware

Device (PCI Power Management)


Owned by PCI Express software PI-8 Specification does not address AS power management PI-8 does not address

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

100

PCI Express Power Management Link (ASPM)


Concept
When link goes idle for some period of time, it enters a lower power state
L0s (low latency) counter value is specified L1 (higher latency) is implementation specific

On a switch, if all downstream ports idle and in low power state, upstream port may go into low power state

Mechanism involves DLLPs and TLPs for entry and exit

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

101

PCI Express Power Management Link (ASPM)


PI-8 chose not to implement this functionality
AS links are not power managed by PCI Express software The messaging required would be complex
New messages for DLLPs would be required for L0s and L1 entrance requests, and the exit request

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

102

Implications of Not Implementing PCI Express ASPM


I/O Switch - None
PCI Express ports may still go into L0s/L1 based upon timers Upstream port is AS, and cannot physically enter PCI Express L0s/L1

Host Switch
No downstream effects AS fabric port Upstream port does not enter L0s/L1
Therefore, PCI Express subsystem above host switch will not enter L0s/L1 via ASPM

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

103

PCI Express Power Management Device PM


All PI-8 devices are reqiured to implement the PCI-PM capability structure
Required by PCI Express specification

Entering a lower power state has two effects


First, puts the link into a lower power state Second, allows device to enter lower power state
i.e. gate clocks, shut any internal PLLs, etc. Does not mandate

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

104

PCI Express Power Management Device PM


As with ASPM, the AS links do not enter a low power state when device put into a low power state
Physical PCI Express links must enter low power state as per PCI Express specification

Bridges may opt to go into a low power state, but AS functionality must not be compromised

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

105

Agenda
Architecture Overview AS Registers for PI-8 Tunneling Mechanism PCI Express Configuration / Hot Plug PCI Express Virtual Channels Power Management Reset / Training PCI Express Errors PCI Express Advanced Switching
Star
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved

Dual Star

Mesh
106

Reset
A bridge may be reset, (with its accompanying link), by programming the secondary bus reset register in PCI configuration space In PCI Express, a reset is recognized on the link via an electrical change PI-8 creates PI-5 events to virtualize this

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

107

Reset Host Switch


If receives an electrical reset from PCI Express
Host switch creates PI-5 reset events for each bound downstream port Host switch resets PCI configuration registers in the upstream and each downstream port

If secondary bus reset bit set in upstream port


Host switch creates PI-5 reset events for each bound downstream port Host switch resets PCI configuration registers in each downstream port

If secondary bus reset bit set in downstream port


Host switch creates single PI-5 reset event for the downstream port (if it is bound)

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

108

Reset I/O Switch


If receives a PI-5 reset event
I/O switch resets PCI configuration registers in the upstream and each downstream port I/O switch electrically resets the downstream links

If secondary bus reset bit set in upstream port


I/O switch resets PCI configuration registers in each downstream port I/O switch electrically resets the downstream links

If secondary bus reset bit set in downstream port


I/O switch electrically resets the single downstream link
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 109

Training
A link is trained during initial bring up, and by PCI Express software (through the link control register) Training of virtual links attached to AS links does not occur
When a link is bound (via the binding enable) it is automatically considered trained If PCI Express software sets the start training bit, the training complete bit is automatically set, and no communication occurs with other side

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

110

Agenda
Architecture Overview AS Registers for PI-8 Tunneling Mechanism PCI Express Configuration / Hot Plug PCI Express Virtual Channels Power Management Reset / Training PCI Express Errors PCI Express Advanced Switching
Star
PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved

Dual Star

Mesh
111

Introduction
PCI Express Errors are TLPs encapsulated on AS
Completion codes PCI Express messages

Errors must be logged in the appropriate P2P bridge of the host or I/O switch
i.e. either in the downstream or upstream port, as appropriate

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

112

Want More Info on Advanced Switching?


Intel Developer Network for PCI Express Architecture
http://developer.intel.com/technology/pciexpress/devnet/comms.htm Information on Intel Industry Enabling

ASI-SIG Web Site


http://www.asi-sig.org/join Specification Documents and Working Groups

Advanced Switching

Star

Dual Star

Mesh

Join The ASI-SIG


PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 113

Thank you for attending the PCI-SIG Developers Conference 2004.

For more information please go to www.pcisig.com


PCI-SIG Developers Conference Copyright 2004, PCI-SIG, All Rights Reserved 114

Advanced Switching Overview


Seth Zirin
Principal Engineer Intel Corporation
ASI-SIG FMS WG Chair

Joe Bennett
Principal Engineer Intel Corporation
ASI-SIG PI-8 WG Chair

Copyright 2004, PCI-SIG, All Rights Reserved

115

PCI-SIG Developers Conference

Copyright 2004, PCI-SIG, All Rights Reserved

116

Das könnte Ihnen auch gefallen