The underlay hardware fabric is controlled by an application (called 'segmentrouting') in ONOS. It interacts with a number of other applications like vRouter, vOLT and multicast applications to provide CORD services. Briefly, the CORD network architecture involves the following:
SDN based Leaf-Spine fabric built with bare-metal (OCP certified) hardware and open-source switch software. This is a pure OF/SDN solution—while we do use ARP, MAC, VLAN, IPv4, MPLS, VXLAN and other data plane headers, there is no usage of any of the distributed protocols found in traditional networking within the fabric. A non-exhaustive list of all the control-plane protocols we do not use ‘inside the fabric’ includes: STP, MSTP, PVSTP, RSTP, LACP, MLAG, OSPF, IS-IS, Trill, RSVP, LDP and BGP.
The fabric has the following characteristics:
L2 switching within a rack handled at leaf-switches (ToRs).
L3 forwarding across racks using ECMP hashing and MPLS segment routing.
vRouter integration for interfacing with upstream metro-router, providing reachability to publicly routable IP addresses.
VLAN cross-connect feature to switch QinQ packets between OLT I/O blades and vSG containers (R-CORD feature).
IPv4 multicast forwarding and pruning for IPTV streams (with vRouter integration) from upstream router to residential subscribers (R-CORD feature).
XOS integration via REST API calls to dynamically configure end-hosts and VLAN cross-connects at runtime.
Ability to use the fabric in single-rack or multi-rack configurations.
The fabric forms the underlay network in an overlay/underlay architecture. The overlay (sometimes referred to as the outer fabric) is also SDN based, with the following characteristics:
Use of software-switches (eg. OvS with DPDK) with a custom-designed pipeline for service-chaining.
Distributed load-balancing per service in each OvS.
VxLAN tunneling in OvS for overlay-based virtual networks.
The biggest advantage of common SDN control over both the overlay infrastructure as well as the underlay fabric is that they can be orchestrated together and optimized to deliver the features and services that Central Offices require, with the agility and economies of datacenter operations.
For a full list of fabric features and development status, click here.
Fabric Hardware & Software
The switches in the fabric leverage software and hardware from a number of other open source projects, as shown in Figure 2.
Figure 2. Fabric Software and Hardware.
The leaf and spine switches are identical Accton 6712 (OCP-certified) switches with 32 40G ports. The only difference is in their usage. The leaf switches use 24 of the 32 ports to connect servers, access blades and metro equipment, reserving 8 of the ports to connect to spine switches. The spine switches use all of their ports to connect to the leaves. There is nothing sacred about this arrangement. In smaller deployments, more leaf ports can be used to connect servers, and in the extreme case, the spines can be eliminated altogether. Similarly, other deployments can use different switch models with 10G or 1G ports (eg. Accton 5712 or 5710 switches). The only requirement is the support of the switch software stack, which we describe next.
A key aspect of this work is that both leaf and spine switches run exactly the same open source switch software stack. This stack was first proposed and integrated in the ONF Atrium project , and includes ONL and ONIE as the switch operating system and boot loader. It also includes Broadcom’s OF-DPA software , which opens up a number of key APIs from their proprietary SDK API. OF-DPA presents an abstraction of the switch ASIC pipeline (forwarding tables and port-groups) in OpenFlow terms, so that an OF 1.3 agent like Indigo can be layered on top of the API. With this layering, an external controller (like ONOS) can then program all the forwarding tables in the switch ASIC thereby leveraging the full capabilities of today’s modern ASICs. Figure 3 shows a simplified view of the OF-DPA pipeline.
Figure 3. Fabric Chip Pipeline (Broadcom’s OF-DPA).
Fabric Operation as an Underlay
For transporting VxLAN encapsulated traffic between services in the overlay network, the fabric acts as an underlay. The fabric control application’s job is to effectively utilize the cross-sectional bandwidth provided by the leaf-spine architecture. For intra-rack traffic between servers in the same rack, the fabric does regular L2 bridging. For inter-rack traffic between servers in different racks, the fabric does L3 routing (with MPLS). In other words, the underlay fabric behaves like an IP/MPLS network that routes traffic between attached subnets, where each rack has its own subnet. Traffic within a subnet (and therefore within a rack) is L2 bridged.
Figure 7. Fabric as an underlay.
With reference to Figure 7, each rack has its own subnet 10.200.x.0/24. Servers in the same rack are assigned (infrastructure) IP addresses in the same subnet (for example 10.200.1.11 and 1.12 in rack 1). Traffic between these servers is bridged by the ToR (leaf switch) they are connected to (s101 in Figure 7). Traffic destined to another rack gets routed by the same ToR. Here we use concepts from MPLS Segment Routing, where we attach an MPLS label to designate the ToR switch in the destination rack, and then hash the flows up to multiple spines. The spines do only MPLS label lookups to route the traffic to the destination ToR. A packet walk-through is shown in Figure 8.
The fabric design choices come out of a desire to have forwarding separation between edge and core (in this case the leaves and spines). Ideally, for a multi-purpose fabric, it is best if the core remains simple and ‘same’ for all use-cases, and only the input to the core changes on a per-use case basis. In our case, this means that the core forwards traffic only on the basis of MPLS labels, whereas the input at the leaf switch can comprise of IPv4, or IPv6, or multicast, or MAC, or VLAN, or VLAN+MAC or QinQ or even other MPLS labels.
Figure 8. Overlay and Underlay packet walk-through.
The fact that we use MPLS labels to achieve this edge-core (leaf-spine) separation is immaterial. We could have just as well used VLAN tags. However in our experience with current ASICs, it is easier to use labels instead of tags, as they are treated differently in current ASIC pipelines and there are different rules to follow and associated limitations in usage.
Furthermore, Segment Routing is just a convenient way to use MPLS labels. SR gives us the concept of globally significant labels which we assign to each leaf and spine switch. This leads to less label state in the network, compared to traditional MPLS networks where locally-significant labels have to be swapped at each node. In addition, by default, SR requires the use of ECMP shortest paths, which fits well with the leaf-spine fabric, where leaf switches ECMP hash traffic to all the spine switches. Finally, SR is source-routing, where we can change the path traffic takes through the network, by simply changing the label assignment at the source (leaf) switch - there is only one switch to ‘touch’ instead of the entire ‘path’ of switches. This is how we have performed elephant-flow traffic-engineering in the past using external analytics .
In addition, an overlay/underlay architecture simplifies network design, as it introduces a separation of concerns - the underlay hardware fabric can be simple and multi-purpose without knowing anything about virtual networks, while the overlay network can be complex but implemented in software switches where more sophisticated functionality like service-chaining and distributed load-balancing can be achieved. Importantly, this architecture does not mean that we cannot have the overlay and underlay network interact with each other and gain visibility into each other. This is especially true in an SDN network where we have the same controller-based control plane for both overlays and underlays.
 Atrium: A Complete SDN Distribution from ONF. https://github.com/onfsdn/atrium-docs/wiki
 OpenFlow Data Plane Abstraction (OF-DPA). https://www.broadcom.com/collateral/pb/OF-DPA-PB100-R.pdf