SDN - Not What You Think It Is

Overview

Broadly speaking, SDN refers to the abstraction of the control plane from the data plane. The implications for this abstraction — the benefits as well as the challenges — are profound, and represent the next iteration of what data center and transit networks will look like. Indeed, the vast majority of large networks today are built on and operated by SDN. CDNs like Akamai and hyperscalers like AWS literally could not function without it. But before we get into those details, it’s helpful to take a step back to understand what the ‘control' and ‘data’ planes are, and what it means to pull the two apart.

 

The control plane is where your routing decisions happen. It’s the ‘brain’ of your network. Traditionally these decisions are made by routing engines (supervisors, in Cisco language). In chassis-based systems like Juniper’s MX or Cisco’s Nexus platforms, these are the line cards that slot into the device. They have hard drives, system memory, and run an operating system like Junos, IOS, or NxOS. They have kernels, APIs you can call, and run daemons that handle things like chassis management, routing protocols, and encryption programs to handle IKE or IPsec. When you connect to a router or switch via SSH, it’s the routing engine you’re talking to. On fixed platforms (sometimes lovingly referred to as ‘pizza boxes’), these are not separate line cards that you can add or remove, but are instead built into the device and presented as a single unit. When a new route is obtained, the control plane is the part of the device that learns that route.

Think of the control plane as the management team of a factory, deciding what orders to issue (routes to take) to the workers.
The data plane is what actually moves traffic from one port to another. Learned routes from the control plane are programmed into the data plane. When traffic moves from one port to another, most of the time the control plane isn’t involved, unless it’s a special type of traffic, like IKE or ICMP. In a chassis-based system (like the above examples), these too are line cards that you can add or remove. There are a lot of different models here, and they support everything from copper connections at 1Gb and 10Gb, to fiber connections at 1Gb, 10Gb, 100Gb, and 400Gb. In a fixed-form device, such as a Juniper QFX-5110 or a Cisco Catalyst 3750, these are not removable. A hardware failure here usually necessitates replacing the entire device.

Think of the data plane as the workers in an assembly line. They don’t make decisions, their job is to just move things (packets) as quickly as possible.
 

Traditional Networking

Traditional networking operates in a monolithic function, where the control and data planes are paired together into a single device. In a chassis, the routing engine is plugged into the same physical unit as the line cards that move the traffic from one port to another. In a fixed-configuration, they exist in the same unit. You may have automation that periodically takes backups of your running configuration, or that assists in the provisioning of new devices for deployment, and routing decisions between devices can be coordinated (such as when devices peered via BGP advertise or withdraw routes they’ve learned, or when MPLS traffic engineering builds new paths within the network), but this is not the same thing as abstracting the control and data planes. This is not SDN. Routing and policy decisions are still happening locally. Unless all of your devices are peered together (and even then you’re subject to convergence and signaling delays), the local routers at each site may not necessarily be aware of the status of links at the remote sites, especially downstream.

 

diagram

Traditional networking has our factory managers and workers mingling together on the factory floor, which represents a more tactical approach.

Software Defined Networking

When we take a step back from the network and begin to understand the big picture, not only can we can start to make more intelligent decisions about what actions to take, but we can change when, how, and where those decisions take place. This addresses scalability, velocity, deployment, and operational management constraints imposed by traditional networking designs. Your disaggregated controller has a view of all links globally. We may likely still leave local IGP (interior gateway routing protocols, such as OSPF) decision-making in place, but a more high-level status is signaled to and understood by our disaggregated control plane. But in addition to more intelligent and pro-active routing decisions, SDN allows us to, through software, automate provisioning, scaling, operations, and other tasks that humans would normally have to undertake.

diagram
×

SDN enables us to move the factory managers from their previous position of mingling amongst workers on the floor to an elevated one, with a more global perspective looking down not only at one factory as a whole, but all of our factories at once, across the globe. This enables a strategic approach.

With SDN, planned or emergent changes on the network can occur before anyone even notices a problem, responding to events not just in real-time, but in advance.

Example

Suppose you know that during a busy time of the year, the demand for network capacity spikes higher than what you can sustain. Or, sometimes your network sees unexpected demands on the network, putting your operators in a reactive position. Unfortunately, they’re not ahead of the demand, and your users see latency or packet loss as a result. Through SDN, we can automate the decision-making process on how to scale, when, and under what circumstances. As your alarming mechanisms start to notice increased bandwidth demands, traffic can be balanced or re-routed before anyone even notices congestion. In a planned scenario, you can decide to schedule changes in your network at a specific time and date. You’ve made the decision beforehand, so your network adjusts automatically when necessary. The auto-scaling feature of EC2 instances in AWS is an example of this.
What are the actual tools that operators use that can make the planned or emergent changes? Some vendors like Cisco Meraki or SilverPeak build this functionality into their software. Other networks take advantage of frameworks like Ansible or Python to make changes. Whatever the tool, at some point in your network (or in ‘the cloud’), there is a ‘brain’ that sits at the center of it, taking in data streams, SNMP, syslog, Netflow, or other kinds of metrics to understand the overall picture of the network. Changes can be executed by scripts (like bash), programming languages (like Python), or frameworks (like Ansible) based on received telemetry or events.

Example

A core router sitting at your Dallas location may not be aware of the sudden spike in traffic or packet loss from your core router in New York, but your centralized management software that’s ingesting data from all of your sites will. This may be exposed to you visually by tools such as SolarWinds, or PRTG. It can then decide that it’s time to migrate traffic from one link to another. A sudden uptick in Netflix traffic from all the college kids returning home might start to choke one of your WAN links in New York, but through SDN, you can intelligently steer that specific kind of traffic to another link, keeping plenty of bandwidth available for your critical link to Dallas.
Because we’ve abstracted the control plane away from the data plane, as long as the control plane can still signal the data plane, it doesn’t really matter where your control plane lives. If you’re a Cisco Meraki user, this intelligence lives in their cloud. If you’re rolling your own solution, this may sit in a cluster of virtual machines on a local VMware cluster, or on a fleet of instances in AWS' EC2.

Example

Your network engineers can write Python scripts or Ansible playbooks to make decisions based on triggers defined on your collectors (the systems receiving various kinds of telemetry or metrics, such as syslog or SNMP — think SolarWinds, or PRTG). These will often run on Linux servers of some sort, and will be executed from there, whether or not they need to pass instruction to local agents that live on your devices. A master Ansible server can run on a Linux instance in EC2 and through an AWS VPN or AWS DirectConnect connection to your datacenter, invoke configuration changes along to your routers and switches.

 

What does SDN offer?

 

Reliability
Instead of relying on humans to make changes, who are slow and error prone, you can guarantee that the changes you make happen exactly the way you want and when you need, every single time.

Speed
When combined with telemetry collected from your devices, you can make necessary changes within your network before a human even knows something is wrong. You can also push necessary changes out to tens of thousands of devices in a reasonable amount of time. Would you want to log into the CLI of 150,000 switches and update a TACACS server address, or push new configurations to 5,000 devices you just unboxed? I wouldn’t.

Repeatability
Define a standard template that needs to get pushed, and your changes happen exactly the same way, every time. It’s as simple as that.

Intelligence
Get your managers off the factory floor, and give them a holistic view of your network. SDN allows the transition from a local, tactical perspective to a global, strategic one.

 

 

SDN allows the transition from a local, tactical perspective to a global, strategic one.

 

The Dangers of SDN

 

The power of SDN is that you can can push a change to 10,000 devices across your global fleet all at once. The danger of SDN is that you can push a change to 10,000 devices across your global fleet all at once. Any time you introduce automation, you need to think very carefully through each granular step. Here are the things you need to keep in mind at all times:


What am I changing?

Why am I changing it?

When is it changing?

How do I know the change is safe?

What’s the risk of doing this change? What’s the risk of not doing it?

How can I make this change in a safe manner?

How can I guarantee my change is safe?

What do I do when my change breaks something?

How do I roll back, and how do I know when to roll back before disaster occurs?


There’s a whole design methodology around automation that is perhaps beyond the scope of this article, but when you’re ready for SDN, you do need to be thinking of these kinds of things before you jump right in.

 

 

Summary

SDN means something different for everyone because the needs of different networks are not all the same. For some, intelligently balancing traffic across multiple links is an issue. For others, integration of on-premise equipment with cloud providers has to come into consideration. Some networks can take advantage of SDN to speed provisioning of new devices, or make dynamic configuration changes and keep those changes documented and up-to-date with change control and approvals and integration with Git, enabling continuous integration/continuous deployment. Whatever the case, it’s clear that SDN — whatever that means in practice for your network — is a new fact of life.

 

Have a question? We are here to help.

Our customers trust us with their data center, network, and cloud services.
If you are looking for robust, scalable infrastructure, our engineers can help find the tailored solution.