Interoperability isn’t guaranteed in the open network era

Over the last few years, the phenomenon of open networking has changed the landscape of the network and datacenter industries. Led by a vanguard of hyperscale Internet companies, open networking is now being adopted by smaller enterprise users.

Customers are drawn to open networking by the flexibility and potential cost savings, but are under the false assumption that open solutions are inherently interoperable. This is not true. An open solution and an interoperable solution are not the same thing. In fact, there are issues inherent in open networking that make interoperability testing more critical than ever.

Consider the following examples of interoperability problems discovered at a recent 10/40G, 25/100G Plugfest:

  • The electrical parameters of each port on a switch can be configured in the silicon. Amplitude and pre-emphasis on the transmitter can be tuned for a particular board layout. If that same silicon is used on a different board layout, perhaps with longer or shorter traces from the Switching ASIC out to the front panel, the electrical parameters need to be adjusted. Switch vendors typically spend a great deal of time tuning in the best ‘recipe’ of electrical parameters for each board layout. Depending on the switching ASIC this capability may be available to the NOS. Thus, a third party has the responsibility to do this tuning. NOS vendors may not be familiar with the board layout of every switch, or may not have access to samples of the switch to do the tuning themselves. In this case they may choose a default set of parameters that has been ‘good enough’ in most situations. However, when a marginal optical module or cable is plugged into that port, the silicon may not be tuned properly to actually bring up a link. No link, no network.
  • Certain combinations of whitebox hardware and software would not enable ports to use direct attach copper cables (DAC cables). This was an issue with what the NOS was expecting to read in the EEPROM of the DAC.
  • Certain combinations of whitebox hardware and software were observed to only enable ports if the expected brand of optical module was plugged in. This is the antithesis of disaggregated open networking, and reflects the ‘old way’ of operating by whitelisting to only enable certain branded products. To be fair, whitelisting will likely protect the user from running into these sorts of interoperability problems. However, branding a product as ‘open’ to a customer searching for the flexibility of a disaggregated network, then whitelisting important components, is misleading. It’s much better to inform the customer of supported hardware via a public Hardware Compatibility List, or Integrators List, rather than forcing them to use a particular brand of module or cable in a bait and switch.
  • Some DAC vendors programmed the EEPROM of their 4 Lane 100G DACs to identify as ‘QSFP+ or Later’, as advised by the SFF spec in order to enable interoperability when using the DAC as a 4 independent 25G links versus using it as an aggregated 100G link. However some NOS were expecting the EEPROM of these cables to identify as ‘QSFP28 or later’. This was a case of disparate companies reading the spec differently, and leading to an interoperability problem. Reprogramming the EEPROM fixed this issue.

Most of these problems are not inherently complex. In fact, they are the types of interop problems that system integrators have been handling for years. In an open ecosystem, however, these problems can sneak into the field simply because no one has looked for them. When users find these problems in their deployments, it undermines the idea of open networking.

Combating this requires understanding the issues making it a challenge to ensure universal connectivity and interoperability in an open ecosystem. There are four primary issues, and understanding them will help us identify a solution:

  1. Access to equipment: Sampling of open network hardware is notoriously difficult, unless you want to buy 1000’s. Providing hardware samples to all NOS and module vendors to ensure interoperability before a product can be an issue.
  2. Intra-box interop issues: Interop problems have traditionally been between Hardware Vendor A and Hardware Vendor B. Vendors A and B both provided a monolithic solution with their own software for configuring and managing their hardware, eliminating any interop problems at the hardware/software interface. Further, they would provide their own limited selection of connectivity options via optical modules and cables that they branded, so there were no problems inside the box, but rather between boxes from different vendors. Today it is much more complicated. Users are buying cables and optic direct from the manufacturer. They are buying third party NOS software to run their datacenter and installing it using Open Networking Install Environment (ONIE). This introduces interoperability problems inside the box.
  3. Exponentially large interop matrix: In an open solution, there are many variables when it comes to connecting a single server to a switch. These variables could include server board, server BIOS, server operating system, NIC, NIC firmware, NIC driver, cable or optical module, switch, or the network operating system. Traditionally, a system integrator would limit the variables to tested and trusted components. In an open ecosystem this is not possible, as the user is now taking on the role played by the system integrator and does so because they are interested in the flexibility offered. That flexibility can have its traps though. In the example outlined above, if each variable has 2 possibilities, there’s over 500 possible combinations to test. The interop matrix is too big.
  4. Integration resources gap: System Integrators have spent many years building their name, reputation, resources and expertise. They’ve built international support organizations, solved key problems around logistics of delivering replacement parts and components, and provide technical support for their entire product line. Many customers still gladly pay for these services, simply to have the assurance of a well regarded support organization.  Customers who choose to roll their own solution by buying directly from manufacturers often run into difficulty identifying and debugging interop problems, and then need to work with multiple suppliers to get updates and fixes. The users may not have the expertise, equipment, or time in-house to do this leg work.

In view of the complexities, it's easy to see why interoperability and universal connectivity in the new open ecosystem is not a foregone conclusion. System Integrators tried to shield users from interop problems. When users buy directly from manufacturers, they must deal with those interop problems themselves. With a variety of software and connectivity options, the interop matrix has grown and new types of interop problems are being discovered by users. What can be done to ensure interoperability in the open ecosystem?