
The telecommunications industry’s continuous strive for higher performance has spurred innovations in processor architectures. The general trend has been to go parallel; adding more cores to a single processor device and then dividing tasks between them. This has resulted in a more complex environment for software engineers to master. But does this mean that the programming of next-generation network processors (NPUs) has to be difficult? Not necessarily.
The processing demands on modern NPUs are very high. In an application designed for 100 Gbps processing, the NPU must be able to handle 150 million packets per second. In such an application, thousands of packets are typically being processed concurrently by the device. The amount of parallelism is extreme compared to any other application in the IT industry. Another unique attribute of NPUs is the demand for extremely high table memory lookup rates.
In packet processing, every network service (User and control/OAM traffic) requires a unique set of operations per packet (classification, filtering, counting, metering, policing/shaping and forwarding). A network service may require hundreds or even thousands of operations before they are eventually forwarded to outgoing interfaces or to the conftrol CPU.
With the networking industry’s unique set of performance requirements, next generation NPUs are designed to solve very specialized problems. They don’t compete with general-purpose CPUs, but offer a programmable alternative to in-house developed fixed-function ASICs.
There is currently a shift toward merchant silicon in high-end networking, mainly driven by the ability to shorten time-to-market and focus research and development (R&D) expenses on differentiation through software rather than through more risky ASIC designs.
As ASIC designs are shifted out in favor of NPUs, some R&D managers raise concerns regarding the complexity of these devices. Are they difficult to program? Can performance and an intuitive uni-processor programming model be combined?
In 2004, Larry Huston of Intel (at the time Intel was a main player in the NPU space) ended a paper with a statement which carries as much meaning today as it did six years ago2:
“The ideal scenario would have a programmer write an application as a single piece of software and the tools would automatically partition and map the application to the set of parallel resources. This may be a difficult goal, but any steps in that direction will improve the life of a developer.”