Xilinx And The Adaptive, Composable Datacentre

Xilinx is one of those companies that builds things that are all over the place but you rarely hear the name unless you’re involved in their specific niche. Xilinx makes programmable hardware, most famously Field Programmable Gate Arrays (FPGAs), that combine the speed of hardware with the flexibility of software.

The Xilinx model for composable, scale-out data movement in the datacentre.

The Xilinx model for composable, scale-out data movement in the datacentre. (Source: Xilinx)

Baking a computer algorithm into hardware (such as specific encryption algorithms, but also things like TCP/IP network processing) as an application-specific integrated circuit (ASIC) makes it process faster than if you run the algorithm as software in a genera-purpose CPU like an Intel x86 or Arm chip. But an ASIC can’t be changed once you’ve manufactured it; it will do that one job forever, so you lose the flexibility of software that can change, or a general-purpose CPU that can run essentially any program you like. It’s a tradeoff.

FPGAs (and similar programmable hardware) is a middle ground: you can reprogram the hardware to do a different job. They’re not as fast as an ASIC, and slower to reprogram than a CPU, but they are very useful in a range of applications.

Of course, these days there are all kinds of specialised pieces of hardware. GPUs grew out of maths co-processors that could do things like floating point mathematics faster than a general-purpose CPU because there’s lots of maths involved in figuring out how imaginary armour on imaginary orc hordes looks when you’re pretending to ride into an imaginary battle.

There’s also a lot of fancy maths of a similar kind involved in machine learning, so hardware developed for computer games got used a lot for that (as well as cryptocurrency mining, but the less said about Numberwang the better) but the market got big enough that people started building specialised Tensor Processing Units (TPUs) for even more niche maths.

And now we have all kinds of accelerator cards joining GPUs and TPUs: Data Processing Units (DPUs) and programmable System-on-Chip (SOC) devices and Xilinx has a new one called an Adaptive Compute Acceleration Platform (ACAP). We’ve come a long way from the simple CPU-RAM-IO model of a computer of my youth.

Adaptive, Composable Computing

Xilinx reckons we’re going to build datacentres out of composable mixtures of all of these technologies, dynamically reconfigurable the way we spin up and tear down software and infrastructure in cloud-style systems. People have gotten comfortable with changing their mind a lot and are less happy buying computer hardware that does one thing forever. We crave variety.

But programming PLDs (programmable logic devices) isn’t for the faint of heart. Back when I was learning how to do it (when dinosaurs roamed the Earth), you had to use specialised languages like ABEL or Verilog and have a good understanding of computer hardware (uphill, both ways, in the snow). Happily, things moved on since then, and people started to use “higher level languages” like C, but you still needed to have a reasonable understanding of hardware to write functioning software.

Developers apparently don’t want to know anything about the icky physical world and would prefer to retreat into an ivory tower made of pure thought, so now you can use even higher level languages like P4.

Xilinx thinks of FPGAs as now being at a similar level to the other kinds of compute, each best for particular kinds of calculations: CPUs for scalars, GPUs for vectors, TPUs for matrices, and FPGAs for dataflow between the others. All the other devices need huge amounts of data moved to and fro very quickly, and the current method (mostly) requires the CPU to get involved with that process.

Using CXL to remove the CPU bottleneck for data transfer.

Using CXL to remove the CPU bottleneck for data transfer. (Source: Xilinx)

Xilinx thinks datacentre computing will move to a more peer-to-peer topology with data flowing directly from memory or storage to the kind of processor that needs it, or out of the node via a SmartNIC off to some other node. The CPU will no longer need to get involved, removing it as a bottleneck. Xilinx also mentioned Compute Express Link (CXL) which is coming up a lot in my research into this area.

It’s very early in this kind of architecture for general enterprise datacentres, but it would make a huge amount of sense for public cloud providers. They have the need, the expertise, and the deep pockets, to build this kind of architecture so I expect Xilinx to sell into the global hyperscaler market first and then gradually expand out into other enterprises once the staff churn out of the cloud providers into the rest of the industry.

Unlike previous attempts at composable infrastructure, it feels like a lot of the required building blocks are falling into place and it might actually be real this time. There are still plenty of hurdles, and I don’t see this making it into mass-adoption for a long time, if at all. But you never know. Maybe a TPU will become the new GPU for people who want really lifelike AI orcs to play against.

Stranger things have happened.

Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.