Just before Tech Field Day 23, Micron announced it would be discontinuing production of 3D XPoint persistent memory (Intel’s Optane brand), and a bunch of people freaked out about it as somehow being the end of Optane. That’s just silly, and if you want to understand why go read this piece by Jim Handy.
The reality of the situation is much more boring: Micron sells a bunch of other things and wasn’t likely to get to the volumes it needed to justify the continued investment. Intel will likely take over the production and move to its next generation 3D XPoint fab, which was going to happen anyway.
We quickly dealt with this question during the event but we concentrated on far more interesting things like Compute Express Link (CXL), which for me is the most interesting thing to happen in datacentre infrastructure in a while.
What is Compute Express Link?
CXL is a standard for linking memory bus devices together: CPUs, GPUs, and memory (and a few other more exotic things like TPUs and DPUs). Think of it as I/O for bytes not blocks.
Right now, the memory bus connects things that live inside a server. There are some technologies like Remote Direct Memory Access (RDMA) that add a kind of shim layer to connect memory devices over a network, but it involves translating to-and-from memory methods to I/O methods like TCP/IP. For example, fellow TFD23 presenter Vcinity uses RDMA over IP to make its rather nifty tech work, but more on that another time. CXL helps to provide high-bandwidth, low-latency connectivity between devices on the memory bus outside of the physical server.
Now, there are still some physical limitations, like the speed of light, but skipping the shim/translation steps removes latency, as does a more direct physical connection between the memory buses of two servers. A technique like CXL is handy for creating low-latency, high-bandwidth connections between lots of compute and data so you can, for example, do machine learning and AI.
Indeed that’s what RDMA was created for: massively parallel compute clusters. There are other competing standards as well, such as Gen-Z and NVLink, though they’re not precisely direct competitors, more alternate ways of doing similar things.
Why is CXL Cool?
I find two things very interesting about CXL.
Firstly, CXL2.0 added support for switching fabrics. This means end-resources don’t need to connect directly to each other but can have connectivity mediated through a networking layer. That means shared resources can be dynamically re-allocated using software, which increases efficiency but also makes the whole thing much more flexible.
Secondly, a broad cross-section of industry players supports CXL, including Micron, Intel, AMD, Arm Holdings, Nvidia, Cisco, Dell EMC, HPE, Huawei, Lenovo, Seagate, Western Digital, Broadcom, Mellanox, SK Hynix, and Xilinx. In fact, the only big name I can see that’s missing from the list of members is Samsung. Pretty much everyone else who makes CPUs, GPUs, memory, storage, and networking gear is on this list.
This tells me that CXL is likely to succeed as the de facto standard for memory bus interconnect between servers. Facebook, Google, and Microsoft are all part of the Consortium as well, so there’s a healthy set of customers who need lots of what CXL-compatible vendors will be selling. High-end enterprises tend to copy what the hyperscalers do, so that’ll extend the market nicely.
This gets us to a lovely ecosystem full of vendors and customers all using a single, common standard. By supporting a common standard, they can all concentrate on playing to their strengths, not trying to guess if Banyan Vines or Token Ring (or VHS or Betamax, or Bluray or HD DVD) will succeed as the dominant standard.
And that frees customers from also trying to guess what will win, and therefore delaying investment because who wants to drop $30 million on something that’ll be end-of-life before you finish turning it on? We can skip the painful shakeout phase where everyone tries to monopolise the brand of electricity available and go straight to the explosion in electrical appliances that makes our lives easier.
Back to Micron
It makes no sense for Micron to limit its ambitions to just Optane devices, and most of what Micron sells isn’t PMEM and would never be. Micron is much better off supporting a standard like CXL that means it can sell its stuff to as many people as possible.
I wouldn’t be at all surprised to see Intel adding in support for CXL to PMEM at some point once the market gets large enough and they figure out there’s more money to be made selling lots of stuff at slightly lower margins than owning 100% of a small market.
Personally I’m looking forward to composable infrastructure becoming a reality. The server is an arbitrary container and the network and storage already exist outside of its confines, so why not move memory out as well?
The rise of Network Attached Memory is going to be really fun to watch.
Pingback: Micron and the Coming of CXL - Tech Field Day