The Vcinity presentation at Tech Field Day 23 was the most technically intriguing of them all. It also puzzled me, and raised a lot of questions that I don’t really feel I have good answers for yet. This is a good thing, I think, because it means there’s a lot to learn here and it’ll be fun to work through.
Data Has Mass
The core challenge Vcinity focuses on is the need to get data to the compute. There are a bunch of ways to do this, but the laws of physics create certain limitations. CPUs (and GPUs) can’t operate on data they don’t have, and data sitting on a drive or tape or piece of paper is inert and a bit useless unless you can compute with it.
There are only two ways to do this: move the data to the compute, or move the compute to the data.
Moving compute to the data requires moving physical hardware. Anyone who has tried to buy a graphics card in the past year understands how that can be challenging. Moving atoms is much slower than moving bits.
But moving bits still takes time. Data transfer rates are limited by the speed of light (at maximum) and all the additional latency and bandwidth issues of data encoding, decoding, encryption, and all the weird and wonderful technologies that make up wide area networks. Using a network to move data can be slower than moving atoms, hence the AWS Snow family.
Never underestimate the bandwidth of a station wagon full of mag tape.
And, let’s be clear, you’re not actually moving them so much as creating copies of them. A move is a copy followed by a delete, unless you enjoy data loss. And copying data creates security and privacy risks.
Vcinity tries to solve the problem by saying “What if you could leave the data where it is, and give compute remote access to it without making lots of copies?”
How Vcinity Works
Stephen Wallo, Vcinity CTO, explained how Vcinity performs its magic in this presentation so go watch the whole thing if you want to really get into the nitty gritty.
Here’s my summary.
Vcinity provides a per-site endpoint (which can just be a virtual machine) that looks like a NAS to clients. Clients access files and folders on the endpoint as if it was a local file share sitting on the LAN, using their regular LAN protocols like SMB or NFS to do so. But behind the scenes, Vcinity is pulling the data from where it actually lives, somewhere on the other side of the WAN. Okay, cool, but how is this different from a caching appliance?
The Vcinity endpoint understands how filesystem data access works. When you’re working on a movie in Adobe Premier (or manipulating a CAD file, or just copying data) the filesystem doesn’t load the entire file into memory. It reads in the chunks it needs. Modern operating systems have an entire virtual memory system and an I/O layer to load chunks of data in and out of memory.
By understanding how this chunking mechanism works for different file access protocols, the Vcinity endpoint can request just the data it needs, and if it can pull the data back fast enough in a steady stream, you don’t really notice the initial time-to-first-byte delay of WAN compared to LAN. When you’re talking about megabytes of data, waiting 70ms for the first 9000 byte jumbo frame doesn’t matter so much if you can reduce how long it takes to wait for the megabytes (or gigabytes) of data the application needs to do its thing.
By analogy, Vcinity takes the very turbulent data flow of your regular WAN and makes the data flow laminar and smooth. This gets you much greater throughput over the same pipes, just like smooth water flow is more efficient than turbulent flow. Regular WAN using TCP/IP is such a churning maelstrom of turbulent data that it’s not very hard to make it quite a lot better.
Okay, so how does Vcinity pull back data with enough bandwidth to overcome the perceived latency problem? Vcinity uses Remote Direct Memory Access (RDMA) (used a lot in high performance computing, supercomputer type things) as its core data fetch mechanism, and encapsulates the more efficient RDMA process inside IP packets so regular WAN devices can process them.
I’m not entirely clear on how encapsulating RDMA inside IP doesn’t add the problems of WAN latency back in again, but I assume it’s something to do with the RDMA method being a lot less chatty than regular TCP/IP, so you can do more efficient IP movement optimised for this kind of data transfer. A bit like how massively multiplayer games tend to use UDP and their own special data resilience techniques that are tuned for gaming. TCP is great, but it’s a general purpose thing that works well enough under a broad range of circumstances. It won’t be as good as something highly optimised for a specific use case, in the same way that a screwdriver makes a lousy paintbrush.
Vcinity can also do something akin to spread spectrum wifi over WAN-links. If you have multiple WAN links between sites, Vcinity can split up the traffic and send it over all the links at once and then reassemble them at the other end, and have unique per-link encryption keys.
Vcinity seems very cool and worth looking at if you want to provide remote access to centrally managed data. I’d want to dig into the specifics of my use cases to ensure that it works well enough for what I’m trying to do because of the assumptions that underpin the techniques here. But it’s a credible enough approach that it will likely work well enough for what a lot of people are trying to do in enterprise.
I look forward to hearing more from Vcinity.