I had a lot of trouble figuring out what SolidFire’s architecture looks like conceptually. I had to read a bunch of different whitepapers and ‘reference architecture’ documents on their website that were pretty light on the conceptual detail, but had plenty of configuration file examples and other gritty detail that I don’t care about at this stage. The other stuff was super-high level and didn’t really give me a nice, two page description with a diagram of how to plug one of these things into a storage network.
What I’ve been able to work out is this:
Solidfire is a shared-nothing distributed storage cluster, with a minimum of 5 nodes, and up to 100 nodes in the cluster. Each node is 1RU high and has 10 SSDs in it, from 300GB to 960GB depending on the model. The nodes use replication of blocks for data protection, with 1 or more copies distributed around the cluster using SolidFire’s proprietary Helix data-protection, so a lot like GFS, Lustre or HDFS it seems.
Each node has 2 x 10Gb SFP+ iSCSI ports, and there appears to be a FibreChannel access node available as well that doesn’t have SSDs in it (but has 4 x 10Gb iSCSI ports, as well as 4 x 16Gb FC ports), but one assumes it participates in the cluster to provide network access to the storage pool. I guess the extra iSCSI ports are to avoid over-saturating the Ethernet ports trying to serve 16Gb of FC traffic, though the Ethernet is 1.6x oversubscribed if you run all the FC at line rate.
I’m not sure how this storage pool is carved up into LUNs that get presented to hosts. Is it just one dirty big pool of storage? Are there logical groupings of LUNs into volumes (for replication to remote clusters, or snapshots, for example) or is everything done at a LUN level? I would hope so, because databases hate it when their snapshots aren’t done across a consistency group. SolidFire mention grouping of something for storage QoS, as well as capacity, but I’m not sure if it’s something more than just LUNs. SolidFire do integrate with VMware vVols, which is nice, but doesn’t explain how the multi-tenancy works inside the SolidFire cluster itself.
I get the failover ability of a shared-nothing cluster from a data storage perspective (you’d need 2 simultaneous failures of nodes or disks holding the same data block replicas to lose data, and more if you have 2 or more data copies) but I’m not clear on how network port failover happens. How do the IPs for the iSCSI targets fail over between nodes? What about FC port WWNs? Do you have to configure multi-pathing for it to work? Do you design where the targets go, or does the system automatically figure it out for you?
Overall it seems like Solidfire may well have quite a nice offering here. Unlike Pure Storage, there’s no active/active HA pair of controllers mediating access to the whole cluster. If you go iSCSI, it looks like you could spread the LUN targets around a bunch of nodes and smooth out your I/O quite nicely. Similar to Pure and others, SolidFire has inline dedupe and compression. Pure boast $5-10/GB usable in their marketing materials, while SolidFire reckon they can come it at about $3/GB.
On the downside, shared-nothing scale-out systems can have issues when you have lots of nodes because of the amount of inter-node communication required to keep state synchronised across the cluster. It’ll be interesting to hear how SolidFire have addressed this issue. Using the same 2 x 10GbE ports for both inter-node comms and serving data is something that draws my attention as a possible bottleneck at scale, so it’d be interesting to see what data SolidFire have about that.
Really, learning about SolidFire so far has raised more questions than it’s answered, so I look forward to hearing from their team about more of the details. Hopefully I’ll be able to get them to whiteboard the architecture of a SolidFire cluster and how the LUNs and multi-tenancy works, or figure it out enough that I can draw you a picture myself.