NVMe is here! A view from the “engine room”
It has to be big. And fast, real fast. But reliable. And expandable. Future-proofed. But cost-effective, mind. I can choose any six of those.
Do you have a boss like that? Well, if you’re an SSD Nodes customer, perhaps you do—and that might be why you chose us as your VPS provider. My boss is a really nice guy, and he’s learned a lot through bootstrapping SSD Nodes, but he can be a bit demanding at times. I’m talking about Matt, the founder of SSD Nodes.
My name is Daniel, and I’m the head of engineering at SSD Nodes. My job is to build, run, fix, and tend to the needs of the computer systems that serve our customers. I run a fleet of large and modern enterprise-grade servers—our previous generation of host nodes are powered by Intel E5/Gold CPUs and generally have 1Tb RAM and 8-16Tb RAID10 redundant SSD storage. All have at least one 10Gbit network connection. We have deployed only modern Intel Skylake CPUs for ages —unlike a competitor who began to offer them only a few months ago. And as you might guess from the company name, SSD Nodes has used only fast SSD storage since Matt founded the company in 2011.
That’s pretty good hardware, better than most in our industry. But not good enough for Matt. Not any more. He wants bigger, better, faster. (To be fair, he is only this demanding on behalf of his customers. You should see his Macbook Pro—let’s just say that I could virtualize it on our platform for $7.99/month and have room to spare).
Earlier this year I was given the task of selecting hardware for our next generation hosting platform. I was assisted by Ranvir, one of our engineers (you might have seen some of his posts on our blog). The new platform had certain additional requirements, many of which I can’t talk about yet (announcements expected later in 2018 or early 2019). But one new requirement that I can discuss is, as I mentioned before, NVMe disk storage.
A good SSD drive might be able to read 550 megabytes of data per second (sequentially) and write 520 megabytes per second, assuming it is connected to a fast SAS or SATA disk interface. (We connect all our drives via at least a 6Gbps interface.) We install multiple SSD drives in a RAID10 configuration, “striping” data across them to at least double performance and to ensure our systems mirror your data. The SSD drive may be capable of around 10,000 input/output operations per second (IOPS) random read.
A key difference with NVMe drives is that they are connected more or less directly to the PCI-E (Gen3 x4) bus—there is no slow SAS or SATA port to get in the way. A good NVMe drive might deliver 3,000 megabytes per second (sequential) read/write and 600,000 IOPS (random read). That is a huge performance increase—about six times faster data transfer and up to 60 times better IOPS performance (depending on the drive and the read/write mix).
We spent many weeks talking to vendors and asking them to put forward their best equipment to meet our needs. They knew we needed “a server.” They didn’t understand that we needed a virtualization host to support a bunch of our customers, each of whom would use it to run “a server.” Performance was crucial (pardon the pun) but so was reliability, reputation, and hot-swappability (in technical terms, that’s called a “surprise add/remove”—that can be tricky in the NVMe world).
Eventually, we insisted that vendors use Intel DC P4610 NVMe drives with a U.2 interface. (We needed the 7.68TB drives, as nothing on our platform is small). They scored well on performance (3,200 megabytes per second read/write; 640,000 IOPS random read; 220,000 IOPS random write) and durability. They were a brand we know and trust, as we already use a lot of Intel SSDs in production, without a single failure.
These Intel NVMe drives cost around $4,000 each, and we still install them in pairs, because we always mirror your data. That’s despite the cost and despite the fact we’ve never seen an Intel drive fail.
Our new servers can accept four hot-swappable NVMe drives and, despite the expense, we plan to have spares on hand. If an Intel drive ever fails on us, for the first time in our seven-year history, we can swap it out immediately and with no impact to your service.
And because we store two copies of your data, we effectively double our read performance (shared by all customers on the host node)—the equivalent of 6,400 megabytes per second and 1,280,000 IOPS random read.
It just goes to show—just because we are half the price of our competitors, that doesn’t mean we are cheap! Matt doesn’t allow it.
(You might be surprised to learn that many VPS providers don’t mirror customer data to guard against a drive failure, even if they are using much cheaper SSD drives. Look at their specifications carefully—if they don’t explicitly say they use RAID10 or mirroring, chances are they don’t mirror your data at all!)