Every solution has a bottleneck. After setting up a 3-node VSAN array with this hardware, I/O performance did not meet initial expectations, especially with write throughput at just over 200 MB/s:
There was a bottleneck somewhere, but several items could cause this including:
- VMware storage drivers
- If the lsi-mr3 driver is replaced by the megaraid_perc9 will speed increase?
- Is the VMware nvme driver an issue?
- Firmware
- PERC H330
- Intel P3700 SSD
- Samsung 850 EVO
- PowerEdge R630 backplane
- VSAN configuration
- How much does VSAN storage policy determine I/O performance?
To rule out a VSAN issue, I removed one of the hosts from the cluster, re-enabled RAID mode and initialized a new RAID-10 disk array with the 4 Samsung 850 EVO SSDs. Results were shockingly bad at under 30 MB/s write:
Reverting the driver from the VMware recommended lsi-mr3 for ESXi 6 to the megaraid_perc9 driver for ESXi 5.5 actually doubled the speed at over 50 MB/s, but still wasn’t anything impressive:
At this point, it was pretty clear that the PERC card was the issue. To ensure the Samsung SSDs weren’t the issue, I ran the same test on another standalone server that used the same SSDs but had the PERC H730 instead of the H330. The results were exponentially better (there are 8 SSDs in this RAID 10 instead of 4):
Now that the PERC H330 was identified as the bottleneck, was there any way to improve write speed in the VSAN cluster without replacing hardware? The good news is yes, VSAN does a great job of using the Intel SSD cache drive quite often to improve I/O.
To improve it even further, I created a new storage policy that used a disk stripe size of 4 instead of the default 1. My guess here was that if data blocks were forced to replicate across all 3 physical hosts in the cluster then guest reads and writes would almost always be handled by three disk groups. The results were better than the initial VSAN test:
In sum? VSAN does what it is supposed to do: use cache as much as possible to deliver maximum performance. That being said, caching isn’t everything. A single standalone host using the H730 with twice the capacity SSDs but no high performance caching SSD still significantly outperforms the 3-node VSAN cluster on the H330. Without VSAN though, anything running on local storage handled by a PERC H330 would take a significant performance hit. With VSAN, these hosts can actually be used for a small standard workload VDI environment. Plotting all 4 results on a chart shows how much of boost VSAN provides compared to using each host in RAID mode:
The PERC H330 just can’t handle SSD workloads. It has passthrough and HBA mode which works well with VSAN, but when it comes to actual disk tests, results are disappointing.
In sum: use the best PERC card offered from Dell when SSDs are in use.