VSAN – the hardware (part 2)

For this deployment, VSAN will be used to host a VMware Horizon View environment. The View deployment will have non-persistent VMs that rely on App Volumes, UEM, and folder redirection for a layered user persistence.

After reading several books on VSAN, including the brand-new Essential Virtual SAN (VSAN): Administrator’s Guide to VMware Virtual SAN, Second Edition that covers the latest 6.2 release, I’ve decided on this hardware for VSAN:

Compute

  • Dell PowerEdge R630
  • 2 x Intel Xeon E5-2680 v3 (12 core)
  • 256 GB RAM
  • Chassis with 8 SATA/SAS drives
  • 3 PCI-e half-length, half-height slots

Network

  • QLogic 57800 (2 x 10GBaseT, 2 x 1GBaseT)

Disk/Controller

So what’s on the HCL?  The 13th gen server itself is on the HCL, so that’s a good start for processor, memory and chipset.  The Intel P3700 is also fully supported, a good thing since most of the I/O will hit this drive.

What’s not on the HCL?  The PERC card and then Samsung SSDs.  I spent a lot of time on forums, Google, and VMware expert blogs on both components.  The tl;dr – slight risk but worth it….well I’ll find out soon enough!

PERC H330…why?  Honestly, I would have gone with the H730 that’s on the HCL if I hadn’t spec’d these before VSAN came into the discussion.  The main reason why it’s not on the HCL is due to its queue depth.  VMware recommends a minimum queue depth of 256.  The H330 falls at or just slightly below that in tests.  Each 850 EVO drive has a queue depth of 16, so the SATA protocol itself is the limitation when it comes to queue depth rather than the PERC.  Also, a recent firmware update for the H330 includes VSAN compatibility on the Dell side.

Now for the controversial component on the list: the Samsung 850 EVO.  VSAN discussion threads overall disapprove of “consumer” SATA SSDs due to small queue depth, lack of power protection, and reduced durability.  Let’s break each one down:

  1. Small queue depth: important, but it’s not everything.  I setup an existing VDI environment running on several 850 EVOs in a RAID 10 and it flies with very low latency.  There’s this post featuring a user who had a nightmare scenario with EVO 850s in a hybrid VSAN deployment where the EVOs were used as cache.  I had a similar issue with crazy high latency until running into this thread suggesting replacement of the ESXi storage driver that literally saved my VDI deployment.  That being said, that post was enough to convince me that the SATA SSDs alone probably wouldn’t be able to handle the caching tier.
  2. Drive reliability: I’ve deployed production VDI environments on Samsung 830, 840, and 850 SSDs.  Not a single drive failed, even with constant recomposition of linked clones.  The durability of the 830s and 840s may be questionable, but the 850 EVO is at a whole new level when it comes to durability according to this very in-depth study specifically on the drive.
  3. Power Protection: okay it’s a risk for sure, especially when silent VMFS corruption can happen.  See here and here.  Again, SSDs will fail, it’s just a matter of when.  Power loss is always a consideration.  However, all servers have redundancy power supplies on different sources.  Risk is low but not zero for sure.  This is why we backup!

Keep in mind, this is all theoretical right now.  Actual deployment and testing is the only way to verify that my assumptions are true.  Per so many different posts, it’s always recommended to follow the HCL for every component.  There’s also a huge benefit to SSDs on the HCL.  The Intel one that will be used in this buildout absolutely crushes the 850 EVO in every test and removes bottlenecks caused by SATA and AHCI.  Keep in mind, it’s also nearly three times the price and half the capacity of the 850 EVO.  In this setup, it’s about being fast, but reasonably fast.

6 thoughts on “VSAN – the hardware (part 2)

  1. Great series on VSAN. I have just begun to experiment with VSAN on my single server at home. I’m surprised by your decision to use consumer SSD drives in your VSAN setup. At home I used the following disks:

    cache tier: Samsung 850 Pro PCIe NVMe 256GB SSD
    capacity tier: 4 x Samsung 840/850 Pro SATA SSDs

    After enabling VSAN with the above disks the performance was appauling. I was getting roughly 20MB/s (80MB/s tops)! I tried changing the stripe width in the VSAN policy from 1 to 2/3/4 but this made no difference.

    After days of research it turned out my problem was due to me using consumer SATA SSD drives.

    Last week I ordered my first enterprise SSD for home use. The Samsung SM863 480GB SATA SSD. I can’t even begin to tell you how much better the performance is!! When I used this single drive in a datastore I was able to get between 350-450MB/s read/write speeds!

    The next test I did was to delete the above datastore and then enable VSAN as follows:

    cache tier: Samsung 850 Pro PCIe NVMe 256GB SSD
    capacity tier: 1 x Samsung SM863 480GB SATA SSD

    Speeds were much better with the SM863 in the capacity tier but write speeds were still only about 200MB/s so I’m assuming that the consumber PCIe drive is the problem here.

    I have another SM863 drive on order and it will be interesting to test how VSAN performs when using both SM863 drives in the cache and capacity tier.

    I am still blown away at the speeds I get with the SM863 drive! Things that took 20min (or longer) take under 2min on the SM863 Enterprise drive!

    Look forward to some more posts on VSAN as I plan on setting myself up a 3 or 4 node VSAN cluster at home for my studies/lab!

    1. First of all thanks for your detailed comment! I’m looking into the SM863 drive now and am trying to figure out how much quicker it would be compared with the 850 EVO. It looks like the major benefits are improved random read and write I/O and better performance in a RAID config. Have you run any guest VM benchmark tests against the drive (PT8, HD Tune, ATTO, CrystalDiskMark) or used the built-in performance tests in VSAN 6.2? If so, I’d be interested in the results. I’m also looking that the PM1633 that is a bit higher in price than the SM863 but is SAS instead of SATA. I”m read some comments on VMware forums about how SATA SSDs haven’t worked well in VSAN but I’m not entirely convinced.

      I’m hoping to get some updated VSAN results posted soon now that I’ve finished testing the drives in different RAID configs. I’m hoping to setup a page that shows results from the built-in VSAN 6.2 performance tests. If you’re interesting in comparing results let me know!

    2. Hi Sean,

      How did you know how much capacity you needed for the cache tier SSD? We’re in an enterprise environment, and I need to order some for my 3 hosts so we can enable VSAN, but I’m not sure how much. I read something that says VMWare recommends 10% of the total HDD in each disk group. My disk group will have a total of 33TB capacity. I think a 3.3TB (which will probably round up to 4TB) SSD seems excessive. Any thoughts?

  2. Thanks for replying! I did look into SAS drives but they are just too expensive for home lab use. The SM863 was only a bit more expensive than a 850 Pro consumer drive so it was a no brainer for me and the SM863 is on the VSAN HCL. Also, the endurance of the SM863 is about 3000TBW vs about 300TBW (I think) for the consumer 850 Pro.

    I did run ATTO on the SM863 drive when it was setup as a single drive datastore and I also copied a 5GB ISO between VMs when one of the VMs was running on the SM863 datastore. I didn’t know VSAN 6.2 had a built-in performance tester? Thats cool! Let me know what you want me to test and I will do it. My second SM863 should arrive in the next day or two and I will then setup both these drives in a new VSAN datastore (for cache and capacity).

    I can definitely say that my experience with consumber drives in ESXi has been awful, I tried using the onboard Intel SATA ports, the LSI2308 SAS ports (higher queue depth) and using an IBM M5015 RAID card. Changing controller didn’t help at all. Updating firmware and drivers didn’t help.

    The only thing that has made my datastore and VSAN perform well is changing the consumer drives for an enterprise one.

    I can’t wait to test VSAN with two SM863 drives. It’ll be my first VSAN datastore with only enterprise drives so the performance testing will be intersting.

    Let me know what tests I should run and I’ll post the results here.

    Is there a reason you are even bothering with RAID and VSAN? Isn’t it better to run VSAN without RAID?

    Look forward to your next article!

  3. Sorry for the very delayed reply. I started making a Google Spreadsheet with results from the integrated vSAN performance test. The link is here: https://docs.google.com/spreadsheets/d/1deX_VT2fCERP6CgUmp6nDt4Rt3GQTK-fjDlA4tRkS5I/edit?usp=sharing

    The 4 tests that I ran were: “70/30 read/write mix, realistic, optimal flash cache usage”, “100% Read, optimal RC usage after warmup”, “100% Write, optimal WB usage”, and “Stress test”. I exported the results to CSV and then averaged the last four columns (IOPS, Throughput, Average Latency, Maximum Latency).

    I kept the storage profile quite simple: single stripe and no CRC.

    If you want to add your test results to the Google doc, there should be an option to request write access to it…or post them as a comment if it’s easier.

    Thanks for helping with the testing!

Leave a Reply to Sean Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s