I’m currently deploying a newly developed SharePoint extranet for a client on to an environment they have provided. During the initial deployment phase we had some instances where the hypervisor environment stalled completely. My client’s IT support tracked this down to a bad configuration of the iSCSI interface used to connect the (virtual – I’m still not a fan of this; and let’s not talk about both nodes of a SQL HA cluster on the same hypervisor either) SQL cluster to the SAN for storage.
My client worked with their SAN and switch provider and changed some parameters which fixed this stalling issue; but how could I prove this before moving on with my install?
The answer I decided was to stress test the environment with SQLIOSIM. There’s a ton of information out there about SQLIOSIM which I’m not going to repeat, but I thought it would be good to post about some of the more salient points found during my time investigating I/O performance with this tool.
SQLIOSIM isn’t a tool to measure performance.
If you want to measure your SQL I/O performance (and yes, you do!) then use SQLIO. SQLIOSIM is a tool which exercises the I/O subsystem by simulating SQL load, and verifies this for correctness.
Errors are errors!
During our testing we received errors from SQLIOSIM which said that we had outstanding I/O requests which had been outstanding for more than 15 seconds. This is a problem; it’s not caused by saturated storage. On a well-performing system you can saturate your I/O channel without getting these errors. Getting these errors means you need to investigate your I/O path. Our client had to get their issues raised with tech support at the SAN provider in order to start resolving these issue.
Yes – these errors could affect SharePoint
My client was obviously pushing to get their extranet deployed, but we had to delay this until their environment was stable. When it comes to SQL I/O SharePoint is a pretty non-deterministic system. Think of all those timer jobs which may run, and the search crawls which can be running and accessing the database concurrently. Let alone the SQL I/O required to actually load our page and its webparts. The load-stress produced on the I/O subsystem by SQLIOSIM can be achieved in normal operations (one of our demonstration test cases would actually cause a full security crawl to be kicked off whilst a lot of site features were being activated; lots of I/O here).
Any diligent deployment of a SharePoint environment should include measurement of the raw SQL I/O performance to verify that it supports your capacity planning (use SQLIO for this), and also verify that the SQL I/O subsystem is stable, use SQLIOSIM for this, and remember to get all those errors corrected before you hand the environment back to your client.