Tuesday, December 19, 2017


More SQL Server Trace Flags

Another SQL Server Trace Flags resource (in addition to the Microsoft one).

There are a bunch of SQL Server resources hosted there, SQL Server Kit.

Saturday, December 16, 2017


SQLDiagCmd Updated

I’ve updated SQLDiagCmd, my standalone executable for running any or all of Glenn Berry’s excellent SQL Server DMV diagnostic scripts.

As well as being able to target multiple servers and multiple databases, it now also has the option to exclude specified queries from being executed (such as those that might take some time to execute on large very databases or busy server instances).

The source code is available on GitHub and you can download the executable directly from these links:



Thursday, December 14, 2017


A recursive C# function

I was searching through email today looking for a LINQPad snippet that a colleague, James Miles, wrote some time ago, one which we used to generate the scripts for a production SQL Server database + transaction log point in time restore after IT had a little SAN mishap!

In doing so, I came across this gem from James: Solving Puzzles in C#: Poker Hands, which is not just a great example of writing a recursive function but of problem solving in general. [Where I used to work, we often used to have a Friday puzzle where I tried to come up with or find puzzles that wouldn’t be easy to solve by brute force.  This was one of the many times I was thwarted by James and others!]

Tuesday, November 14, 2017


SQL Server: A more useful CXPacket Waits...

Starting with the upcoming SQL Server 2017 CU3 and SQL Server 2016 SP2 releases, CXPACKET waits are split into an actionable wait (CXPACKET) and a negligible wait (CXCONSUMER). 

(these wait types are already present in Azure SQL Database).

Making parallelism waits actionable

Wednesday, November 01, 2017


SQL Server Trace Flags

Microsoft have published a useful list of all SQL Server trace flags in a single location: 
DBCC TRACEON - Trace Flags

In the past, some of these were poorly documented or hard to find.

Wednesday, October 25, 2017


Shared Memory Protocol is not Supported on SQL Server Failover Clusters

I was recently trying to work out why SSAS installed on the same server as SQL Server would not use shared memory for its processing connections. It may be obvious to some people, but an internet search turns up surprising few references: the Shared Memory Protocol is not Supported on SQL Server Failover Clusters.

On a standard SQL Server instance, the Shared Memory protocol can be used when a client is running on the same computer as the SQL Server instance and the Shared Memory Protocol is enabled in SQL Server’s network protocols. (You can check the status of the enabled protocols using SQL Server Configuration Manager).

sys.dm_exec_connections will show you which net transport a client connection is using:

SELECT net_transport FROM sys.dm_exec_connections WHERE session_id = @@SPID;

You can force a client connection to use a specific protocol by prefixing the Server name in the connection string with one of these modifiers:

  • TCP: tcp:
  • Multiprotocol = rpc:
  • Shared Memory = lpc:

e.g. Force connection to use the TCP protocol:


In addition, you can force the client connection to use the Shared Memory protocol by using (local) as the server name. You can also use localhost or a period (.) e.g.:



Tuesday, October 17, 2017


SSMS 17.3 has XE Profiler built-in

New to SQL Server Manager Studio (SSMS) 17.3 is the XE Profiler. This is Profiler-like functionality built-in to SSMS:

SSMS 17.3 has Profiler built-in


Just double-click either of the two entries to create a live trace window (built on the SSMS XE “Watch Live Data” functionality).  The event sessions that will be created are named:

  • Standard:  QuickSessionStandard
  • TSQL:        QuickSessionTSQL

Friday, October 13, 2017


SSAS: Turn Off Flight Recorder

A quick and easy SSAS optimisation: turn off flight recorder:

SQL Server Analysis Services Flight Recorder provides a mechanism to record server activity into a short-term log. Information captured by Flight Recorder can be helpful for troubleshooting specific issues, however the load placed on the server when capturing the snapshots and trace events can have a small impact on overall performance.  For optimal performance the flight recorder should be disabled unless attempting to capture diagnostic information relevant to troubleshooting a specific problem.



Wednesday, October 04, 2017


SQL Server: Do You Have a Poorly Performing Query you can't Explain?

If you are running a SQL Server version prior to SQL Server 2016, and you have a query whose plan just doesn't seem right and you can't explain it, try running it with trace flag 4199

SELECT SomeColum
FROM SomeTable
It enables all the query optimiser hot fixes present in your applied SP and CU version.
Many DBAs enable this trace flag globally (at the instance level).
SQL Server 2016 will automatically enable all prior version query optimiser hot fixes.

SQL Server query optimizer hotfix trace flag 4199 servicing model
SQL Server 2016: The Death of the Trace Flag

Tuesday, October 03, 2017


SQL Server 2017: Performance Improvements and Linux

Bob Ward's post has some interesting stuff in it: 

Sunday, September 17, 2017


SQL Server Connectivity Issues: Guided Walkthrough

It’s not uncommon to see questions on StackOverflow relating to SQL Server connectivity issues. Microsoft support have published the following guide to help troubleshoot connectivity issues:

Solving Connectivity errors to SQL Server

In addition to providing a checklist of items that you can go through, it provides step by step troubleshooting procedures for the following error messages:

  • A network-related or instance-specific error occurred while establishing a connection to SQL Server
  • No connection could be made because the target machine actively refused it
  • SQL Server does not exist or access denied
  • PivotTable Operation Failed: We cannot locate a server to load the workbook Data Model
  • Cannot generate SSPI context
  • Login failed for user
  • Timeout Expired
  • The timeout period elapsed prior to obtaining a connection from the pool

There are also troubleshooting guides for Always On and SQL Azure DB connectivity issues:

Troubleshooting Always On Issues

Troubleshooting connectivity issues with Microsoft Azure SQL Database

Tuesday, December 27, 2016


R: Evaluating a classifier using standard performance evaluation metrics

The Azure ML team have released a useful Custom R Evaluator script for computing standard classifier performance metrics. The module expects as input a dataset containing the actual and predicted class labels (i.e. a confusion matrix). The R code is available at GitHub.

Example output:

Actual  a  b  c
     a 27  2  5
     b  1 24  2
     c  1  5 33

                                     a         b         c
Accuracy                     0.8400000 0.8400000 0.8400000
Precision                    0.9310345 0.7741935 0.8250000
Recall                       0.7941176 0.8888889 0.8461538
F1                           0.8571429 0.8275862 0.8354430
MacroAvgPrecision            0.8434093 0.8434093 0.8434093
MacroAvgRecall               0.8430535 0.8430535 0.8430535
MacroAvgF1                   0.8400574 0.8400574 0.8400574
AvgAccuracy                  0.8933333 0.8933333 0.8933333
MicroAvgPrecision            0.8400000 0.8400000 0.8400000
MicroAvgRecall               0.8400000 0.8400000 0.8400000
MicroAvgF1                   0.8400000 0.8400000 0.8400000
MajorityClassAccuracy        0.3900000 0.3900000 0.3900000
MajorityClassPrecision       0.0000000 0.0000000 0.3900000
MajorityClassRecall          0.0000000 0.0000000 1.0000000
MajorityClassF1              0.0000000 0.0000000 0.5611511
Kappa                        0.7581986 0.7581986 0.7581986
RandomGuessAccuracy          0.3333333 0.3333333 0.3333333
RandomGuessPrecision         0.3400000 0.2700000 0.3900000
RandomGuessRecall            0.3333333 0.3333333 0.3333333
RandomGuessF1                0.3366337 0.2983425 0.3594470
RandomWeightedGuessAccuracy  0.3406000 0.3406000 0.3406000
RandomWeightedGuessPrecision 0.3400000 0.2700000 0.3900000
RandomWeightedGuessRecall    0.3400000 0.2700000 0.3900000
RandomWeightedGuessF1        0.3400000 0.2700000 0.3900000

Wednesday, December 21, 2016


Editions and Supported Features for SQL Server 2016

Just posting this link so I can find it easily: Editions and Supported Features for SQL Server 2016

SQL Server 2016 SP1 onwards now supports Data Compression in Standard Edition (every edition in fact, including Express!)

Wednesday, November 30, 2016


Storage Benchmarking with diskspd plus a LINQPad Script for Generating diskspd Batch Scripts

It is always a good idea to measure a performance baseline when commissioning (or choosing) new storage hardware or a new server, particularly for SQL Server. It is not uncommon for SAN’s to be non-optimally configured, so knowing how close the storage’s performance comes to the vendor’s advertised numbers is important. You should also benchmark when you make any hardware/configuration changes to storage.

In the past, SQLIO was one of the commonly used tools to perform I/O testing, but SQLIO has now been superceded. diskspd.exe is Microsoft’s replacement for SQLIO, with a more comprehensive set of testing features and expanded output. Like SQLIO, Diskspd is also a command line tool which means it can easily be scripted to perform reads and writes of various I/O block sizes including random and sequential access patterns to simulate different types of workloads.

Where can I download diskspd?

diskspd is stand-alone executable with no dependencies required to run it. You can download diskspd from Microsoft TechNet – Diskspd, a Robust Storage Testing Tool.
Download the executable and unzip it into an appropriate folder. Once unzipped you will see 3 subfolders with different executable targets: amd64fre (for 64-bit systems: the most common server target), x86fre (for 32-bit systems) and armfre (for ARM systems). The source code is hosted on Github here.

Analyzing I/O Performance: What Metrics should I measure?

The three main characteristics that are used to describe storage performance are (from Glenn Berry’s post: Analyzing I/O Performance for SQL Server):


Latency is the duration between issuing a request and receiving the response. The measurement begins when the operating system sends a request to the storage and ends when the storage completes the request. Reads are complete when the operating system receives the data; writes are complete when the drive signals the operating system that it has received the data.
For writes, the data may still be in a cache on the drive or disk controller, depending on your caching policy and hardware. Write-back caching is much faster than write-through caching, but it requires a battery backup for the disk controller. For SQL Server usage, you want to make sure you are using write-back caching rather than write-through caching if at all possible. You also want to make sure your hardware disk cache is actually enabled: some vendor disk management tools disable it by default.

IOPS (Input/Output Operations per Second )

The second metric is Input/Output Operations per Second (IOPS). A constant latency of 1ms means that a drive can process 1,000 IOs per second with a queue depth of 1. As more IOs are added to the queue, latency will increase. One of the key advantages of flash storage is that it can read/write to multiple NAND channels in parallel, along with the fact that there are no electro-mechanical moving parts to slow disk access down. IOPS actually equals queue depth divided by the latency, and IOPS by itself does not consider the transfer size for an individual disk transfer. You can translate IOPS to MB/sec and MB/sec to latency as long as you know the queue depth and transfer size.
The majority of storage vendors report their IOPS performance using a 4k block size, which is largely irrelevant for SQL Server workloads, since the majority of the time SQL Server reads data in 64k chunks. IOPS Are A Scam. To convert 4k block IOPS into 64k block IOPS simply divide by 16 or to convert IOPS into MB/s measurements multiply IOPS * block transfer size.


Sequential throughput is the rate that you can transfer data, typically measured in megabytes per second (MB/sec) or gigabytes per second (GB/sec). Your sequential throughput metric in MB/sec equals the IOPS times the transfer size. For example, 556 MB/sec equals 135,759 IOPS times a 4096 bytes transfer size, while 135,759 IOPS times a 8192 bytes transfer size would be 1112 MB/sec of sequential throughput. Despite its everyday importance to SQL Server, sequential disk throughput often gets short-changed in enterprise storage, both by storage vendors and by storage administrators. It is also actually fairly common to see the actual magnetic disks in a direct attached storage (DAS) enclosure or a storage area network (SAN) device be so busy that they cannot deliver their full rated sequential throughput.
Sequential throughput is critical for many common database server activities, including full database backups and restores, index creation and rebuilds, and large data warehouse-type sequential read scans (when your data does not fit into the SQL Server buffer pool).

How do I use diskspd?

WARNING: Ideally, you should perform DskSpd testing when there is no other activity on the server and storage. You could be generating a large amount of disk IO, network traffic and/or CPU load when you run DiskSpd. If you’re in a shared environment, you might want to talk to your administrator(s) before running such a test. This could negatively impact anyone else using other VMs in the same host, other LUNs on the same SAN or other traffic on the same network.

Ensure the user used to run diskspd has been granted the ‘Perform volume maintenance tasks’ right: run secpol.msc -&>; Local Policies -> User Rights Assignment -> ‘Perform volume maintenance tasks’

NOTE: You should run diskspd from an elevated command prompt (by choosing “Run as Administrator”). This will ensure file creation is fast. Otherwise, diskspd will fall back to a slower method of creating files.

diskpsd parameters

You can get a complete list of all the supported command line parameters and usage by entering the following at a command prompt:
> diskspd.exe
The most common parameters are:
Parameter Description
-d Test duration in seconds. Aim for at least 60 seconds
-W Test warm up time in seconds
-b I/O Block size (K/M/G). e.g. –b8K means an 8KB block size, -b64K means a 64KB block size: both are relevant for SQL Server
-o Number of outstanding I/Os (queue depth) per target, per worker thread
-t Worker threads per test file
-Su Disable software caching
-Sw Enable writethrough (no hardware write caching). Normally used together (-Suw) to replace deprecated -h (or equivently use -Sh)
-L Capture latency info
-r Random data access tests
-si Thread coordinated Sequential data access tests
-w Write percentage. For example, –w10 means 10% writes, 90% reads
-Z<size>[K|M|G|b] Workload test write source buffer size. Used to supply random data (entropy) for writes, which is a good idea for SQL Server testing and for testing de-duping behaviour on flash arrays
-c<size>[K|M|G|b] Create workload file(s) of specified size
diskspd.exe -Suw -L -W5 –Z1G -d60 –c440G -t8 -o4 -b8K -r -w20 E:\iotest.dat > output.txt

This will run a 60 second random I/O test using a 440GB test file located on the E: drive, with a 20% write and 80% read ratio, using an 8K block size and a 5 second warm up. It will use eight worker threads, each with four outstanding I/Os and a write entropy buffer of 1GB, and save the results to a text file named output.txt. This set of parameters is representative of a SQL Server OLTP workload.
Note: The test file size (you can have multiple test files) should be larger than the SAN’s DRAM cache (and ideally not an exact multiple of it).

LINQPad Script

To automate the creation of a bunch of testing scenarios, rather than manually editing (which is tedious and error prone), I’ve written a simple C# LINQPad script:

  const string batchScriptFilename = @"c:\temp\diskspd.bat";

  // Flags used in each run and do not vary
  string disableHardwarecaching = "-Suw";       // -Suw: Disable both hardware and software buffering. SQL Server does this.
                                                // Su = disable software caching, Sw = enable writethrough (no hardware write caching)
  string captureLatency    = "-L";              // capture disk latency numbers
  string warmWorkLoad      = "-W5";             // Warm up time in seconds
  string entropyRandomData = "-Z1G";            // Used to supply random data (K/M/G) for writes, which is good for SQL Server testing.
  string testduration      = "-d120";           // Test duration in seconds NB: At least 60 seconds, 2-3 minutes is good 
  string testFileSize      = "-c440G";          // Nothing smaller than the SAN's cache size (and not an exact multiple of it)
  string testFileFullPath  = @"E:\iotest.dat";  // Test file name (goes at the end of the command)
  string resultsFilename   = @"output.txt";     // File to output the text results

  // prefix results file name with date
  resultsFilename = DateTime.Now.Date.ToString("yyyyMMdd") + "_" + resultsFilename;
  // Lists of varying params to use
  var randomOrSequential = new List<string> { "-r", "-si" };                 // -r = Random, -si = Sequential
  var writepercentage = new List<string> { "-w0", "-w10", "-w25", "-w100" }; // -w0 means no writes: -w10 = 90%/10% reads/writes           
  var blocksize = new List<string> { "-b8K", "-b64K", "-b512K", "-b2M" };    // 2M represents SQL Server read ahead, 512K backups
  var overlappedIOs = new List<string> { "-o2", "-o4", "-o8", "-o16"};       // This is queue depth
  var workerthreads = new List<string> { "-t4", "-t8", "-t16", "-t32" };     // Worker threads

  int runTimeSeconds = randomOrSequential.Count() * writepercentage.Count() * blocksize.Count() * 
                       overlappedIOs.Count() * workerthreads.Count() * 
                       (Int32.Parse(testduration.Substring(2)) + Int32.Parse(warmWorkLoad.Substring(2)));

  using (StreamWriter fs = new StreamWriter(batchScriptFilename))
      fs.WriteLine("REM Expected run time: {0} Minutes == {1:0.0} Hours", runTimeSeconds / 60, runTimeSeconds / 3600.0);

      string cmd = string.Format("diskspd.exe {0} {1} {2} {3} {4} {5} ",
                                 disableHardwarecaching, captureLatency, warmWorkLoad,
                                 entropyRandomData, testduration, testFileSize);
      // Yes, LINQ could be used!
      for (int i1 = 0; i1 < writepercentage.Count(); i1++)
          for (int i2 = 0; i2 < randomOrSequential.Count(); i2++)
              for (int i3 = 0; i3 < blocksize.Count(); i3++)
                  for (int i4 = 0; i4 < overlappedIOs.Count(); i4++)
                      for (int i5 = 0; i5 < workerthreads.Count(); i5++)
                          fs.WriteLine(string.Format("{0} {1} {2} {3} {4} {5} {6} >> {7}",


Short Test Batch Script

A batch script to perform an initial (relatively) quick test would look something like the following:
REM Expected run time: 98 Minutes == 1.6 Hours
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o2 -b8K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o2 -b8K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o2 -b8K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o4 -b8K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o4 -b8K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o4 -b8K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o8 -b8K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o8 -b8K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o8 -b8K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o16 -b8K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o16 -b8K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o16 -b8K -r -w0 E:\iotest.dat >> output.txt

diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o2 -b64K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o2 -b64K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o2 -b64K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o4 -b64K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o4 -b64K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o4 -b64K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o8 -b64K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o8 -b64K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o8 -b64K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o16 -b64K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o16 -b64K -r -w0 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o16 -b64K -r -w0 E:\iotest.dat >> output.txt

diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o2 -b8K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o2 -b8K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o2 -b8K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o4 -b8K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o4 -b8K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o4 -b8K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o8 -b8K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o8 -b8K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o8 -b8K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o16 -b8K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o16 -b8K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o16 -b8K -r -w20 E:\iotest.dat >> output.txt

diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o2 -b64K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o2 -b64K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o2 -b64K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o4 -b64K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o4 -b64K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o4 -b64K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o8 -b64K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o8 -b64K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o8 -b64K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t4 -o16 -b64K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t8 -o16 -b64K -r -w20 E:\iotest.dat >> output.txt
diskspd.exe -Suw -L -W3 -Z1G -d120 -c440G  -t16 -o16 -b64K -r -w20 E:\iotest.dat >> output.txt


Interpreting the diskspd results

diskspd produces quite a bit of output per run. The first section is a recap of the parameters that were used in the command line:
Command Line: diskspd.exe -Suw -L -W5 -Z1G -d120 -c440G -t16 -o4 -b64K -r -w10 E:\iotest.dat

Input parameters:

    timespan:   1
    duration: 120s
    warm up time: 5s
    cool down time: 0s
    measuring latency
    random seed: 0
    path: 'E:\iotest.dat'
        think time: 0ms
        burst size: 0
        software cache disabled
        hardware write cache disabled, writethrough on
        write buffer size: 1073741824
        performing mix test (read/write ratio: 90/10)
        block size: 65536
        using random I/O (alignment: 65536)
        number of outstanding I/O operations: 4
        thread stride size: 0
        threads per file: 16
        using I/O Completion Ports
        IO priority: normal

This is a great improvement over sqlio which did not echo the run parameters or provide a readable summary of the parameters, making it hard to decipher runs at a later date.
Next is a summary of CPU information. This information can help determine if your storage test is CPU bottlenecked:
actual test time:   120.00s
thread count:       16
proc count:     32

CPU |  Usage |  User  |  Kernel |  Idle
   0|  10.21%|   1.09%|    9.11%|  89.79%
   1|  10.31%|   1.09%|    9.22%|  89.69%
   2|  10.14%|   1.08%|    9.06%|  89.86%
   3|  18.26%|   0.94%|   17.32%|  81.74%
   4|   7.86%|   1.12%|    6.74%|  92.14%
   5|   7.79%|   0.91%|    6.87%|  92.21%
   6|   7.55%|   1.15%|    6.41%|  92.45%
   7|   7.71%|   1.13%|    6.58%|  92.29%
   8|   0.00%|   0.00%|    0.00%|   0.00%
avg.|   2.49%|   0.27%|    2.23%|  22.51%

The results for each thread should be very similar in most cases.
After the CPU summary is the I/O summary, split into total (read + write), followed by separate read and write statistics:
Total IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
     0 |     10107486208 |       154228 |      80.33 |    1285.23 |    3.109 |     3.640 | E:\iotest.dat (440GB)
     1 |     10038870016 |       153181 |      79.78 |    1276.50 |    3.130 |     4.082 | E:\iotest.dat (440GB)
     2 |     10062594048 |       153543 |      79.97 |    1279.52 |    3.123 |     4.048 | E:\iotest.dat (440GB)
     3 |     10012590080 |       152780 |      79.57 |    1273.16 |    3.138 |     3.954 | E:\iotest.dat (440GB)
     4 |     10169417728 |       155173 |      80.82 |    1293.10 |    3.090 |     3.909 | E:\iotest.dat (440GB)
     5 |     10148446208 |       154853 |      80.65 |    1290.44 |    3.096 |     4.159 | E:\iotest.dat (440GB)
     6 |     10158669824 |       155009 |      80.73 |    1291.74 |    3.093 |     4.024 | E:\iotest.dat (440GB)
     7 |     10205724672 |       155727 |      81.11 |    1297.72 |    3.079 |     3.901 | E:\iotest.dat (440GB)
     8 |     10096607232 |       154062 |      80.24 |    1283.85 |    3.112 |     3.896 | E:\iotest.dat (440GB)
     9 |     10057023488 |       153458 |      79.93 |    1278.81 |    3.124 |     4.187 | E:\iotest.dat (440GB)
    10 |     10092347392 |       153997 |      80.21 |    1283.30 |    3.113 |     3.951 | E:\iotest.dat (440GB)
    11 |      9996730368 |       152538 |      79.45 |    1271.15 |    3.143 |     3.894 | E:\iotest.dat (440GB)
    12 |     10157883392 |       154997 |      80.73 |    1291.64 |    3.093 |     4.040 | E:\iotest.dat (440GB)
    13 |     10157424640 |       154990 |      80.72 |    1291.58 |    3.093 |     3.934 | E:\iotest.dat (440GB)
    14 |     10177937408 |       155303 |      80.89 |    1294.19 |    3.087 |     3.978 | E:\iotest.dat (440GB)
    15 |     10223681536 |       156001 |      81.25 |    1300.00 |    3.073 |     3.642 | E:\iotest.dat (440GB)
total:      161863434240 |      2469840 |    1286.37 |   20581.94 |    3.106 |     3.955

Remember: The I/Os are recorded in whatever blocksize the test specified. In the case above, the I/Os are 64K I/Os.
Last, but not least are the latency measurements:
  %-ile |  Read (ms) | Write (ms) | Total (ms)
    min |      0.535 |      0.729 |      0.535
   25th |      2.531 |      3.446 |      2.565
   50th |      2.796 |      3.792 |      2.849
   75th |      3.088 |      4.227 |      3.211
   90th |      3.439 |      4.743 |      3.745
   95th |      3.763 |      5.179 |      4.169
   99th |      4.818 |      6.761 |      5.274
3-nines |     38.694 |     42.926 |     39.374
4-nines |    207.585 |    209.483 |    207.734
5-nines |    208.562 |    210.939 |    209.483
6-nines |    209.058 |    211.330 |    210.939
7-nines |    209.256 |    211.330 |    211.330
8-nines |    209.256 |    211.330 |    211.330
9-nines |    209.256 |    211.330 |    211.330
    max |    209.256 |    211.330 |    211.330

This last section shows the latency percentile distribution of the test results from the minimum to the maximum value in milliseconds, split into reads, writes and total latency. It’s essential to know how the storage will perform and respond under load, so this section should examined carefully. The “n-nines” in the ‘%-ile’ column refers to the number of nines, where 3-nines means 99.9%, 4-nines means 99.99% etc. If you want to accurately measure the higher percentiles, you should run longer duration tests that generate a larger number of I/O operations.
What you want to look for in the latency results is the point at which the values make a large jump. In this test, 99% of the reads had a latency of 4.818 milliseconds or less, but if we go higher, 99.9% of the reads had a latency of 38.694 milliseconds or less.


Powered by Blogger