Analyzing Storage Performance issues with PerfMon

Perfmon is a built in Windows utility that measures performance, it has been introduced in Windows 3.1 and has shipped with every version of Windows ever since. I think every admin should become a little bit more familiar with it as it is a really useful tool for troubleshooting performance.

My place of work has a big VMware environment so I am used to looking at performance issues through ESXTOP, vsphere client performance charts or task manager. On a few occasions I set up a perfmon counter to examine memory but since VMware offers a lot of the same counters through the performance charts running perfmon is usually not required.

I recently had to get more familiar with perfmon as all of the SQL servers that we manage are physical. Perfmon and task manager are the only two ways that I can get a little bit more detail regarding the health of the server. I’m making this post for my own reference and for anyone else who wants to learn more about it.

Physical vs Logical Counters

In perfmon you will see the same counters for Physical and Logical disks, its important to know when to use the right one. One example of a logical disk would be a Raid-5 configuration that consists of 6 disks and you partitioned it to be drive C: then this would be a logical volume, this is pretty typical of a server configuration. An example of a physical disk would be your laptop or desktop PC. You have one 2TB disk and you installed windows on it and partitioned he whole disk to be a C so there are no logical volumes.

Perfmon Counters to Measure Disk Response Times:

  • Latency
    • \PhysicalDisk\Avg. Disk Sec/Read
    • \PhysicalDisk\Avg. Disk Sec/Write
    • \PhysicalDisk\Avg. Disk sec/Transfer
  • IOPS
    • \PhysicalDisk\Disk Reads/Sec
    • \PhysicalDisk\Disk Writes/Sec
    • \PhysicalDisk\Disk Transfers/sec
  • Throughput
    • \PhysicalDisk\Disk Read Bytes/Sec
    • \PhysicalDisk\Disk Write Bytes/sec
    • \PhysicalDisk\Disk Bytes/Sec
  • Fragmentation
    • \PhysicalDisk\Split IO/sec
  • Disk Queue Length
    • \PhysicalDisk\Avg. Disk Queue Length

The Short Version:

Latency: Avg. Disk sec/Transfer (Scale 1000) – Measures how long it takes read & write commands to complete. If disk latency stays at a constant 25 ms or higher you will start to experience performance issues.

IOPS: Disk Transfers/sec (Scale 1) – Measures the total number of read and write (input/output) operations per second (IOPS). On a single 7,200 disk there are a total of 100 IOPS available, 10,000 RPM 150 IOPS, 15,000 RPM 200 IOPS. A Raid-5 Logical volume made up of (6) 10,000 RPM disk has 900 IOPS available . If the application generates more IOPS than available disk performance will suffer.

Throughput: Disk Bytes/Sec (Scale 0.000001 to see MB/s) This measures how much data is being sent through channel, SCSI connections, Network speed, bus speed. Depending on the connection you may reach a throughput limit, sending too much data and it’s saturating the link. A Gigabit connection can handle file transfers up to 120 MB/s, is this being exhausted, is your iSCSI connection going over a gigabit network.

Fragmentation: Split IO/sec (Scale 1) Does the operating system have to perform more then one command for each I/O due to the file being split too far apart on the spindle disk. Split I/O is a good indicator of disk fragmentation, run disk derangement to reduce seek time.

Avg. Disk Queue Length (Scale 1) – This indicates the length of the queue when transactions are waiting to be read or written to the disk. This metric reports the number of requests that are waiting when the counter was measured including the ones being serviced. Threshold is number of spindles + 2. What this metric really indicates is that there are disk transactions in the queue.


Disk latency is the measure of how long it takes a storage command to complete. You can find the threshold numbers in the below chart that indicate whether the latency is acceptable. The latency thresholds are somewhat up for debate. If a user is using a server and the latency reaches 25 ms they may start to notice the slowness. On the other hand If a standalone server is processing data with a 30 ms it may go unnoticed.

The Disk latency counters report disk latency in seconds but they report it with milliseconds accuracy. What this means is that when you look at the perfmon screenshot below and locate the (Latest, Average, Minimum, Maximum) you will find that the numbers for these fields are reported in seconds. For example the average below is being reported as 0.049 seconds, at first glace I’m not sure if this is good or bad latency. First question is how many milliseconds is this, so 0.049 seconds translates to 49 ms so now I know this sis a little high according to the below threshold numbers. To get meaningful millisecond numbers to show up on the chart you will have to adjust the scale of the counter to 1000 if its not set already.

The following perfmon counters measure physical disk latency

Disk Latency
\PhysicalDisk(*)\Avg. Disk sec/Transfer (Measures Read/Write)
\PhysicalDisk(*)\Avg. Disk sec/Read
\PhysicalDisk(*)\Avg. Disk sec/Write

Disk Latency Thresholds
Excellent: below 10 ms
Good: below 15 ms
Ok: below 20 ms
Slow: below 25 ms
Performance issue: 25 ms and up (Some may argue 15 and up)

Generating Latency

In the below screenshot I generated disk load with a program called “Heavy Load”, the utility does random writes to a temp file. I started the program and watched the latency rise. At the beginning I can see the disk latency starts to climb close to 50, then has a spike to around 130 ms. After that the program really starts to generate the load on the disk and we peak at 230 ms. When it reached 230 ms latency I could barely stop the disk load program as it was lagging so much.

Latest: 0.153 seconds equals to 153 ms
Average: 0.043 seconds equals to 43 ms
Maximum: 0.238 seconds equlas to 238 ms
Another Example: 0.009 seconds equals to 9 ms

We can also see the same information for Read and Write if I wanted to see what is contributing to the latency numbers.

Disk Laency for Avg. Disk sec/Write

Latency on a Production SQL Server

Here is a what latency numbers look like on one of the Production SQL servers. In this case every morning around 10AM the client runs a report and it causes the SQL server to read from the disk and spike the latency.

For about 15 minutes the latency goes up to around 70ms and then starting from 10:30AM the latency returns back to normal 20 to 30 ms. This is a little high but the underlying storage is a very busy. It is a Netapp array that is also hosting a lot of virtual machines. In this case even though the latency spikes to where you would see a performance issue, the reports get generated and the amount of time it takes to do that is acceptable.

10AM – Latency jumps to 70ms for around 10 minutes
10:35AM – Latency goes back to normal 25 – 30 ms


Disk Transfers/Sec measure the amount of read and write operations on the disk. This metric is important as hard disks are only able to execute a specific amount of commands at once before they reach their limit. The below metrics capture the Input Output operations on the disk.

\PhysicalDisk(*)\Disk Transfers/Sec (Measures Read/Write)
\PhysicalDisk(*)\Disk Reads/Sec
\PhysicalDisk(*)\Disk Writes/Sec

I read an interesting article KB1031773 a while ago regarding disk performance and IOPS. The article mentioned that a single VM requires a minimum of 30 IOPS in order to run as a Virtual Machine. This is probably one VM doing nothing or running very small tasks. I have run VM’s before on 7,200 RPM disks and 3 VM’s on one disk seems about right before you start to see slowness.

Disk Speed (RPM) IOPS available
7,200 RPM 100
10,000 RPM 150
15,000 RPM 230

There is a formula used to calculate the total IOPS available on a set of disks. If you have a logical raid volume made up of (6) 10,000 RPM disks then you have 150 x 6 = 900 IOPS available. This means that you can power up 30 VM’s and each one will have 30 IOPS. In reality if you want the VM’s to do any work you may want to bring that down to like 15 VM’s.

Create a Custom ‘IOPS’ Collector:
Start > Run > Perfmon.exe
Data Collector Sets > User Defined, right-click New > Data Collector Set
Name “IOPS’, Create manually (Advanced), Next
Create data logs > Performance counter, Next
Performance counters > Add the 3 IOPS counters,
for LAB adjust Sample Interval 1,
for PROD adjust Sample Interval 10 seconds (Average will be used)

Generating IOPS

In the below test I am running a ‘CrystalDiskMark’ and ‘HeavyLoad’ utilities at the same time in order to generate IO, i thought both of them at the same time would be a good test. Needed something quick I know IOMeter is what a lot of admins use but no time for configuration needed something quick.

The below test is on a Windows server 2012 R2 VM running on a single 7,200 RPM disk. Even though I see some spike in IOPS that go way up to the 200’s the average reported by perfmon is 98 IOPS which is very close to what a 7,200 RPM disk is capable of (max 100 IOPS for 7,200 RPM disk). I was surprised to see that the disk is capable of reaching 200 IOPS. From what I was able to find is that this is pretty common output. When manufactures rate disks for IOPS it’s the worst case scenario where the tests are run with random reads/writes and the needle on the disk has to travel across the entire spindle to read the data increasing access time. For sequential reads/writes where the needle doesn’t have to travel as far it is common to see higher IOPS then the disk is rated for.

IOPS on a 7200 RPM disk (avg 98 IOPS)


I always hear people mix up the terminology between Throughput vs Bandwidth, here is a quick explanation. Bandwidth is how much data we can send through a channel and Throughput is how much data we are actually sending down the channel. On a Gigabit network, we can theoretically send 120 MB/s worth of information before we saturate the link. If you have an iSCSI disk connected over a Gigabit network then your maximum throughput might be 120 MB/s.

Metrics that measure disk throughput

Disk Throughput
\PhysicalDisk(*)\Disk Bytes/sec (Measures Read/Write)
\PhysicalDisk(*)\Disk Read Bytes/sec
\PhysicalDisk(*)\Disk Write Bytes/sec

Create a Custom ‘Throughput’ Collector:
Start > Run > Perfmon.exe
Data Collector Sets > User Defined, right-click New > Data Collector Set
Name “Throughput’, Create manually (Advanced), Next
Create data logs > Performance counter, Next
Performance counters > Add the IOPS counters,
for LAB environment Sample Interval 1,
for PROD Sample Interval 10 seconds (Average will be used)
Start the User Defined collector
View details under Reports

Generating Throughput

To be able to see the details in MB/s you need to adjust the scale for the above counters to 0.000001. I ran the ‘DiskMark’ tool to generate load and I could see throughput was jumping from 80 MB/s to 115 MB/s.

The below data was gathered on a 7,200 RPM disk. Not shown on the graph I could view the counter for Read throughput and they matched what I was seeing in the CrystalDiskMark software which was around 80MB/s.

Split I/O

Split IO/Sec reports the rate at which I/Os to the disk were split into multiple I/Os. A split I/O may result from requesting data of a size that is too large to fit into a single I/O or that the disk is fragmented.

Split I/O
\PhysicalDisk\Split IO/sec

In other words with a more simplified example. When you first create a file lets say it is 1GB. Then you create other files that take up the sectors on the disk next to this file. Then you write more data to the original file but the sectors next to it are taken so that data needs to be written somewhere else on the diskat the end of the disk. So now you have 1GB at the beginning of the disk and another 1GB at the end of the disk. The OS has to execute two I/O commands to reach the data. Run disk defragmenter to lower the Split IO metric.

Split I/O

Disk Queue Length

This indicates the length of the queue when transactions are waiting to be read or written to the disk. This metric reports the number of requests that are waiting when the counter was measured including the ones being serviced. Threshold is number of spindles + 2. If you have a RAID0 made up of 5 disks (spindles) then anything above 7 indicates the disk I/O is queuing up.

Disk Queue Length
\PhysicalDisk(*)\Avg. Disk Queue Length
Avg. Disk Queue Length (Scale 1)

This entry was posted in Microsoft, Storage and tagged , , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *