Understanding Historical “CPU Ready” statistics in VMware vCenter server

One of the most confusing performance metrics in vCenter server is the CPU “Ready” metric. Part of the reason is that this metrics shows up as a “Summation in milliseconds”.  In additional to that the threshold changes depending if you are looking at “Real-time”, “Past day”, “Past Week”, “Past Month” or “Past Year”. What this means is that if you are seeing a 2000 millisecond summation for the CPU “Ready” per vCPU in Real Time it indicates an issue, the equivalent for “Past day” view would be 30,000.

Ready – If value per vCPU is above 10% the VM will start to see a performance degradation

Quick Reference guide (Ready value equivalent to 10% in ESXTop per vCPU)
RealTime:      2000
Past Day:       30,000
Past Week:    180,000
Past Month:  720,000
Past Year:      8,640,000

rdy

Why does the threshold change for different views in vCenter?

When you bring up the “RealTime” view for a CPU what you are looking at is one sample taken every 20 seconds. When you are looking at “Past Day” view it’s one sample time taken every 5 minutes, past week 30 min and so on. This sample time duration changes how the CPU “Ready summation” gets reported and we need to apply some math to get an equivalent %RDY for different views.

rdy20

When looking in ESXTop this metric gets reported in %RDY which is much simpler to interpret. The problem with ESXTop is that it only reports real-time stats. If a user complains that his application had a performance issue yesterday at 9PM you have to rely on vCenter server performance metrics to see if in fact the %RDY was high at that time and if we should blame the VMware environment for the slow down. There are a number of posts that discuss what this number should but the general consensus is that if a VM’s %RDY” metric is above 10% the VM will start to see a performance degradation.

rdy3

Example:

When I look at the past day performance charts for one of the VM’s, vCenter reports 15,000 milliseconds for the CPU “Ready” metric. I know that past day takes a sample of the cpu every 5 minutes or 300 seconds or 300,000 milliseconds. All I want to know is if the value is good or bad, to find out you need to do some math. So for Past day the math would be (15,000/300,000)*100 = 5%, this number is ok for most applications and is below the 10% threshold per vCPU.

If a VM has 2vCPU’s then the above would mean that each vCPU is reporting 2.5% %RDY which is way below the 10% threshold per vCPU. So a threashold that would indicate an issue for a 2vCPU VM would be 20% %RDY in ESXtop.

vSphere client > Administration > vCenter server settings.

stats
The Math:

Thresholds per vCPU (10%)
RealTime:      2000
Past Day:       30,000
Past Week:    180,000
Past Month:  720,000
Past Year:      8,640,000

vCenter “Ready” metric (10% RDY Threasholds)
Realtime (20 sec) >> 20,000 milliseconds | Math–> (2000/20,000)*100 = 10% RDY
Past Day (5 min ) >> 300,000 milliseconds | Math–> (30,000/300,000)*100 = 10% RDY
Past Week (30 min) >> 1,800,000 milliseconds | Math–> (180,000/1,800,000)*100 = 10% RDY
Past Month(2 hr ) >> 7,200,000 milliseconds | Math–> (720,000/7,200,000)*100 = 10% RDY
Past Year (1 day ) >> 86,400,000 milliseconds | Math–> (8,640,000/86,400,000)*100 = 10% RDY

I will use Realtime as an example. If a VM has 2 vCPU and vCenter is reporting 2000 for the “Ready” metric for the whole VM. This mans that each vCPU is reporting 5% RDY.

References:
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2002181

 

This entry was posted in VMware and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *