One of the most confusing performance metrics in vCenter server is the CPU “Ready” metric. Part of the reason is that this metrics shows up as a “Summation in milliseconds”. In additional to that the threshold changes depending if you are looking at “Real-time”, “Past day”, “Past Week”, “Past Month” or “Past Year”. What this means is that if you are seeing a 2000 millisecond summation for the CPU “Ready” per vCPU in Real Time it indicates an issue, the equivalent for “Past day” view would be 30,000.
Ready – If value per vCPU is above 10% the VM will start to see a performance degradation
Quick Reference guide (Ready value equivalent to 10% in ESXTop per vCPU)
RealTime: 2000
Past Day: 30,000
Past Week: 180,000
Past Month: 720,000
Past Year: 8,640,000
Why does the threshold change for different views in vCenter?
When you bring up the “RealTime” view for a CPU what you are looking at is one sample taken every 20 seconds. When you are looking at “Past Day” view it’s one sample time taken every 5 minutes, past week 30 min and so on. This sample time duration changes how the CPU “Ready summation” gets reported and we need to apply some math to get an equivalent %RDY for different views.
When looking in ESXTop this metric gets reported in %RDY which is much simpler to interpret. The problem with ESXTop is that it only reports real-time stats. If a user complains that his application had a performance issue yesterday at 9PM you have to rely on vCenter server performance metrics to see if in fact the %RDY was high at that time and if we should blame the VMware environment for the slow down. There are a number of posts that discuss what this number should but the general consensus is that if a VM’s %RDY” metric is above 10% the VM will start to see a performance degradation.
Example:
When I look at the past day performance charts for one of the VM’s, vCenter reports 15,000 milliseconds for the CPU “Ready” metric. I know that past day takes a sample of the cpu every 5 minutes or 300 seconds or 300,000 milliseconds. All I want to know is if the value is good or bad, to find out you need to do some math. So for Past day the math would be (15,000/300,000)*100 = 5%, this number is ok for most applications and is below the 10% threshold per vCPU.
If a VM has 2vCPU’s then the above would mean that each vCPU is reporting 2.5% %RDY which is way below the 10% threshold per vCPU. So a threashold that would indicate an issue for a 2vCPU VM would be 20% %RDY in ESXtop.
vSphere client > Administration > vCenter server settings.
Thresholds per vCPU (10%)
RealTime: 2000
Past Day: 30,000
Past Week: 180,000
Past Month: 720,000
Past Year: 8,640,000
vCenter “Ready” metric (10% RDY Threasholds)
Realtime (20 sec) >> 20,000 milliseconds | Math–> (2000/20,000)*100 = 10% RDY
Past Day (5 min ) >> 300,000 milliseconds | Math–> (30,000/300,000)*100 = 10% RDY
Past Week (30 min) >> 1,800,000 milliseconds | Math–> (180,000/1,800,000)*100 = 10% RDY
Past Month(2 hr ) >> 7,200,000 milliseconds | Math–> (720,000/7,200,000)*100 = 10% RDY
Past Year (1 day ) >> 86,400,000 milliseconds | Math–> (8,640,000/86,400,000)*100 = 10% RDY
I will use Realtime as an example. If a VM has 2 vCPU and vCenter is reporting 2000 for the “Ready” metric for the whole VM. This mans that each vCPU is reporting 5% RDY.
References:
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2002181