This reduces the overhead of the method instrumentation code primarily
by reducing the number of method calls. There are also some other small
optimisations such as not casting timing values to Floats (there's no
particular need for this), using Symbols for method call metric names,
and reducing the number of Hash lookups for instrumented methods.
The exact impact depends on the code being executed. For example, for a
method that's only called once the difference won't be very noticeable.
However, for methods that are called many times the difference can be
more significant.
For example, the loading time of a large commit
(nrclark/dummy_project@81ebdea5df)
was reduced from around 19 seconds to around 15 seconds using these
changes.
Process.clock_gettime allows getting the real time in nanoseconds as
well as allowing one to get a monotonic timestamp. This offers greater
accuracy without the overhead of having to allocate a Time instance. In
general using Time.now/Time.new is about 2x slower than using
Process.clock_gettime(). For example:
require 'benchmark/ips'
Benchmark.ips do |bench|
bench.report 'Time.now' do
Time.now.to_f
end
bench.report 'clock_gettime' do
Process.clock_gettime(Process::CLOCK_MONOTONIC, :millisecond)
end
bench.compare!
end
Running this benchmark gives:
Calculating -------------------------------------
Time.now 108.052k i/100ms
clock_gettime 125.984k i/100ms
-------------------------------------------------
Time.now 2.343M (± 7.1%) i/s - 11.670M
clock_gettime 4.979M (± 0.8%) i/s - 24.945M
Comparison:
clock_gettime: 4979393.8 i/s
Time.now: 2342986.8 i/s - 2.13x slower
Another benefit of using Process.clock_gettime() is that we can simplify
the code a bit since it can give timestamps in nanoseconds out of the
box.
Previously we'd create a separate Metric instance for every method call
that would exceed the method call threshold. This is problematic because
it doesn't provide us with information to accurately get the _total_
execution time of a particular method. For example, if the method
"Foo#bar" was called 4 times with a runtime of ~10 milliseconds we'd end
up with 4 different Metric instances. If we were to then get the
average/95th percentile/etc of the timings this would be roughly 10
milliseconds. However, the _actual_ total time spent in this method
would be around 40 milliseconds.
To solve this problem we now create a single Metric instance per method.
This Metric instance contains the _total_ real/CPU time and the call
count for every instrumented method.