1
0
Fork 0
mirror of https://github.com/mperham/sidekiq.git synced 2022-11-09 13:52:34 -05:00

Better scheduling for large clusters, fixes #3889

Today we add 50% to the sleep time so that processes cluster around the target time:

50% --> target <-- 150%

This works well for small clusters, e.g. less than 10 processes.

The problem is that, for large clusters, the processes will never sleep less than 50% of (process_count * poll average) which breaks the average and delays job scheduling for several minutes.

Instead, beyond 10 processes, don't add that 50% buffer.  Allow the processes to sleep anywhere within the timespan:

0% --> target <-- 200%

With many processes, the average sleep within a 200% time period should work out close enough to 100%.
This commit is contained in:
Mike Perham 2018-07-18 10:02:38 -07:00
parent 2483e9d56d
commit 8a589d68fe

View file

@ -97,9 +97,34 @@ module Sidekiq
sleep 5
end
# Calculates a random interval that is ±50% the desired average.
def random_poll_interval
poll_interval_average * rand + poll_interval_average.to_f / 2
# We want one Sidekiq process to schedule jobs every N seconds. We have M processes
# and **don't** want to coordinate.
#
# So in N*M second timespan, we want each process to schedule once. The basic loop is:
#
# * sleep # a random amount within that N*M timespan
# * wake up, schedule
#
# There are pathological edge cases:
#
# Imagine a set of 4 processes, scheduling every 5 seconds, so N*M = 20. Each process
# decides to randomly sleep 18 seconds, now we've failed to meet that 5 second average.
# Thankfully each schedule cycle will sleep randomly so the next iteration could see each
# process sleep for 1 second, undercutting our average.
#
# So below 10 processes, we special case and ensure the processes sleep closer to the average.
# As we run more processes, the scheduling interval average should approach the desired
# amount.
#
if process_count < 10
# For small clusters, # calculates a random interval that is ±50% the desired average.
poll_interval_average * rand + poll_interval_average.to_f / 2
else
# With 10+ processes, we should have enough randomness to get decent polling
# across the entire timespan
poll_interval_average * rand * 2
end
end
# We do our best to tune the poll interval to the size of the active Sidekiq
@ -123,9 +148,13 @@ module Sidekiq
# This minimizes a single point of failure by dispersing check-ins but without taxing
# Redis if you run many Sidekiq processes.
def scaled_poll_interval
process_count * Sidekiq.options[:average_scheduled_poll_interval]
end
def process_count
pcount = Sidekiq::ProcessSet.new.size
pcount = 1 if pcount == 0
pcount * Sidekiq.options[:average_scheduled_poll_interval]
pcount
end
def initial_wait