gitlab-org--gitlab-foss/lib/gitlab/storage_check/cli.rb
Bob Van Landuyt f1ae1e39ce Move the circuitbreaker check out in a separate process
Moving the check out of the general requests, makes sure we don't have
any slowdown in the regular requests.

To keep the process performing this checks small, the check is still
performed inside a unicorn. But that is called from a process running
on the same server.

Because the checks are now done outside normal request, we can have a
simpler failure strategy:

The check is now performed in the background every
`circuitbreaker_check_interval`. Failures are logged in redis. The
failures are reset when the check succeeds. Per check we will try
`circuitbreaker_access_retries` times within
`circuitbreaker_storage_timeout` seconds.

When the number of failures exceeds
`circuitbreaker_failure_count_threshold`, we will block access to the
storage.

After `failure_reset_time` of no checks, we will clear the stored
failures. This could happen when the process that performs the checks
is not running.
2017-12-08 09:11:39 +01:00

69 lines
1.8 KiB
Ruby

module Gitlab
module StorageCheck
class CLI
def self.start!(args)
runner = new(Gitlab::StorageCheck::OptionParser.parse!(args))
runner.start_loop
end
attr_reader :logger, :options
def initialize(options)
@options = options
@logger = Logger.new(STDOUT)
end
def start_loop
logger.info "Checking #{options.target} every #{options.interval} seconds"
if options.dryrun
logger.info "Dryrun, exiting..."
return
end
begin
loop do
response = GitlabCaller.new(options).call!
log_response(response)
update_settings(response)
sleep options.interval
end
rescue Interrupt
logger.info "Ending storage-check"
end
end
def update_settings(response)
previous_interval = options.interval
if response.valid?
options.interval = response.check_interval || previous_interval
end
if previous_interval != options.interval
logger.info "Interval changed: #{options.interval} seconds"
end
end
def log_response(response)
unless response.valid?
return logger.error("Invalid response checking nfs storage: #{response.http_response.inspect}")
end
if response.responsive_shards.any?
logger.debug("Responsive shards: #{response.responsive_shards.join(', ')}")
end
warnings = []
if response.skipped_shards.any?
warnings << "Skipped shards: #{response.skipped_shards.join(', ')}"
end
if response.failing_shards.any?
warnings << "Failing shards: #{response.failing_shards.join(', ')}"
end
logger.warn(warnings.join(' - ')) if warnings.any?
end
end
end
end