Previously `ProjectCacheWorker` would be scheduled once per ref, which
would generate unnecessary I/O and load on Sidekiq, especially if many
tags or branches were pushed at once. `ProjectCacheWorker` would expire
three items:
1. Repository size: This only needs to be updated once per push.
2. Commit count: This only needs to be updated if the default branch
is updated.
3. Project method caches: This only needs to be updated if the default
branch changes, but only if certain files change (e.g. README,
CHANGELOG, etc.).
Because the third item requires looking at the actual changes in the
commit deltas, we schedule one `ProjectCacheWorker` to handle the first
two cases, and schedule a separate `ProjectCacheWorker` for the third
case if it is needed. As a result, this brings down the number of
`ProjectCacheWorker` jobs from N to 2.
Closes https://gitlab.com/gitlab-org/gitlab-ce/issues/52046
Schedules a Namespace::AggregationSchedule worker if some of the project
statistics are refreshed.
The worker is only executed if the feature flag is enabled.
The ProjectCacheWorker refreshes cache periodically, but it runs outside Rails
context. So include the ActionView helpers so the `content_tag` method is
available.
This adds counters for build artifacts and LFS objects, and moves
the preexisting repository_size and commit_count from the projects
table into a new project_statistics table.
The counters are displayed in the administration area for projects
and groups, and also available through the API for admins (on */all)
and normal users (on */owned)
The statistics are updated through ProjectCacheWorker, which can now
do more granular updates with the new :statistics argument.
This refactors repository caching so it's possible to selectively
refresh certain caches, instead of just expiring and refreshing
everything.
To allow this the various methods that were cached (e.g. "tag_count" and
"readme") use a similar pattern that makes expiring and refreshing
their data much easier.
In this new setup caches are refreshed as follows:
1. After a commit (but before running ProjectCacheWorker) we expire some
basic caches such as the commit count and repository size.
2. ProjectCacheWorker will recalculate the commit count, repository
size, then refresh a specific set of caches based on the list of
files changed in a push payload.
This requires a bunch of changes to the various methods that may be
cached. For one, data should not be cached if a branch used or the
entire repository does not exist. To prevent all these methods from
handling this manually this is taken care of in
Repository#cache_method_output. Some methods still manually check for
the existence of a repository but this result is also cached.
With selective flushing implemented ProjectCacheWorker no longer uses an
exclusive lease for all of its work. Instead this worker only uses a
lease to limit the number of times the repository size is updated as
this is a fairly expensive operation.
This changes ProjectCacheWorker.perform_async so it only schedules a job
when no lease for the given project is present. This ensures we don't
end up scheduling hundreds of jobs when they won't be executed anyway.
This ensures ProjectCacheWorker jobs for a given project are performed
at most once per 15 minutes. This should reduce disk load a bit in cases
where there are multiple pushes happening (which should schedule
multiple ProjectCacheWorker jobs).