gitlab-org--gitlab-foss

Commit Graph

Author	SHA1	Message	Date
John Cai	6c35fb59b7	Add GitDeduplicationService for deduplication housekeeping GitDeduplicationService performs idempotent operations on deduplicated projects.	2019-05-21 13:34:31 -07:00
Jan Provaznik	d25239ee0b	Use git_garbage_collect_worker to run pack_refs PackRefs is not an expensive gitaly call - we want to call it more often (than as part of full `gc`) because it helps to keep number of refs files small - too many refs file may be a problem for deployments with slow storage.	2019-05-02 21:41:05 +00:00
Zeger-Jan van de Weg	e03602e09d	Ensure pool participants are linked before GC In theory the case could happen that the initial linking of the pool fails and so do all the retries that Sidekiq performs. This could lead to data loss. To prevent that case, linking is done before Gits GC too. This makes sure that case doesn't happen.	2019-01-14 16:09:47 +01:00
Zeger-Jan van de Weg	896c0bdbfb	Allow public forks to be deduplicated When a project is forked, the new repository used to be a deep copy of everything stored on disk by leveraging `git clone`. This works well, and makes isolation between repository easy. However, the clone is at the start 100% the same as the origin repository. And in the case of the objects in the object directory, this is almost always going to be a lot of duplication. Object Pools are a way to create a third repository that essentially only exists for its 'objects' subdirectory. This third repository's object directory will be set as alternate location for objects. This means that in the case an object is missing in the local repository, git will look in another location. This other location is the object pool repository. When Git performs garbage collection, it's smart enough to check the alternate location. When objects are duplicated, it will allow git to throw one copy away. This copy is on the local repository, where to pool remains as is. These pools have an origin location, which for now will always be a repository that itself is not a fork. When the root of a fork network is forked by a user, the fork still clones the full repository. Async, the pool repository will be created. Either one of these processes can be done earlier than the other. To handle this race condition, the Join ObjectPool operation is idempotent. Given its idempotent, we can schedule it twice, with the same effect. To accommodate the holding of state two migrations have been added. 1. Added a state column to the pool_repositories column. This column is managed by the state machine, allowing for hooks on transitions. 2. pool_repositories now has a source_project_id. This column in convenient to have for multiple reasons: it has a unique index allowing the database to handle race conditions when creating a new record. Also, it's nice to know who the host is. As that's a short link to the fork networks root. Object pools are only available for public project, which use hashed storage and when forking from the root of the fork network. (That is, the project being forked from itself isn't a fork) In this commit message I use both ObjectPool and Pool repositories, which are alike, but different from each other. ObjectPool refers to whatever is on the disk stored and managed by Gitaly. PoolRepository is the record in the database.	2018-12-07 19:18:37 +01:00
Stan Hu	0c1eebe24c	Fix ArgumentError in GitGarbageCollectWorker Sidekiq job When the Gitaly call failed, the exception handling failed because `method` is expected to have a parameter. Closes #49096	2018-07-10 15:11:10 -07:00
gfyoung	dfbe5ce435	Enable frozen string literals for app/workers/*.rb	2018-06-27 07:23:28 +00:00
Zeger-Jan van de Weg	0e2577229d	Move GC RPCs to mandatory Closes https://gitlab.com/gitlab-org/gitaly/issues/354	2018-06-13 16:36:43 +02:00
Kim Carlbäcker	cc9468e4fa	Move GC/Repack to OptOut	2018-06-06 14:28:03 +00:00
Jacob Vosmaer (GitLab)	c43e18fc49	Remove some easy cases of 'path_to_repo' use	2018-03-28 09:21:32 +00:00
Stan Hu	885998c220	Release libgit2 cache and open file descriptors after `git gc` run Relates to #21879	2018-03-03 22:21:50 -08:00
Mario de la Ossa	eaada9d706	use Gitlab::UserSettings directly as a singleton instead of including/extending it	2018-02-02 18:39:55 +00:00
Douwe Maan	0b15570e49	Add ApplicationWorker and make every worker include it	2017-12-05 11:59:39 +01:00
Tiago Botelho	39298575a8	Adds exclusive lease to Git garbage collect worker.	2017-09-07 18:52:04 +01:00
Kim "BKC" Carlbäcker	05f90b861f	Migrate GitGarbageCollectWorker to Gitaly	2017-07-28 17:49:22 +02:00
Jacob Vosmaer	6bcc52a536	Refine Git garbage collection	2016-11-04 14:30:11 +01:00
Yorick Peterse	97731760d7	Re-organize queues to use for Sidekiq Dumping too many jobs in the same queue (e.g. the "default" queue) is a dangerous setup. Jobs that take a long time to process can effectively block any other work from being performed given there are enough of these jobs. Furthermore it becomes harder to monitor the jobs as a single queue could contain jobs for different workers. In such a setup the only reliable way of getting counts per job is to iterate over all jobs in a queue, which is a rather time consuming process. By using separate queues for various workers we have better control over throughput, we can add weight to queues, and we can monitor queues better. Some workers still use the same queue whenever their work is related. For example, the various CI pipeline workers use the same "pipeline" queue. This commit includes a Rails migration that moves Sidekiq jobs from the old queues to the new ones. This migration also takes care of doing the inverse if ever needed. This does require downtime as otherwise new jobs could be scheduled in the old queues after this migration completes. This commit also includes an RSpec test that blacklists the use of the "default" queue and ensures cron workers use the "cronjob" queue. Fixes gitlab-org/gitlab-ce#23370	2016-10-21 18:17:07 +02:00
Stan Hu	0d4b1bb752	Refresh branch cache after `git gc` Possible workaround for #15392	2016-07-13 06:49:58 -07:00
Stan Hu	3dc6bf2b71	Expire the branch cache after `git gc` runs Due to a stale NFS cache, it's possible that a branch lookup fails while `git gc` is running and causes missing branches in merge requests. Possible workaround for #15392	2016-07-12 05:42:19 -07:00

18 Commits