Commit Graph

20 Commits

Author SHA1 Message Date
Gabriel Mazetto 39229eed34 Hashed Storage is enabled by default on new installations
updated documentation for Geo
2019-06-17 20:35:22 +02:00
Jacob Vosmaer 45c5c2aad6 Update git object deduplication overview 2019-06-12 07:12:15 +00:00
Marcel Amirault f4a1dcbe2f Docs: Merge Misc EE doc/administration files and dirs to CE 2019-05-05 15:21:25 +00:00
Katrin Leinweber ddfd99b288 Complete "Repository storage" directions 2019-04-13 15:31:45 +00:00
Marcel Amirault 8b42fe3b91 Docs: Fix more anchors, mostly pipeline related 2019-03-27 04:17:02 +00:00
James Ramsay f9a968ac5e Add beta caution to hashed object pools 2019-03-20 14:01:54 +00:00
Gabriel Mazetto 823695ee37 Document Storage Rollback mechanism
Updated Rake-specific documentation to include storage rollback,
and improved migration and rollback instructions.
2019-03-15 04:34:33 +01:00
Evan Read 47fb1c5235 Remove consecutive blank lines from markdown files
For the sake of consistency, removes any extraneous
consecutive blank lines from the doc suite.
2019-02-18 09:36:13 +00:00
Evan Read d98560c1f5 Make unordered lists conform to styleguide
- Also makes other minor Markdown fixes that were near the main fixes.
2019-01-08 12:21:09 +10:00
Zeger-Jan van de Weg 896c0bdbfb
Allow public forks to be deduplicated
When a project is forked, the new repository used to be a deep copy of everything
stored on disk by leveraging `git clone`. This works well, and makes isolation
between repository easy. However, the clone is at the start 100% the same as the
origin repository. And in the case of the objects in the object directory, this
is almost always going to be a lot of duplication.

Object Pools are a way to create a third repository that essentially only exists
for its 'objects' subdirectory. This third repository's object directory will be
set as alternate location for objects. This means that in the case an object is
missing in the local repository, git will look in another location. This other
location is the object pool repository.

When Git performs garbage collection, it's smart enough to check the
alternate location. When objects are duplicated, it will allow git to
throw one copy away. This copy is on the local repository, where to pool
remains as is.

These pools have an origin location, which for now will always be a
repository that itself is not a fork. When the root of a fork network is
forked by a user, the fork still clones the full repository. Async, the
pool repository will be created.

Either one of these processes can be done earlier than the other. To
handle this race condition, the Join ObjectPool operation is
idempotent. Given its idempotent, we can schedule it twice, with the
same effect.

To accommodate the holding of state two migrations have been added.
1. Added a state column to the pool_repositories column. This column is
managed by the state machine, allowing for hooks on transitions.
2. pool_repositories now has a source_project_id. This column in
convenient to have for multiple reasons: it has a unique index allowing
the database to handle race conditions when creating a new record. Also,
it's nice to know who the host is. As that's a short link to the fork
networks root.

Object pools are only available for public project, which use hashed
storage and when forking from the root of the fork network. (That is,
the project being forked from itself isn't a fork)

In this commit message I use both ObjectPool and Pool repositories,
which are alike, but different from each other. ObjectPool refers to
whatever is on the disk stored and managed by Gitaly. PoolRepository is
the record in the database.
2018-12-07 19:18:37 +01:00
John Jarvis 74e8e9554e Update repository_storage_types.md 2018-10-18 12:56:43 +00:00
Ben Bodenmiller 8be9c0bfad fix hashed storage readiness link 2018-08-12 07:42:35 +00:00
Valery Sizov 10df0eb7cb Resolve "Hashed storage: extend "Enable hashed storage for all new projects" to "for all new and renamed projects"" 2018-08-03 14:34:28 +00:00
Gabriel Mazetto fbc687c032 Improve Hashed Storage documentation for rollback
Fixed storage coverage table with additional information
and wrote down implementationd details from few entities.
2018-06-27 13:01:25 +02:00
Nick Thomas 4e3ad326e1
Backport EE changes to some hashed storage documentation to CE 2018-02-08 18:33:35 +00:00
Marcia Ramos b314ef7de7 search and replace EES, EEP, EEU with Starter, Premium, Ultimate 2018-02-01 19:09:30 -02:00
Gabriel Mazetto 2db542c519 Added file storage documentation and updated hash storage one 2017-11-08 15:58:10 +01:00
Michael Kozono c57a7cafc1 Fix typo 2017-11-01 17:04:08 +00:00
Gabriel Mazetto eed6408e0d Document existing storable objects and their status regarding Hashed storage 2017-10-30 14:31:10 +01:00
Gabriel Mazetto f4de14d71f
Add support to migrate existing projects to Hashed Storage async 2017-09-28 16:32:14 +01:00