From 4e3ad326e10f724c0e6dac11e0c2631e666bb8d0 Mon Sep 17 00:00:00 2001 From: Nick Thomas Date: Thu, 8 Feb 2018 18:33:35 +0000 Subject: [PATCH] Backport EE changes to some hashed storage documentation to CE --- .../repository_storage_types.md | 91 +++++++++++-------- 1 file changed, 52 insertions(+), 39 deletions(-) diff --git a/doc/administration/repository_storage_types.md b/doc/administration/repository_storage_types.md index c5b286f6804..39bd19ac851 100644 --- a/doc/administration/repository_storage_types.md +++ b/doc/administration/repository_storage_types.md @@ -4,50 +4,63 @@ ## Legacy Storage -Legacy Storage is the storage behavior prior to version 10.0. For historical reasons, GitLab replicated the same -mapping structure from the projects URLs: +Legacy Storage is the storage behavior prior to version 10.0. For historical +reasons, GitLab replicated the same mapping structure from the projects URLs: - * Project's repository: `#{namespace}/#{project_name}.git` - * Project's wiki: `#{namespace}/#{project_name}.wiki.git` +* Project's repository: `#{namespace}/#{project_name}.git` +* Project's wiki: `#{namespace}/#{project_name}.wiki.git` -This structure made simple to migrate from existing solutions to GitLab and easy for Administrators to find where the -repository is stored. +This structure made it simple to migrate from existing solutions to GitLab and +easy for Administrators to find where the repository is stored. On the other hand this has some drawbacks: -Storage location will concentrate huge amount of top-level namespaces. The impact can be reduced by the introduction of [multiple storage paths][storage-paths]. +Storage location will concentrate huge amount of top-level namespaces. The +impact can be reduced by the introduction of [multiple storage +paths][storage-paths]. -Because Backups are a snapshot of the same URL mapping, if you try to recover a very old backup, you need to verify -if any project has taken the place of an old removed project sharing the same URL. This means that `mygroup/myproject` -from your backup may not be the same original project that is today in the same URL. +Because backups are a snapshot of the same URL mapping, if you try to recover a +very old backup, you need to verify whether any project has taken the place of +an old removed or renamed project sharing the same URL. This means that +`mygroup/myproject` from your backup may not be the same original project that +is at that same URL today. -Any change in the URL will need to be reflected on disk (when groups / users or projects are renamed). This can add a lot -of load in big installations, and can be even worst if they are using any type of network based filesystem. +Any change in the URL will need to be reflected on disk (when groups / users or +projects are renamed). This can add a lot of load in big installations, +especially if using any type of network based filesystem. -Last, for GitLab Geo, this storage type means we have to synchronize the disk state, replicate renames in the correct -order or we may end-up with wrong repository or missing data temporarily. +For GitLab Geo in particular: Geo does work with legacy storage, but in some +edge cases due to race conditions it can lead to errors when a project is +renamed multiple times in short succession, or a project is deleted and +recreated under the same name very quickly. We expect these race events to be +rare, and we have not observed a race condition side-effect happening yet. -This pattern also exists in other objects stored in GitLab, like issue Attachments, GitLab Pages artifacts, -Docker Containers for the integrated Registry, etc. +This pattern also exists in other objects stored in GitLab, like issue +Attachments, GitLab Pages artifacts, Docker Containers for the integrated +Registry, etc. ## Hashed Storage -Hashed Storage is the new storage behavior we are rolling out with 10.0. It's not enabled by default yet, but we -encourage everyone to try-it and take the time to fix any script you may have that depends on the old behavior. +> **Warning:** Hashed storage is in **Beta**. For the latest updates, check the +> associated [issue](https://gitlab.com/gitlab-com/infrastructure/issues/2821) +> and please report any problems you encounter. -Instead of coupling project URL and the folder structure where the repository will be stored on disk, we are coupling -a hash, based on the project's ID. +Hashed Storage is the new storage behavior we are rolling out with 10.0. Instead +of coupling project URL and the folder structure where the repository will be +stored on disk, we are coupling a hash, based on the project's ID. This makes +the folder structure immutable, and therefore eliminates any requirement to +synchronize state from URLs to disk structure. This means that renaming a group, +user, or project will cost only the database transaction, and will take effect +immediately. -This makes the folder structure immutable, and therefore eliminates any requirement to synchronize state from URLs to -disk structure. This means that renaming a group, user or project will cost only the database transaction, and will take -effect immediately. +The hash also helps to spread the repositories more evenly on the disk, so the +top-level directory will contain less folders than the total amount of top-level +namespaces. -The hash also helps to spread the repositories more evenly on the disk, so the top-level directory will contain less -folders than the total amount of top-level namespaces. - -Hash format is based on hexadecimal representation of SHA256: `SHA256(project.id)`. -Top-level folder uses first 2 characters, followed by another folder with the next 2 characters. They are both stored in -a special folder `@hashed`, to co-exist with existing Legacy projects: +The hash format is based on the hexadecimal representation of SHA256: +`SHA256(project.id)`. The top-level folder uses the first 2 characters, followed +by another folder with the next 2 characters. They are both stored in a special +`@hashed` folder, to be able to co-exist with existing Legacy Storage projects: ```ruby # Project's repository: @@ -57,15 +70,13 @@ a special folder `@hashed`, to co-exist with existing Legacy projects: "@hashed/#{hash[0..1]}/#{hash[2..3]}/#{hash}.wiki.git" ``` -This new format also makes possible to restore backups with confidence, as when restoring a repository from the backup, -you will never mistakenly restore a repository in the wrong project (considering the backup is made after the migration). - ### How to migrate to Hashed Storage -In GitLab, go to **Admin > Settings**, find the **Repository Storage** section and select -"_Create new projects using hashed storage paths_". +In GitLab, go to **Admin > Settings**, find the **Repository Storage** section +and select "_Create new projects using hashed storage paths_". -To migrate your existing projects to the new storage type, check the specific [rake tasks]. +To migrate your existing projects to the new storage type, check the specific +[rake tasks]. [ce-28283]: https://gitlab.com/gitlab-org/gitlab-ce/issues/28283 [rake tasks]: raketasks/storage.md#migrate-existing-projects-to-hashed-storage @@ -73,11 +84,13 @@ To migrate your existing projects to the new storage type, check the specific [r ### Hashed Storage coverage -We are incrementally moving every storable object in GitLab to the Hashed Storage pattern. You can check the current -coverage status below. +We are incrementally moving every storable object in GitLab to the Hashed +Storage pattern. You can check the current coverage status below (and also see +the [issue](https://gitlab.com/gitlab-com/infrastructure/issues/2821)). -Note that things stored in an S3 compatible endpoint will not have the downsides mentioned earlier, if they are not -prefixed with `#{namespace}/#{project_name}`, which is true for CI Cache and LFS Objects. +Note that things stored in an S3 compatible endpoint will not have the downsides +mentioned earlier, if they are not prefixed with `#{namespace}/#{project_name}`, +which is true for CI Cache and LFS Objects. | Storable Object | Legacy Storage | Hashed Storage | S3 Compatible | GitLab Version | | --------------- | -------------- | -------------- | ------------- | -------------- |