Backport EE changes to some hashed storage documentation to CE
This commit is contained in:
parent
5edd94ae75
commit
4e3ad326e1
|
@ -4,50 +4,63 @@
|
|||
|
||||
## Legacy Storage
|
||||
|
||||
Legacy Storage is the storage behavior prior to version 10.0. For historical reasons, GitLab replicated the same
|
||||
mapping structure from the projects URLs:
|
||||
Legacy Storage is the storage behavior prior to version 10.0. For historical
|
||||
reasons, GitLab replicated the same mapping structure from the projects URLs:
|
||||
|
||||
* Project's repository: `#{namespace}/#{project_name}.git`
|
||||
* Project's wiki: `#{namespace}/#{project_name}.wiki.git`
|
||||
* Project's repository: `#{namespace}/#{project_name}.git`
|
||||
* Project's wiki: `#{namespace}/#{project_name}.wiki.git`
|
||||
|
||||
This structure made simple to migrate from existing solutions to GitLab and easy for Administrators to find where the
|
||||
repository is stored.
|
||||
This structure made it simple to migrate from existing solutions to GitLab and
|
||||
easy for Administrators to find where the repository is stored.
|
||||
|
||||
On the other hand this has some drawbacks:
|
||||
|
||||
Storage location will concentrate huge amount of top-level namespaces. The impact can be reduced by the introduction of [multiple storage paths][storage-paths].
|
||||
Storage location will concentrate huge amount of top-level namespaces. The
|
||||
impact can be reduced by the introduction of [multiple storage
|
||||
paths][storage-paths].
|
||||
|
||||
Because Backups are a snapshot of the same URL mapping, if you try to recover a very old backup, you need to verify
|
||||
if any project has taken the place of an old removed project sharing the same URL. This means that `mygroup/myproject`
|
||||
from your backup may not be the same original project that is today in the same URL.
|
||||
Because backups are a snapshot of the same URL mapping, if you try to recover a
|
||||
very old backup, you need to verify whether any project has taken the place of
|
||||
an old removed or renamed project sharing the same URL. This means that
|
||||
`mygroup/myproject` from your backup may not be the same original project that
|
||||
is at that same URL today.
|
||||
|
||||
Any change in the URL will need to be reflected on disk (when groups / users or projects are renamed). This can add a lot
|
||||
of load in big installations, and can be even worst if they are using any type of network based filesystem.
|
||||
Any change in the URL will need to be reflected on disk (when groups / users or
|
||||
projects are renamed). This can add a lot of load in big installations,
|
||||
especially if using any type of network based filesystem.
|
||||
|
||||
Last, for GitLab Geo, this storage type means we have to synchronize the disk state, replicate renames in the correct
|
||||
order or we may end-up with wrong repository or missing data temporarily.
|
||||
For GitLab Geo in particular: Geo does work with legacy storage, but in some
|
||||
edge cases due to race conditions it can lead to errors when a project is
|
||||
renamed multiple times in short succession, or a project is deleted and
|
||||
recreated under the same name very quickly. We expect these race events to be
|
||||
rare, and we have not observed a race condition side-effect happening yet.
|
||||
|
||||
This pattern also exists in other objects stored in GitLab, like issue Attachments, GitLab Pages artifacts,
|
||||
Docker Containers for the integrated Registry, etc.
|
||||
This pattern also exists in other objects stored in GitLab, like issue
|
||||
Attachments, GitLab Pages artifacts, Docker Containers for the integrated
|
||||
Registry, etc.
|
||||
|
||||
## Hashed Storage
|
||||
|
||||
Hashed Storage is the new storage behavior we are rolling out with 10.0. It's not enabled by default yet, but we
|
||||
encourage everyone to try-it and take the time to fix any script you may have that depends on the old behavior.
|
||||
> **Warning:** Hashed storage is in **Beta**. For the latest updates, check the
|
||||
> associated [issue](https://gitlab.com/gitlab-com/infrastructure/issues/2821)
|
||||
> and please report any problems you encounter.
|
||||
|
||||
Instead of coupling project URL and the folder structure where the repository will be stored on disk, we are coupling
|
||||
a hash, based on the project's ID.
|
||||
Hashed Storage is the new storage behavior we are rolling out with 10.0. Instead
|
||||
of coupling project URL and the folder structure where the repository will be
|
||||
stored on disk, we are coupling a hash, based on the project's ID. This makes
|
||||
the folder structure immutable, and therefore eliminates any requirement to
|
||||
synchronize state from URLs to disk structure. This means that renaming a group,
|
||||
user, or project will cost only the database transaction, and will take effect
|
||||
immediately.
|
||||
|
||||
This makes the folder structure immutable, and therefore eliminates any requirement to synchronize state from URLs to
|
||||
disk structure. This means that renaming a group, user or project will cost only the database transaction, and will take
|
||||
effect immediately.
|
||||
The hash also helps to spread the repositories more evenly on the disk, so the
|
||||
top-level directory will contain less folders than the total amount of top-level
|
||||
namespaces.
|
||||
|
||||
The hash also helps to spread the repositories more evenly on the disk, so the top-level directory will contain less
|
||||
folders than the total amount of top-level namespaces.
|
||||
|
||||
Hash format is based on hexadecimal representation of SHA256: `SHA256(project.id)`.
|
||||
Top-level folder uses first 2 characters, followed by another folder with the next 2 characters. They are both stored in
|
||||
a special folder `@hashed`, to co-exist with existing Legacy projects:
|
||||
The hash format is based on the hexadecimal representation of SHA256:
|
||||
`SHA256(project.id)`. The top-level folder uses the first 2 characters, followed
|
||||
by another folder with the next 2 characters. They are both stored in a special
|
||||
`@hashed` folder, to be able to co-exist with existing Legacy Storage projects:
|
||||
|
||||
```ruby
|
||||
# Project's repository:
|
||||
|
@ -57,15 +70,13 @@ a special folder `@hashed`, to co-exist with existing Legacy projects:
|
|||
"@hashed/#{hash[0..1]}/#{hash[2..3]}/#{hash}.wiki.git"
|
||||
```
|
||||
|
||||
This new format also makes possible to restore backups with confidence, as when restoring a repository from the backup,
|
||||
you will never mistakenly restore a repository in the wrong project (considering the backup is made after the migration).
|
||||
|
||||
### How to migrate to Hashed Storage
|
||||
|
||||
In GitLab, go to **Admin > Settings**, find the **Repository Storage** section and select
|
||||
"_Create new projects using hashed storage paths_".
|
||||
In GitLab, go to **Admin > Settings**, find the **Repository Storage** section
|
||||
and select "_Create new projects using hashed storage paths_".
|
||||
|
||||
To migrate your existing projects to the new storage type, check the specific [rake tasks].
|
||||
To migrate your existing projects to the new storage type, check the specific
|
||||
[rake tasks].
|
||||
|
||||
[ce-28283]: https://gitlab.com/gitlab-org/gitlab-ce/issues/28283
|
||||
[rake tasks]: raketasks/storage.md#migrate-existing-projects-to-hashed-storage
|
||||
|
@ -73,11 +84,13 @@ To migrate your existing projects to the new storage type, check the specific [r
|
|||
|
||||
### Hashed Storage coverage
|
||||
|
||||
We are incrementally moving every storable object in GitLab to the Hashed Storage pattern. You can check the current
|
||||
coverage status below.
|
||||
We are incrementally moving every storable object in GitLab to the Hashed
|
||||
Storage pattern. You can check the current coverage status below (and also see
|
||||
the [issue](https://gitlab.com/gitlab-com/infrastructure/issues/2821)).
|
||||
|
||||
Note that things stored in an S3 compatible endpoint will not have the downsides mentioned earlier, if they are not
|
||||
prefixed with `#{namespace}/#{project_name}`, which is true for CI Cache and LFS Objects.
|
||||
Note that things stored in an S3 compatible endpoint will not have the downsides
|
||||
mentioned earlier, if they are not prefixed with `#{namespace}/#{project_name}`,
|
||||
which is true for CI Cache and LFS Objects.
|
||||
|
||||
| Storable Object | Legacy Storage | Hashed Storage | S3 Compatible | GitLab Version |
|
||||
| --------------- | -------------- | -------------- | ------------- | -------------- |
|
||||
|
|
Loading…
Reference in New Issue