2017-09-06 05:16:26 +00:00
|
|
|
# Repository Storage Types
|
|
|
|
|
|
|
|
> [Introduced][ce-28283] in GitLab 10.0.
|
|
|
|
|
|
|
|
## Legacy Storage
|
|
|
|
|
2018-02-08 18:33:35 +00:00
|
|
|
Legacy Storage is the storage behavior prior to version 10.0. For historical
|
|
|
|
reasons, GitLab replicated the same mapping structure from the projects URLs:
|
2017-09-06 05:16:26 +00:00
|
|
|
|
2018-02-08 18:33:35 +00:00
|
|
|
* Project's repository: `#{namespace}/#{project_name}.git`
|
|
|
|
* Project's wiki: `#{namespace}/#{project_name}.wiki.git`
|
2017-11-08 02:36:06 +00:00
|
|
|
|
2018-02-08 18:33:35 +00:00
|
|
|
This structure made it simple to migrate from existing solutions to GitLab and
|
|
|
|
easy for Administrators to find where the repository is stored.
|
2017-09-06 05:16:26 +00:00
|
|
|
|
|
|
|
On the other hand this has some drawbacks:
|
|
|
|
|
2018-02-08 18:33:35 +00:00
|
|
|
Storage location will concentrate huge amount of top-level namespaces. The
|
|
|
|
impact can be reduced by the introduction of [multiple storage
|
|
|
|
paths][storage-paths].
|
2017-09-06 05:16:26 +00:00
|
|
|
|
2018-02-08 18:33:35 +00:00
|
|
|
Because backups are a snapshot of the same URL mapping, if you try to recover a
|
|
|
|
very old backup, you need to verify whether any project has taken the place of
|
|
|
|
an old removed or renamed project sharing the same URL. This means that
|
|
|
|
`mygroup/myproject` from your backup may not be the same original project that
|
|
|
|
is at that same URL today.
|
2017-09-06 05:16:26 +00:00
|
|
|
|
2018-02-08 18:33:35 +00:00
|
|
|
Any change in the URL will need to be reflected on disk (when groups / users or
|
|
|
|
projects are renamed). This can add a lot of load in big installations,
|
|
|
|
especially if using any type of network based filesystem.
|
2017-09-06 05:16:26 +00:00
|
|
|
|
2018-02-08 18:33:35 +00:00
|
|
|
For GitLab Geo in particular: Geo does work with legacy storage, but in some
|
|
|
|
edge cases due to race conditions it can lead to errors when a project is
|
|
|
|
renamed multiple times in short succession, or a project is deleted and
|
|
|
|
recreated under the same name very quickly. We expect these race events to be
|
|
|
|
rare, and we have not observed a race condition side-effect happening yet.
|
2017-10-30 13:31:10 +00:00
|
|
|
|
2018-02-08 18:33:35 +00:00
|
|
|
This pattern also exists in other objects stored in GitLab, like issue
|
|
|
|
Attachments, GitLab Pages artifacts, Docker Containers for the integrated
|
|
|
|
Registry, etc.
|
2017-09-06 05:16:26 +00:00
|
|
|
|
|
|
|
## Hashed Storage
|
|
|
|
|
|
|
|
|
2018-10-18 12:56:43 +00:00
|
|
|
Hashed Storage is the new storage behavior we rolled out with 10.0. Instead
|
2018-02-08 18:33:35 +00:00
|
|
|
of coupling project URL and the folder structure where the repository will be
|
|
|
|
stored on disk, we are coupling a hash, based on the project's ID. This makes
|
|
|
|
the folder structure immutable, and therefore eliminates any requirement to
|
|
|
|
synchronize state from URLs to disk structure. This means that renaming a group,
|
|
|
|
user, or project will cost only the database transaction, and will take effect
|
|
|
|
immediately.
|
2017-09-06 05:16:26 +00:00
|
|
|
|
2018-02-08 18:33:35 +00:00
|
|
|
The hash also helps to spread the repositories more evenly on the disk, so the
|
|
|
|
top-level directory will contain less folders than the total amount of top-level
|
|
|
|
namespaces.
|
2017-09-06 05:16:26 +00:00
|
|
|
|
2018-02-08 18:33:35 +00:00
|
|
|
The hash format is based on the hexadecimal representation of SHA256:
|
|
|
|
`SHA256(project.id)`. The top-level folder uses the first 2 characters, followed
|
|
|
|
by another folder with the next 2 characters. They are both stored in a special
|
|
|
|
`@hashed` folder, to be able to co-exist with existing Legacy Storage projects:
|
2017-09-06 05:16:26 +00:00
|
|
|
|
|
|
|
```ruby
|
|
|
|
# Project's repository:
|
|
|
|
"@hashed/#{hash[0..1]}/#{hash[2..3]}/#{hash}.git"
|
|
|
|
|
|
|
|
# Wiki's repository:
|
|
|
|
"@hashed/#{hash[0..1]}/#{hash[2..3]}/#{hash}.wiki.git"
|
|
|
|
```
|
|
|
|
|
|
|
|
### How to migrate to Hashed Storage
|
|
|
|
|
2018-02-08 18:33:35 +00:00
|
|
|
In GitLab, go to **Admin > Settings**, find the **Repository Storage** section
|
2018-08-03 14:34:28 +00:00
|
|
|
and select "_Use hashed storage paths for newly created and renamed projects_".
|
2017-11-08 02:36:06 +00:00
|
|
|
|
2018-02-08 18:33:35 +00:00
|
|
|
To migrate your existing projects to the new storage type, check the specific
|
|
|
|
[rake tasks].
|
2017-09-06 05:16:26 +00:00
|
|
|
|
|
|
|
[ce-28283]: https://gitlab.com/gitlab-org/gitlab-ce/issues/28283
|
|
|
|
[rake tasks]: raketasks/storage.md#migrate-existing-projects-to-hashed-storage
|
|
|
|
[storage-paths]: repository_storage_types.md
|
2017-10-30 13:31:10 +00:00
|
|
|
|
2018-06-27 03:01:09 +00:00
|
|
|
#### Rollback
|
|
|
|
|
|
|
|
There is no automated rollback implemented. Below are the steps required to rollback
|
|
|
|
from each storage migration.
|
|
|
|
|
|
|
|
The rollback has to be performed in the reverse order. To get into "Legacy" state,
|
|
|
|
you need to rollback Attachments first, then Project.
|
|
|
|
|
|
|
|
Also note that if Geo is enabled, after the migration was triggered, an event is generated
|
|
|
|
to replicate the operation on any Secondary node. That means the on disk changes will also
|
|
|
|
need to be performed on these nodes as well. Database changes will propagate without issues.
|
|
|
|
|
|
|
|
You must make sure the migration event was already processed or otherwise it may migrate
|
|
|
|
the files back to Hashed state again.
|
|
|
|
|
|
|
|
##### Attachments
|
|
|
|
|
|
|
|
To rollback single Attachment migration, rename `aa/bb/abcdef1234567890...` folder back to `namespace/project`.
|
|
|
|
|
|
|
|
Both folder names can be generated by the `FileUploader.absolute_base_dir(project)`, you
|
|
|
|
just need to switch the version from the `project` back to the previous one.
|
|
|
|
|
|
|
|
```ruby
|
|
|
|
project.storage_version
|
|
|
|
# => 2
|
|
|
|
|
|
|
|
FileUploader.absolute_base_dir(project)
|
|
|
|
# => "/opt/gitlab/embedded/service/gitlab-rails/public/uploads/@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35"
|
|
|
|
|
|
|
|
project.storage_version = 1
|
|
|
|
|
|
|
|
FileUploader.absolute_base_dir(project)
|
|
|
|
# => "/opt/gitlab/embedded/service/gitlab-rails/public/uploads/gitlab/gitlab-shell-renamed"
|
|
|
|
```
|
|
|
|
|
|
|
|
##### Project
|
|
|
|
|
|
|
|
To rollback single Project migration, move `@hashed/aa/bb/aabbcdef1234567890abcdef.git` and `@hashed/aa/bb/aabbcdef1234567890abcdef.wiki.git`
|
|
|
|
back to `namespace/project.git` and `namespace/project.wiki.git` respectively and switch the version from the `project` back to `null`.
|
|
|
|
|
2017-10-30 13:31:10 +00:00
|
|
|
### Hashed Storage coverage
|
|
|
|
|
2018-02-08 18:33:35 +00:00
|
|
|
We are incrementally moving every storable object in GitLab to the Hashed
|
|
|
|
Storage pattern. You can check the current coverage status below (and also see
|
|
|
|
the [issue](https://gitlab.com/gitlab-com/infrastructure/issues/2821)).
|
2017-10-30 13:31:10 +00:00
|
|
|
|
2018-02-08 18:33:35 +00:00
|
|
|
Note that things stored in an S3 compatible endpoint will not have the downsides
|
|
|
|
mentioned earlier, if they are not prefixed with `#{namespace}/#{project_name}`,
|
|
|
|
which is true for CI Cache and LFS Objects.
|
2017-10-30 13:31:10 +00:00
|
|
|
|
2017-11-08 02:36:06 +00:00
|
|
|
| Storable Object | Legacy Storage | Hashed Storage | S3 Compatible | GitLab Version |
|
|
|
|
| --------------- | -------------- | -------------- | ------------- | -------------- |
|
2017-10-30 13:31:10 +00:00
|
|
|
| Repository | Yes | Yes | - | 10.0 |
|
|
|
|
| Attachments | Yes | Yes | - | 10.2 |
|
2017-11-08 02:36:06 +00:00
|
|
|
| Avatars | Yes | No | - | - |
|
2017-10-30 13:31:10 +00:00
|
|
|
| Pages | Yes | No | - | - |
|
|
|
|
| Docker Registry | Yes | No | - | - |
|
2017-11-08 02:36:06 +00:00
|
|
|
| CI Build Logs | No | No | - | - |
|
2018-06-27 03:01:09 +00:00
|
|
|
| CI Artifacts | No | No | Yes | 9.4 / 10.6 |
|
2017-10-30 13:31:10 +00:00
|
|
|
| CI Cache | No | No | Yes | - |
|
2018-06-27 03:01:09 +00:00
|
|
|
| LFS Objects | Yes | Similar | Yes | 10.0 / 10.7 |
|
|
|
|
|
|
|
|
#### Implementation Details
|
|
|
|
|
|
|
|
##### Avatars
|
|
|
|
|
|
|
|
Each file is stored in a folder with its `id` from the database. The filename is always `avatar.png` for user avatars.
|
|
|
|
When avatar is replaced, `Upload` model is destroyed and a new one takes place with different `id`.
|
|
|
|
|
|
|
|
##### CI Artifacts
|
|
|
|
|
|
|
|
CI Artifacts are S3 compatible since **9.4** (GitLab Premium), and available in GitLab Core since **10.6**.
|
|
|
|
|
|
|
|
##### LFS Objects
|
|
|
|
|
|
|
|
LFS Objects implements a similar storage pattern using 2 chars, 2 level folders, following git own implementation:
|
|
|
|
|
|
|
|
```ruby
|
|
|
|
"shared/lfs-objects/#{oid[0..1}/#{oid[2..3]}/#{oid[4..-1]}"
|
|
|
|
|
|
|
|
# Based on object `oid`: `8909029eb962194cfb326259411b22ae3f4a814b5be4f80651735aeef9f3229c`, path will be:
|
|
|
|
"shared/lfs-objects/89/09/029eb962194cfb326259411b22ae3f4a814b5be4f80651735aeef9f3229c"
|
|
|
|
```
|
|
|
|
|
|
|
|
They are also S3 compatible since **10.0** (GitLab Premium), and available in GitLab Core since **10.7**.
|