2017-11-07 21:36:06 -05:00
# File Storage in GitLab
2020-04-15 11:09:17 -04:00
We use the [CarrierWave ](https://github.com/carrierwaveuploader/carrierwave ) gem to handle file upload, store and retrieval.
2017-11-07 21:36:06 -05:00
2019-08-23 06:40:04 -04:00
File uploads should be accelerated by workhorse, for details please refer to [uploads development documentation ](uploads.md ).
2017-11-07 21:36:06 -05:00
There are many places where file uploading is used, according to contexts:
2018-11-13 01:07:16 -05:00
- System
2017-11-07 21:36:06 -05:00
- Instance Logo (logo visible in sign in/sign up pages)
- Header Logo (one displayed in the navigation bar)
2018-11-13 01:07:16 -05:00
- Group
2017-11-07 21:36:06 -05:00
- Group avatars
2018-11-13 01:07:16 -05:00
- User
2017-11-07 21:36:06 -05:00
- User avatars
- User snippet attachments
2018-11-13 01:07:16 -05:00
- Project
2017-11-07 21:36:06 -05:00
- Project avatars
2018-01-29 12:57:34 -05:00
- Issues/MR/Notes Markdown attachments
- Issues/MR/Notes Legacy Markdown attachments
2018-01-31 10:07:00 -05:00
- CI Artifacts (archive, metadata, trace)
2017-11-07 21:36:06 -05:00
- LFS Objects
2019-01-09 12:01:28 -05:00
- Merge request diffs
2020-05-15 11:08:04 -04:00
- Design Management design thumbnails
2017-11-07 21:36:06 -05:00
## Disk storage
GitLab started saving everything on local disk. While directory location changed from previous versions,
they are still not 100% standardized. You can see them below:
2018-01-29 12:57:34 -05:00
| Description | In DB? | Relative path (from CarrierWave.root) | Uploader class | model_type |
2017-11-07 21:36:06 -05:00
| ------------------------------------- | ------ | ----------------------------------------------------------- | ---------------------- | ---------- |
2020-05-07 02:09:38 -04:00
| Instance logo | yes | `uploads/-/system/appearance/logo/:id/:filename` | `AttachmentUploader` | Appearance |
| Header logo | yes | `uploads/-/system/appearance/header_logo/:id/:filename` | `AttachmentUploader` | Appearance |
| Group avatars | yes | `uploads/-/system/group/avatar/:id/:filename` | `AvatarUploader` | Group |
| User avatars | yes | `uploads/-/system/user/avatar/:id/:filename` | `AvatarUploader` | User |
| User snippet attachments | yes | `uploads/-/system/personal_snippet/:id/:random_hex/:filename` | `PersonalFileUploader` | Snippet |
| Project avatars | yes | `uploads/-/system/project/avatar/:id/:filename` | `AvatarUploader` | Project |
| Issues/MR/Notes Markdown attachments | yes | `uploads/:project_path_with_namespace/:random_hex/:filename` | `FileUploader` | Project |
| Issues/MR/Notes Legacy Markdown attachments | no | `uploads/-/system/note/attachment/:id/:filename` | `AttachmentUploader` | Note |
2020-05-15 11:08:04 -04:00
| Design Management design thumbnails | yes | `uploads/-/system/design_management/action/image_v432x230/:id/:filename` | `DesignManagement::DesignV432x230Uploader` | DesignManagement::Action |
2020-05-07 02:09:38 -04:00
| CI Artifacts (CE) | yes | `shared/artifacts/:disk_hash[0..1]/:disk_hash[2..3]/:disk_hash/:year_:month_:date/:job_id/:job_artifact_id` (`:disk_hash` is SHA256 digest of `project_id` ) | `JobArtifactUploader` | Ci::JobArtifact |
| LFS Objects (CE) | yes | `shared/lfs-objects/:hex/:hex/:object_hash` | `LfsObjectUploader` | LfsObject |
| External merge request diffs | yes | `shared/external-diffs/merge_request_diffs/mr-:parent_id/diff-:id` | `ExternalDiffUploader` | MergeRequestDiff |
2017-11-07 21:36:06 -05:00
CI Artifacts and LFS Objects behave differently in CE and EE. In CE they inherit the `GitlabUploader`
2018-01-29 12:57:34 -05:00
while in EE they inherit the `ObjectStorage` and store files in and S3 API compatible object store.
2017-11-07 21:36:06 -05:00
2020-04-15 11:09:17 -04:00
In the case of Issues/MR/Notes Markdown attachments, there is a different approach using the [Hashed Storage ](../administration/repository_storage_types.md ) layout,
2017-11-07 21:36:06 -05:00
instead of basing the path into a mutable variable `:project_path_with_namespace` , it's possible to use the
hash of the project ID instead, if project migrates to the new approach (introduced in 10.2).
2020-04-15 11:09:17 -04:00
> Note: We provide an [all-in-one Rake task](../administration/raketasks/uploads/migrate.md) to migrate all uploads to object
2018-09-18 10:41:46 -04:00
> storage in one go. If a new Uploader class or model type is introduced, make
2020-04-15 11:09:17 -04:00
> sure you add a Rake task invocation corresponding to it to the
> [category list](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/tasks/gitlab/uploads/migrate.rake).
2018-09-18 10:41:46 -04:00
2018-01-29 12:57:34 -05:00
### Path segments
2018-11-13 01:07:16 -05:00
Files are stored at multiple locations and use different path schemes.
2018-01-29 12:57:34 -05:00
All the `GitlabUploader` derived classes should comply with this path segment schema:
2020-03-25 02:07:58 -04:00
```plaintext
2018-01-29 12:57:34 -05:00
| GitlabUploader
| ----------------------- + ------------------------- + --------------------------------- + -------------------------------- |
| `<gitlab_root>/public/` | `uploads/-/system/` | `user/avatar/:id/` | `:filename` |
| ----------------------- + ------------------------- + --------------------------------- + -------------------------------- |
| `CarrierWave.root` | `GitlabUploader.base_dir` | `GitlabUploader#dynamic_segment` | `CarrierWave::Uploader#filename` |
2018-11-13 01:07:16 -05:00
| | `CarrierWave::Uploader#store_dir` | |
2018-01-29 12:57:34 -05:00
| FileUploader
| ----------------------- + ------------------------- + --------------------------------- + -------------------------------- |
| `<gitlab_root>/shared/` | `artifacts/` | `:year_:month/:id` | `:filename` |
| `<gitlab_root>/shared/` | `snippets/` | `:secret/` | `:filename` |
| ----------------------- + ------------------------- + --------------------------------- + -------------------------------- |
| `CarrierWave.root` | `GitlabUploader.base_dir` | `GitlabUploader#dynamic_segment` | `CarrierWave::Uploader#filename` |
2018-11-13 01:07:16 -05:00
| | `CarrierWave::Uploader#store_dir` | |
2018-01-29 12:57:34 -05:00
| | | `FileUploader#upload_path |
| ObjectStore::Concern (store = remote)
| ----------------------- + ------------------------- + ----------------------------------- + -------------------------------- |
| `<bucket_name>` | < ignored > | `user/avatar/:id/` | `:filename` |
| ----------------------- + ------------------------- + ----------------------------------- + -------------------------------- |
| `#fog_dir` | `GitlabUploader.base_dir` | `GitlabUploader#dynamic_segment` | `CarrierWave::Uploader#filename` |
2018-11-13 01:07:16 -05:00
| | | `ObjectStorage::Concern#store_dir` | |
2018-01-29 12:57:34 -05:00
| | | `ObjectStorage::Concern#upload_path |
```
The `RecordsUploads::Concern` concern will create an `Upload` entry for every file stored by a `GitlabUploader` persisting the dynamic parts of the path using
`GitlabUploader#dynamic_path` . You may then use the `Upload#build_uploader` method to manipulate the file.
## Object Storage
By including the `ObjectStorage::Concern` in the `GitlabUploader` derived class, you may enable the object storage for this uploader. To enable the object storage
in your uploader, you need to either 1) include `RecordsUpload::Concern` and prepend `ObjectStorage::Extension::RecordsUploads` or 2) mount the uploader and create a new field named `<mount>_store` .
2018-04-27 04:50:05 -04:00
The `CarrierWave::Uploader#store_dir` is overridden to
2018-01-29 12:57:34 -05:00
2019-07-17 21:15:58 -04:00
- `GitlabUploader.base_dir` + `GitlabUploader.dynamic_segment` when the store is LOCAL
- `GitlabUploader.dynamic_segment` when the store is REMOTE (the bucket name is used to namespace)
2018-01-29 12:57:34 -05:00
### Using `ObjectStorage::Extension::RecordsUploads`
> Note: this concern will automatically include `RecordsUploads::Concern` if not already included.
The `ObjectStorage::Concern` uploader will search for the matching `Upload` to select the correct object store. The `Upload` is mapped using `#store_dirs + identifier` for each store (LOCAL/REMOTE).
```ruby
class SongUploader < GitlabUploader
include RecordsUploads::Concern
include ObjectStorage::Concern
prepend ObjectStorage::Extension::RecordsUploads
...
end
class Thing < ActiveRecord::Base
mount :theme, SongUploader # we have a great theme song!
...
end
```
### Using a mounted uploader
The `ObjectStorage::Concern` will query the `model.<mount>_store` attribute to select the correct object store.
This column must be present in the model schema.
```ruby
class SongUploader < GitlabUploader
include ObjectStorage::Concern
...
end
class Thing < ActiveRecord::Base
attr_reader :theme_store # this is an ActiveRecord attribute
mount :theme, SongUploader # we have a great theme song!
def theme_store
super || ObjectStorage::Store::LOCAL
end
...
end
```