Refactor the job traces admin docs
This commit is contained in:
parent
f41f8eca53
commit
0b6f7a2a8c
3 changed files with 90 additions and 89 deletions
|
@ -28,6 +28,7 @@ for each GitLab application server in your environment.
|
|||
10.1.0.1:/var/opt/gitlab/.ssh /var/opt/gitlab/.ssh nfs defaults,soft,rsize=1048576,wsize=1048576,noatime,nofail,lookupcache=positive 0 2
|
||||
10.1.0.1:/var/opt/gitlab/gitlab-rails/uploads /var/opt/gitlab/gitlab-rails/uploads nfs defaults,soft,rsize=1048576,wsize=1048576,noatime,nofail,lookupcache=positive 0 2
|
||||
10.1.0.1:/var/opt/gitlab/gitlab-rails/shared /var/opt/gitlab/gitlab-rails/shared nfs defaults,soft,rsize=1048576,wsize=1048576,noatime,nofail,lookupcache=positive 0 2
|
||||
10.1.0.1:/var/opt/gitlab/gitlab-ci/builds /var/opt/gitlab/gitlab-ci/builds nfs defaults,soft,rsize=1048576,wsize=1048576,noatime,nofail,lookupcache=positive 0 2
|
||||
10.1.0.1:/var/opt/gitlab/git-data /var/opt/gitlab/git-data nfs defaults,soft,rsize=1048576,wsize=1048576,noatime,nofail,lookupcache=positive 0 2
|
||||
```
|
||||
|
||||
|
@ -35,7 +36,7 @@ for each GitLab application server in your environment.
|
|||
mount locations.
|
||||
|
||||
```
|
||||
mkdir -p /var/opt/gitlab/.ssh /var/opt/gitlab/gitlab-rails/uploads /var/opt/gitlab/gitlab-rails/shared /var/opt/gitlab/git-data
|
||||
mkdir -p /var/opt/gitlab/.ssh /var/opt/gitlab/gitlab-rails/uploads /var/opt/gitlab/gitlab-rails/shared /var/opt/gitlab/gitlab-ci/builds /var/opt/gitlab/git-data
|
||||
```
|
||||
|
||||
1. Download/install GitLab Omnibus using **steps 1 and 2** from
|
||||
|
|
|
@ -130,6 +130,7 @@ NFS mount point is `/gitlab-nfs`. Then, add the following bind mounts in
|
|||
/gitlab-nfs/gitlab-data/.ssh /var/opt/gitlab/.ssh none bind 0 0
|
||||
/gitlab-nfs/gitlab-data/uploads /var/opt/gitlab/gitlab-rails/uploads none bind 0 0
|
||||
/gitlab-nfs/gitlab-data/shared /var/opt/gitlab/gitlab-rails/shared none bind 0 0
|
||||
/gitlab-nfs/gitlab-data/builds /var/opt/gitlab/gitlab-ci/builds none bind 0 0
|
||||
```
|
||||
|
||||
Using bind mounts will require manually making sure the data directories
|
||||
|
@ -147,7 +148,8 @@ following are the 5 locations need to be shared:
|
|||
| `/var/opt/gitlab/git-data` | Git repository data. This will account for a large portion of your data | `git_data_dirs({"default" => "/var/opt/gitlab/git-data"})`
|
||||
| `/var/opt/gitlab/.ssh` | SSH `authorized_keys` file and keys used to import repositories from some other Git services | `user['home'] = '/var/opt/gitlab/'`
|
||||
| `/var/opt/gitlab/gitlab-rails/uploads` | User uploaded attachments | `gitlab_rails['uploads_directory'] = '/var/opt/gitlab/gitlab-rails/uploads'`
|
||||
| `/var/opt/gitlab/gitlab-rails/shared` | Job artifacts, job traces, GitLab Pages, LFS objects, temp files, etc. If you're using LFS this may also account for a large portion of your data | `gitlab_rails['shared_path'] = '/var/opt/gitlab/gitlab-rails/shared'`
|
||||
| `/var/opt/gitlab/gitlab-rails/shared` | Build artifacts, GitLab Pages, LFS objects, temp files, etc. If you're using LFS this may also account for a large portion of your data | `gitlab_rails['shared_path'] = '/var/opt/gitlab/gitlab-rails/shared'`
|
||||
| `/var/opt/gitlab/gitlab-ci/builds` | GitLab CI build traces | `gitlab_ci['builds_directory'] = '/var/opt/gitlab/gitlab-ci/builds'`
|
||||
|
||||
Other GitLab directories should not be shared between nodes. They contain
|
||||
node-specific files and GitLab code that does not need to be shared. To ship
|
||||
|
|
|
@ -1,22 +1,30 @@
|
|||
# Job traces (logs)
|
||||
|
||||
Job traces are sent by gitlab-runner while it's processing a job. You can see traces in job pages, pipelines, email notifications, etc.
|
||||
Basically, there are two states in job traces. One is "Live trace", and another one is "Archived trace";
|
||||
|
||||
|state|condition|step|data flow|stored path|
|
||||
|---|---|---|---|---|
|
||||
|Live trace|when a job is running|1: patching| gitlab-runner => gitlab-unicorn => file storage|`#{ROOT_PATH}/builds/#{YYYY_mm}/#{project_id}/#{job_id}.log`|
|
||||
|Live trace|when a job is finished|2: overwtiring| gitlab-runner => gitlab-unicorn => file storage |`#{ROOT_PATH}/builds/#{YYYY_mm}/#{project_id}/#{job_id}.log`|
|
||||
|Archived trace|After a job is finished|3: archiving| sidekiq moves live trace to artifacts folder |`#{ROOT_PATH}/shared/artifacts/#{disk_hash}/#{YYYY_mm_dd}/#{job_id}/#{job_artifact_id}/trace.log`|
|
||||
|
||||
The `ROOT_PATH` varies per your enviroment. For example, if you used omnibus packages, it would be `/var/opt/gitlab/gitlab-ci`,
|
||||
whereas if you used source instlation, it would be `/home/git/gitlab`.
|
||||
Job traces are sent by GitLab Runner while it's processing a job. You can see
|
||||
traces in job pages, pipelines, email notifications, etc.
|
||||
|
||||
There isn't a way to automatically expire old job logs, but it's safe to remove
|
||||
them if they're taking up too much space. If you remove the logs manually, the
|
||||
job output in the UI will be empty.
|
||||
|
||||
## Changing the job traces location
|
||||
## Data flow
|
||||
|
||||
In general, there are two states in job traces: "live trace" and "archived trace".
|
||||
In the following table you can see the phases a trace is going through its
|
||||
journey.
|
||||
|
||||
| Phase | State | Condition | Data flow | Stored path |
|
||||
| ----- | ----- | --------- | --------- | ----------- |
|
||||
| 1: patching | Live trace | When a job is running | GitLab Runner => Unicorn => file storage |`#{ROOT_PATH}/builds/#{YYYY_mm}/#{project_id}/#{job_id}.log`|
|
||||
| 2: overwriting | Live trace | When a job is finished | GitLab Runner => Unicorn => file storage |`#{ROOT_PATH}/builds/#{YYYY_mm}/#{project_id}/#{job_id}.log`|
|
||||
| 3: archiving | Archived trace | After a job is finished | Sidekiq moves live trace to artifacts folder |`#{ROOT_PATH}/shared/artifacts/#{disk_hash}/#{YYYY_mm_dd}/#{job_id}/#{job_artifact_id}/trace.log`|
|
||||
| 4: uploading | Archived trace | After a trace is archived | Sidekiq moves archived trace to [object storage](#uploading-traces-to-object-storage) (if configured) |`#{bucket_name}/#{disk_hash}/#{YYYY_mm_dd}/#{job_id}/#{job_artifact_id}/trace.log`|
|
||||
|
||||
The `ROOT_PATH` varies per your environment. For example, Omnibus GitLab it
|
||||
would be `/var/opt/gitlab/gitlab-ci`, whereas for installations from source
|
||||
it would be `/home/git/gitlab`.
|
||||
|
||||
## Changing the job traces local location
|
||||
|
||||
To change the location where the job logs will be stored, follow the steps below.
|
||||
|
||||
|
@ -48,120 +56,110 @@ To change the location where the job logs will be stored, follow the steps below
|
|||
[reconfigure gitlab]: restart_gitlab.md#omnibus-gitlab-reconfigure "How to reconfigure Omnibus GitLab"
|
||||
[restart gitlab]: restart_gitlab.md#installations-from-source "How to restart GitLab"
|
||||
|
||||
## Upload traces to object storage
|
||||
## Uploading traces to object storage
|
||||
|
||||
Archived trace is one of [job artifacts](job_artifacts.md).
|
||||
If you set up [object storage settings](https://docs.gitlab.com/ce/administration/job_artifacts.html#object-storage-settings),
|
||||
job traces are automatically migrated to object storage as well as other job artifacts.
|
||||
An archived trace is considered as a [job artifact](job_artifacts.md).
|
||||
Therefore, when you [set up an object storage](job_artifacts.md#object-storage-settings),
|
||||
job traces are automatically migrated to it along with the other job artifacts.
|
||||
|
||||
Here is the data flow;
|
||||
|
||||
|state|condition|step|data flow|stored path|
|
||||
|---|---|---|---|---|
|
||||
|Live trace|when a job is running|1: patching| gitlab-runner => gitlab-unicorn => file storage|`#{ROOT_PATH}/builds/#{YYYY_mm}/#{project_id}/#{job_id}.log`|
|
||||
|Live trace|when a job is finished|2: overwtiring| gitlab-runner => gitlab-unicorn => file storage |`#{ROOT_PATH}/builds/#{YYYY_mm}/#{project_id}/#{job_id}.log`|
|
||||
|Archived trace|After a job is finished|3: archiving| sidekiq moves live trace to artifacts folder |`#{ROOT_PATH}/shared/artifacts/#{disk_hash}/#{YYYY_mm_dd}/#{job_id}/#{job_artifact_id}/trace.log`|
|
||||
|Archived trace|After a trace is archived|4: uploading| sidekiq moves archived trace to object storage |`#{bucket_name}/#{disk_hash}/#{YYYY_mm_dd}/#{job_id}/#{job_artifact_id}/trace.log`|
|
||||
Check the [data flow](#data-flow) to learn about the process.
|
||||
|
||||
## New live trace architecture
|
||||
|
||||
> [Introduced][ce-18169] in GitLab 10.4.
|
||||
> [Introduced][ce-18169] in GitLab 10.4.
|
||||
> [Announced as General availability][ce-46097] in GitLab 11.0.
|
||||
|
||||
> **Notes**:
|
||||
- Performance improvements are scheduled in [11.1](https://gitlab.com/gitlab-org/gitlab-ce/issues/47125).
|
||||
- This feature is off by default. Please check below how to enable/disable this featrue.
|
||||
NOTE: **Note:**
|
||||
This feature is off by default. Check below how to [enable/disable](#enabling-live-trace) it.
|
||||
|
||||
**For cloud-native compatible application**
|
||||
By combining the process with object storage settings, we can completely bypass
|
||||
the local file storage. This is a useful option if GitLab is installed as
|
||||
cloud-native, for example on Kubernetes.
|
||||
|
||||
By combining the process with object storage settings, we can completely bypass file storage. This is useful option in cloud-native GitLab installtion.
|
||||
The data flow is the same as described in the [data flow section](#data-flow)
|
||||
with one change: _the stored path of the first two phases is different_. This new live
|
||||
trace architecture stores chunks of traces in Redis and the database instead of
|
||||
file storage. Redis is used as first-class storage, and it stores up-to 128KB
|
||||
of data. Once the full chunk is sent, it is flushed to database. After a while,
|
||||
the data in Redis and database will be archived to [object storage](#uploading-traces-to-object-storage).
|
||||
|
||||
Here is the data flow;
|
||||
The data are stored in the following Redis namespace: `Gitlab::Redis::SharedState`.
|
||||
|
||||
|state|condition|step|data flow|stored path|
|
||||
|---|---|---|---|---|
|
||||
|Live trace|when a job is running|1: patching| gitlab-runner => gitlab-unicorn => redis and database|- (Stored in Redis and Database, instead)|
|
||||
|Live trace|when a job is finished|2: overwtiring| gitlab-runner => gitlab-unicorn => redis and database |- (Stored in Redis and Database, instead)|
|
||||
|Archived trace|After a job is finished|3: archiving| sidekiq moves live trace to artifacts folder |`#{ROOT_PATH}/shared/artifacts/#{disk_hash}/#{YYYY_mm_dd}/#{job_id}/#{job_artifact_id}/trace.log`|
|
||||
|Archived trace|After a trace is archived|4: uploading| sidekiq moves archived trace to object storage |`#{bucket_name}/#{disk_hash}/#{YYYY_mm_dd}/#{job_id}/#{job_artifact_id}/trace.log`|
|
||||
Here is the detailed data flow:
|
||||
|
||||
(Step 3 is scheduled to be improved in https://gitlab.com/gitlab-org/gitlab-ce/issues/44663)
|
||||
1. GitLab Runner picks a job from GitLab
|
||||
1. GitLab Runner sends a piece of trace to GitLab
|
||||
1. GitLab appends the data to Redis
|
||||
1. Once the data in Redis reach 128KB, the data is flushed to the database.
|
||||
1. The above steps are repeated until the job is finished.
|
||||
1. Once the job is finished, GitLab schedules a Sidekiq worker to archive the trace.
|
||||
1. The Sidekiq worker archives the trace to object storage and cleans up the trace
|
||||
in Redis and the database.
|
||||
|
||||
**The detailed mechanizm**
|
||||
### Enabling live trace
|
||||
|
||||
This new live trace architecture stores chunks of traces in Redis and database instead of file storage.
|
||||
Redis is used as first-class storage, and it stores up-to 128kB. Once the full chunk is sent it will be flushed to database. Afterwhile, the data in Redis and database will be archived to ObjectStorage.
|
||||
The following commands are to be issues in a Rails console:
|
||||
|
||||
Here is the detailed data flow.
|
||||
```sh
|
||||
# Omnibus GitLab
|
||||
gitlab-rails console
|
||||
|
||||
1. GitLab Runner picks a job from GitLab-Rails
|
||||
1. GitLab Runner sends a piece of trace to GitLab-Rails
|
||||
1. GitLab-Rails appends the data to Redis
|
||||
1. If the data in Redis is fulfilled 128kB, the data is flushed to Database.
|
||||
1. 2.~4. is continued until the job is finished
|
||||
1. Once the job is finished, GitLab-Rails schedules a sidekiq worker to archive the trace
|
||||
1. The sidekiq worker archives the trace to Object Storage, and cleanup the trace in Redis and Database
|
||||
# Installation from source
|
||||
cd /home/git/gitlab
|
||||
sudo -u git -H bin/rails console RAILS_ENV=production
|
||||
```
|
||||
|
||||
**How to check if it's on or off?**
|
||||
**To check if live trace is enabled:**
|
||||
|
||||
```ruby
|
||||
Feature.enabled?('ci_enable_live_trace')
|
||||
```
|
||||
|
||||
**How to enable?**
|
||||
**To enable live trace:**
|
||||
|
||||
```ruby
|
||||
Feature.enable('ci_enable_live_trace')
|
||||
```
|
||||
|
||||
>**Note:**
|
||||
The transition period will be handled gracefully. Upcoming traces will be generated with the new architecture, and on-going live traces will stay with the legacy architecture (i.e. on-going live traces won't be re-generated forcibly with the new architecture).
|
||||
NOTE: **Note:**
|
||||
The transition period will be handled gracefully. Upcoming traces will be
|
||||
generated with the new architecture, and on-going live traces will stay with the
|
||||
legacy architecture, which means that on-going live traces won't be forcibly
|
||||
re-generated with the new architecture.
|
||||
|
||||
**How to disable?**
|
||||
**To disable live trace:**
|
||||
|
||||
```ruby
|
||||
Feature.disable('ci_enable_live_trace')
|
||||
```
|
||||
|
||||
>**Note:**
|
||||
The transition period will be handled gracefully. Upcoming traces will be generated with the legacy architecture, and on-going live traces will stay with the new architecture (i.e. on-going live traces won't be re-generated forcibly with the legacy architecture).
|
||||
NOTE: **Note:**
|
||||
The transition period will be handled gracefully. Upcoming traces will be generated
|
||||
with the legacy architecture, and on-going live traces will stay with the new
|
||||
architecture, which means that on-going live traces won't be forcibly re-generated
|
||||
with the legacy architecture.
|
||||
|
||||
**Redis namespace:**
|
||||
### Potential implications
|
||||
|
||||
`Gitlab::Redis::SharedState`
|
||||
In some cases, having data stored on Redis could incur data loss:
|
||||
|
||||
**Potential impact:**
|
||||
1. **Case 1: When all data in Redis are accidentally flushed**
|
||||
- On going live traces could be recovered by re-sending traces (this is
|
||||
supported by all versions of the GitLab Runner).
|
||||
- Finished jobs which have not archived live traces will lose the last part
|
||||
(~128KB) of trace data.
|
||||
|
||||
- This feature could incur data loss:
|
||||
- Case 1: When all data in Redis are accidentally flushed.
|
||||
- On-going live traces could be recovered by re-sending traces (This is supported by all versions of GitLab Runner)
|
||||
- Finished jobs which has not archived live traces will lose the last part (~128kB) of trace data.
|
||||
- Case 2: When sidekiq workers failed to archive (e.g. There was a bug that prevents archiving process, Sidekiq inconsistancy, etc):
|
||||
- Currently all trace data in Redis will be deleted after one week. If the sidekiq workers can't finish by the expiry date, the part of trace data will be lost.
|
||||
- This feature could consume all memory on Redis instance. If the number of jobs is 1000, 128MB (128kB * 1000) is consumed.
|
||||
- This feature could pressure Database replication lag. `INSERT` are generated to indicate that we have trace chunk. `UPDATE` with 128kB of data is issued once we receive multiple chunks.
|
||||
- and so on
|
||||
1. **Case 2: When Sidekiq workers fail to archive (e.g., there was a bug that
|
||||
prevents archiving process, Sidekiq inconsistency, etc.)**
|
||||
- Currently all trace data in Redis will be deleted after one week. If the
|
||||
Sidekiq workers can't finish by the expiry date, the part of trace data will be lost.
|
||||
|
||||
**How to test?**
|
||||
Another issue that might arise is that it could consume all memory on the Redis
|
||||
instance. If the number of jobs is 1000, 128MB (128KB * 1000) is consumed.
|
||||
|
||||
We're currently evaluating this feature on dev.gitalb.org or staging.gitlab.com to verify this features. Here is the list of tests/measurements.
|
||||
|
||||
- Features:
|
||||
- Live traces should be visible on job pages
|
||||
- Archived traces should be visible on job pages
|
||||
- Live traces should be archived to Object storage
|
||||
- Live traces should be cleaned up after archived
|
||||
- etc
|
||||
- Performance:
|
||||
- Schedule 1000~10000 jobs and let GitLab-runners process concurrently. Measure memoery presssure, IO load, etc.
|
||||
- etc
|
||||
- Failover:
|
||||
- Simulate Redis outage
|
||||
- etc
|
||||
|
||||
**How to verify the correctnesss?**
|
||||
|
||||
- TBD
|
||||
Also, it could pressure the database replication lag. `INSERT`s are generated to
|
||||
indicate that we have trace chunk. `UPDATE`s with 128KB of data is issued once we
|
||||
receive multiple chunks.
|
||||
|
||||
[ce-18169]: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/18169
|
||||
[ce-46097]: https://gitlab.com/gitlab-org/gitlab-ce/issues/46097
|
||||
|
|
Loading…
Reference in a new issue