Job traces are sent by gitlab-runner while it's processing a job. You can see traces in job pages, pipelines, email notifications, etc.
Basically, there are two states in job traces. One is "Live trace", and another one is "Archived trace";
|state|condition|step|data flow|stored path|
|---|---|---|---|---|
|Live trace|when a job is running|1: patching| gitlab-runner => gitlab-unicorn => file storage|`#{ROOT_PATH}/builds/#{YYYY_mm}/#{project_id}/#{job_id}.log`|
|Live trace|when a job is finished|2: overwtiring| gitlab-runner => gitlab-unicorn => file storage |`#{ROOT_PATH}/builds/#{YYYY_mm}/#{project_id}/#{job_id}.log`|
|Archived trace|After a job is finished|3: archiving| sidekiq moves live trace to artifacts folder |`#{ROOT_PATH}/shared/artifacts/#{disk_hash}/#{YYYY_mm_dd}/#{job_id}/#{job_artifact_id}/trace.log`|
The `ROOT_PATH` varies per your enviroment. For example, if you used omnibus packages, it would be `/var/opt/gitlab/gitlab-ci`,
whereas if you used source instlation, it would be `/home/git/gitlab`.
Archived trace is one of [job artifacts](job_artifacts.md).
If you set up [object storage settings](https://docs.gitlab.com/ce/administration/job_artifacts.html#object-storage-settings),
job traces are automatically migrated to object storage as well as other job artifacts.
Here is the data flow;
|state|condition|step|data flow|stored path|
|---|---|---|---|---|
|Live trace|when a job is running|1: patching| gitlab-runner => gitlab-unicorn => file storage|`#{ROOT_PATH}/builds/#{YYYY_mm}/#{project_id}/#{job_id}.log`|
|Live trace|when a job is finished|2: overwtiring| gitlab-runner => gitlab-unicorn => file storage |`#{ROOT_PATH}/builds/#{YYYY_mm}/#{project_id}/#{job_id}.log`|
|Archived trace|After a job is finished|3: archiving| sidekiq moves live trace to artifacts folder |`#{ROOT_PATH}/shared/artifacts/#{disk_hash}/#{YYYY_mm_dd}/#{job_id}/#{job_artifact_id}/trace.log`|
|Archived trace|After a trace is archived|4: uploading| sidekiq moves archived trace to object storage |`#{bucket_name}/#{disk_hash}/#{YYYY_mm_dd}/#{job_id}/#{job_artifact_id}/trace.log`|
By combining the process with object storage settings, we can completely bypass file storage. This is useful option in cloud-native GitLab installtion.
|Live trace|when a job is running|1: patching| gitlab-runner => gitlab-unicorn => redis and database|- (Stored in Redis and Database, instead)|
|Live trace|when a job is finished|2: overwtiring| gitlab-runner => gitlab-unicorn => redis and database |- (Stored in Redis and Database, instead)|
|Archived trace|After a job is finished|3: archiving| sidekiq moves live trace to artifacts folder |`#{ROOT_PATH}/shared/artifacts/#{disk_hash}/#{YYYY_mm_dd}/#{job_id}/#{job_artifact_id}/trace.log`|
|Archived trace|After a trace is archived|4: uploading| sidekiq moves archived trace to object storage |`#{bucket_name}/#{disk_hash}/#{YYYY_mm_dd}/#{job_id}/#{job_artifact_id}/trace.log`|
This new live trace architecture stores chunks of traces in Redis and database instead of file storage.
Redis is used as first-class storage, and it stores up-to 128kB. Once the full chunk is sent it will be flushed to database. Afterwhile, the data in Redis and database will be archived to ObjectStorage.
The transition period will be handled gracefully. Upcoming traces will be generated with the new architecture, and on-going live traces will stay with the legacy architecture (i.e. on-going live traces won't be re-generated forcibly with the new architecture).
The transition period will be handled gracefully. Upcoming traces will be generated with the legacy architecture, and on-going live traces will stay with the new architecture (i.e. on-going live traces won't be re-generated forcibly with the legacy architecture).
- Currently all trace data in Redis will be deleted after one week. If the sidekiq workers can't finish by the expiry date, the part of trace data will be lost.
- This feature could consume all memory on Redis instance. If the number of jobs is 1000, 128MB (128kB * 1000) is consumed.
- This feature could pressure Database replication lag. `INSERT` are generated to indicate that we have trace chunk. `UPDATE` with 128kB of data is issued once we receive multiple chunks.