From c5067b1232112804aa9634aef3bafb5ec1cc690e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kamil=20Trzci=C5=84ski?= Date: Mon, 7 May 2018 10:40:40 +0200 Subject: [PATCH] Update `job_traces.md` documentation --- doc/administration/job_traces.md | 55 ++++++++++++++++---------------- 1 file changed, 27 insertions(+), 28 deletions(-) diff --git a/doc/administration/job_traces.md b/doc/administration/job_traces.md index 3470274e5ea..f0b2054a7f3 100644 --- a/doc/administration/job_traces.md +++ b/doc/administration/job_traces.md @@ -52,35 +52,34 @@ To change the location where the job logs will be stored, follow the steps below **What is "live trace"?** -It's job traces exists while job is being processed by Gitlab-Runner. You can see the progress in job pages(GUI). -In contrast, all traces will be archived after job is finished, that's called "archived trace". +Job trace that is sent by runner while jobs are running. You can see live trace in job pages UI. +The live traces are archived once job finishes. **What is new architecture?** -So far, when GitLab-Runner sends a job trace to GitLab-Rails, traces have been saved to File Storage as text files. -This was a problem on [Cloud Native-compatible GitLab application](https://gitlab.com/gitlab-com/migration/issues/23) that -GitLab-Rails had to rely on File Storage. +So far, when GitLab Runner sends a job trace to GitLab-Rails, traces have been saved to file storage as text files. +This was a problem for [Cloud Native-compatible GitLab application](https://gitlab.com/gitlab-com/migration/issues/23) where GitLab had to rely on File Storage. -This new live trace architecture stores traces to Redis and Database instead of File Storage. -Redis is used as first-class trace storage, it stores each trace upto 128KB. Once the data is fulfileld, it's flushed to Database. Afterwhile, the data in Redis and Database will be archived to ObjectStorage. +This new live trace architecture stores chunks of traces in Redis and database instead of file storage. +Redis is used as first-class storage, and it stores up-to 128kB. Once the full chunk is sent it will be flushed to database. Afterwhile, the data in Redis and database will be archived to ObjectStorage. Here is the detailed data flow. -1. GitLab-Runner picks a job from GitLab-Rails -1. GitLab-Runner sends a piece of trace to GitLab-Rails +1. GitLab Runner picks a job from GitLab-Rails +1. GitLab Runner sends a piece of trace to GitLab-Rails 1. GitLab-Rails appends the data to Redis -1. If the data in Redis is fulfilled 128KB, the data is flushed to Database. +1. If the data in Redis is fulfilled 128kB, the data is flushed to Database. 1. 2.~4. is continued until the job is finished 1. Once the job is finished, GitLab-Rails schedules a sidekiq worker to archive the trace 1. The sidekiq worker archives the trace to Object Storage, and cleanup the trace in Redis and Database -**How to check if it's on or off** +**How to check if it's on or off?** ```ruby Feature.enabled?('ci_enable_live_trace') ``` -**How to enable** +**How to enable?** ```ruby Feature.enable('ci_enable_live_trace') @@ -89,7 +88,7 @@ Feature.enable('ci_enable_live_trace') >**Note:** The transition period will be handled gracefully. Upcoming traces will be generated with the new architecture, and on-going live traces will stay with the legacy architecture (i.e. on-going live traces won't be re-generated forcibly with the new architecture). -**How to disable** +**How to disable?** ```ruby Feature.disable('ci_enable_live_trace') @@ -98,41 +97,41 @@ Feature.disable('ci_enable_live_trace') >**Note:** The transition period will be handled gracefully. Upcoming traces will be generated with the legacy architecture, and on-going live traces will stay with the new architecture (i.e. on-going live traces won't be re-generated forcibly with the legacy architecture). -**Redis namespace** +**Redis namespace:** `Gitlab::Redis::SharedState` -**Potential impact** +**Potential impact:** -- This feature could incur data loss +- This feature could incur data loss: - Case 1: When all data in Redis are accidentally flushed. - - On-going live traces could be recovered by re-sending traces (This is supported by all versions of GitLab-Runner) - - Finished jobs which has not archived live traces will lose the last part(~128KB) of trace data. - - Case 2: When sidekiq workers failed to archive (e.g. There was a bug that prevents archiving process, Sidekiq inconsistancy, etc) + - On-going live traces could be recovered by re-sending traces (This is supported by all versions of GitLab Runner) + - Finished jobs which has not archived live traces will lose the last part (~128kB) of trace data. + - Case 2: When sidekiq workers failed to archive (e.g. There was a bug that prevents archiving process, Sidekiq inconsistancy, etc): - Currently all trace data in Redis will be deleted after one week. If the sidekiq workers can't finish by the expiry date, the part of trace data will be lost. -- This feature could consume all memeory on Redis instance. If the number of jobs is 1000, 128KB * 1000 = 128MB is consumed. -- This feature could pressure Database instance. `INSERT` is queried per 128KB per a job. `UPDATE` is queried with the same condition, but only if the total size of the trace exceeds 128KB. +- This feature could consume all memory on Redis instance. If the number of jobs is 1000, 128MB (128kB * 1000) is consumed. +- This feature could pressure Database replication lag. `INSERT` are generated to indicate that we have trace chunk. `UPDATE` with 128kB of data is issued once we receive multiple chunks. - and so on -**How to test** +**How to test?** We're currently evaluating this feature on dev.gitalb.org or staging.gitlab.com to verify this features. Here is the list of tests/measurements. -- Features +- Features: - Live traces should be visible on job pages - Archived traces should be visible on job pages - Live traces should be archived to Object storage - Live traces should be cleaned up after archived - etc -- Performance +- Performance: - Schedule 1000~10000 jobs and let GitLab-runners process concurrently. Measure memoery presssure, IO load, etc. - etc -- Failover +- Failover: - Simulate Redis outage - etc -**How to verify the correctnesss** - - - TBD +**How to verify the correctnesss?** + +- TBD [ce-44935]: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/18169