diff --git a/doc/ci/directed_acyclic_graph/index.md b/doc/ci/directed_acyclic_graph/index.md new file mode 100644 index 00000000000..2f000719c04 --- /dev/null +++ b/doc/ci/directed_acyclic_graph/index.md @@ -0,0 +1,76 @@ +--- +type: reference +--- + +# Directed Acyclic Graph + +> [Introduced](https://gitlab.com/gitlab-org/gitlab-ce/issues/47063) in GitLab 12.2 (enabled by `ci_dag_support` feature flag). + +A [directed acyclic graph](https://www.techopedia.com/definition/5739/directed-acyclic-graph-dag) can be +used in the context of a CI/CD pipeline to build relationships between jobs such that +execution is performed in the quickest possible manner, regardless how stages may +be set up. + +For example, you may have a specific tool or separate website that is built +as part of your main project. Using a DAG, you can specify the relationship between +these jobs and GitLab will then execute the jobs as soon as possible instead of waiting +for each stage to complete. + +Unlike other DAG solutions for CI/CD, GitLab does not require you to choose one or the +other. You can implement a hybrid combination of DAG and traditional +stage-based operation within a single pipeline. Configuration is kept very simple, +requiring a single keyword to enable the feature for any job. + +Consider a monorepo as follows: + +``` +./service_a +./service_b +./service_c +./service_d +``` + +It has a pipeline that looks like the following: + +| build | test | deploy | +| ----- | ---- | ------ | +| build_a | test_a | deploy_a | +| build_b | test_b | deploy_b | +| build_c | test_c | deploy_c | +| build_d | test_d | deploy_d | + +Using a DAG, you can relate the `_a` jobs to each other separately from the `_b` jobs, +and even if service `a` takes a very long time to build, service `b` will not +wait for it and will finish as quickly as it can. In this very same pipeline, `_c` and +`_d` can be left alone and will run together in staged sequence just like any normal +GitLab pipeline. + +## Use cases + +A DAG can help solve several different kinds of relationships between jobs within +a CI/CD pipeline. Most typically this would cover when jobs need to fan in or out, +and/or merge back together (diamond dependencies). This can happen when you're +handling multi-platform builds or complex webs of dependencies as in something like +an operating system build or a complex deployment graph of independently deployable +but related microservices. + +Additionally, a DAG can help with general speediness of pipelines and helping +to deliver fast feedback. By creating dependency relationships that don't unnecessarily +block each other, your pipelines will run as quickly as possible regardless of +pipeline stages, ensuring output (including errors) is available to developers +as quickly as possible. + +## Usage + +Relationships are defined between jobs using the [`needs:` keyword](../yaml/README.md#needs). + +Note that `needs:` also works with the [parallel](../yaml/README.md#parallel) keyword, +giving your powerful options for parallelization within your pipeline. + +## Limitations + +A directed acyclic graph is a complicated feature, and as of the initial MVC there +are certain use cases that you may need to work around. For more information: + + - [`needs` requirements and limitations](../yaml/README.md#requirements-and-limitations). + - Related epic [gitlab-org#1716](https://gitlab.com/groups/gitlab-org/-/epics/1716). diff --git a/doc/ci/yaml/README.md b/doc/ci/yaml/README.md index a6051e87366..2be93433b36 100644 --- a/doc/ci/yaml/README.md +++ b/doc/ci/yaml/README.md @@ -1665,6 +1665,84 @@ You can ask your administrator to [flip this switch](../../administration/job_artifacts.md#validation-for-dependencies) and bring back the old behavior. +### `needs` + +> Introduced in GitLab 12.2. + +The `needs:` keyword enables executing jobs out-of-order, allowing you to implement +a [directed acyclic graph](../directed_acyclic_graph/index.md) in your `.gitlab-ci.yml`. + +This lets you run some jobs without waiting for other ones, disregarding stage ordering +so you can have multiple stages running concurrently. + +Let's consider the following example: + +```yaml +linux:build: + stage: build + +mac:build: + stage: build + +linux:rspec: + stage: test + needs: [linux:build] + +linux:rubocop: + stage: test + needs: [linux:build] + +mac:rspec: + stage: test + needs: [mac:build] + +mac:rubocop: + stage: test + needs: [mac:build] + +production: + stage: deploy +``` + +This example creates three paths of execution: + +- Linux path: the `linux:rspec` and `linux:rubocop` jobs will be run as soon + as the `linux:build` job finishes without waiting for `mac:build` to finish. + +- macOS path: the `mac:rspec` and `mac:rubocop` jobs will be run as soon + as the `mac:build` job finishes, without waiting for `linux:build` to finish. + +- The `production` job will be executed as soon as all previous jobs + finish; in this case: `linux:build`, `linux:rspec`, `linux:rubocop`, + `mac:build`, `mac:rspec`, `mac:rubocop`. + +#### Requirements and limitations + +1. If `needs:` is set to point to a job that is not instantiated + because of `only/except` rules or otherwise does not exist, the + job will fail. +1. Note that one day one of the launch, we are temporarily limiting the + maximum number of jobs that a single job can need in the `needs:` array. Track + our [infrastructure issue](https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7541) + for details on the current limit. +1. If you use `dependencies:` with `needs:`, it's important that you + do not mark a job as having a dependency on something that won't + have been run at the time it needs it. It's better to use both + keywords in this case so that GitLab handles the ordering appropriately. +1. It is impossible for now to have `needs: []` (empty needs), + the job always needs to depend on something, unless this is the job + in the first stage (see [gitlab-ce#65504](https://gitlab.com/gitlab-org/gitlab-ce/issues/65504)). +1. If `needs:` refers to a job that is marked as `parallel:`. + the current job will depend on all parallel jobs created. +1. `needs:` is similar to `dependencies:` in that needs to use jobs from + prior stages, this means that it is impossible to create circular + dependencies or depend on jobs in the current stage (see [gitlab-ce#65505](https://gitlab.com/gitlab-org/gitlab-ce/issues/65505)). +1. Related to the above, stages must be explicitly defined for all jobs + that have the keyword `needs:` or are referred to by one. +1. For self-managed users, the feature must be turned on using the `ci_dag_support` + feature flag. The `ci_dag_limit_needs` option, if set, will limit the number of + jobs that a single job can need to `50`. If unset, the limit is `5`. + ### `coverage` > [Introduced][ce-7447] in GitLab 8.17.