gitlab-org--gitlab-foss/doc/administration/consul.md

---
stage: Systems
group: Distribution
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
type: reference
---

# How to set up Consul **(PREMIUM SELF)**

A Consul cluster consists of both
[server and client agents](https://www.consul.io/docs/agent).
The servers run on their own nodes and the clients run on other nodes that in
turn communicate with the servers.

GitLab Premium includes a bundled version of [Consul](https://www.consul.io/)
a service networking solution that you can manage by using `/etc/gitlab/gitlab.rb`.

## Prerequisites

Before configuring Consul:

1. Review the [reference architecture](reference_architectures/index.md#available-reference-architectures)
   documentation to determine the number of Consul server nodes you should have.
1. If necessary, ensure the [appropriate ports are open](package_information/defaults.md#ports) in your firewall.

## Configure the Consul nodes

On _each_ Consul server node:

1. Follow the instructions to [install](https://about.gitlab.com/install/)
   GitLab by choosing your preferred platform, but do not supply the
   `EXTERNAL_URL` value when asked.
1. Edit `/etc/gitlab/gitlab.rb`, and add the following by replacing the values
   noted in the `retry_join` section. In the example below, there are three
   nodes, two denoted with their IP, and one with its FQDN, you can use either
   notation:

   ```ruby
   # Disable all components except Consul
   roles ['consul_role']

   # Consul nodes: can be FQDN or IP, separated by a whitespace
   consul['configuration'] = {
     server: true,
     retry_join: %w(10.10.10.1 consul1.gitlab.example.com 10.10.10.2)
   }

   # Disable auto migrations
   gitlab_rails['auto_migrate'] = false
   ```

1. [Reconfigure GitLab](restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes
   to take effect.
1. Run the following command to ensure Consul is both configured correctly and
   to verify that all server nodes are communicating:

   ```shell
   sudo /opt/gitlab/embedded/bin/consul members
   ```

   The output should be similar to:

   ```plaintext
   Node                 Address               Status  Type    Build  Protocol  DC
   CONSUL_NODE_ONE      XXX.XXX.XXX.YYY:8301  alive   server  0.9.2  2         gitlab_consul
   CONSUL_NODE_TWO      XXX.XXX.XXX.YYY:8301  alive   server  0.9.2  2         gitlab_consul
   CONSUL_NODE_THREE    XXX.XXX.XXX.YYY:8301  alive   server  0.9.2  2         gitlab_consul
   ```

   If the results display any nodes with a status that isn't `alive`, or if any
   of the three nodes are missing, see the [Troubleshooting section](#troubleshooting-consul).

## Upgrade the Consul nodes

To upgrade your Consul nodes, upgrade the GitLab package.

Nodes should be:

- Members of a healthy cluster prior to upgrading the Omnibus GitLab package.
- Upgraded one node at a time.

Identify any existing health issues in the cluster by running the following command
in each node. The command returns an empty array if the cluster is healthy:

```shell
curl "http://127.0.0.1:8500/v1/health/state/critical"
```

If the Consul version has changed, you see a notice at the end of `gitlab-ctl reconfigure`
informing you that Consul must be restarted for the new version to be used.

Restart Consul one node at a time:

```shell
sudo gitlab-ctl restart consul
```

Consul nodes communicate using the raft protocol. If the current leader goes
offline, there must be a leader election. A leader node must exist to facilitate
synchronization across the cluster. If too many nodes go offline at the same time,
the cluster loses quorum and doesn't elect a leader due to
[broken consensus](https://www.consul.io/docs/architecture/consensus).

Consult the [troubleshooting section](#troubleshooting-consul) if the cluster is not
able to recover after the upgrade. The [outage recovery](#outage-recovery) may
be of particular interest.

GitLab uses Consul to store only easily regenerated, transient data. If the
bundled Consul wasn't used by any process other than GitLab itself, you can
[rebuild the cluster from scratch](#recreate-from-scratch).

## Troubleshooting Consul

Below are some operations should you debug any issues.
You can see any error logs by running:

```shell
sudo gitlab-ctl tail consul
```

### Check the cluster membership

To determine which nodes are part of the cluster, run the following on any member in the cluster:

```shell
sudo /opt/gitlab/embedded/bin/consul members
```

The output should be similar to:

```plaintext
Node            Address               Status  Type    Build  Protocol  DC
consul-b        XX.XX.X.Y:8301        alive   server  0.9.0  2         gitlab_consul
consul-c        XX.XX.X.Y:8301        alive   server  0.9.0  2         gitlab_consul
consul-c        XX.XX.X.Y:8301        alive   server  0.9.0  2         gitlab_consul
db-a            XX.XX.X.Y:8301        alive   client  0.9.0  2         gitlab_consul
db-b            XX.XX.X.Y:8301        alive   client  0.9.0  2         gitlab_consul
```

Ideally all nodes have a `Status` of `alive`.

### Restart Consul

If it is necessary to restart Consul, it is important to do this in
a controlled manner to maintain quorum. If quorum is lost, to recover the cluster,
you follow the Consul [outage recovery](#outage-recovery) process.

To be safe, it's recommended that you only restart Consul in one node at a time to
ensure the cluster remains intact. For larger clusters, it is possible to restart
multiple nodes at a time. See the
[Consul consensus document](https://www.consul.io/docs/architecture/consensus#deployment-table)
for the number of failures it can tolerate. This is the number of simultaneous
restarts it can sustain.

To restart Consul:

```shell
sudo gitlab-ctl restart consul
```

### Consul nodes unable to communicate

By default, Consul attempts to
[bind](https://www.consul.io/docs/agent/config/config-files#bind_addr) to `0.0.0.0`, but
it advertises the first private IP address on the node for other Consul nodes
to communicate with it. If the other nodes cannot communicate with a node on
this address, then the cluster has a failed status.

If you run into this issue, then messages like the following are output in `gitlab-ctl tail consul`:

```plaintext
2017-09-25_19:53:39.90821     2017/09/25 19:53:39 [WARN] raft: no known peers, aborting election
2017-09-25_19:53:41.74356     2017/09/25 19:53:41 [ERR] agent: failed to sync remote state: No cluster leader
```

To fix this:

1. Pick an address on each node that all of the other nodes can reach this node through.
1. Update your `/etc/gitlab/gitlab.rb`

   ```ruby
   consul['configuration'] = {
     ...
     bind_addr: 'IP ADDRESS'
   }
   ```

1. Reconfigure GitLab;

   ```shell
   gitlab-ctl reconfigure
   ```

If you still see the errors, you may have to
[erase the Consul database and reinitialize](#recreate-from-scratch) on the affected node.

### Consul does not start - multiple private IPs

If a node has multiple private IPs, Consul doesn't know about
which of the private addresses to advertise, and then it immediately exits on start.

Messages like the following are output in `gitlab-ctl tail consul`:

```plaintext
2017-11-09_17:41:45.52876 ==> Starting Consul agent...
2017-11-09_17:41:45.53057 ==> Error creating agent: Failed to get advertise address: Multiple private IPs found. Please configure one.
```

To fix this:

1. Pick an address on the node that all of the other nodes can reach this node through.
1. Update your `/etc/gitlab/gitlab.rb`

   ```ruby
   consul['configuration'] = {
     ...
     bind_addr: 'IP ADDRESS'
   }
   ```

1. Reconfigure GitLab;

   ```shell
   gitlab-ctl reconfigure
   ```

### Outage recovery

If you have lost enough Consul nodes in the cluster to break quorum, then the cluster
is considered to have failed and cannot function without manual intervention.
In that case, you can either recreate the nodes from scratch or attempt a
recover.

#### Recreate from scratch

By default, GitLab does not store anything in the Consul node that cannot be
recreated. To erase the Consul database and reinitialize:

```shell
sudo gitlab-ctl stop consul
sudo rm -rf /var/opt/gitlab/consul/data
sudo gitlab-ctl start consul
```

After this, the node should start back up, and the rest of the server agents rejoin.
Shortly after that, the client agents should rejoin as well.

#### Recover a failed node

If you have taken advantage of Consul to store other data and want to restore
the failed node, follow the
[Consul guide](https://learn.hashicorp.com/tutorials/consul/recovery-outage)
to recover a failed cluster.
Add latest changes from gitlab-org/gitlab@master 2020-07-31 23:09:36 -04:00			`---`
Add latest changes from gitlab-org/gitlab@master 2022-05-29 20:08:35 -04:00			`stage: Systems`
Add latest changes from gitlab-org/gitlab@master 2020-12-01 13:09:42 -05:00			`group: Distribution`
Add latest changes from gitlab-org/gitlab@master 2022-09-21 17:13:33 -04:00			`info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments`
Add latest changes from gitlab-org/gitlab@master 2020-07-31 23:09:36 -04:00			`type: reference`
			`---`

Add latest changes from gitlab-org/gitlab@master 2021-01-28 07:09:54 -05:00			`# How to set up Consul (PREMIUM SELF)`
Add latest changes from gitlab-org/gitlab@master 2020-07-31 23:09:36 -04:00
			`A Consul cluster consists of both`
			`[server and client agents](https://www.consul.io/docs/agent).`
			`The servers run on their own nodes and the clients run on other nodes that in`
			`turn communicate with the servers.`

			`GitLab Premium includes a bundled version of [Consul](https://www.consul.io/)`
			a service networking solution that you can manage by using `/etc/gitlab/gitlab.rb`.

Add latest changes from gitlab-org/gitlab@master 2021-06-14 17:10:22 -04:00			`## Prerequisites`

			`Before configuring Consul:`

			`1. Review the [reference architecture](reference_architectures/index.md#available-reference-architectures)`
			`documentation to determine the number of Consul server nodes you should have.`
Add latest changes from gitlab-org/gitlab@master 2021-09-17 05:09:24 -04:00			`1. If necessary, ensure the [appropriate ports are open](package_information/defaults.md#ports) in your firewall.`
Add latest changes from gitlab-org/gitlab@master 2021-06-14 17:10:22 -04:00
Add latest changes from gitlab-org/gitlab@master 2020-07-31 23:09:36 -04:00			`## Configure the Consul nodes`

Add latest changes from gitlab-org/gitlab@master 2021-06-14 17:10:22 -04:00			`On _each_ Consul server node:`
Add latest changes from gitlab-org/gitlab@master 2020-07-31 23:09:36 -04:00
			`1. Follow the instructions to [install](https://about.gitlab.com/install/)`
			`GitLab by choosing your preferred platform, but do not supply the`
			`EXTERNAL_URL` value when asked.
			1. Edit `/etc/gitlab/gitlab.rb`, and add the following by replacing the values
			noted in the `retry_join` section. In the example below, there are three
			`nodes, two denoted with their IP, and one with its FQDN, you can use either`
			`notation:`

			```ruby
			`# Disable all components except Consul`
			`roles ['consul_role']`

			`# Consul nodes: can be FQDN or IP, separated by a whitespace`
			`consul['configuration'] = {`
			`server: true,`
			`retry_join: %w(10.10.10.1 consul1.gitlab.example.com 10.10.10.2)`
			`}`

			`# Disable auto migrations`
			`gitlab_rails['auto_migrate'] = false`
			```

			`1. [Reconfigure GitLab](restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes`
			`to take effect.`
			`1. Run the following command to ensure Consul is both configured correctly and`
			`to verify that all server nodes are communicating:`

			```shell
			`sudo /opt/gitlab/embedded/bin/consul members`
			```

			`The output should be similar to:`

			```plaintext
			`Node Address Status Type Build Protocol DC`
			`CONSUL_NODE_ONE XXX.XXX.XXX.YYY:8301 alive server 0.9.2 2 gitlab_consul`
			`CONSUL_NODE_TWO XXX.XXX.XXX.YYY:8301 alive server 0.9.2 2 gitlab_consul`
			`CONSUL_NODE_THREE XXX.XXX.XXX.YYY:8301 alive server 0.9.2 2 gitlab_consul`
			```

			If the results display any nodes with a status that isn't `alive`, or if any
			`of the three nodes are missing, see the [Troubleshooting section](#troubleshooting-consul).`

			`## Upgrade the Consul nodes`

			`To upgrade your Consul nodes, upgrade the GitLab package.`

			`Nodes should be:`

			`- Members of a healthy cluster prior to upgrading the Omnibus GitLab package.`
			`- Upgraded one node at a time.`

			`Identify any existing health issues in the cluster by running the following command`
Add latest changes from gitlab-org/gitlab@master 2022-07-15 08:10:10 -04:00			`in each node. The command returns an empty array if the cluster is healthy:`
Add latest changes from gitlab-org/gitlab@master 2020-07-31 23:09:36 -04:00
			```shell
Add latest changes from gitlab-org/gitlab@master 2020-12-09 01:09:41 -05:00			`curl "http://127.0.0.1:8500/v1/health/state/critical"`
Add latest changes from gitlab-org/gitlab@master 2020-07-31 23:09:36 -04:00			```

Add latest changes from gitlab-org/gitlab@master 2022-07-06 05:08:10 -04:00			If the Consul version has changed, you see a notice at the end of `gitlab-ctl reconfigure`
Add latest changes from gitlab-org/gitlab@master 2022-07-15 08:10:10 -04:00			`informing you that Consul must be restarted for the new version to be used.`
Add latest changes from gitlab-org/gitlab@master 2021-06-09 08:10:27 -04:00
			`Restart Consul one node at a time:`

			```shell
			`sudo gitlab-ctl restart consul`
			```

Add latest changes from gitlab-org/gitlab@master 2020-07-31 23:09:36 -04:00			`Consul nodes communicate using the raft protocol. If the current leader goes`
Add latest changes from gitlab-org/gitlab@master 2022-07-15 08:10:10 -04:00			`offline, there must be a leader election. A leader node must exist to facilitate`
Add latest changes from gitlab-org/gitlab@master 2020-07-31 23:09:36 -04:00			`synchronization across the cluster. If too many nodes go offline at the same time,`
Add latest changes from gitlab-org/gitlab@master 2021-05-03 14:10:17 -04:00			`the cluster loses quorum and doesn't elect a leader due to`
Add latest changes from gitlab-org/gitlab@master 2020-10-28 05:08:37 -04:00			`[broken consensus](https://www.consul.io/docs/architecture/consensus).`
Add latest changes from gitlab-org/gitlab@master 2020-07-31 23:09:36 -04:00
			`Consult the [troubleshooting section](#troubleshooting-consul) if the cluster is not`
			`able to recover after the upgrade. The [outage recovery](#outage-recovery) may`
			`be of particular interest.`

Add latest changes from gitlab-org/gitlab@master 2020-11-03 16:09:12 -05:00			`GitLab uses Consul to store only easily regenerated, transient data. If the`
			`bundled Consul wasn't used by any process other than GitLab itself, you can`
			`[rebuild the cluster from scratch](#recreate-from-scratch).`
Add latest changes from gitlab-org/gitlab@master 2020-07-31 23:09:36 -04:00
			`## Troubleshooting Consul`

Add latest changes from gitlab-org/gitlab@master 2022-07-15 08:10:10 -04:00			`Below are some operations should you debug any issues.`
Add latest changes from gitlab-org/gitlab@master 2020-07-31 23:09:36 -04:00			`You can see any error logs by running:`

			```shell
			`sudo gitlab-ctl tail consul`
			```

			`### Check the cluster membership`

			`To determine which nodes are part of the cluster, run the following on any member in the cluster:`

			```shell
			`sudo /opt/gitlab/embedded/bin/consul members`
			```

			`The output should be similar to:`

			```plaintext
			`Node Address Status Type Build Protocol DC`
			`consul-b XX.XX.X.Y:8301 alive server 0.9.0 2 gitlab_consul`
			`consul-c XX.XX.X.Y:8301 alive server 0.9.0 2 gitlab_consul`
			`consul-c XX.XX.X.Y:8301 alive server 0.9.0 2 gitlab_consul`
			`db-a XX.XX.X.Y:8301 alive client 0.9.0 2 gitlab_consul`
			`db-b XX.XX.X.Y:8301 alive client 0.9.0 2 gitlab_consul`
			```

Add latest changes from gitlab-org/gitlab@master 2021-05-03 14:10:17 -04:00			Ideally all nodes have a `Status` of `alive`.
Add latest changes from gitlab-org/gitlab@master 2020-07-31 23:09:36 -04:00
			`### Restart Consul`

			`If it is necessary to restart Consul, it is important to do this in`
			`a controlled manner to maintain quorum. If quorum is lost, to recover the cluster,`
Add latest changes from gitlab-org/gitlab@master 2021-05-03 14:10:17 -04:00			`you follow the Consul [outage recovery](#outage-recovery) process.`
Add latest changes from gitlab-org/gitlab@master 2020-07-31 23:09:36 -04:00
			`To be safe, it's recommended that you only restart Consul in one node at a time to`
			`ensure the cluster remains intact. For larger clusters, it is possible to restart`
			`multiple nodes at a time. See the`
Add latest changes from gitlab-org/gitlab@master 2020-12-03 19:09:55 -05:00			`[Consul consensus document](https://www.consul.io/docs/architecture/consensus#deployment-table)`
Add latest changes from gitlab-org/gitlab@master 2022-07-06 05:08:10 -04:00			`for the number of failures it can tolerate. This is the number of simultaneous`
Add latest changes from gitlab-org/gitlab@master 2020-07-31 23:09:36 -04:00			`restarts it can sustain.`

			`To restart Consul:`

			```shell
			`sudo gitlab-ctl restart consul`
			```

			`### Consul nodes unable to communicate`

Add latest changes from gitlab-org/gitlab@master 2021-05-03 14:10:17 -04:00			`By default, Consul attempts to`
Add latest changes from gitlab-org/gitlab@master 2022-06-09 05:09:12 -04:00			[bind](https://www.consul.io/docs/agent/config/config-files#bind_addr) to `0.0.0.0`, but
Add latest changes from gitlab-org/gitlab@master 2021-05-03 14:10:17 -04:00			`it advertises the first private IP address on the node for other Consul nodes`
Add latest changes from gitlab-org/gitlab@master 2020-07-31 23:09:36 -04:00			`to communicate with it. If the other nodes cannot communicate with a node on`
Add latest changes from gitlab-org/gitlab@master 2021-05-03 14:10:17 -04:00			`this address, then the cluster has a failed status.`
Add latest changes from gitlab-org/gitlab@master 2020-07-31 23:09:36 -04:00
Add latest changes from gitlab-org/gitlab@master 2021-05-03 14:10:17 -04:00			If you run into this issue, then messages like the following are output in `gitlab-ctl tail consul`:
Add latest changes from gitlab-org/gitlab@master 2020-07-31 23:09:36 -04:00
			```plaintext
			`2017-09-25_19:53:39.90821 2017/09/25 19:53:39 [WARN] raft: no known peers, aborting election`
			`2017-09-25_19:53:41.74356 2017/09/25 19:53:41 [ERR] agent: failed to sync remote state: No cluster leader`
			```

			`To fix this:`

			`1. Pick an address on each node that all of the other nodes can reach this node through.`
			1. Update your `/etc/gitlab/gitlab.rb`

			```ruby
			`consul['configuration'] = {`
			`...`
			`bind_addr: 'IP ADDRESS'`
			`}`
			```

			`1. Reconfigure GitLab;`

			```shell
			`gitlab-ctl reconfigure`
			```

			`If you still see the errors, you may have to`
			`[erase the Consul database and reinitialize](#recreate-from-scratch) on the affected node.`

			`### Consul does not start - multiple private IPs`

Add latest changes from gitlab-org/gitlab@master 2021-05-03 14:10:17 -04:00			`If a node has multiple private IPs, Consul doesn't know about`
			`which of the private addresses to advertise, and then it immediately exits on start.`
Add latest changes from gitlab-org/gitlab@master 2020-07-31 23:09:36 -04:00
Add latest changes from gitlab-org/gitlab@master 2021-05-03 14:10:17 -04:00			Messages like the following are output in `gitlab-ctl tail consul`:
Add latest changes from gitlab-org/gitlab@master 2020-07-31 23:09:36 -04:00
			```plaintext
			`2017-11-09_17:41:45.52876 ==> Starting Consul agent...`
			`2017-11-09_17:41:45.53057 ==> Error creating agent: Failed to get advertise address: Multiple private IPs found. Please configure one.`
			```

			`To fix this:`

			`1. Pick an address on the node that all of the other nodes can reach this node through.`
			1. Update your `/etc/gitlab/gitlab.rb`

			```ruby
			`consul['configuration'] = {`
			`...`
			`bind_addr: 'IP ADDRESS'`
			`}`
			```

			`1. Reconfigure GitLab;`

			```shell
			`gitlab-ctl reconfigure`
			```

			`### Outage recovery`

Add latest changes from gitlab-org/gitlab@master 2021-05-03 14:10:17 -04:00			`If you have lost enough Consul nodes in the cluster to break quorum, then the cluster`
			`is considered to have failed and cannot function without manual intervention.`
Add latest changes from gitlab-org/gitlab@master 2020-07-31 23:09:36 -04:00			`In that case, you can either recreate the nodes from scratch or attempt a`
			`recover.`

			`#### Recreate from scratch`

			`By default, GitLab does not store anything in the Consul node that cannot be`
			`recreated. To erase the Consul database and reinitialize:`

			```shell
			`sudo gitlab-ctl stop consul`
			`sudo rm -rf /var/opt/gitlab/consul/data`
			`sudo gitlab-ctl start consul`
			```

			`After this, the node should start back up, and the rest of the server agents rejoin.`
			`Shortly after that, the client agents should rejoin as well.`

			`#### Recover a failed node`

			`If you have taken advantage of Consul to store other data and want to restore`
			`the failed node, follow the`
Add latest changes from gitlab-org/gitlab@master 2020-08-18 14:10:10 -04:00			`[Consul guide](https://learn.hashicorp.com/tutorials/consul/recovery-outage)`
Add latest changes from gitlab-org/gitlab@master 2020-07-31 23:09:36 -04:00			`to recover a failed cluster.`