gitlab-org--gitlab-foss/doc/administration/high_availability/consul.md

---
type: reference
---

# Working with the bundled Consul service **(PREMIUM ONLY)**

As part of its High Availability stack, GitLab Premium includes a bundled version of [Consul](https://www.consul.io/) that can be managed through `/etc/gitlab/gitlab.rb`.

A Consul cluster consists of multiple server agents, as well as client agents that run on other nodes which need to talk to the Consul cluster.

## Prerequisites

First, make sure to [download/install](https://about.gitlab.com/install/)
GitLab Omnibus **on each node**.

Choose an installation method, then make sure you complete steps:

1. Install and configure the necessary dependencies.
1. Add the GitLab package repository and install the package.

When installing the GitLab package, do not supply `EXTERNAL_URL` value.

## Configuring the Consul nodes

On each Consul node perform the following:

1. Make sure you collect [`CONSUL_SERVER_NODES`](database.md#consul-information), which are the IP addresses or DNS records of the Consul server nodes, for the next step, before executing the next step.

1. Edit `/etc/gitlab/gitlab.rb` replacing values noted in the `# START user configuration` section:

   ```ruby
   # Disable all components except Consul
   roles ['consul_role']

   # START user configuration
   # Replace placeholders:
   #
   # Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z
   # with the addresses gathered for CONSUL_SERVER_NODES
   consul['configuration'] = {
     server: true,
     retry_join: %w(Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z)
   }

   # Disable auto migrations
   gitlab_rails['auto_migrate'] = false
   #
   # END user configuration
   ```

   > `consul_role` was introduced with GitLab 10.3

1. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes
   to take effect.

### Consul checkpoint

Before moving on, make sure Consul is configured correctly. Run the following
command to verify all server nodes are communicating:

```shell
/opt/gitlab/embedded/bin/consul members
```

The output should be similar to:

```plaintext
Node                 Address               Status  Type    Build  Protocol  DC
CONSUL_NODE_ONE      XXX.XXX.XXX.YYY:8301  alive   server  0.9.2  2         gitlab_consul
CONSUL_NODE_TWO      XXX.XXX.XXX.YYY:8301  alive   server  0.9.2  2         gitlab_consul
CONSUL_NODE_THREE    XXX.XXX.XXX.YYY:8301  alive   server  0.9.2  2         gitlab_consul
```

If any of the nodes isn't `alive` or if any of the three nodes are missing,
check the [Troubleshooting section](#troubleshooting) before proceeding.

## Operations

### Checking cluster membership

To see which nodes are part of the cluster, run the following on any member in the cluster

```shell
$ /opt/gitlab/embedded/bin/consul members
Node            Address               Status  Type    Build  Protocol  DC
consul-b        XX.XX.X.Y:8301        alive   server  0.9.0  2         gitlab_consul
consul-c        XX.XX.X.Y:8301        alive   server  0.9.0  2         gitlab_consul
consul-c        XX.XX.X.Y:8301        alive   server  0.9.0  2         gitlab_consul
db-a            XX.XX.X.Y:8301        alive   client  0.9.0  2         gitlab_consul
db-b            XX.XX.X.Y:8301        alive   client  0.9.0  2         gitlab_consul
```

Ideally all nodes will have a `Status` of `alive`.

### Restarting the server cluster

**Note**: This section only applies to server agents. It is safe to restart client agents whenever needed.

If it is necessary to restart the server cluster, it is important to do this in a controlled fashion in order to maintain quorum. If quorum is lost, you will need to follow the Consul [outage recovery](#outage-recovery) process to recover the cluster.

To be safe, we recommend you only restart one server agent at a time to ensure the cluster remains intact.

For larger clusters, it is possible to restart multiple agents at a time. See the [Consul consensus document](https://www.consul.io/docs/internals/consensus.html#deployment-table) for how many failures it can tolerate. This will be the number of simultaneous restarts it can sustain.

## Upgrades for bundled Consul

Nodes running GitLab-bundled Consul should be:

- Members of a healthy cluster prior to upgrading the GitLab Omnibus package.
- Upgraded one node at a time.

NOTE: **NOTE:**
Running `curl http://127.0.0.1:8500/v1/health/state/critical` from any Consul node will identify existing health issues in the cluster. The command will return an empty array if the cluster is healthy.

Consul clusters communicate using the raft protocol. If the current leader goes offline, there needs to be a leader election. A leader node must exist to facilitate synchronization across the cluster. If too many nodes go offline at the same time, the cluster will lose quorum and not elect a leader due to [broken consensus](https://www.consul.io/docs/internals/consensus.html).

Consult the [troubleshooting section](#troubleshooting) if the cluster is not able to recover after the upgrade. The [outage recovery](#outage-recovery) may be of particular interest.

NOTE: **NOTE:**
GitLab only uses Consul to store transient data that is easily regenerated. If the bundled Consul was not used by any process other than GitLab itself, then [rebuilding the cluster from scratch](#recreate-from-scratch) is fine.

## Troubleshooting

### Consul server agents unable to communicate

By default, the server agents will attempt to [bind](https://www.consul.io/docs/agent/options.html#_bind) to '0.0.0.0', but they will advertise the first private IP address on the node for other agents to communicate with them. If the other nodes cannot communicate with a node on this address, then the cluster will have a failed status.

You will see messages like the following in `gitlab-ctl tail consul` output if you are running into this issue:

```plaintext
2017-09-25_19:53:39.90821     2017/09/25 19:53:39 [WARN] raft: no known peers, aborting election
2017-09-25_19:53:41.74356     2017/09/25 19:53:41 [ERR] agent: failed to sync remote state: No cluster leader
```

To fix this:

1. Pick an address on each node that all of the other nodes can reach this node through.
1. Update your `/etc/gitlab/gitlab.rb`

   ```ruby
   consul['configuration'] = {
     ...
     bind_addr: 'IP ADDRESS'
   }
   ```

1. Run `gitlab-ctl reconfigure`

If you still see the errors, you may have to [erase the Consul database and reinitialize](#recreate-from-scratch) on the affected node.

### Consul agents do not start - Multiple private IPs

In the case that a node has multiple private IPs the agent be confused as to which of the private addresses to advertise, and then immediately exit on start.

You will see messages like the following in `gitlab-ctl tail consul` output if you are running into this issue:

```plaintext
2017-11-09_17:41:45.52876 ==> Starting Consul agent...
2017-11-09_17:41:45.53057 ==> Error creating agent: Failed to get advertise address: Multiple private IPs found. Please configure one.
```

To fix this:

1. Pick an address on the node that all of the other nodes can reach this node through.
1. Update your `/etc/gitlab/gitlab.rb`

   ```ruby
   consul['configuration'] = {
     ...
     bind_addr: 'IP ADDRESS'
   }
   ```

1. Run `gitlab-ctl reconfigure`

### Outage recovery

If you lost enough server agents in the cluster to break quorum, then the cluster is considered failed, and it will not function without manual intervention.

#### Recreate from scratch

By default, GitLab does not store anything in the Consul cluster that cannot be recreated. To erase the Consul database and reinitialize

```shell
gitlab-ctl stop consul
rm -rf /var/opt/gitlab/consul/data
gitlab-ctl start consul
```

After this, the cluster should start back up, and the server agents rejoin. Shortly after that, the client agents should rejoin as well.

#### Recover a failed cluster

If you have taken advantage of Consul to store other data, and want to restore the failed cluster, please follow the [Consul guide](https://learn.hashicorp.com/consul/day-2-operations/outage) to recover a failed cluster.
SSoT for administration/high_availability docs - Make sure we have type defined in the frontmatter - Add troubleshooting sections where absent - Meaningful intros where absent 2019-07-16 03:32:29 +00:00			`---`
			`type: reference`
			`---`
Docs: Merge EE doc/administration/high_availability to CE 2019-05-05 22:58:11 +00:00
SSoT for administration/high_availability docs - Make sure we have type defined in the frontmatter - Add troubleshooting sections where absent - Meaningful intros where absent 2019-07-16 03:32:29 +00:00			`# Working with the bundled Consul service (PREMIUM ONLY)`
Docs: Merge EE doc/administration/high_availability to CE 2019-05-05 22:58:11 +00:00
Update redirected links in CE part 1 First MR in a series updating all redirected links in CE documentation to the destination URLs 2019-07-08 00:41:33 +00:00			As part of its High Availability stack, GitLab Premium includes a bundled version of [Consul](https://www.consul.io/) that can be managed through `/etc/gitlab/gitlab.rb`.
Docs: Merge EE doc/administration/high_availability to CE 2019-05-05 22:58:11 +00:00
Add latest changes from gitlab-org/gitlab@master 2019-09-23 06:06:19 +00:00			`A Consul cluster consists of multiple server agents, as well as client agents that run on other nodes which need to talk to the Consul cluster.`
Docs: Merge EE doc/administration/high_availability to CE 2019-05-05 22:58:11 +00:00
Move the consul server docs to its own section The consul docs are currently in the HA database section, but it should be in it's own section. 2019-07-08 23:42:35 +00:00			`## Prerequisites`

			`First, make sure to [download/install](https://about.gitlab.com/install/)`
			`GitLab Omnibus on each node.`

			`Choose an installation method, then make sure you complete steps:`

			`1. Install and configure the necessary dependencies.`
			`1. Add the GitLab package repository and install the package.`

			When installing the GitLab package, do not supply `EXTERNAL_URL` value.

			`## Configuring the Consul nodes`

			`On each Consul node perform the following:`

			1. Make sure you collect [`CONSUL_SERVER_NODES`](database.md#consul-information), which are the IP addresses or DNS records of the Consul server nodes, for the next step, before executing the next step.

			1. Edit `/etc/gitlab/gitlab.rb` replacing values noted in the `# START user configuration` section:

Fix whitespace in many administration docs Many code blocks are 4spaced, and they render in GitLab without coloring as a result, even though they are fenced with a language label. If in a list, other items will render as being in a code block too, even if not meant to. This fixes all these issues for many admin docs in /high_availability and /auth (part 1) 2019-07-12 02:06:46 +00:00			```ruby
			`# Disable all components except Consul`
			`roles ['consul_role']`

			`# START user configuration`
			`# Replace placeholders:`
			`#`
			`# Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z`
			`# with the addresses gathered for CONSUL_SERVER_NODES`
			`consul['configuration'] = {`
			`server: true,`
			`retry_join: %w(Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z)`
			`}`

			`# Disable auto migrations`
			`gitlab_rails['auto_migrate'] = false`
			`#`
			`# END user configuration`
			```

			> `consul_role` was introduced with GitLab 10.3
Move the consul server docs to its own section The consul docs are currently in the HA database section, but it should be in it's own section. 2019-07-08 23:42:35 +00:00
			`1. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes`
			`to take effect.`

			`### Consul checkpoint`

			`Before moving on, make sure Consul is configured correctly. Run the following`
			`command to verify all server nodes are communicating:`

Add latest changes from gitlab-org/gitlab@master 2020-01-30 15:09:15 +00:00			```shell
Move the consul server docs to its own section The consul docs are currently in the HA database section, but it should be in it's own section. 2019-07-08 23:42:35 +00:00			`/opt/gitlab/embedded/bin/consul members`
			```

			`The output should be similar to:`

Add latest changes from gitlab-org/gitlab@master 2020-01-18 03:08:23 +00:00			```plaintext
Move the consul server docs to its own section The consul docs are currently in the HA database section, but it should be in it's own section. 2019-07-08 23:42:35 +00:00			`Node Address Status Type Build Protocol DC`
			`CONSUL_NODE_ONE XXX.XXX.XXX.YYY:8301 alive server 0.9.2 2 gitlab_consul`
			`CONSUL_NODE_TWO XXX.XXX.XXX.YYY:8301 alive server 0.9.2 2 gitlab_consul`
			`CONSUL_NODE_THREE XXX.XXX.XXX.YYY:8301 alive server 0.9.2 2 gitlab_consul`
			```

			If any of the nodes isn't `alive` or if any of the three nodes are missing,
			`check the [Troubleshooting section](#troubleshooting) before proceeding.`

Docs: Merge EE doc/administration/high_availability to CE 2019-05-05 22:58:11 +00:00			`## Operations`

			`### Checking cluster membership`

			`To see which nodes are part of the cluster, run the following on any member in the cluster`
Fix whitespace in many administration docs Many code blocks are 4spaced, and they render in GitLab without coloring as a result, even though they are fenced with a language label. If in a list, other items will render as being in a code block too, even if not meant to. This fixes all these issues for many admin docs in /high_availability and /auth (part 1) 2019-07-12 02:06:46 +00:00
Add latest changes from gitlab-org/gitlab@master 2020-01-18 03:08:23 +00:00			```shell
			`$ /opt/gitlab/embedded/bin/consul members`
Docs: Merge EE doc/administration/high_availability to CE 2019-05-05 22:58:11 +00:00			`Node Address Status Type Build Protocol DC`
			`consul-b XX.XX.X.Y:8301 alive server 0.9.0 2 gitlab_consul`
			`consul-c XX.XX.X.Y:8301 alive server 0.9.0 2 gitlab_consul`
			`consul-c XX.XX.X.Y:8301 alive server 0.9.0 2 gitlab_consul`
			`db-a XX.XX.X.Y:8301 alive client 0.9.0 2 gitlab_consul`
			`db-b XX.XX.X.Y:8301 alive client 0.9.0 2 gitlab_consul`
			```

			Ideally all nodes will have a `Status` of `alive`.

			`### Restarting the server cluster`

			`Note: This section only applies to server agents. It is safe to restart client agents whenever needed.`

Add latest changes from gitlab-org/gitlab@master 2019-09-23 06:06:19 +00:00			`If it is necessary to restart the server cluster, it is important to do this in a controlled fashion in order to maintain quorum. If quorum is lost, you will need to follow the Consul [outage recovery](#outage-recovery) process to recover the cluster.`
Docs: Merge EE doc/administration/high_availability to CE 2019-05-05 22:58:11 +00:00
			`To be safe, we recommend you only restart one server agent at a time to ensure the cluster remains intact.`

Add latest changes from gitlab-org/gitlab@master 2020-01-08 03:08:05 +00:00			`For larger clusters, it is possible to restart multiple agents at a time. See the [Consul consensus document](https://www.consul.io/docs/internals/consensus.html#deployment-table) for how many failures it can tolerate. This will be the number of simultaneous restarts it can sustain.`
Docs: Merge EE doc/administration/high_availability to CE 2019-05-05 22:58:11 +00:00
Add latest changes from gitlab-org/gitlab@master 2019-10-23 09:06:03 +00:00			`## Upgrades for bundled Consul`

			`Nodes running GitLab-bundled Consul should be:`

			`- Members of a healthy cluster prior to upgrading the GitLab Omnibus package.`
			`- Upgraded one node at a time.`

			`NOTE: NOTE:`
			Running `curl http://127.0.0.1:8500/v1/health/state/critical` from any Consul node will identify existing health issues in the cluster. The command will return an empty array if the cluster is healthy.

			`Consul clusters communicate using the raft protocol. If the current leader goes offline, there needs to be a leader election. A leader node must exist to facilitate synchronization across the cluster. If too many nodes go offline at the same time, the cluster will lose quorum and not elect a leader due to [broken consensus](https://www.consul.io/docs/internals/consensus.html).`

			`Consult the [troubleshooting section](#troubleshooting) if the cluster is not able to recover after the upgrade. The [outage recovery](#outage-recovery) may be of particular interest.`

			`NOTE: NOTE:`
			`GitLab only uses Consul to store transient data that is easily regenerated. If the bundled Consul was not used by any process other than GitLab itself, then [rebuilding the cluster from scratch](#recreate-from-scratch) is fine.`

Docs: Merge EE doc/administration/high_availability to CE 2019-05-05 22:58:11 +00:00			`## Troubleshooting`

			`### Consul server agents unable to communicate`

			`By default, the server agents will attempt to [bind](https://www.consul.io/docs/agent/options.html#_bind) to '0.0.0.0', but they will advertise the first private IP address on the node for other agents to communicate with them. If the other nodes cannot communicate with a node on this address, then the cluster will have a failed status.`

			You will see messages like the following in `gitlab-ctl tail consul` output if you are running into this issue:

Add latest changes from gitlab-org/gitlab@master 2020-01-18 03:08:23 +00:00			```plaintext
Docs: Merge EE doc/administration/high_availability to CE 2019-05-05 22:58:11 +00:00			`2017-09-25_19:53:39.90821 2017/09/25 19:53:39 [WARN] raft: no known peers, aborting election`
			`2017-09-25_19:53:41.74356 2017/09/25 19:53:41 [ERR] agent: failed to sync remote state: No cluster leader`
			```

			`To fix this:`

			`1. Pick an address on each node that all of the other nodes can reach this node through.`
			1. Update your `/etc/gitlab/gitlab.rb`

Fix whitespace in many administration docs Many code blocks are 4spaced, and they render in GitLab without coloring as a result, even though they are fenced with a language label. If in a list, other items will render as being in a code block too, even if not meant to. This fixes all these issues for many admin docs in /high_availability and /auth (part 1) 2019-07-12 02:06:46 +00:00			```ruby
			`consul['configuration'] = {`
			`...`
			`bind_addr: 'IP ADDRESS'`
			`}`
			```

Docs: Merge EE doc/administration/high_availability to CE 2019-05-05 22:58:11 +00:00			1. Run `gitlab-ctl reconfigure`

Add latest changes from gitlab-org/gitlab@master 2019-09-23 06:06:19 +00:00			`If you still see the errors, you may have to [erase the Consul database and reinitialize](#recreate-from-scratch) on the affected node.`
Docs: Merge EE doc/administration/high_availability to CE 2019-05-05 22:58:11 +00:00
			`### Consul agents do not start - Multiple private IPs`

			`In the case that a node has multiple private IPs the agent be confused as to which of the private addresses to advertise, and then immediately exit on start.`

			You will see messages like the following in `gitlab-ctl tail consul` output if you are running into this issue:

Add latest changes from gitlab-org/gitlab@master 2020-01-18 03:08:23 +00:00			```plaintext
Docs: Merge EE doc/administration/high_availability to CE 2019-05-05 22:58:11 +00:00			`2017-11-09_17:41:45.52876 ==> Starting Consul agent...`
			`2017-11-09_17:41:45.53057 ==> Error creating agent: Failed to get advertise address: Multiple private IPs found. Please configure one.`
			```

			`To fix this:`

			`1. Pick an address on the node that all of the other nodes can reach this node through.`
			1. Update your `/etc/gitlab/gitlab.rb`

Fix whitespace in many administration docs Many code blocks are 4spaced, and they render in GitLab without coloring as a result, even though they are fenced with a language label. If in a list, other items will render as being in a code block too, even if not meant to. This fixes all these issues for many admin docs in /high_availability and /auth (part 1) 2019-07-12 02:06:46 +00:00			```ruby
			`consul['configuration'] = {`
			`...`
			`bind_addr: 'IP ADDRESS'`
			`}`
			```

Docs: Merge EE doc/administration/high_availability to CE 2019-05-05 22:58:11 +00:00			1. Run `gitlab-ctl reconfigure`

			`### Outage recovery`

Add latest changes from gitlab-org/gitlab@master 2019-10-18 15:06:05 +00:00			`If you lost enough server agents in the cluster to break quorum, then the cluster is considered failed, and it will not function without manual intervention.`
Docs: Merge EE doc/administration/high_availability to CE 2019-05-05 22:58:11 +00:00
			`#### Recreate from scratch`
Fix whitespace in many administration docs Many code blocks are 4spaced, and they render in GitLab without coloring as a result, even though they are fenced with a language label. If in a list, other items will render as being in a code block too, even if not meant to. This fixes all these issues for many admin docs in /high_availability and /auth (part 1) 2019-07-12 02:06:46 +00:00
Add latest changes from gitlab-org/gitlab@master 2019-09-23 06:06:19 +00:00			`By default, GitLab does not store anything in the Consul cluster that cannot be recreated. To erase the Consul database and reinitialize`
Docs: Merge EE doc/administration/high_availability to CE 2019-05-05 22:58:11 +00:00
Add latest changes from gitlab-org/gitlab@master 2020-01-18 03:08:23 +00:00			```shell
			`gitlab-ctl stop consul`
			`rm -rf /var/opt/gitlab/consul/data`
			`gitlab-ctl start consul`
Docs: Merge EE doc/administration/high_availability to CE 2019-05-05 22:58:11 +00:00			```

			`After this, the cluster should start back up, and the server agents rejoin. Shortly after that, the client agents should rejoin as well.`

			`#### Recover a failed cluster`
Fix whitespace in many administration docs Many code blocks are 4spaced, and they render in GitLab without coloring as a result, even though they are fenced with a language label. If in a list, other items will render as being in a code block too, even if not meant to. This fixes all these issues for many admin docs in /high_availability and /auth (part 1) 2019-07-12 02:06:46 +00:00
Add latest changes from gitlab-org/gitlab@master 2019-10-11 03:07:00 +00:00			`If you have taken advantage of Consul to store other data, and want to restore the failed cluster, please follow the [Consul guide](https://learn.hashicorp.com/consul/day-2-operations/outage) to recover a failed cluster.`