2020-06-03 17:08:23 -04:00
---
stage: Enablement
group: Geo
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
type: howto
---
2019-07-08 04:50:38 -04:00
# Geo configuration **(PREMIUM ONLY)**
2019-05-05 12:08:21 -04:00
## Configuring a new **secondary** node
NOTE: **Note:**
This is the final step in setting up a **secondary** Geo node. Stages of the
setup process must be completed in the documented order.
2020-04-01 02:07:50 -04:00
Before attempting the steps in this stage, [complete all prior stages ](index.md#using-omnibus-gitlab ).
2019-05-05 12:08:21 -04:00
The basic steps of configuring a **secondary** node are to:
- Replicate required configurations between the **primary** node and the **secondary** nodes.
- Configure a tracking database on each **secondary** node.
- Start GitLab on each **secondary** node.
You are encouraged to first read through all the steps before executing them
in your testing/production environment.
2019-07-02 21:37:27 -04:00
NOTE: **Note:**
2019-07-24 09:32:15 -04:00
**Do not** set up any custom authentication for the **secondary** nodes. This will be handled by the **primary** node.
2019-07-02 21:37:27 -04:00
Any change that requires access to the **Admin Area** needs to be done in the
**primary** node because the **secondary** node is a read-only replica.
2019-05-05 12:08:21 -04:00
### Step 1. Manually replicate secret GitLab values
GitLab stores a number of secret values in the `/etc/gitlab/gitlab-secrets.json`
file which *must* be the same on all nodes. Until there is
2020-05-21 02:08:25 -04:00
a means of automatically replicating these between nodes (see [issue #3789 ](https://gitlab.com/gitlab-org/gitlab/-/issues/3789 )),
2019-05-05 12:08:21 -04:00
they must be manually replicated to the **secondary** node.
1. SSH into the **primary** node, and execute the command below:
2020-01-30 10:09:15 -05:00
```shell
2019-07-02 21:37:27 -04:00
sudo cat /etc/gitlab/gitlab-secrets.json
```
2019-05-05 12:08:21 -04:00
2019-07-02 21:37:27 -04:00
This will display the secrets that need to be replicated, in JSON format.
2019-05-05 12:08:21 -04:00
1. SSH into the **secondary** node and login as the `root` user:
2020-01-30 10:09:15 -05:00
```shell
2019-07-02 21:37:27 -04:00
sudo -i
```
2019-05-05 12:08:21 -04:00
1. Make a backup of any existing secrets:
2020-01-30 10:09:15 -05:00
```shell
2019-07-02 21:37:27 -04:00
mv /etc/gitlab/gitlab-secrets.json /etc/gitlab/gitlab-secrets.json.`date +%F`
```
2019-05-05 12:08:21 -04:00
1. Copy `/etc/gitlab/gitlab-secrets.json` from the **primary** node to the **secondary** node, or
copy-and-paste the file contents between nodes:
2020-01-30 10:09:15 -05:00
```shell
2019-07-02 21:37:27 -04:00
sudo editor /etc/gitlab/gitlab-secrets.json
2019-05-05 12:08:21 -04:00
2019-07-02 21:37:27 -04:00
# paste the output of the `cat` command you ran on the primary
# save and exit
```
2019-05-05 12:08:21 -04:00
1. Ensure the file permissions are correct:
2020-01-30 10:09:15 -05:00
```shell
2019-07-02 21:37:27 -04:00
chown root:root /etc/gitlab/gitlab-secrets.json
chmod 0600 /etc/gitlab/gitlab-secrets.json
```
2019-05-05 12:08:21 -04:00
1. Reconfigure the **secondary** node for the change to take effect:
2020-01-30 10:09:15 -05:00
```shell
2019-07-02 21:37:27 -04:00
gitlab-ctl reconfigure
gitlab-ctl restart
```
2019-05-05 12:08:21 -04:00
### Step 2. Manually replicate the **primary** node's SSH host keys
GitLab integrates with the system-installed SSH daemon, designating a user
2019-09-23 02:06:19 -04:00
(typically named `git` ) through which all access requests are handled.
2019-05-05 12:08:21 -04:00
2020-04-01 02:07:50 -04:00
In a [Disaster Recovery ](../disaster_recovery/index.md ) situation, GitLab system
2019-05-05 12:08:21 -04:00
administrators will promote a **secondary** node to the **primary** node. DNS records for the
**primary** domain should also be updated to point to the new **primary** node
(previously a **secondary** node). Doing so will avoid the need to update Git remotes and API URLs.
This will cause all SSH requests to the newly promoted **primary** node to
fail due to SSH host key mismatch. To prevent this, the primary SSH host
keys must be manually replicated to the **secondary** node.
1. SSH into the **secondary** node and login as the `root` user:
2020-01-30 10:09:15 -05:00
```shell
2019-07-02 21:37:27 -04:00
sudo -i
```
2019-05-05 12:08:21 -04:00
1. Make a backup of any existing SSH host keys:
2020-01-30 10:09:15 -05:00
```shell
2019-07-02 21:37:27 -04:00
find /etc/ssh -iname ssh_host_* -exec cp {} {}.backup.`date +%F` \;
```
2019-05-05 12:08:21 -04:00
1. Copy OpenSSH host keys from the **primary** node:
2019-07-02 21:37:27 -04:00
If you can access your **primary** node using the **root** user:
2019-05-05 12:08:21 -04:00
2020-01-30 10:09:15 -05:00
```shell
2019-07-02 21:37:27 -04:00
# Run this from the secondary node, change `<primary_node_fqdn>` for the IP or FQDN of the server
scp root@< primary_node_fqdn > :/etc/ssh/ssh_host_*_key* /etc/ssh
```
2019-05-05 12:08:21 -04:00
2020-03-08 20:08:14 -04:00
If you only have access through a user with `sudo` privileges:
2019-05-05 12:08:21 -04:00
2020-01-30 10:09:15 -05:00
```shell
2019-07-02 21:37:27 -04:00
# Run this from your primary node:
sudo tar --transform 's/.*\///g' -zcvf ~/geo-host-key.tar.gz /etc/ssh/ssh_host_*_key*
2019-05-05 12:08:21 -04:00
2019-07-02 21:37:27 -04:00
# Run this from your secondary node:
scp < user_with_sudo > @< primary_node_fqdn > :geo-host-key.tar.gz .
tar zxvf ~/geo-host-key.tar.gz -C /etc/ssh
```
2019-05-05 12:08:21 -04:00
1. On your **secondary** node, ensure the file permissions are correct:
2020-01-30 10:09:15 -05:00
```shell
2019-07-02 21:37:27 -04:00
chown root:root /etc/ssh/ssh_host_*_key*
chmod 0600 /etc/ssh/ssh_host_*_key*
```
2019-05-05 12:08:21 -04:00
1. To verify key fingerprint matches, execute the following command on both nodes:
2020-01-30 10:09:15 -05:00
```shell
2019-07-02 21:37:27 -04:00
for file in /etc/ssh/ssh_host_*_key; do ssh-keygen -lf $file; done
```
2019-05-05 12:08:21 -04:00
2019-07-02 21:37:27 -04:00
You should get an output similar to this one and they should be identical on both nodes:
2019-05-05 12:08:21 -04:00
2020-01-30 10:09:15 -05:00
```shell
2019-07-02 21:37:27 -04:00
1024 SHA256:FEZX2jQa2bcsd/fn/uxBzxhKdx4Imc4raXrHwsbtP0M root@serverhostname (DSA)
256 SHA256:uw98R35Uf+fYEQ/UnJD9Br4NXUFPv7JAUln5uHlgSeY root@serverhostname (ECDSA)
256 SHA256:sqOUWcraZQKd89y/QQv/iynPTOGQxcOTIXU/LsoPmnM root@serverhostname (ED25519)
2048 SHA256:qwa+rgir2Oy86QI+PZi/QVR+MSmrdrpsuH7YyKknC+s root@serverhostname (RSA)
```
2019-05-05 12:08:21 -04:00
1. Verify that you have the correct public keys for the existing private keys:
2020-01-30 10:09:15 -05:00
```shell
2019-07-02 21:37:27 -04:00
# This will print the fingerprint for private keys:
for file in /etc/ssh/ssh_host_*_key; do ssh-keygen -lf $file; done
2019-05-05 12:08:21 -04:00
2019-07-02 21:37:27 -04:00
# This will print the fingerprint for public keys:
for file in /etc/ssh/ssh_host_*_key.pub; do ssh-keygen -lf $file; done
```
2019-05-05 12:08:21 -04:00
2019-07-02 21:37:27 -04:00
NOTE: **Note:**
The output for private keys and public keys command should generate the same fingerprint.
2019-05-05 12:08:21 -04:00
2020-03-08 20:08:14 -04:00
1. Restart `sshd` on your **secondary** node:
2019-05-05 12:08:21 -04:00
2020-01-30 10:09:15 -05:00
```shell
2019-07-02 21:37:27 -04:00
# Debian or Ubuntu installations
sudo service ssh reload
2019-05-05 12:08:21 -04:00
2019-07-02 21:37:27 -04:00
# CentOS installations
sudo service sshd reload
```
2019-05-05 12:08:21 -04:00
### Step 3. Add the **secondary** node
2019-10-14 23:06:19 -04:00
1. SSH into your GitLab **secondary** server and login as root:
2020-01-30 10:09:15 -05:00
```shell
2019-10-14 23:06:19 -04:00
sudo -i
```
2020-02-11 16:08:44 -05:00
1. Edit `/etc/gitlab/gitlab.rb` and add a **unique** name for your node. You will need this in the next steps:
2019-10-14 23:06:19 -04:00
```ruby
# The unique identifier for the Geo node.
gitlab_rails['geo_node_name'] = '< node_name_here > '
```
1. Reconfigure the **secondary** node for the change to take effect:
2020-01-30 10:09:15 -05:00
```shell
2019-10-14 23:06:19 -04:00
gitlab-ctl reconfigure
```
2020-07-29 20:09:53 -04:00
1. Visit the **primary** node's **Admin Area > Geo**
2019-05-05 12:08:21 -04:00
(`/admin/geo/nodes`) in your browser.
2019-10-14 23:06:19 -04:00
1. Click the **New node** button.
2020-08-07 11:10:17 -04:00
![Add secondary node ](img/adding_a_secondary_node_v13_3.png )
2019-10-24 02:07:07 -04:00
1. Fill in **Name** with the `gitlab_rails['geo_node_name']` in
`/etc/gitlab/gitlab.rb` . These values must always match *exactly* , character
for character.
1. Fill in **URL** with the `external_url` in `/etc/gitlab/gitlab.rb` . These
values must always match, but it doesn't matter if one ends with a `/` and
the other doesn't.
2019-05-05 12:08:21 -04:00
1. Optionally, choose which groups or storage shards should be replicated by the
**secondary** node. Leave blank to replicate all. Read more in
[selective synchronization ](#selective-synchronization ).
2019-10-24 02:07:07 -04:00
1. Click the **Add node** button to add the **secondary** node.
2019-05-05 12:08:21 -04:00
1. SSH into your GitLab **secondary** server and restart the services:
2020-01-30 10:09:15 -05:00
```shell
2019-07-02 21:37:27 -04:00
gitlab-ctl restart
```
2019-05-05 12:08:21 -04:00
2019-07-02 21:37:27 -04:00
Check if there are any common issue with your Geo setup by running:
2019-05-05 12:08:21 -04:00
2020-01-30 10:09:15 -05:00
```shell
2019-07-02 21:37:27 -04:00
gitlab-rake gitlab:geo:check
```
2019-05-05 12:08:21 -04:00
1. SSH into your **primary** server and login as root to verify the
**secondary** node is reachable or there are any common issue with your Geo setup:
2020-01-30 10:09:15 -05:00
```shell
2019-07-02 21:37:27 -04:00
gitlab-rake gitlab:geo:check
```
2019-05-05 12:08:21 -04:00
Once added to the admin panel and restarted, the **secondary** node will automatically start
replicating missing data from the **primary** node in a process known as **backfill** .
Meanwhile, the **primary** node will start to notify each **secondary** node of any changes, so
that the **secondary** node can act on those notifications immediately.
Make sure the **secondary** node is running and accessible.
You can login to the **secondary** node with the same credentials as used for the **primary** node.
### Step 4. Enabling Hashed Storage
Using Hashed Storage significantly improves Geo replication. Project and group
renames no longer require synchronization between nodes.
2020-07-28 23:09:51 -04:00
1. Visit the **primary** node's **Admin Area > Settings > Repository**
2019-05-05 12:08:21 -04:00
(`/admin/application_settings/repository`) in your browser.
1. In the **Repository storage** section, check **Use hashed storage paths for newly created and renamed projects** .
### Step 5. (Optional) Configuring the **secondary** node to trust the **primary** node
You can safely skip this step if your **primary** node uses a CA-issued HTTPS certificate.
If your **primary** node is using a self-signed certificate for *HTTPS* support, you will
need to add that certificate to the **secondary** node's trust store. Retrieve the
certificate from the **primary** node and follow
2020-04-01 02:07:50 -04:00
[these instructions ](https://docs.gitlab.com/omnibus/settings/ssl.html )
2019-05-05 12:08:21 -04:00
on the **secondary** node.
### Step 6. Enable Git access over HTTP/HTTPS
Geo synchronizes repositories over HTTP/HTTPS, and therefore requires this clone
2020-07-28 23:09:51 -04:00
method to be enabled. Navigate to **Admin Area > Settings**
2020-02-10 07:08:59 -05:00
(`/admin/application_settings/general`) on the **primary** node, and set
2019-05-05 12:08:21 -04:00
`Enabled Git access protocols` to `Both SSH and HTTP(S)` or `Only HTTP(S)` .
### Step 7. Verify proper functioning of the **secondary** node
Your **secondary** node is now configured!
You can login to the **secondary** node with the same credentials you used for the
2020-07-29 20:09:53 -04:00
**primary** node. Visit the **secondary** node's **Admin Area > Geo**
2019-05-05 12:08:21 -04:00
(`/admin/geo/nodes`) in your browser to check if it's correctly identified as a
**secondary** Geo node and if Geo is enabled.
The initial replication, or 'backfill', will probably still be in progress. You
2020-05-14 11:08:14 -04:00
can monitor the synchronization process on each Geo node from the **primary**
2020-02-25 04:09:10 -05:00
node's **Geo Nodes** dashboard in your browser.
2019-05-05 12:08:21 -04:00
![Geo dashboard ](img/geo_node_dashboard.png )
If your installation isn't working properly, check the
2019-07-24 09:32:15 -04:00
[troubleshooting document ](troubleshooting.md ).
2019-05-05 12:08:21 -04:00
The two most obvious issues that can become apparent in the dashboard are:
1. Database replication not working well.
1. Instance to instance notification not working. In that case, it can be
something of the following:
2019-07-02 21:37:27 -04:00
- You are using a custom certificate or custom CA (see the [troubleshooting document ](troubleshooting.md )).
- The instance is firewalled (check your firewall rules).
2019-05-05 12:08:21 -04:00
Please note that disabling a **secondary** node will stop the synchronization process.
Please note that if `git_data_dirs` is customized on the **primary** node for multiple
repository shards you must duplicate the same configuration on each **secondary** node.
2020-04-01 02:07:50 -04:00
Point your users to the ["Using a Geo Server" guide ](using_a_geo_server.md ).
2019-05-05 12:08:21 -04:00
Currently, this is what is synced:
- Git repositories.
- Wikis.
- LFS objects.
- Issues, merge requests, snippets, and comment attachments.
- Users, groups, and project avatars.
## Selective synchronization
Geo supports selective synchronization, which allows admins to choose
which projects should be synchronized by **secondary** nodes.
A subset of projects can be chosen, either by group or by storage shard. The
former is ideal for replicating data belonging to a subset of users, while the
latter is more suited to progressively rolling out Geo to a large GitLab
instance.
It is important to note that selective synchronization:
1. Does not restrict permissions from **secondary** nodes.
1. Does not hide project metadata from **secondary** nodes.
- Since Geo currently relies on PostgreSQL replication, all project metadata
gets replicated to **secondary** nodes, but repositories that have not been
selected will be empty.
1. Does not reduce the number of events generated for the Geo event log.
- The **primary** node generates events as long as any **secondary** nodes are present.
Selective synchronization restrictions are implemented on the **secondary** nodes,
not the **primary** node.
2020-05-13 02:08:02 -04:00
### Git operations on unreplicated repositories
2020-03-24 20:08:11 -04:00
2020-04-28 08:09:44 -04:00
> [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/2562) in GitLab 12.10 for HTTP(S) and in GitLab 13.0 for SSH.
2020-03-24 20:08:11 -04:00
2020-04-28 08:09:44 -04:00
Git clone, pull, and push operations over HTTP(S) and SSH are supported for repositories that
2020-03-24 20:08:11 -04:00
exist on the **primary** node but not on **secondary** nodes. This situation can occur
when:
- Selective synchronization does not include the project attached to the repository.
- The repository is actively being replicated but has not completed yet.
2019-05-05 12:08:21 -04:00
## Upgrading Geo
See the [updating the Geo nodes document ](updating_the_geo_nodes.md ).
## Troubleshooting
See the [troubleshooting document ](troubleshooting.md ).