2019-07-08 04:50:38 -04:00
|
|
|
# Geo with external PostgreSQL instances **(PREMIUM ONLY)**
|
2019-05-05 12:08:21 -04:00
|
|
|
|
|
|
|
This document is relevant if you are using a PostgreSQL instance that is *not
|
|
|
|
managed by Omnibus*. This includes cloud-managed instances like AWS RDS, or
|
|
|
|
manually installed and configured PostgreSQL instances.
|
|
|
|
|
2019-07-02 21:37:27 -04:00
|
|
|
NOTE: **Note:**
|
2019-05-05 12:08:21 -04:00
|
|
|
We strongly recommend running Omnibus-managed instances as they are actively
|
|
|
|
developed and tested. We aim to be compatible with most external
|
|
|
|
(not managed by Omnibus) databases but we do not guarantee compatibility.
|
|
|
|
|
|
|
|
## **Primary** node
|
|
|
|
|
|
|
|
1. SSH into a GitLab **primary** application server and login as root:
|
|
|
|
|
2020-01-30 10:09:15 -05:00
|
|
|
```shell
|
2019-07-02 21:37:27 -04:00
|
|
|
sudo -i
|
|
|
|
```
|
2019-05-05 12:08:21 -04:00
|
|
|
|
|
|
|
1. Execute the command below to define the node as **primary** node:
|
|
|
|
|
2020-01-30 10:09:15 -05:00
|
|
|
```shell
|
2019-07-02 21:37:27 -04:00
|
|
|
gitlab-ctl set-geo-primary-node
|
|
|
|
```
|
2019-05-05 12:08:21 -04:00
|
|
|
|
2019-07-02 21:37:27 -04:00
|
|
|
This command will use your defined `external_url` in `/etc/gitlab/gitlab.rb`.
|
2019-05-05 12:08:21 -04:00
|
|
|
|
|
|
|
### Configure the external database to be replicated
|
|
|
|
|
|
|
|
To set up an external database, you can either:
|
|
|
|
|
|
|
|
- Set up streaming replication yourself (for example, in AWS RDS).
|
|
|
|
- Perform the Omnibus configuration manually as follows.
|
|
|
|
|
|
|
|
#### Leverage your cloud provider's tools to replicate the primary database
|
|
|
|
|
|
|
|
Given you have a primary node set up on AWS EC2 that uses RDS.
|
|
|
|
You can now just create a read-only replica in a different region and the
|
|
|
|
replication process will be managed by AWS. Make sure you've set Network ACL, Subnet, and
|
|
|
|
Security Group according to your needs, so the secondary application node can access the database.
|
|
|
|
Skip to the [Configure secondary application node](#configure-secondary-application-nodes-to-use-the-external-read-replica) section below.
|
|
|
|
|
|
|
|
#### Manually configure the primary database for replication
|
|
|
|
|
|
|
|
The [geo_primary_role](https://docs.gitlab.com/omnibus/roles/#gitlab-geo-roles)
|
|
|
|
configures the **primary** node's database to be replicated by making changes to
|
|
|
|
`pg_hba.conf` and `postgresql.conf`. Make the following configuration changes
|
2020-04-21 11:21:10 -04:00
|
|
|
manually to your external database configuration and ensure that you restart PostgreSQL
|
|
|
|
afterwards for the changes to take effect:
|
2019-05-05 12:08:21 -04:00
|
|
|
|
2020-01-17 22:08:23 -05:00
|
|
|
```plaintext
|
2019-05-05 12:08:21 -04:00
|
|
|
##
|
|
|
|
## Geo Primary Role
|
|
|
|
## - pg_hba.conf
|
|
|
|
##
|
2020-04-21 11:21:10 -04:00
|
|
|
host all all <trusted primary IP>/32 md5
|
|
|
|
host replication gitlab_replicator <trusted primary IP>/32 md5
|
|
|
|
host all all <trusted secondary IP>/32 md5
|
2019-05-05 12:08:21 -04:00
|
|
|
host replication gitlab_replicator <trusted secondary IP>/32 md5
|
|
|
|
```
|
|
|
|
|
2020-01-17 22:08:23 -05:00
|
|
|
```plaintext
|
2019-05-05 12:08:21 -04:00
|
|
|
##
|
|
|
|
## Geo Primary Role
|
|
|
|
## - postgresql.conf
|
|
|
|
##
|
|
|
|
wal_level = hot_standby
|
|
|
|
max_wal_senders = 10
|
|
|
|
wal_keep_segments = 50
|
|
|
|
max_replication_slots = 1 # number of secondary instances
|
|
|
|
hot_standby = on
|
|
|
|
```
|
|
|
|
|
|
|
|
## **Secondary** nodes
|
|
|
|
|
|
|
|
### Manually configure the replica database
|
|
|
|
|
2020-04-21 11:21:10 -04:00
|
|
|
Make the following configuration changes manually to your `pg_hba.conf` and `postgresql.conf`
|
|
|
|
of your external replica database and ensure that you restart PostgreSQL afterwards
|
|
|
|
for the changes to take effect:
|
|
|
|
|
|
|
|
```plaintext
|
|
|
|
##
|
|
|
|
## Geo Secondary Role
|
|
|
|
## - pg_hba.conf
|
|
|
|
##
|
|
|
|
host all all <trusted secondary IP>/32 md5
|
|
|
|
host replication gitlab_replicator <trusted secondary IP>/32 md5
|
|
|
|
host all all <trusted primary IP>/24 md5
|
|
|
|
```
|
2019-05-05 12:08:21 -04:00
|
|
|
|
2020-01-17 22:08:23 -05:00
|
|
|
```plaintext
|
2019-05-05 12:08:21 -04:00
|
|
|
##
|
|
|
|
## Geo Secondary Role
|
|
|
|
## - postgresql.conf
|
|
|
|
##
|
|
|
|
wal_level = hot_standby
|
|
|
|
max_wal_senders = 10
|
|
|
|
wal_keep_segments = 10
|
|
|
|
hot_standby = on
|
|
|
|
```
|
|
|
|
|
|
|
|
### Configure **secondary** application nodes to use the external read-replica
|
|
|
|
|
|
|
|
With Omnibus, the
|
|
|
|
[geo_secondary_role](https://docs.gitlab.com/omnibus/roles/#gitlab-geo-roles)
|
|
|
|
has three main functions:
|
|
|
|
|
|
|
|
1. Configure the replica database.
|
|
|
|
1. Configure the tracking database.
|
|
|
|
1. Enable the [Geo Log Cursor](index.md#geo-log-cursor) (not covered in this section).
|
|
|
|
|
|
|
|
To configure the connection to the external read-replica database and enable Log Cursor:
|
|
|
|
|
|
|
|
1. SSH into a GitLab **secondary** application server and login as root:
|
|
|
|
|
2020-01-30 10:09:15 -05:00
|
|
|
```shell
|
2019-07-02 21:37:27 -04:00
|
|
|
sudo -i
|
|
|
|
```
|
2019-05-05 12:08:21 -04:00
|
|
|
|
|
|
|
1. Edit `/etc/gitlab/gitlab.rb` and add the following
|
|
|
|
|
2019-07-02 21:37:27 -04:00
|
|
|
```ruby
|
|
|
|
##
|
|
|
|
## Geo Secondary role
|
|
|
|
## - configure dependent flags automatically to enable Geo
|
|
|
|
##
|
|
|
|
roles ['geo_secondary_role']
|
|
|
|
|
|
|
|
# note this is shared between both databases,
|
|
|
|
# make sure you define the same password in both
|
|
|
|
gitlab_rails['db_password'] = '<your_password_here>'
|
2019-05-05 12:08:21 -04:00
|
|
|
|
2019-07-02 21:37:27 -04:00
|
|
|
gitlab_rails['db_username'] = 'gitlab'
|
|
|
|
gitlab_rails['db_host'] = '<database_read_replica_host>'
|
|
|
|
```
|
2019-05-05 12:08:21 -04:00
|
|
|
|
|
|
|
1. Save the file and [reconfigure GitLab](../../restart_gitlab.md#omnibus-gitlab-reconfigure)
|
|
|
|
|
|
|
|
### Configure the tracking database
|
|
|
|
|
|
|
|
**Secondary** nodes use a separate PostgreSQL installation as a tracking
|
|
|
|
database to keep track of replication status and automatically recover from
|
|
|
|
potential replication issues. Omnibus automatically configures a tracking database
|
|
|
|
when `roles ['geo_secondary_role']` is set. For high availability,
|
2020-05-04 14:10:20 -04:00
|
|
|
refer to [Geo High Availability](../../reference_architectures/index.md).
|
2019-05-05 12:08:21 -04:00
|
|
|
If you want to run this database external to Omnibus, please follow the instructions below.
|
|
|
|
|
2019-10-10 23:07:00 -04:00
|
|
|
The tracking database requires an [FDW](https://www.postgresql.org/docs/9.6/postgres-fdw.html)
|
2019-05-05 12:08:21 -04:00
|
|
|
connection with the **secondary** replica database for improved performance.
|
|
|
|
|
|
|
|
If you have an external database ready to be used as the tracking database,
|
|
|
|
follow the instructions below to use it:
|
|
|
|
|
|
|
|
NOTE: **Note:**
|
|
|
|
If you want to use AWS RDS as a tracking database, make sure it has access to
|
|
|
|
the secondary database. Unfortunately, just assigning the same security group is not enough as
|
|
|
|
outbound rules do not apply to RDS PostgreSQL databases. Therefore, you need to explicitly add an inbound
|
|
|
|
rule to the read-replica's security group allowing any TCP traffic from
|
|
|
|
the tracking database on port 5432.
|
|
|
|
|
2020-04-21 11:21:10 -04:00
|
|
|
1. Ensure that your secondary node can communicate with your tracking database by
|
|
|
|
manually changing the `pg_hba.conf` that is associated with your tracking database.
|
|
|
|
Remember to restart PostgreSQL afterwards for the changes to take effect:
|
|
|
|
|
|
|
|
```plaintext
|
|
|
|
##
|
|
|
|
## Geo Tracking Database Role
|
|
|
|
## - pg_hba.conf
|
|
|
|
##
|
|
|
|
host all all <trusted tracking IP>/32 md5
|
|
|
|
host all all <trusted secondary IP>/32 md5
|
|
|
|
```
|
|
|
|
|
2019-05-05 12:08:21 -04:00
|
|
|
1. SSH into a GitLab **secondary** server and login as root:
|
|
|
|
|
2020-01-30 10:09:15 -05:00
|
|
|
```shell
|
2019-07-02 21:37:27 -04:00
|
|
|
sudo -i
|
|
|
|
```
|
2019-05-05 12:08:21 -04:00
|
|
|
|
2020-03-26 23:07:56 -04:00
|
|
|
1. Edit `/etc/gitlab/gitlab.rb` with the connection parameters and credentials for
|
2019-07-02 21:37:27 -04:00
|
|
|
the machine with the PostgreSQL instance:
|
2019-05-05 12:08:21 -04:00
|
|
|
|
2019-07-02 21:37:27 -04:00
|
|
|
```ruby
|
|
|
|
geo_secondary['db_username'] = 'gitlab_geo'
|
|
|
|
geo_secondary['db_password'] = '<your_password_here>'
|
2019-05-05 12:08:21 -04:00
|
|
|
|
2019-07-02 21:37:27 -04:00
|
|
|
geo_secondary['db_host'] = '<tracking_database_host>'
|
|
|
|
geo_secondary['db_port'] = <tracking_database_port> # change to the correct port
|
|
|
|
geo_secondary['db_fdw'] = true # enable FDW
|
|
|
|
geo_postgresql['enable'] = false # don't use internal managed instance
|
|
|
|
```
|
2019-05-05 12:08:21 -04:00
|
|
|
|
|
|
|
1. Save the file and [reconfigure GitLab](../../restart_gitlab.md#omnibus-gitlab-reconfigure)
|
|
|
|
|
|
|
|
1. Run the tracking database migrations:
|
|
|
|
|
2020-01-30 10:09:15 -05:00
|
|
|
```shell
|
2019-07-02 21:37:27 -04:00
|
|
|
gitlab-rake geo:db:create
|
|
|
|
gitlab-rake geo:db:migrate
|
|
|
|
```
|
|
|
|
|
2019-10-10 23:07:00 -04:00
|
|
|
1. Configure the [PostgreSQL FDW](https://www.postgresql.org/docs/9.6/postgres-fdw.html)
|
2019-07-02 21:37:27 -04:00
|
|
|
connection and credentials:
|
|
|
|
|
|
|
|
Save the script below in a file, ex. `/tmp/geo_fdw.sh` and modify the connection
|
2020-03-26 23:07:56 -04:00
|
|
|
parameters to match your environment. Execute it to set up the FDW connection.
|
2019-07-02 21:37:27 -04:00
|
|
|
|
2020-01-30 10:09:15 -05:00
|
|
|
```shell
|
2019-07-02 21:37:27 -04:00
|
|
|
#!/bin/bash
|
|
|
|
|
|
|
|
# Secondary Database connection params:
|
|
|
|
DB_HOST="<public_ip_or_vpc_private_ip>"
|
|
|
|
DB_NAME="gitlabhq_production"
|
|
|
|
DB_USER="gitlab"
|
|
|
|
DB_PASS="<your_password_here>"
|
|
|
|
DB_PORT="5432"
|
|
|
|
|
|
|
|
# Tracking Database connection params:
|
|
|
|
GEO_DB_HOST="<public_ip_or_vpc_private_ip>"
|
|
|
|
GEO_DB_NAME="gitlabhq_geo_production"
|
|
|
|
GEO_DB_USER="gitlab_geo"
|
|
|
|
GEO_DB_PORT="5432"
|
|
|
|
|
|
|
|
query_exec () {
|
2020-03-23 20:09:24 -04:00
|
|
|
gitlab-psql -h $GEO_DB_HOST -U $GEO_DB_USER -d $GEO_DB_NAME -p $GEO_DB_PORT -c "${1}"
|
2019-07-02 21:37:27 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
query_exec "CREATE EXTENSION postgres_fdw;"
|
|
|
|
query_exec "CREATE SERVER gitlab_secondary FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '${DB_HOST}', dbname '${DB_NAME}', port '${DB_PORT}');"
|
|
|
|
query_exec "CREATE USER MAPPING FOR ${GEO_DB_USER} SERVER gitlab_secondary OPTIONS (user '${DB_USER}', password '${DB_PASS}');"
|
|
|
|
query_exec "CREATE SCHEMA gitlab_secondary;"
|
|
|
|
query_exec "GRANT USAGE ON FOREIGN SERVER gitlab_secondary TO ${GEO_DB_USER};"
|
|
|
|
```
|
|
|
|
|
|
|
|
NOTE: **Note:** The script template above uses `gitlab-psql` as it's intended to be executed from the Geo machine,
|
|
|
|
but you can change it to `psql` and run it from any machine that has access to the database. We also recommend using
|
|
|
|
`psql` for AWS RDS.
|
2019-05-05 12:08:21 -04:00
|
|
|
|
|
|
|
1. Save the file and [restart GitLab](../../restart_gitlab.md#omnibus-gitlab-restart)
|
|
|
|
1. Populate the FDW tables:
|
|
|
|
|
2020-01-30 10:09:15 -05:00
|
|
|
```shell
|
2019-07-02 21:37:27 -04:00
|
|
|
gitlab-rake geo:db:refresh_foreign_tables
|
|
|
|
```
|