258 lines
9.8 KiB
Markdown
258 lines
9.8 KiB
Markdown
---
|
|
stage: Data Stores
|
|
group: Database
|
|
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
|
|
---
|
|
|
|
# Database table partitioning
|
|
|
|
Table partitioning is a powerful database feature that allows a table's
|
|
data to be split into smaller physical tables that act as a single large
|
|
table. If the application is designed to work with partitioning in mind,
|
|
there can be multiple benefits, such as:
|
|
|
|
- Query performance can be improved greatly, because the database can
|
|
cheaply eliminate much of the data from the search space, while still
|
|
providing full SQL capabilities.
|
|
|
|
- Bulk deletes can be achieved with minimal impact on the database by
|
|
dropping entire partitions. This is a natural fit for features that need
|
|
to periodically delete data that falls outside the retention window.
|
|
|
|
- Administrative tasks like `VACUUM` and index rebuilds can operate on
|
|
individual partitions, rather than across a single massive table.
|
|
|
|
Unfortunately, not all models fit a partitioning scheme, and there are
|
|
significant drawbacks if implemented incorrectly. Additionally, tables
|
|
can only be partitioned at their creation, making it nontrivial to apply
|
|
partitioning to a busy database. A suite of migration tools are available
|
|
to enable backend developers to partition existing tables, but the
|
|
migration process is rather heavy, taking multiple steps split across
|
|
several releases. Due to the limitations of partitioning and the related
|
|
migrations, you should understand how partitioning fits your use case
|
|
before attempting to leverage this feature.
|
|
|
|
## Determining when to use partitioning
|
|
|
|
While partitioning can be very useful when properly applied, it's
|
|
imperative to identify if the data and workload of a table naturally fit a
|
|
partitioning scheme. There are a few details you have to understand
|
|
to decide if partitioning is a good fit for your particular
|
|
problem.
|
|
|
|
First, a table is partitioned on a partition key, which is a column or
|
|
set of columns which determine how the data is split across the
|
|
partitions. The partition key is used by the database when reading or
|
|
writing data, to decide which partitions must be accessed. The
|
|
partition key should be a column that would be included in a `WHERE`
|
|
clause on almost all queries accessing that table.
|
|
|
|
Second, it's necessary to understand the strategy the database uses
|
|
to split the data across the partitions. The scheme supported by the
|
|
GitLab migration helpers is date-range partitioning, where each partition
|
|
in the table contains data for a single month. In this case, the partitioning
|
|
key must be a timestamp or date column. In order for this type of
|
|
partitioning to work well, most queries must access data in a
|
|
certain date range.
|
|
|
|
For a more concrete example, the `audit_events` table can be used, which
|
|
was the first table to be partitioned in the application database
|
|
(scheduled for deployment with the GitLab 13.5 release). This
|
|
table tracks audit entries of security events that happen in the
|
|
application. In almost all cases, users want to see audit activity that
|
|
occurs in a certain time frame. As a result, date-range partitioning
|
|
was a natural fit for how the data would be accessed.
|
|
|
|
To look at this in more detail, imagine a simplified `audit_events` schema:
|
|
|
|
```sql
|
|
CREATE TABLE audit_events (
|
|
id SERIAL NOT NULL PRIMARY KEY,
|
|
author_id INT NOT NULL,
|
|
details jsonb NOT NULL,
|
|
created_at timestamptz NOT NULL);
|
|
```
|
|
|
|
Now imagine typical queries in the UI would display the data in a
|
|
certain date range, like a single week:
|
|
|
|
```sql
|
|
SELECT *
|
|
FROM audit_events
|
|
WHERE created_at >= '2020-01-01 00:00:00'
|
|
AND created_at < '2020-01-08 00:00:00'
|
|
ORDER BY created_at DESC
|
|
LIMIT 100
|
|
```
|
|
|
|
If the table is partitioned on the `created_at` column the base table would
|
|
look like:
|
|
|
|
```sql
|
|
CREATE TABLE audit_events (
|
|
id SERIAL NOT NULL,
|
|
author_id INT NOT NULL,
|
|
details jsonb NOT NULL,
|
|
created_at timestamptz NOT NULL,
|
|
PRIMARY KEY (id, created_at))
|
|
PARTITION BY RANGE(created_at);
|
|
```
|
|
|
|
NOTE:
|
|
The primary key of a partitioned table must include the partition key as
|
|
part of the primary key definition.
|
|
|
|
And we might have a list of partitions for the table, such as:
|
|
|
|
```sql
|
|
audit_events_202001 FOR VALUES FROM ('2020-01-01') TO ('2020-02-01')
|
|
audit_events_202002 FOR VALUES FROM ('2020-02-01') TO ('2020-03-01')
|
|
audit_events_202003 FOR VALUES FROM ('2020-03-01') TO ('2020-04-01')
|
|
```
|
|
|
|
Each partition is a separate physical table, with the same structure as
|
|
the base `audit_events` table, but contains only data for rows where the
|
|
partition key falls in the specified range. For example, the partition
|
|
`audit_events_202001` contains rows where the `created_at` column is
|
|
greater than or equal to `2020-01-01` and less than `2020-02-01`.
|
|
|
|
Now, if we look at the previous example query again, the database can
|
|
use the `WHERE` to recognize that all matching rows are in the
|
|
`audit_events_202001` partition. Rather than searching all of the data
|
|
in all of the partitions, it can search only the single month's worth
|
|
of data in the appropriate partition. In a large table, this can
|
|
dramatically reduce the amount of data the database needs to access.
|
|
However, imagine a query that does not filter based on the partitioning
|
|
key, such as:
|
|
|
|
```sql
|
|
SELECT *
|
|
FROM audit_events
|
|
WHERE author_id = 123
|
|
ORDER BY created_at DESC
|
|
LIMIT 100
|
|
```
|
|
|
|
In this example, the database can't prune any partitions from the search,
|
|
because matching data could exist in any of them. As a result, it has to
|
|
query each partition individually, and aggregate the rows into a single result
|
|
set. Because `author_id` would be indexed, the performance impact could
|
|
likely be acceptable, but on more complex queries the overhead can be
|
|
substantial. Partitioning should only be leveraged if the access patterns
|
|
of the data support the partitioning strategy, otherwise performance
|
|
suffers.
|
|
|
|
## Partitioning a table
|
|
|
|
Unfortunately, tables can only be partitioned at their creation, making
|
|
it nontrivial to apply to a busy database. A suite of migration
|
|
tools have been developed to enable backend developers to partition
|
|
existing tables. This migration process takes multiple steps which must
|
|
be split across several releases.
|
|
|
|
### Caveats
|
|
|
|
The partitioning migration helpers work by creating a partitioned duplicate
|
|
of the original table and using a combination of a trigger and a background
|
|
migration to copy data into the new table. Changes to the original table
|
|
schema can be made in parallel with the partitioning migration, but they
|
|
must take care to not break the underlying mechanism that makes the migration
|
|
work. For example, if a column is added to the table that is being
|
|
partitioned, both the partitioned table and the trigger definition must
|
|
be updated to match.
|
|
|
|
### Step 1: Creating the partitioned copy (Release N)
|
|
|
|
The first step is to add a migration to create the partitioned copy of
|
|
the original table. This migration creates the appropriate
|
|
partitions based on the data in the original table, and install a
|
|
trigger that syncs writes from the original table into the
|
|
partitioned copy.
|
|
|
|
An example migration of partitioning the `audit_events` table by its
|
|
`created_at` column would look like:
|
|
|
|
```ruby
|
|
class PartitionAuditEvents < Gitlab::Database::Migration[1.0]
|
|
include Gitlab::Database::PartitioningMigrationHelpers
|
|
|
|
def up
|
|
partition_table_by_date :audit_events, :created_at
|
|
end
|
|
|
|
def down
|
|
drop_partitioned_table_for :audit_events
|
|
end
|
|
end
|
|
```
|
|
|
|
After this has executed, any inserts, updates, or deletes in the
|
|
original table are also duplicated in the new table. For updates and
|
|
deletes, the operation only has an effect if the corresponding row
|
|
exists in the partitioned table.
|
|
|
|
### Step 2: Backfill the partitioned copy (Release N)
|
|
|
|
The second step is to add a post-deployment migration that schedules
|
|
the background jobs that backfill existing data from the original table
|
|
into the partitioned copy.
|
|
|
|
Continuing the above example, the migration would look like:
|
|
|
|
```ruby
|
|
class BackfillPartitionAuditEvents < Gitlab::Database::Migration[1.0]
|
|
include Gitlab::Database::PartitioningMigrationHelpers
|
|
|
|
def up
|
|
enqueue_partitioning_data_migration :audit_events
|
|
end
|
|
|
|
def down
|
|
cleanup_partitioning_data_migration :audit_events
|
|
end
|
|
end
|
|
```
|
|
|
|
This step uses the same mechanism as any background migration, so you
|
|
may want to read the [Background Migration](background_migrations.md)
|
|
guide for details on that process. Background jobs are scheduled every
|
|
2 minutes and copy `50_000` records at a time, which can be used to
|
|
estimate the timing of the background migration portion of the
|
|
partitioning migration.
|
|
|
|
### Step 3: Post-backfill cleanup (Release N+1)
|
|
|
|
The third step must occur at least one release after the release that
|
|
includes the background migration. This gives time for the background
|
|
migration to execute properly in self-managed installations. In this step,
|
|
add another post-deployment migration that cleans up after the
|
|
background migration. This includes forcing any remaining jobs to
|
|
execute, and copying data that may have been missed, due to dropped or
|
|
failed jobs.
|
|
|
|
Once again, continuing the example, this migration would look like:
|
|
|
|
```ruby
|
|
class CleanupPartitionedAuditEventsBackfill < Gitlab::Database::Migration[1.0]
|
|
include Gitlab::Database::PartitioningMigrationHelpers
|
|
|
|
def up
|
|
finalize_backfilling_partitioned_table :audit_events
|
|
end
|
|
|
|
def down
|
|
# no op
|
|
end
|
|
end
|
|
```
|
|
|
|
After this migration has completed, the original table and partitioned
|
|
table should contain identical data. The trigger installed on the
|
|
original table guarantees that the data remains in sync going forward.
|
|
|
|
### Step 4: Swap the partitioned and non-partitioned tables (Release N+1)
|
|
|
|
The final step of the migration makes the partitioned table ready
|
|
for use by the application. This section will be updated when the
|
|
migration helper is ready, for now development can be followed in the
|
|
[Tracking Issue](https://gitlab.com/gitlab-org/gitlab/-/issues/241267).
|