1029 lines
35 KiB
Markdown
1029 lines
35 KiB
Markdown
---
|
||
stage: none
|
||
group: unassigned
|
||
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
|
||
---
|
||
|
||
# Migration Style Guide
|
||
|
||
When writing migrations for GitLab, you have to take into account that
|
||
these are run by hundreds of thousands of organizations of all sizes, some with
|
||
many years of data in their database.
|
||
|
||
In addition, having to take a server offline for an upgrade small or big is a
|
||
big burden for most organizations. For this reason, it is important that your
|
||
migrations are written carefully, can be applied online, and adhere to the style
|
||
guide below.
|
||
|
||
Migrations are **not** allowed to require GitLab installations to be taken
|
||
offline unless _absolutely necessary_.
|
||
|
||
When downtime is necessary the migration has to be approved by:
|
||
|
||
1. The VP of Engineering
|
||
1. A Backend Maintainer
|
||
1. A Database Maintainer
|
||
|
||
An up-to-date list of people holding these titles can be found at
|
||
<https://about.gitlab.com/company/team/>.
|
||
|
||
When writing your migrations, also consider that databases might have stale data
|
||
or inconsistencies and guard for that. Try to make as few assumptions as
|
||
possible about the state of the database.
|
||
|
||
Please don't depend on GitLab-specific code since it can change in future
|
||
versions. If needed copy-paste GitLab code into the migration to make it forward
|
||
compatible.
|
||
|
||
For GitLab.com, please take into consideration that regular migrations (under `db/migrate`)
|
||
are run before [Canary is deployed](https://gitlab.com/gitlab-com/gl-infra/readiness/-/tree/master/library/canary/#configuration-and-deployment),
|
||
and post-deployment migrations (`db/post_migrate`) are run after the deployment to production has finished.
|
||
|
||
## Schema Changes
|
||
|
||
Changes to the schema should be committed to `db/structure.sql`. This
|
||
file is automatically generated by Rails, so you normally should not
|
||
edit this file by hand. If your migration is adding a column to a
|
||
table, that column is added at the bottom. Please do not reorder
|
||
columns manually for existing tables as this causes confusion to
|
||
other people using `db/structure.sql` generated by Rails.
|
||
|
||
When your local database in your GDK is diverging from the schema from
|
||
`master` it might be hard to cleanly commit the schema changes to
|
||
Git. In that case you can use the `scripts/regenerate-schema` script to
|
||
regenerate a clean `db/structure.sql` for the migrations you're
|
||
adding. This script applies all migrations found in `db/migrate`
|
||
or `db/post_migrate`, so if there are any migrations you don't want to
|
||
commit to the schema, rename or remove them. If your branch is not
|
||
targeting `master` you can set the `TARGET` environment variable.
|
||
|
||
```shell
|
||
# Regenerate schema against `master`
|
||
scripts/regenerate-schema
|
||
|
||
# Regenerate schema against `12-9-stable-ee`
|
||
TARGET=12-9-stable-ee scripts/regenerate-schema
|
||
```
|
||
|
||
## What Requires Downtime?
|
||
|
||
The document ["What Requires Downtime?"](what_requires_downtime.md) specifies
|
||
various database operations, such as
|
||
|
||
- [dropping and renaming columns](what_requires_downtime.md#dropping-columns)
|
||
- [changing column constraints and types](what_requires_downtime.md#changing-column-constraints)
|
||
- [adding and dropping indexes, tables, and foreign keys](what_requires_downtime.md#adding-indexes)
|
||
|
||
and whether they require downtime and how to work around that whenever possible.
|
||
|
||
## Downtime Tagging
|
||
|
||
Every migration must specify if it requires downtime or not, and if it should
|
||
require downtime it must also specify a reason for this. This is required even
|
||
if 99% of the migrations don't require downtime as this makes it easier to find
|
||
the migrations that _do_ require downtime.
|
||
|
||
To tag a migration, add the following two constants to the migration class'
|
||
body:
|
||
|
||
- `DOWNTIME`: a boolean that when set to `true` indicates the migration requires
|
||
downtime.
|
||
- `DOWNTIME_REASON`: a String containing the reason for the migration requiring
|
||
downtime. This constant **must** be set when `DOWNTIME` is set to `true`.
|
||
|
||
For example:
|
||
|
||
```ruby
|
||
class MyMigration < ActiveRecord::Migration[6.0]
|
||
DOWNTIME = true
|
||
DOWNTIME_REASON = 'This migration requires downtime because ...'
|
||
|
||
def change
|
||
...
|
||
end
|
||
end
|
||
```
|
||
|
||
It is an error (that is, CI fails) if the `DOWNTIME` constant is missing
|
||
from a migration class.
|
||
|
||
## Reversibility
|
||
|
||
Your migration **must be** reversible. This is very important, as it should
|
||
be possible to downgrade in case of a vulnerability or bugs.
|
||
|
||
In your migration, add a comment describing how the reversibility of the
|
||
migration was tested.
|
||
|
||
Some migrations cannot be reversed. For example, some data migrations can't be
|
||
reversed because we lose information about the state of the database before the migration.
|
||
You should still create a `down` method with a comment, explaining why
|
||
the changes performed by the `up` method can't be reversed, so that the
|
||
migration itself can be reversed, even if the changes performed during the migration
|
||
can't be reversed:
|
||
|
||
```ruby
|
||
def down
|
||
# no-op
|
||
|
||
# comment explaining why changes performed by `up` cannot be reversed.
|
||
end
|
||
```
|
||
|
||
## Atomicity
|
||
|
||
By default, migrations are single transaction. That is, a transaction is opened
|
||
at the beginning of the migration, and committed after all steps are processed.
|
||
|
||
Running migrations in a single transaction makes sure that if one of the steps fails,
|
||
none of the steps are executed, leaving the database in valid state.
|
||
Therefore, either:
|
||
|
||
- Put all migrations in one single-transaction migration.
|
||
- If necessary, put most actions in one migration and create a separate migration
|
||
for the steps that cannot be done in a single transaction.
|
||
|
||
For example, if you create an empty table and need to build an index for it,
|
||
it is recommended to use a regular single-transaction migration and the default
|
||
rails schema statement: [`add_index`](https://api.rubyonrails.org/v5.2/classes/ActiveRecord/ConnectionAdapters/SchemaStatements.html#method-i-add_index).
|
||
This is a blocking operation, but it doesn't cause problems because the table is not yet used,
|
||
and therefore it does not have any records yet.
|
||
|
||
## Heavy operations in a single transaction
|
||
|
||
When using a single-transaction migration, a transaction holds a database connection
|
||
for the duration of the migration, so you must make sure the actions in the migration
|
||
do not take too much time: GitLab.com’s production database has a `15s` timeout, so
|
||
in general, the cumulative execution time in a migration should aim to fit comfortably
|
||
in that limit. Singular query timings should fit within the [standard limit](query_performance.md#timing-guidelines-for-queries)
|
||
|
||
In case you need to insert, update, or delete a significant amount of data, you:
|
||
|
||
- Must disable the single transaction with `disable_ddl_transaction!`.
|
||
- Should consider doing it in a [Background Migration](background_migrations.md).
|
||
|
||
## Retry mechanism when acquiring database locks
|
||
|
||
When changing the database schema, we use helper methods to invoke DDL (Data Definition
|
||
Language) statements. In some cases, these DDL statements require a specific database lock.
|
||
|
||
Example:
|
||
|
||
```ruby
|
||
def change
|
||
remove_column :users, :full_name, :string
|
||
end
|
||
```
|
||
|
||
Executing this migration requires an exclusive lock on the `users` table. When the table
|
||
is concurrently accessed and modified by other processes, acquiring the lock may take
|
||
a while. The lock request is waiting in a queue and it may also block other queries
|
||
on the `users` table once it has been enqueued.
|
||
|
||
More information about PostgresSQL locks: [Explicit Locking](https://www.postgresql.org/docs/current/explicit-locking.html)
|
||
|
||
For stability reasons, GitLab.com has a specific [`statement_timeout`](../user/gitlab_com/index.md#postgresql)
|
||
set. When the migration is invoked, any database query has
|
||
a fixed time to execute. In a worst-case scenario, the request sits in the
|
||
lock queue, blocking other queries for the duration of the configured statement timeout,
|
||
then failing with `canceling statement due to statement timeout` error.
|
||
|
||
This problem could cause failed application upgrade processes and even application
|
||
stability issues, since the table may be inaccessible for a short period of time.
|
||
|
||
To increase the reliability and stability of database migrations, the GitLab codebase
|
||
offers a helper method to retry the operations with different `lock_timeout` settings
|
||
and wait time between the attempts. Multiple smaller attempts to acquire the necessary
|
||
lock allow the database to process other statements.
|
||
|
||
### Examples
|
||
|
||
**Removing a column:**
|
||
|
||
```ruby
|
||
include Gitlab::Database::MigrationHelpers
|
||
|
||
def up
|
||
with_lock_retries do
|
||
remove_column :users, :full_name
|
||
end
|
||
end
|
||
|
||
def down
|
||
with_lock_retries do
|
||
add_column :users, :full_name, :string
|
||
end
|
||
end
|
||
```
|
||
|
||
**Removing a foreign key:**
|
||
|
||
```ruby
|
||
include Gitlab::Database::MigrationHelpers
|
||
|
||
def up
|
||
with_lock_retries do
|
||
remove_foreign_key :issues, :projects
|
||
end
|
||
end
|
||
|
||
def down
|
||
with_lock_retries do
|
||
add_foreign_key :issues, :projects
|
||
end
|
||
end
|
||
```
|
||
|
||
**Changing default value for a column:**
|
||
|
||
```ruby
|
||
include Gitlab::Database::MigrationHelpers
|
||
|
||
def up
|
||
with_lock_retries do
|
||
change_column_default :merge_requests, :lock_version, from: nil, to: 0
|
||
end
|
||
end
|
||
|
||
def down
|
||
with_lock_retries do
|
||
change_column_default :merge_requests, :lock_version, from: 0, to: nil
|
||
end
|
||
end
|
||
```
|
||
|
||
**Creating a new table with a foreign key:**
|
||
|
||
We can simply wrap the `create_table` method with `with_lock_retries`:
|
||
|
||
```ruby
|
||
def up
|
||
with_lock_retries do
|
||
create_table :issues do |t|
|
||
t.references :project, index: true, null: false, foreign_key: { on_delete: :cascade }
|
||
t.string :title, limit: 255
|
||
end
|
||
end
|
||
end
|
||
|
||
def down
|
||
with_lock_retries do
|
||
drop_table :issues
|
||
end
|
||
end
|
||
```
|
||
|
||
**Creating a new table when we have two foreign keys:**
|
||
|
||
For this, we need three migrations:
|
||
|
||
1. Creating the table without foreign keys (with the indices).
|
||
1. Add foreign key to the first table.
|
||
1. Add foreign key to the second table.
|
||
|
||
Creating the table:
|
||
|
||
```ruby
|
||
def up
|
||
create_table :imports do |t|
|
||
t.bigint :project_id, null: false
|
||
t.bigint :user_id, null: false
|
||
t.string :jid, limit: 255
|
||
end
|
||
|
||
add_index :imports, :project_id
|
||
add_index :imports, :user_id
|
||
end
|
||
|
||
def down
|
||
drop_table :imports
|
||
end
|
||
```
|
||
|
||
Adding foreign key to `projects`:
|
||
|
||
We can use the `add_concurrenct_foreign_key` method in this case, as this helper method
|
||
has the lock retries built into it.
|
||
|
||
```ruby
|
||
include Gitlab::Database::MigrationHelpers
|
||
|
||
disable_ddl_transaction!
|
||
|
||
def up
|
||
add_concurrent_foreign_key :imports, :projects, column: :project_id, on_delete: :cascade
|
||
end
|
||
|
||
def down
|
||
with_lock_retries do
|
||
remove_foreign_key :imports, column: :project_id
|
||
end
|
||
end
|
||
```
|
||
|
||
Adding foreign key to `users`:
|
||
|
||
```ruby
|
||
include Gitlab::Database::MigrationHelpers
|
||
|
||
disable_ddl_transaction!
|
||
|
||
def up
|
||
add_concurrent_foreign_key :imports, :users, column: :user_id, on_delete: :cascade
|
||
end
|
||
|
||
def down
|
||
with_lock_retries do
|
||
remove_foreign_key :imports, column: :user_id
|
||
end
|
||
end
|
||
```
|
||
|
||
**Usage with `disable_ddl_transaction!`**
|
||
|
||
Generally the `with_lock_retries` helper should work with `disable_ddl_transaction!`. A custom RuboCop rule ensures that only allowed methods can be placed within the lock retries block.
|
||
|
||
```ruby
|
||
disable_ddl_transaction!
|
||
|
||
def up
|
||
with_lock_retries do
|
||
add_column :users, :name, :text
|
||
end
|
||
|
||
add_text_limit :users, :name, 255 # Includes constraint validation (full table scan)
|
||
end
|
||
```
|
||
|
||
The RuboCop rule generally allows standard Rails migration methods, listed below. This example causes a Rubocop offense:
|
||
|
||
```ruby
|
||
disable_ddl_transaction!
|
||
|
||
def up
|
||
with_lock_retries do
|
||
add_concurrent_index :users, :name
|
||
end
|
||
end
|
||
```
|
||
|
||
### When to use the helper method
|
||
|
||
The `with_lock_retries` helper method can be used when you normally use
|
||
standard Rails migration helper methods. Calling more than one migration
|
||
helper is not a problem if they're executed on the same table.
|
||
|
||
Using the `with_lock_retries` helper method is advised when a database
|
||
migration involves one of the [high-traffic tables](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L3).
|
||
|
||
Example changes:
|
||
|
||
- `add_foreign_key` / `remove_foreign_key`
|
||
- `add_column` / `remove_column`
|
||
- `change_column_default`
|
||
- `create_table` / `drop_table`
|
||
|
||
The `with_lock_retries` method **cannot** be used within the `change` method, you must manually define the `up` and `down` methods to make the migration reversible.
|
||
|
||
### How the helper method works
|
||
|
||
1. Iterate 50 times.
|
||
1. For each iteration, set a pre-configured `lock_timeout`.
|
||
1. Try to execute the given block. (`remove_column`).
|
||
1. If `LockWaitTimeout` error is raised, sleep for the pre-configured `sleep_time`
|
||
and retry the block.
|
||
1. If no error is raised, the current iteration has successfully executed the block.
|
||
|
||
For more information check the [`Gitlab::Database::WithLockRetries`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/with_lock_retries.rb) class. The `with_lock_retries` helper method is implemented in the [`Gitlab::Database::MigrationHelpers`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/migration_helpers.rb) module.
|
||
|
||
In a worst-case scenario, the method:
|
||
|
||
- Executes the block for a maximum of 50 times over 40 minutes.
|
||
- Most of the time is spent in a pre-configured sleep period after each iteration.
|
||
- After the 50th retry, the block is executed without `lock_timeout`, just
|
||
like a standard migration invocation.
|
||
- If a lock cannot be acquired, the migration fails with `statement timeout` error.
|
||
|
||
The migration might fail if there is a very long running transaction (40+ minutes)
|
||
accessing the `users` table.
|
||
|
||
## Multi-Threading
|
||
|
||
Sometimes a migration might need to use multiple Ruby threads to speed up a
|
||
migration. For this to work your migration needs to include the module
|
||
`Gitlab::Database::MultiThreadedMigration`:
|
||
|
||
```ruby
|
||
class MyMigration < ActiveRecord::Migration[6.0]
|
||
include Gitlab::Database::MigrationHelpers
|
||
include Gitlab::Database::MultiThreadedMigration
|
||
end
|
||
```
|
||
|
||
You can then use the method `with_multiple_threads` to perform work in separate
|
||
threads. For example:
|
||
|
||
```ruby
|
||
class MyMigration < ActiveRecord::Migration[6.0]
|
||
include Gitlab::Database::MigrationHelpers
|
||
include Gitlab::Database::MultiThreadedMigration
|
||
|
||
def up
|
||
with_multiple_threads(4) do
|
||
disable_statement_timeout
|
||
|
||
# ...
|
||
end
|
||
end
|
||
end
|
||
```
|
||
|
||
Here the call to `disable_statement_timeout` uses the connection local to
|
||
the `with_multiple_threads` block, instead of re-using the global connection
|
||
pool. This ensures each thread has its own connection object, and doesn't time
|
||
out when trying to obtain one.
|
||
|
||
PostgreSQL has a maximum amount of connections that it allows. This
|
||
limit can vary from installation to installation. As a result, it's recommended
|
||
you do not use more than 32 threads in a single migration. Usually, 4-8 threads
|
||
should be more than enough.
|
||
|
||
## Removing indexes
|
||
|
||
If the table is not empty when removing an index, make sure to use the method
|
||
`remove_concurrent_index` instead of the regular `remove_index` method.
|
||
The `remove_concurrent_index` method drops indexes concurrently, so no locking is required,
|
||
and there is no need for downtime. To use this method, you must disable single-transaction mode
|
||
by calling the method `disable_ddl_transaction!` in the body of your migration
|
||
class like so:
|
||
|
||
```ruby
|
||
class MyMigration < ActiveRecord::Migration[6.0]
|
||
include Gitlab::Database::MigrationHelpers
|
||
disable_ddl_transaction!
|
||
|
||
INDEX_NAME = 'index_name'
|
||
def up
|
||
remove_concurrent_index :table_name, :column_name, name: INDEX_NAME
|
||
end
|
||
end
|
||
```
|
||
|
||
Note that it is not necessary to check if the index exists prior to
|
||
removing it, however it is required to specify the name of the
|
||
index that is being removed. This can be done either by passing the name
|
||
as an option to the appropriate form of `remove_index` or `remove_concurrent_index`,
|
||
or more simply by using the `remove_concurrent_index_by_name` method. Explicitly
|
||
specifying the name is important to ensure the correct index is removed.
|
||
|
||
For a small table (such as an empty one or one with less than `1,000` records),
|
||
it is recommended to use `remove_index` in a single-transaction migration,
|
||
combining it with other operations that don't require `disable_ddl_transaction!`.
|
||
|
||
### Disabling an index
|
||
|
||
There are certain situations in which you might want to disable an index before removing it.
|
||
See the [maintenance operations guide](database/maintenance_operations.md#disabling-an-index)
|
||
for more details.
|
||
|
||
## Adding indexes
|
||
|
||
Before adding an index, consider if this one is necessary. There are situations in which an index
|
||
might not be required, like:
|
||
|
||
- The table is small (less than `1,000` records) and it's not expected to exponentially grow in size.
|
||
- Any existing indexes filter out enough rows.
|
||
- The reduction in query timings after the index is added is not significant.
|
||
|
||
Additionally, wide indexes are not required to match all filter criteria of queries, we just need
|
||
to cover enough columns so that the index lookup has a small enough selectivity. Please review our
|
||
[Adding Database indexes](adding_database_indexes.md) guide for more details.
|
||
|
||
When adding an index to a non-empty table make sure to use the method
|
||
`add_concurrent_index` instead of the regular `add_index` method.
|
||
The `add_concurrent_index` method automatically creates concurrent indexes
|
||
when using PostgreSQL, removing the need for downtime.
|
||
|
||
To use this method, you must disable single-transactions mode
|
||
by calling the method `disable_ddl_transaction!` in the body of your migration
|
||
class like so:
|
||
|
||
```ruby
|
||
class MyMigration < ActiveRecord::Migration[6.0]
|
||
include Gitlab::Database::MigrationHelpers
|
||
|
||
DOWNTIME = false
|
||
|
||
disable_ddl_transaction!
|
||
|
||
INDEX_NAME = 'index_name'
|
||
|
||
def up
|
||
add_concurrent_index :table, :column, name: INDEX_NAME
|
||
end
|
||
|
||
def down
|
||
remove_concurrent_index :table, :column, name: INDEX_NAME
|
||
end
|
||
end
|
||
```
|
||
|
||
You must explicitly name indexes that are created with more complex
|
||
definitions beyond table name, column name(s) and uniqueness constraint.
|
||
Consult the [Adding Database Indexes](adding_database_indexes.md#requirements-for-naming-indexes)
|
||
guide for more details.
|
||
|
||
If you need to add a unique index, please keep in mind there is the possibility
|
||
of existing duplicates being present in the database. This means that should
|
||
always _first_ add a migration that removes any duplicates, before adding the
|
||
unique index.
|
||
|
||
For a small table (such as an empty one or one with less than `1,000` records),
|
||
it is recommended to use `add_index` in a single-transaction migration, combining it with other
|
||
operations that don't require `disable_ddl_transaction!`.
|
||
|
||
## Testing for existence of indexes
|
||
|
||
If a migration requires conditional logic based on the absence or
|
||
presence of an index, you must test for existence of that index using
|
||
its name. This helps avoids problems with how Rails compares index definitions,
|
||
which can lead to unexpected results. For more details, review the
|
||
[Adding Database Indexes](adding_database_indexes.md#why-explicit-names-are-required)
|
||
guide.
|
||
|
||
The easiest way to test for existence of an index by name is to use the
|
||
`index_name_exists?` method, but the `index_exists?` method can also
|
||
be used with a name option. For example:
|
||
|
||
```ruby
|
||
class MyMigration < ActiveRecord::Migration[6.0]
|
||
include Gitlab::Database::MigrationHelpers
|
||
|
||
INDEX_NAME = 'index_name'
|
||
|
||
def up
|
||
# an index must be conditionally created due to schema inconsistency
|
||
unless index_exists?(:table_name, :column_name, name: INDEX_NAME)
|
||
add_index :table_name, :column_name, name: INDEX_NAME
|
||
end
|
||
end
|
||
|
||
def down
|
||
# no op
|
||
end
|
||
end
|
||
```
|
||
|
||
Keep in mind that concurrent index helpers like `add_concurrent_index`,
|
||
`remove_concurrent_index`, and `remove_concurrent_index_by_name` already
|
||
perform existence checks internally.
|
||
|
||
## Adding foreign-key constraints
|
||
|
||
When adding a foreign-key constraint to either an existing or a new column also
|
||
remember to add an index on the column.
|
||
|
||
This is **required** for all foreign-keys, e.g., to support efficient cascading
|
||
deleting: when a lot of rows in a table get deleted, the referenced records need
|
||
to be deleted too. The database has to look for corresponding records in the
|
||
referenced table. Without an index, this results in a sequential scan on the
|
||
table, which can take a long time.
|
||
|
||
Here's an example where we add a new column with a foreign key
|
||
constraint. Note it includes `index: true` to create an index for it.
|
||
|
||
```ruby
|
||
class Migration < ActiveRecord::Migration[6.0]
|
||
|
||
def change
|
||
add_reference :model, :other_model, index: true, foreign_key: { on_delete: :cascade }
|
||
end
|
||
end
|
||
```
|
||
|
||
When adding a foreign-key constraint to an existing column in a non-empty table,
|
||
we have to employ `add_concurrent_foreign_key` and `add_concurrent_index`
|
||
instead of `add_reference`.
|
||
|
||
For an empty table (such as a fresh one), it is recommended to use
|
||
`add_reference` in a single-transaction migration, combining it with other
|
||
operations that don't require `disable_ddl_transaction!`.
|
||
|
||
You can read more about adding [foreign key constraints to an existing column](database/add_foreign_key_to_existing_column.md).
|
||
|
||
## `NOT NULL` constraints
|
||
|
||
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/38358) in GitLab 13.0.
|
||
|
||
See the style guide on [`NOT NULL` constraints](database/not_null_constraints.md) for more information.
|
||
|
||
## Adding Columns With Default Values
|
||
|
||
With PostgreSQL 11 being the minimum version in GitLab 13.0 and later, adding columns with default values has become much easier and
|
||
the standard `add_column` helper should be used in all cases.
|
||
|
||
Before PostgreSQL 11, adding a column with a default was problematic as it would
|
||
have caused a full table rewrite. The corresponding helper `add_column_with_default`
|
||
has been deprecated and is scheduled to be removed in a later release.
|
||
|
||
If a backport adding a column with a default value is needed for %12.9 or earlier versions,
|
||
it should use `add_column_with_default` helper. If a [large table](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L3)
|
||
is involved, backporting to %12.9 is contraindicated.
|
||
|
||
## Changing the column default
|
||
|
||
One might think that changing a default column with `change_column_default` is an
|
||
expensive and disruptive operation for larger tables, but in reality it's not.
|
||
|
||
Take the following migration as an example:
|
||
|
||
```ruby
|
||
class DefaultRequestAccessGroups < ActiveRecord::Migration[5.2]
|
||
DOWNTIME = false
|
||
|
||
def change
|
||
change_column_default(:namespaces, :request_access_enabled, from: false, to: true)
|
||
end
|
||
end
|
||
```
|
||
|
||
Migration above changes the default column value of one of our largest
|
||
tables: `namespaces`. This can be translated to:
|
||
|
||
```sql
|
||
ALTER TABLE namespaces
|
||
ALTER COLUMN request_access_enabled
|
||
SET DEFAULT false
|
||
```
|
||
|
||
In this particular case, the default value exists and we're just changing the metadata for
|
||
`request_access_enabled` column, which does not imply a rewrite of all the existing records
|
||
in the `namespaces` table. Only when creating a new column with a default, all the records are going be rewritten.
|
||
|
||
NOTE:
|
||
A faster [ALTER TABLE ADD COLUMN with a non-null default](https://www.depesz.com/2018/04/04/waiting-for-postgresql-11-fast-alter-table-add-column-with-a-non-null-default/)
|
||
was introduced on PostgresSQL 11.0, removing the need of rewriting the table when a new column with a default value is added.
|
||
|
||
For the reasons mentioned above, it's safe to use `change_column_default` in a single-transaction migration
|
||
without requiring `disable_ddl_transaction!`.
|
||
|
||
## Updating an existing column
|
||
|
||
To update an existing column to a particular value, you can use
|
||
`update_column_in_batches`. This splits the updates into batches, so we
|
||
don't update too many rows at in a single statement.
|
||
|
||
This updates the column `foo` in the `projects` table to 10, where `some_column`
|
||
is `'hello'`:
|
||
|
||
```ruby
|
||
update_column_in_batches(:projects, :foo, 10) do |table, query|
|
||
query.where(table[:some_column].eq('hello'))
|
||
end
|
||
```
|
||
|
||
If a computed update is needed, the value can be wrapped in `Arel.sql`, so Arel
|
||
treats it as an SQL literal. It's also a required deprecation for [Rails 6](https://gitlab.com/gitlab-org/gitlab/-/issues/28497).
|
||
|
||
The below example is the same as the one above, but
|
||
the value is set to the product of the `bar` and `baz` columns:
|
||
|
||
```ruby
|
||
update_value = Arel.sql('bar * baz')
|
||
|
||
update_column_in_batches(:projects, :foo, update_value) do |table, query|
|
||
query.where(table[:some_column].eq('hello'))
|
||
end
|
||
```
|
||
|
||
Like `add_column_with_default`, there is a RuboCop cop to detect usage of this
|
||
on large tables. In the case of `update_column_in_batches`, it may be acceptable
|
||
to run on a large table, as long as it is only updating a small subset of the
|
||
rows in the table, but do not ignore that without validating on the GitLab.com
|
||
staging environment - or asking someone else to do so for you - beforehand.
|
||
|
||
## Dropping a database table
|
||
|
||
Dropping a database table is uncommon, and the `drop_table` method
|
||
provided by Rails is generally considered safe. Before dropping the table,
|
||
please consider the following:
|
||
|
||
If your table has foreign keys on a high-traffic table (like `projects`), then
|
||
the `DROP TABLE` statement might fail with **statement timeout** error. Determining
|
||
what tables are high traffic can be difficult. Self-managed instances might
|
||
use different features of GitLab with different usage patterns, thus making
|
||
assumptions based on GitLab.com is not enough.
|
||
|
||
Table **has no records** (feature was never in use) and **no foreign
|
||
keys**:
|
||
|
||
- Simply use the `drop_table` method in your migration.
|
||
|
||
```ruby
|
||
def change
|
||
drop_table :my_table
|
||
end
|
||
```
|
||
|
||
Table **has records** but **no foreign keys**:
|
||
|
||
- First release: Remove the application code related to the table, such as models,
|
||
controllers and services.
|
||
- Second release: Use the `drop_table` method in your migration.
|
||
|
||
```ruby
|
||
def up
|
||
drop_table :my_table
|
||
end
|
||
|
||
def down
|
||
# create_table ...
|
||
end
|
||
```
|
||
|
||
Table **has foreign keys**:
|
||
|
||
- First release: Remove the application code related to the table, such as models,
|
||
controllers, and services.
|
||
- Second release: Remove the foreign keys using the `with_lock_retries`
|
||
helper method. Use `drop_table` in another migration file.
|
||
|
||
**Migrations for the second release:**
|
||
|
||
Removing the foreign key on the `projects` table:
|
||
|
||
```ruby
|
||
# first migration file
|
||
|
||
def up
|
||
with_lock_retries do
|
||
remove_foreign_key :my_table, :projects
|
||
end
|
||
end
|
||
|
||
def down
|
||
with_lock_retries do
|
||
add_foreign_key :my_table, :projects
|
||
end
|
||
end
|
||
```
|
||
|
||
Dropping the table:
|
||
|
||
```ruby
|
||
# second migration file
|
||
|
||
def up
|
||
drop_table :my_table
|
||
end
|
||
|
||
def down
|
||
# create_table ...
|
||
end
|
||
```
|
||
|
||
## Integer column type
|
||
|
||
By default, an integer column can hold up to a 4-byte (32-bit) number. That is
|
||
a max value of 2,147,483,647. Be aware of this when creating a column that
|
||
holds file sizes in byte units. If you are tracking file size in bytes, this
|
||
restricts the maximum file size to just over 2GB.
|
||
|
||
To allow an integer column to hold up to an 8-byte (64-bit) number, explicitly
|
||
set the limit to 8-bytes. This allows the column to hold a value up to
|
||
`9,223,372,036,854,775,807`.
|
||
|
||
Rails migration example:
|
||
|
||
```ruby
|
||
add_column(:projects, :foo, :integer, default: 10, limit: 8)
|
||
```
|
||
|
||
## Strings and the Text data type
|
||
|
||
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/30453) in GitLab 13.0.
|
||
|
||
See the [text data type](database/strings_and_the_text_data_type.md) style guide for more information.
|
||
|
||
## Timestamp column type
|
||
|
||
By default, Rails uses the `timestamp` data type that stores timestamp data
|
||
without timezone information. The `timestamp` data type is used by calling
|
||
either the `add_timestamps` or the `timestamps` method.
|
||
|
||
Also, Rails converts the `:datetime` data type to the `timestamp` one.
|
||
|
||
Example:
|
||
|
||
```ruby
|
||
# timestamps
|
||
create_table :users do |t|
|
||
t.timestamps
|
||
end
|
||
|
||
# add_timestamps
|
||
def up
|
||
add_timestamps :users
|
||
end
|
||
|
||
# :datetime
|
||
def up
|
||
add_column :users, :last_sign_in, :datetime
|
||
end
|
||
```
|
||
|
||
Instead of using these methods, one should use the following methods to store
|
||
timestamps with timezones:
|
||
|
||
- `add_timestamps_with_timezone`
|
||
- `timestamps_with_timezone`
|
||
- `datetime_with_timezone`
|
||
|
||
This ensures all timestamps have a time zone specified. This, in turn, means
|
||
existing timestamps don't suddenly use a different timezone when the system's
|
||
timezone changes. It also makes it very clear which timezone was used in the
|
||
first place.
|
||
|
||
## Storing JSON in database
|
||
|
||
The Rails 5 natively supports `JSONB` (binary JSON) column type.
|
||
Example migration adding this column:
|
||
|
||
```ruby
|
||
class AddOptionsToBuildMetadata < ActiveRecord::Migration[5.0]
|
||
DOWNTIME = false
|
||
|
||
def change
|
||
add_column :ci_builds_metadata, :config_options, :jsonb
|
||
end
|
||
end
|
||
```
|
||
|
||
You have to use a serializer to provide a translation layer:
|
||
|
||
```ruby
|
||
class BuildMetadata
|
||
serialize :config_options, Serializers::JSON # rubocop:disable Cop/ActiveRecordSerialize
|
||
end
|
||
```
|
||
|
||
When using a `JSONB` column, use the [JsonSchemaValidator](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/validators/json_schema_validator.rb) to keep control of the data being inserted over time.
|
||
|
||
```ruby
|
||
class BuildMetadata
|
||
validates :config_options, json_schema: { filename: 'build_metadata_config_option' }
|
||
end
|
||
```
|
||
|
||
## Testing
|
||
|
||
See the [Testing Rails migrations](testing_guide/testing_migrations_guide.md) style guide.
|
||
|
||
## Data migration
|
||
|
||
Please prefer Arel and plain SQL over usual ActiveRecord syntax. In case of
|
||
using plain SQL, you need to quote all input manually with `quote_string` helper.
|
||
|
||
Example with Arel:
|
||
|
||
```ruby
|
||
users = Arel::Table.new(:users)
|
||
users.group(users[:user_id]).having(users[:id].count.gt(5))
|
||
|
||
#update other tables with these results
|
||
```
|
||
|
||
Example with plain SQL and `quote_string` helper:
|
||
|
||
```ruby
|
||
select_all("SELECT name, COUNT(id) as cnt FROM tags GROUP BY name HAVING COUNT(id) > 1").each do |tag|
|
||
tag_name = quote_string(tag["name"])
|
||
duplicate_ids = select_all("SELECT id FROM tags WHERE name = '#{tag_name}'").map{|tag| tag["id"]}
|
||
origin_tag_id = duplicate_ids.first
|
||
duplicate_ids.delete origin_tag_id
|
||
|
||
execute("UPDATE taggings SET tag_id = #{origin_tag_id} WHERE tag_id IN(#{duplicate_ids.join(",")})")
|
||
execute("DELETE FROM tags WHERE id IN(#{duplicate_ids.join(",")})")
|
||
end
|
||
```
|
||
|
||
If you need more complex logic, you can define and use models local to a
|
||
migration. For example:
|
||
|
||
```ruby
|
||
class MyMigration < ActiveRecord::Migration[6.0]
|
||
class Project < ActiveRecord::Base
|
||
self.table_name = 'projects'
|
||
end
|
||
|
||
def up
|
||
# Reset the column information of all the models that update the database
|
||
# to ensure the Active Record's knowledge of the table structure is current
|
||
Project.reset_column_information
|
||
|
||
# ... ...
|
||
end
|
||
end
|
||
```
|
||
|
||
When doing so be sure to explicitly set the model's table name, so it's not
|
||
derived from the class name or namespace.
|
||
|
||
Be aware of the limitations [when using models in migrations](#using-models-in-migrations-discouraged).
|
||
|
||
### Renaming reserved paths
|
||
|
||
When a new route for projects is introduced, it could conflict with any
|
||
existing records. The path for these records should be renamed, and the
|
||
related data should be moved on disk.
|
||
|
||
Since we had to do this a few times already, there are now some helpers to help
|
||
with this.
|
||
|
||
To use this you can include `Gitlab::Database::RenameReservedPathsMigration::V1`
|
||
in your migration. This provides 3 methods which you can pass one or more
|
||
paths that need to be rejected.
|
||
|
||
- **`rename_root_paths`**: Renames the path of all _namespaces_ with the
|
||
given name that don't have a `parent_id`.
|
||
- **`rename_child_paths`**: Renames the path of all _namespaces_ with the
|
||
given name that have a `parent_id`.
|
||
- **`rename_wildcard_paths`**: Renames the path of all _projects_, and all
|
||
_namespaces_ that have a `project_id`.
|
||
|
||
The `path` column for these rows are renamed to their previous value followed
|
||
by an integer. For example: `users` would turn into `users0`
|
||
|
||
## Using models in migrations (discouraged)
|
||
|
||
The use of models in migrations is generally discouraged. As such models are
|
||
[contraindicated for background migrations](background_migrations.md#isolation),
|
||
the model needs to be declared in the migration.
|
||
|
||
If using a model in the migrations, you should first
|
||
[clear the column cache](https://api.rubyonrails.org/classes/ActiveRecord/ModelSchema/ClassMethods.html#method-i-reset_column_information)
|
||
using `reset_column_information`.
|
||
|
||
This avoids problems where a column that you are using was altered and cached
|
||
in a previous migration.
|
||
|
||
### Example: Add a column `my_column` to the users table
|
||
|
||
It is important not to leave out the `User.reset_column_information` command, in order to ensure that the old schema is dropped from the cache and ActiveRecord loads the updated schema information.
|
||
|
||
```ruby
|
||
class AddAndSeedMyColumn < ActiveRecord::Migration[6.0]
|
||
class User < ActiveRecord::Base
|
||
self.table_name = 'users'
|
||
end
|
||
|
||
def up
|
||
User.count # Any ActiveRecord calls on the model that caches the column information.
|
||
|
||
add_column :users, :my_column, :integer, default: 1
|
||
|
||
User.reset_column_information # The old schema is dropped from the cache.
|
||
User.find_each do |user|
|
||
user.my_column = 42 if some_condition # ActiveRecord sees the correct schema here.
|
||
user.save!
|
||
end
|
||
end
|
||
end
|
||
```
|
||
|
||
The underlying table is modified and then accessed via ActiveRecord.
|
||
|
||
Note that this also needs to be used if the table is modified in a previous, different migration,
|
||
if both migrations are run in the same `db:migrate` process.
|
||
|
||
This results in the following. Note the inclusion of `my_column`:
|
||
|
||
```shell
|
||
== 20200705232821 AddAndSeedMyColumn: migrating ==============================
|
||
D, [2020-07-06T00:37:12.483876 #130101] DEBUG -- : (0.2ms) BEGIN
|
||
D, [2020-07-06T00:37:12.521660 #130101] DEBUG -- : (0.4ms) SELECT COUNT(*) FROM "user"
|
||
-- add_column(:users, :my_column, :integer, {:default=>1})
|
||
D, [2020-07-06T00:37:12.523309 #130101] DEBUG -- : (0.8ms) ALTER TABLE "users" ADD "my_column" integer DEFAULT 1
|
||
-> 0.0016s
|
||
D, [2020-07-06T00:37:12.650641 #130101] DEBUG -- : AddAndSeedMyColumn::User Load (0.7ms) SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT $1 [["LIMIT", 1000]]
|
||
D, [2020-07-18T00:41:26.851769 #459802] DEBUG -- : AddAndSeedMyColumn::User Update (1.1ms) UPDATE "users" SET "my_column" = $1, "updated_at" = $2 WHERE "users"."id" = $3 [["my_column", 42], ["updated_at", "2020-07-17 23:41:26.849044"], ["id", 1]]
|
||
D, [2020-07-06T00:37:12.653648 #130101] DEBUG -- : ↳ config/initializers/config_initializers_active_record_locking.rb:13:in `_update_row'
|
||
== 20200705232821 AddAndSeedMyColumn: migrated (0.1706s) =====================
|
||
```
|
||
|
||
If you skip clearing the schema cache (`User.reset_column_information`), the column is not
|
||
used by ActiveRecord and the intended changes are not made, leading to the result below,
|
||
where `my_column` is missing from the query.
|
||
|
||
```shell
|
||
== 20200705232821 AddAndSeedMyColumn: migrating ==============================
|
||
D, [2020-07-06T00:37:12.483876 #130101] DEBUG -- : (0.2ms) BEGIN
|
||
D, [2020-07-06T00:37:12.521660 #130101] DEBUG -- : (0.4ms) SELECT COUNT(*) FROM "user"
|
||
-- add_column(:users, :my_column, :integer, {:default=>1})
|
||
D, [2020-07-06T00:37:12.523309 #130101] DEBUG -- : (0.8ms) ALTER TABLE "users" ADD "my_column" integer DEFAULT 1
|
||
-> 0.0016s
|
||
D, [2020-07-06T00:37:12.650641 #130101] DEBUG -- : AddAndSeedMyColumn::User Load (0.7ms) SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT $1 [["LIMIT", 1000]]
|
||
D, [2020-07-06T00:37:12.653459 #130101] DEBUG -- : AddAndSeedMyColumn::User Update (0.5ms) UPDATE "users" SET "updated_at" = $1 WHERE "users"."id" = $2 [["updated_at", "2020-07-05 23:37:12.652297"], ["id", 1]]
|
||
D, [2020-07-06T00:37:12.653648 #130101] DEBUG -- : ↳ config/initializers/config_initializers_active_record_locking.rb:13:in `_update_row'
|
||
== 20200705232821 AddAndSeedMyColumn: migrated (0.1706s) =====================
|
||
```
|