2015-05-11 09:09:36 -04:00
# Migration Style Guide
When writing migrations for GitLab, you have to take into account that
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
these will be run by hundreds of thousands of organizations of all sizes, some with
2015-05-11 09:09:36 -04:00
many years of data in their database.
2018-01-19 05:09:59 -05:00
In addition, having to take a server offline for an upgrade small or big is a
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
big burden for most organizations. For this reason, it is important that your
migrations are written carefully, can be applied online, and adhere to the style
2017-04-12 12:15:19 -04:00
guide below.
2015-05-11 09:09:36 -04:00
2017-04-12 12:15:19 -04:00
Migrations are **not** allowed to require GitLab installations to be taken
2019-08-06 10:08:28 -04:00
offline unless _absolutely necessary_ .
2017-04-12 12:15:19 -04:00
When downtime is necessary the migration has to be approved by:
1. The VP of Engineering
1. A Backend Lead
1. A Database Specialist
An up-to-date list of people holding these titles can be found at
2019-07-08 20:20:40 -04:00
< https: / / about . gitlab . com / company / team / > .
2017-04-12 12:15:19 -04:00
2015-05-11 09:09:36 -04:00
When writing your migrations, also consider that databases might have stale data
2017-04-12 12:15:19 -04:00
or inconsistencies and guard for that. Try to make as few assumptions as
possible about the state of the database.
Please don't depend on GitLab-specific code since it can change in future
versions. If needed copy-paste GitLab code into the migration to make it forward
compatible.
2017-08-04 07:30:57 -04:00
## Schema Changes
Migrations that make changes to the database schema (e.g. adding a column) can
only be added in the monthly release, patch releases may only contain data
migrations _unless_ schema changes are absolutely required to solve a problem.
2018-03-22 12:08:21 -04:00
## What Requires Downtime?
The document ["What Requires Downtime?" ](what_requires_downtime.md ) specifies
2018-11-13 01:07:16 -05:00
various database operations, such as
2018-03-22 12:08:21 -04:00
- [adding, dropping, and renaming columns ](what_requires_downtime.md#adding-columns )
- [changing column constraints and types ](what_requires_downtime.md#changing-column-constraints )
- [adding and dropping indexes, tables, and foreign keys ](what_requires_downtime.md#adding-indexes )
and whether they require downtime and how to work around that whenever possible.
2016-06-24 12:29:23 -04:00
## Downtime Tagging
2015-05-11 09:09:36 -04:00
2016-06-24 12:29:23 -04:00
Every migration must specify if it requires downtime or not, and if it should
2017-04-12 12:15:19 -04:00
require downtime it must also specify a reason for this. This is required even
if 99% of the migrations won't require downtime as this makes it easier to find
the migrations that _do_ require downtime.
To tag a migration, add the following two constants to the migration class'
body:
2015-05-11 09:09:36 -04:00
2018-11-13 01:07:16 -05:00
- `DOWNTIME` : a boolean that when set to `true` indicates the migration requires
2016-06-24 12:29:23 -04:00
downtime.
2018-11-13 01:07:16 -05:00
- `DOWNTIME_REASON` : a String containing the reason for the migration requiring
2016-06-24 12:29:23 -04:00
downtime. This constant **must** be set when `DOWNTIME` is set to `true` .
2015-05-11 09:09:36 -04:00
2016-06-24 12:29:23 -04:00
For example:
2015-05-11 09:09:36 -04:00
2016-06-24 12:29:23 -04:00
```ruby
2018-12-12 10:38:40 -05:00
class MyMigration < ActiveRecord::Migration [ 4 . 2 ]
2016-06-24 12:29:23 -04:00
DOWNTIME = true
DOWNTIME_REASON = 'This migration requires downtime because ...'
2016-06-15 17:38:12 -04:00
2016-06-24 12:29:23 -04:00
def change
...
end
end
```
2015-05-11 09:09:36 -04:00
2016-06-24 12:29:23 -04:00
It is an error (that is, CI will fail) if the `DOWNTIME` constant is missing
from a migration class.
2015-11-02 10:14:34 -05:00
2016-06-24 12:29:23 -04:00
## Reversibility
2015-05-11 09:09:36 -04:00
2017-04-12 12:15:19 -04:00
Your migration **must be** reversible. This is very important, as it should
2015-05-11 09:09:36 -04:00
be possible to downgrade in case of a vulnerability or bugs.
In your migration, add a comment describing how the reversibility of the
2015-05-12 04:48:18 -04:00
migration was tested.
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
## Atomicity
By default, migrations are single transaction. That is, a transaction is opened
at the beginning of the migration, and committed after all steps are processed.
Running migrations in a single transaction makes sure that if one of the steps fails,
none of the steps will be executed, leaving the database in valid state.
Therefore, either:
- Put all migrations in one single-transaction migration.
2019-08-27 07:31:43 -04:00
- If necessary, put most actions in one migration and create a separate migration
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
for the steps that cannot be done in a single transaction.
2019-08-27 07:31:43 -04:00
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
For example, if you create an empty table and need to build an index for it,
it is recommended to use a regular single-transaction migration and the default
rails schema statement: [`add_index` ](https://api.rubyonrails.org/v5.2/classes/ActiveRecord/ConnectionAdapters/SchemaStatements.html#method-i-add_index ).
This is a blocking operation, but it won't cause problems because the table is not yet used,
and therefore it does not have any records yet.
## Heavy operations in a single transaction
When using a single-transaction migration, a transaction will hold on a database connection
for the duration of the migration, so you must make sure the actions in the migration
do not take too much time: In general, queries executed in a migration need to fit comfortably
within `15s` on GitLab.com.
In case you need to insert, update, or delete a significant amount of data, you:
- Must disable the single transaction with `disable_ddl_transaction!` .
- Should consider doing it in a [Background Migration ](background_migrations.md ).
2020-02-06 01:08:52 -05:00
## Retry mechanism when acquiring database locks
When changing the database schema, we use helper methods to invoke DDL (Data Definition
Language) statements. In some cases, these DDL statements require a specific database lock.
Example:
```ruby
def change
remove_column :users, :full_name, :string
end
```
Executing this migration requires an exclusive lock on the `users` table. When the table
is concurrently accessed and modified by other processes, acquiring the lock may take
a while. The lock request is waiting in a queue and it may also block other queries
on the `users` table once it has been enqueued.
More information about PostgresSQL locks: [Explicit Locking ](https://www.postgresql.org/docs/current/explicit-locking.html )
For stability reasons, GitLab.com has a specific [`statement_timeout` ](../user/gitlab_com/index.md#postgresql )
set. When the migration is invoked, any database query will have
a fixed time to execute. In a worst-case scenario, the request will sit in the
lock queue, blocking other queries for the duration of the configured statement timeout,
then failing with `canceling statement due to statement timeout` error.
This problem could cause failed application upgrade processes and even application
stability issues, since the table may be inaccessible for a short period of time.
To increase the reliability and stability of database migrations, the GitLab codebase
offers a helper method to retry the operations with different `lock_timeout` settings
and wait time between the attempts. Multiple smaller attempts to acquire the necessary
lock allow the database to process other statements.
### Examples
Removing a column:
```ruby
include Gitlab::Database::MigrationHelpers
def change
with_lock_retries do
remove_column :users, :full_name, :string
end
end
```
Removing a foreign key:
```ruby
include Gitlab::Database::MigrationHelpers
def change
with_lock_retries do
remove_foreign_key :issues, :projects
end
end
```
Changing default value for a column:
```ruby
include Gitlab::Database::MigrationHelpers
def change
with_lock_retries do
change_column_default :merge_requests, :lock_version, from: nil, to: 0
end
end
```
### When to use the helper method
The `with_lock_retries` helper method can be used when you normally use
standard Rails migration helper methods. Calling more than one migration
helper is not a problem if they're executed on the same table.
Using the `with_lock_retries` helper method is advised when a database
migration involves one of the high-traffic tables:
- `users`
- `projects`
- `namespaces`
- `ci_pipelines`
- `ci_builds`
- `notes`
Example changes:
- `add_foreign_key` / `remove_foreign_key`
- `add_column` / `remove_column`
- `change_column_default`
**Note:** `with_lock_retries` method **cannot** be used with `disable_ddl_transaction!` .
### How the helper method works
1. Iterate 50 times.
1. For each iteration, set a pre-configured `lock_timeout` .
1. Try to execute the given block. (`remove_column`).
1. If `LockWaitTimeout` error is raised, sleep for the pre-configured `sleep_time`
and retry the block.
1. If no error is raised, the current iteration has successfully executed the block.
For more information check the [`Gitlab::Database::WithLockRetries` ](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/with_lock_retries.rb ) class. The `with_lock_retries` helper method is implemented in the [`Gitlab::Database::MigrationHelpers` ](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/migration_helpers.rb ) module.
In a worst-case scenario, the method:
- Executes the block for a maximum of 50 times over 40 minutes.
- Most of the time is spent in a pre-configured sleep period after each iteration.
- After the 50th retry, the block will be executed without `lock_timeout` , just
like a standard migration invocation.
- If a lock cannot be acquired, the migration will fail with `statement timeout` error.
The migration might fail if there is a very long running transaction (40+ minutes)
accessing the `users` table.
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
## Multi-Threading
2017-04-12 12:15:19 -04:00
Sometimes a migration might need to use multiple Ruby threads to speed up a
migration. For this to work your migration needs to include the module
`Gitlab::Database::MultiThreadedMigration` :
```ruby
2018-12-12 10:38:40 -05:00
class MyMigration < ActiveRecord::Migration [ 4 . 2 ]
2017-04-12 12:15:19 -04:00
include Gitlab::Database::MigrationHelpers
include Gitlab::Database::MultiThreadedMigration
end
```
You can then use the method `with_multiple_threads` to perform work in separate
threads. For example:
```ruby
2018-12-12 10:38:40 -05:00
class MyMigration < ActiveRecord::Migration [ 4 . 2 ]
2017-04-12 12:15:19 -04:00
include Gitlab::Database::MigrationHelpers
include Gitlab::Database::MultiThreadedMigration
def up
with_multiple_threads(4) do
disable_statement_timeout
# ...
end
end
end
```
Here the call to `disable_statement_timeout` will use the connection local to
the `with_multiple_threads` block, instead of re-using the global connection
2020-02-11 16:08:44 -05:00
pool. This ensures each thread has its own connection object, and won't time
2017-04-12 12:15:19 -04:00
out when trying to obtain one.
**NOTE:** PostgreSQL has a maximum amount of connections that it allows. This
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
limit can vary from installation to installation. As a result, it's recommended
you do not use more than 32 threads in a single migration. Usually, 4-8 threads
2017-04-12 12:15:19 -04:00
should be more than enough.
2017-06-13 07:44:13 -04:00
## Removing indexes
2015-05-12 04:48:18 -04:00
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
If the table is not empty when removing an index, make sure to use the method
`remove_concurrent_index` instead of the regular `remove_index` method.
The `remove_concurrent_index` method drops indexes concurrently, so no locking is required,
and there is no need for downtime. To use this method, you must disable single-transaction mode
2018-08-13 20:51:27 -04:00
by calling the method `disable_ddl_transaction!` in the body of your migration
class like so:
2015-05-12 04:48:18 -04:00
2016-11-09 05:59:15 -05:00
```ruby
2018-12-12 10:38:40 -05:00
class MyMigration < ActiveRecord::Migration [ 4 . 2 ]
2017-04-05 18:53:57 -04:00
include Gitlab::Database::MigrationHelpers
disable_ddl_transaction!
def up
2018-03-20 09:38:43 -04:00
remove_concurrent_index :table_name, :column_name
2017-04-05 18:53:57 -04:00
end
end
2015-05-12 04:48:18 -04:00
```
2018-03-20 09:38:43 -04:00
Note that it is not necessary to check if the index exists prior to
removing it.
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
For a small table (such as an empty one or one with less than `1,000` records),
it is recommended to use `remove_index` in a single-transaction migration,
combining it with other operations that don't require `disable_ddl_transaction!` .
2017-06-13 07:44:13 -04:00
## Adding indexes
2015-05-12 04:48:18 -04:00
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
If you need to add a unique index, please keep in mind there is the possibility
2017-04-12 12:15:19 -04:00
of existing duplicates being present in the database. This means that should
always _first_ add a migration that removes any duplicates, before adding the
unique index.
2015-05-12 04:48:18 -04:00
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
When adding an index to a non-empty table make sure to use the method
2019-08-27 07:31:43 -04:00
`add_concurrent_index` instead of the regular `add_index` method.
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
The `add_concurrent_index` method automatically creates concurrent indexes
2019-08-27 07:31:43 -04:00
when using PostgreSQL, removing the need for downtime.
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
To use this method, you must disable single-transactions mode
by calling the method `disable_ddl_transaction!` in the body of your migration
class like so:
2016-05-09 09:05:19 -04:00
2016-11-09 05:59:15 -05:00
```ruby
2018-12-12 10:38:40 -05:00
class MyMigration < ActiveRecord::Migration [ 4 . 2 ]
2016-06-15 17:38:12 -04:00
include Gitlab::Database::MigrationHelpers
2017-04-12 12:15:19 -04:00
2016-05-09 09:05:19 -04:00
disable_ddl_transaction!
2017-04-12 12:15:19 -04:00
def up
add_concurrent_index :table, :column
end
2016-05-09 09:05:19 -04:00
2017-04-12 12:15:19 -04:00
def down
2019-11-01 14:06:00 -04:00
remove_concurrent_index :table, :column
2016-05-09 09:05:19 -04:00
end
end
```
2019-08-27 07:31:43 -04:00
For a small table (such as an empty one or one with less than `1,000` records),
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
it is recommended to use `add_index` in a single-transaction migration, combining it with other
operations that don't require `disable_ddl_transaction!` .
2018-08-01 06:05:37 -04:00
## Adding foreign-key constraints
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
When adding a foreign-key constraint to either an existing or a new column also
remember to add an index on the column.
2018-08-01 06:05:37 -04:00
2019-05-23 08:45:52 -04:00
This is **required** for all foreign-keys, e.g., to support efficient cascading
deleting: when a lot of rows in a table get deleted, the referenced records need
to be deleted too. The database has to look for corresponding records in the
referenced table. Without an index, this will result in a sequential scan on the
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
table, which can take a long time.
2018-08-01 06:05:37 -04:00
Here's an example where we add a new column with a foreign key
constraint. Note it includes `index: true` to create an index for it.
```ruby
2018-12-12 10:38:40 -05:00
class Migration < ActiveRecord::Migration [ 4 . 2 ]
2018-08-01 06:05:37 -04:00
def change
add_reference :model, :other_model, index: true, foreign_key: { on_delete: :cascade }
end
end
```
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
When adding a foreign-key constraint to an existing column in a non-empty table,
we have to employ `add_concurrent_foreign_key` and `add_concurrent_index`
2018-08-01 06:05:37 -04:00
instead of `add_reference` .
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
For an empty table (such as a fresh one), it is recommended to use
`add_reference` in a single-transaction migration, combining it with other
operations that don't require `disable_ddl_transaction!` .
2016-05-09 09:05:19 -04:00
## Adding Columns With Default Values
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
When adding columns with default values to non-empty tables, you must use
2016-05-09 09:05:19 -04:00
`add_column_with_default` . This method ensures the table is updated without
requiring downtime. This method is not reversible so you must manually define
the `up` and `down` methods in your migration class.
For example, to add the column `foo` to the `projects` table with a default
value of `10` you'd write the following:
2016-11-09 05:59:15 -05:00
```ruby
2018-12-12 10:38:40 -05:00
class MyMigration < ActiveRecord::Migration [ 4 . 2 ]
2016-06-15 17:38:12 -04:00
include Gitlab::Database::MigrationHelpers
disable_ddl_transaction!
2016-06-24 12:29:23 -04:00
2016-05-09 09:05:19 -04:00
def up
2016-06-15 17:38:12 -04:00
add_column_with_default(:projects, :foo, :integer, default: 10)
2016-05-09 09:05:19 -04:00
end
def down
remove_column(:projects, :foo)
end
end
```
2017-04-12 12:15:19 -04:00
Keep in mind that this operation can easily take 10-15 minutes to complete on
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
larger installations (e.g. GitLab.com). As a result, you should only add
default values if absolutely necessary. There is a RuboCop cop that will fail if
this method is used on some tables that are very large on GitLab.com, which
would cause other issues.
For a small table (such as an empty one or one with less than `1,000` records),
use `add_column` and `change_column_default` in a single-transaction migration,
combining it with other operations that don't require `disable_ddl_transaction!` .
2017-11-17 11:02:10 -05:00
2019-10-07 14:06:24 -04:00
## Changing the column default
One might think that changing a default column with `change_column_default` is an
expensive and disruptive operation for larger tables, but in reality it's not.
Take the following migration as an example:
```ruby
class DefaultRequestAccessGroups < ActiveRecord::Migration [ 5 . 2 ]
include Gitlab::Database::MigrationHelpers
DOWNTIME = false
def up
change_column_default :namespaces, :request_access_enabled, true
end
def down
change_column_default :namespaces, :request_access_enabled, false
end
end
```
Migration above changes the default column value of one of our largest
tables: `namespaces` . This can be translated to:
```sql
ALTER TABLE namespaces
ALTER COLUMN request_access_enabled
DEFAULT false
```
In this particular case, the default value exists and we're just changing the metadata for
`request_access_enabled` column, which does not imply a rewrite of all the existing records
in the `namespaces` table. Only when creating a new column with a default, all the records are going be rewritten.
NOTE: **Note:** A faster [ALTER TABLE ADD COLUMN with a non-null default ](https://www.depesz.com/2018/04/04/waiting-for-postgresql-11-fast-alter-table-add-column-with-a-non-null-default/ )
2020-01-08 22:07:56 -05:00
was introduced on PostgresSQL 11.0, removing the need of rewriting the table when a new column with a default value is added.
2019-10-07 14:06:24 -04:00
For the reasons mentioned above, it's safe to use `change_column_default` in a single-transaction migration
without requiring `disable_ddl_transaction!` .
2017-11-17 11:02:10 -05:00
## Updating an existing column
To update an existing column to a particular value, you can use
`update_column_in_batches` (`add_column_with_default` uses this internally to
fill in the default value). This will split the updates into batches, so we
don't update too many rows at in a single statement.
This updates the column `foo` in the `projects` table to 10, where `some_column`
is `'hello'` :
```ruby
update_column_in_batches(:projects, :foo, 10) do |table, query|
query.where(table[:some_column].eq('hello'))
end
```
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
If a computed update is needed, the value can be wrapped in `Arel.sql` , so Arel
2019-09-18 10:02:45 -04:00
treats it as an SQL literal. It's also a required deprecation for [Rails 6 ](https://gitlab.com/gitlab-org/gitlab-foss/issues/61451 ).
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
The below example is the same as the one above, but
2017-11-17 11:02:10 -05:00
the value is set to the product of the `bar` and `baz` columns:
```ruby
update_value = Arel.sql('bar * baz')
update_column_in_batches(:projects, :foo, update_value) do |table, query|
query.where(table[:some_column].eq('hello'))
end
```
Like `add_column_with_default` , there is a RuboCop cop to detect usage of this
on large tables. In the case of `update_column_in_batches` , it may be acceptable
to run on a large table, as long as it is only updating a small subset of the
rows in the table, but do not ignore that without validating on the GitLab.com
staging environment - or asking someone else to do so for you - beforehand.
2016-09-13 18:15:14 -04:00
2020-02-06 01:08:52 -05:00
## Dropping a database table
Dropping a database table is uncommon, and the `drop_table` method
provided by Rails is generally considered safe. Before dropping the table,
please consider the following:
If your table has foreign keys on a high-traffic table (like `projects` ), then
the `DROP TABLE` statement might fail with **statement timeout** error. Determining
what tables are high traffic can be difficult. Self-managed instances might
use different features of GitLab with different usage patterns, thus making
assumptions based on GitLab.com is not enough.
Table **has no records** (feature was never in use) and **no foreign
keys**:
- Simply use the `drop_table` method in your migration.
```ruby
def change
drop_table :my_table
end
```
Table **has records** but **no foreign keys** :
- First release: Remove the application code related to the table, such as models,
controllers and services.
- Second release: Use the `drop_table` method in your migration.
```ruby
def up
drop_table :my_table
end
def down
# create_table ...
end
```
Table **has foreign keys** :
- First release: Remove the application code related to the table, such as models,
controllers, and services.
- Second release: Remove the foreign keys using the `with_lock_retries`
helper method. Use `drop_table` in another migration file.
**Migrations for the second release:**
Removing the foreign key on the `projects` table:
```ruby
# first migration file
def up
with_lock_retries do
remove_foreign_key :my_table, :projects
end
end
def down
with_lock_retries do
add_foreign_key :my_table, :projects
end
end
```
Dropping the table:
```ruby
# second migration file
def up
drop_table :my_table
end
def down
# create_table ...
end
```
2016-09-13 18:15:14 -04:00
## Integer column type
By default, an integer column can hold up to a 4-byte (32-bit) number. That is
a max value of 2,147,483,647. Be aware of this when creating a column that will
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
hold file sizes in byte units. If you are tracking file size in bytes, this
2016-09-13 18:15:14 -04:00
restricts the maximum file size to just over 2GB.
To allow an integer column to hold up to an 8-byte (64-bit) number, explicitly
set the limit to 8-bytes. This will allow the column to hold a value up to
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
`9,223,372,036,854,775,807` .
2016-09-13 18:15:14 -04:00
Rails migration example:
2016-11-09 05:59:15 -05:00
```ruby
2016-09-13 18:15:14 -04:00
add_column_with_default(:projects, :foo, :integer, default: 10, limit: 8)
```
2017-06-13 07:44:13 -04:00
## Timestamp column type
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
By default, Rails uses the `timestamp` data type that stores timestamp data
without timezone information. The `timestamp` data type is used by calling
either the `add_timestamps` or the `timestamps` method.
Also, Rails converts the `:datetime` data type to the `timestamp` one.
2017-06-13 07:44:13 -04:00
Example:
```ruby
# timestamps
create_table :users do |t|
t.timestamps
end
# add_timestamps
def up
add_timestamps :users
end
# :datetime
def up
add_column :users, :last_sign_in, :datetime
end
```
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
Instead of using these methods, one should use the following methods to store
timestamps with timezones:
2017-06-13 07:44:13 -04:00
2018-11-13 01:07:16 -05:00
- `add_timestamps_with_timezone`
- `timestamps_with_timezone`
2019-09-23 14:06:14 -04:00
- `datetime_with_timezone`
2017-06-13 07:44:13 -04:00
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
This ensures all timestamps have a time zone specified. This, in turn, means
existing timestamps won't suddenly use a different timezone when the system's
timezone changes. It also makes it very clear which timezone was used in the
first place.
2017-06-13 07:44:13 -04:00
2018-09-02 10:35:15 -04:00
## Storing JSON in database
The Rails 5 natively supports `JSONB` (binary JSON) column type.
Example migration adding this column:
```ruby
class AddOptionsToBuildMetadata < ActiveRecord::Migration [ 5 . 0 ]
DOWNTIME = false
def change
add_column :ci_builds_metadata, :config_options, :jsonb
end
end
```
2019-08-06 10:08:28 -04:00
You have to use a serializer to provide a translation layer:
2018-09-02 10:35:15 -04:00
```ruby
class BuildMetadata
serialize :config_options, Serializers::JSON # rubocop:disable Cop/ActiveRecordSerialize
end
```
2017-06-13 07:44:13 -04:00
2015-05-12 04:48:18 -04:00
## Testing
2019-09-24 02:06:02 -04:00
See the [Testing Rails migrations ](testing_guide/testing_migrations_guide.md ) style guide.
2015-05-12 04:48:18 -04:00
## Data migration
2017-04-12 12:15:19 -04:00
Please prefer Arel and plain SQL over usual ActiveRecord syntax. In case of
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
using plain SQL, you need to quote all input manually with `quote_string` helper.
2015-05-12 04:48:18 -04:00
Example with Arel:
2016-11-09 05:59:15 -05:00
```ruby
2015-05-12 04:48:18 -04:00
users = Arel::Table.new(:users)
users.group(users[:user_id]).having(users[:id].count.gt(5))
2016-05-30 01:31:39 -04:00
#update other tables with these results
2015-05-12 04:48:18 -04:00
```
Example with plain SQL and `quote_string` helper:
2016-11-09 05:59:15 -05:00
```ruby
2015-05-12 04:48:18 -04:00
select_all("SELECT name, COUNT(id) as cnt FROM tags GROUP BY name HAVING COUNT(id) > 1").each do |tag|
tag_name = quote_string(tag["name"])
duplicate_ids = select_all("SELECT id FROM tags WHERE name = '#{tag_name}'").map{|tag| tag["id"]}
origin_tag_id = duplicate_ids.first
duplicate_ids.delete origin_tag_id
execute("UPDATE taggings SET tag_id = #{origin_tag_id} WHERE tag_id IN(#{duplicate_ids.join(",")})")
execute("DELETE FROM tags WHERE id IN(#{duplicate_ids.join(",")})")
end
2016-05-09 09:05:19 -04:00
```
2017-04-12 12:15:19 -04:00
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
If you need more complex logic, you can define and use models local to a
2017-04-12 12:15:19 -04:00
migration. For example:
```ruby
2018-12-12 10:38:40 -05:00
class MyMigration < ActiveRecord::Migration [ 4 . 2 ]
2017-04-12 12:15:19 -04:00
class Project < ActiveRecord::Base
self.table_name = 'projects'
end
end
```
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
When doing so be sure to explicitly set the model's table name, so it's not
2017-04-12 12:15:19 -04:00
derived from the class name or namespace.
2017-05-02 08:10:55 -04:00
### Renaming reserved paths
Migrations guide: use atomic steps, when possible
Currently, the DB migrations guide says that "you must" use non-blocking
operations (such as CREATE INDEX CONCURRENTLY), always.
But this does not make sense in cases of empty tables and leads to
splitting the work to multiple non-atomic
(with disable_ddl_transaction!) DB migrations.
To follow KISS principle, to have fewer DB migrations steps,
to have them atomic when it's possible and simplify deployment
and troubleshooting, the following exceptions were added
to the doc:
- index creation,
- index dropping,
- defining an FK,
- adding a column with DEFAULT,
2019-08-27 06:22:12 -04:00
When a new route for projects is introduced, it could conflict with any
existing records. The path for these records should be renamed, and the
2017-05-02 08:10:55 -04:00
related data should be moved on disk.
Since we had to do this a few times already, there are now some helpers to help
with this.
To use this you can include `Gitlab::Database::RenameReservedPathsMigration::V1`
in your migration. This will provide 3 methods which you can pass one or more
paths that need to be rejected.
**`rename_root_paths`**: This will rename the path of all _namespaces_ with the
given name that don't have a `parent_id` .
**`rename_child_paths`**: This will rename the path of all _namespaces_ with the
given name that have a `parent_id` .
**`rename_wildcard_paths`**: This will rename the path of all _projects_ , and all
_namespaces_ that have a `project_id` .
The `path` column for these rows will be renamed to their previous value followed
by an integer. For example: `users` would turn into `users0`