--- stage: none group: unassigned info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments --- # Migration Style Guide When writing migrations for GitLab, you have to take into account that these are run by hundreds of thousands of organizations of all sizes, some with many years of data in their database. In addition, having to take a server offline for an upgrade small or big is a big burden for most organizations. For this reason, it is important that your migrations are written carefully, can be applied online, and adhere to the style guide below. Migrations are **not** allowed to require GitLab installations to be taken offline unless _absolutely necessary_. When downtime is necessary the migration has to be approved by: 1. The VP of Engineering 1. A Backend Maintainer 1. A Database Maintainer An up-to-date list of people holding these titles can be found at . When writing your migrations, also consider that databases might have stale data or inconsistencies and guard for that. Try to make as few assumptions as possible about the state of the database. Please don't depend on GitLab-specific code since it can change in future versions. If needed copy-paste GitLab code into the migration to make it forward compatible. For GitLab.com, please take into consideration that regular migrations (under `db/migrate`) are run before [Canary is deployed](https://gitlab.com/gitlab-com/gl-infra/readiness/-/tree/master/library/canary/#configuration-and-deployment), and post-deployment migrations (`db/post_migrate`) are run after the deployment to production has finished. ## Schema Changes Changes to the schema should be committed to `db/structure.sql`. This file is automatically generated by Rails, so you normally should not edit this file by hand. If your migration is adding a column to a table, that column is added at the bottom. Please do not reorder columns manually for existing tables as this causes confusion to other people using `db/structure.sql` generated by Rails. When your local database in your GDK is diverging from the schema from `master` it might be hard to cleanly commit the schema changes to Git. In that case you can use the `scripts/regenerate-schema` script to regenerate a clean `db/structure.sql` for the migrations you're adding. This script applies all migrations found in `db/migrate` or `db/post_migrate`, so if there are any migrations you don't want to commit to the schema, rename or remove them. If your branch is not targeting `master` you can set the `TARGET` environment variable. ```shell # Regenerate schema against `master` scripts/regenerate-schema # Regenerate schema against `12-9-stable-ee` TARGET=12-9-stable-ee scripts/regenerate-schema ``` ## What Requires Downtime? The document ["What Requires Downtime?"](what_requires_downtime.md) specifies various database operations, such as - [dropping and renaming columns](what_requires_downtime.md#dropping-columns) - [changing column constraints and types](what_requires_downtime.md#changing-column-constraints) - [adding and dropping indexes, tables, and foreign keys](what_requires_downtime.md#adding-indexes) and whether they require downtime and how to work around that whenever possible. ## Downtime Tagging Every migration must specify if it requires downtime or not, and if it should require downtime it must also specify a reason for this. This is required even if 99% of the migrations don't require downtime as this makes it easier to find the migrations that _do_ require downtime. To tag a migration, add the following two constants to the migration class' body: - `DOWNTIME`: a boolean that when set to `true` indicates the migration requires downtime. - `DOWNTIME_REASON`: a String containing the reason for the migration requiring downtime. This constant **must** be set when `DOWNTIME` is set to `true`. For example: ```ruby class MyMigration < ActiveRecord::Migration[6.0] DOWNTIME = true DOWNTIME_REASON = 'This migration requires downtime because ...' def change ... end end ``` It is an error (that is, CI fails) if the `DOWNTIME` constant is missing from a migration class. ## Reversibility Your migration **must be** reversible. This is very important, as it should be possible to downgrade in case of a vulnerability or bugs. In your migration, add a comment describing how the reversibility of the migration was tested. Some migrations cannot be reversed. For example, some data migrations can't be reversed because we lose information about the state of the database before the migration. You should still create a `down` method with a comment, explaining why the changes performed by the `up` method can't be reversed, so that the migration itself can be reversed, even if the changes performed during the migration can't be reversed: ```ruby def down # no-op # comment explaining why changes performed by `up` cannot be reversed. end ``` ## Atomicity By default, migrations are single transaction. That is, a transaction is opened at the beginning of the migration, and committed after all steps are processed. Running migrations in a single transaction makes sure that if one of the steps fails, none of the steps are executed, leaving the database in valid state. Therefore, either: - Put all migrations in one single-transaction migration. - If necessary, put most actions in one migration and create a separate migration for the steps that cannot be done in a single transaction. For example, if you create an empty table and need to build an index for it, it is recommended to use a regular single-transaction migration and the default rails schema statement: [`add_index`](https://api.rubyonrails.org/v5.2/classes/ActiveRecord/ConnectionAdapters/SchemaStatements.html#method-i-add_index). This is a blocking operation, but it doesn't cause problems because the table is not yet used, and therefore it does not have any records yet. ## Heavy operations in a single transaction When using a single-transaction migration, a transaction holds a database connection for the duration of the migration, so you must make sure the actions in the migration do not take too much time: GitLab.com’s production database has a `15s` timeout, so in general, the cumulative execution time in a migration should aim to fit comfortably in that limit. Singular query timings should fit within the [standard limit](query_performance.md#timing-guidelines-for-queries) In case you need to insert, update, or delete a significant amount of data, you: - Must disable the single transaction with `disable_ddl_transaction!`. - Should consider doing it in a [Background Migration](background_migrations.md). ## Retry mechanism when acquiring database locks When changing the database schema, we use helper methods to invoke DDL (Data Definition Language) statements. In some cases, these DDL statements require a specific database lock. Example: ```ruby def change remove_column :users, :full_name, :string end ``` Executing this migration requires an exclusive lock on the `users` table. When the table is concurrently accessed and modified by other processes, acquiring the lock may take a while. The lock request is waiting in a queue and it may also block other queries on the `users` table once it has been enqueued. More information about PostgresSQL locks: [Explicit Locking](https://www.postgresql.org/docs/current/explicit-locking.html) For stability reasons, GitLab.com has a specific [`statement_timeout`](../user/gitlab_com/index.md#postgresql) set. When the migration is invoked, any database query has a fixed time to execute. In a worst-case scenario, the request sits in the lock queue, blocking other queries for the duration of the configured statement timeout, then failing with `canceling statement due to statement timeout` error. This problem could cause failed application upgrade processes and even application stability issues, since the table may be inaccessible for a short period of time. To increase the reliability and stability of database migrations, the GitLab codebase offers a helper method to retry the operations with different `lock_timeout` settings and wait time between the attempts. Multiple smaller attempts to acquire the necessary lock allow the database to process other statements. ### Examples **Removing a column:** ```ruby include Gitlab::Database::MigrationHelpers def up with_lock_retries do remove_column :users, :full_name end end def down with_lock_retries do add_column :users, :full_name, :string end end ``` **Removing a foreign key:** ```ruby include Gitlab::Database::MigrationHelpers def up with_lock_retries do remove_foreign_key :issues, :projects end end def down with_lock_retries do add_foreign_key :issues, :projects end end ``` **Changing default value for a column:** ```ruby include Gitlab::Database::MigrationHelpers def up with_lock_retries do change_column_default :merge_requests, :lock_version, from: nil, to: 0 end end def down with_lock_retries do change_column_default :merge_requests, :lock_version, from: 0, to: nil end end ``` **Creating a new table with a foreign key:** We can simply wrap the `create_table` method with `with_lock_retries`: ```ruby def up with_lock_retries do create_table :issues do |t| t.references :project, index: true, null: false, foreign_key: { on_delete: :cascade } t.string :title, limit: 255 end end end def down with_lock_retries do drop_table :issues end end ``` **Creating a new table when we have two foreign keys:** For this, we need three migrations: 1. Creating the table without foreign keys (with the indices). 1. Add foreign key to the first table. 1. Add foreign key to the second table. Creating the table: ```ruby def up create_table :imports do |t| t.bigint :project_id, null: false t.bigint :user_id, null: false t.string :jid, limit: 255 end add_index :imports, :project_id add_index :imports, :user_id end def down drop_table :imports end ``` Adding foreign key to `projects`: We can use the `add_concurrenct_foreign_key` method in this case, as this helper method has the lock retries built into it. ```ruby include Gitlab::Database::MigrationHelpers disable_ddl_transaction! def up add_concurrent_foreign_key :imports, :projects, column: :project_id, on_delete: :cascade end def down with_lock_retries do remove_foreign_key :imports, column: :project_id end end ``` Adding foreign key to `users`: ```ruby include Gitlab::Database::MigrationHelpers disable_ddl_transaction! def up add_concurrent_foreign_key :imports, :users, column: :user_id, on_delete: :cascade end def down with_lock_retries do remove_foreign_key :imports, column: :user_id end end ``` **Usage with `disable_ddl_transaction!`** Generally the `with_lock_retries` helper should work with `disable_ddl_transaction!`. A custom RuboCop rule ensures that only allowed methods can be placed within the lock retries block. ```ruby disable_ddl_transaction! def up with_lock_retries do add_column :users, :name, :text end add_text_limit :users, :name, 255 # Includes constraint validation (full table scan) end ``` The RuboCop rule generally allows standard Rails migration methods, listed below. This example causes a Rubocop offense: ```ruby disable_ddl_transaction! def up with_lock_retries do add_concurrent_index :users, :name end end ``` ### When to use the helper method The `with_lock_retries` helper method can be used when you normally use standard Rails migration helper methods. Calling more than one migration helper is not a problem if they're executed on the same table. Using the `with_lock_retries` helper method is advised when a database migration involves one of the [high-traffic tables](#high-traffic-tables). Example changes: - `add_foreign_key` / `remove_foreign_key` - `add_column` / `remove_column` - `change_column_default` - `create_table` / `drop_table` The `with_lock_retries` method **cannot** be used within the `change` method, you must manually define the `up` and `down` methods to make the migration reversible. ### How the helper method works 1. Iterate 50 times. 1. For each iteration, set a pre-configured `lock_timeout`. 1. Try to execute the given block. (`remove_column`). 1. If `LockWaitTimeout` error is raised, sleep for the pre-configured `sleep_time` and retry the block. 1. If no error is raised, the current iteration has successfully executed the block. For more information check the [`Gitlab::Database::WithLockRetries`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/with_lock_retries.rb) class. The `with_lock_retries` helper method is implemented in the [`Gitlab::Database::MigrationHelpers`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/migration_helpers.rb) module. In a worst-case scenario, the method: - Executes the block for a maximum of 50 times over 40 minutes. - Most of the time is spent in a pre-configured sleep period after each iteration. - After the 50th retry, the block is executed without `lock_timeout`, just like a standard migration invocation. - If a lock cannot be acquired, the migration fails with `statement timeout` error. The migration might fail if there is a very long running transaction (40+ minutes) accessing the `users` table. ## Multi-Threading Sometimes a migration might need to use multiple Ruby threads to speed up a migration. For this to work your migration needs to include the module `Gitlab::Database::MultiThreadedMigration`: ```ruby class MyMigration < ActiveRecord::Migration[6.0] include Gitlab::Database::MigrationHelpers include Gitlab::Database::MultiThreadedMigration end ``` You can then use the method `with_multiple_threads` to perform work in separate threads. For example: ```ruby class MyMigration < ActiveRecord::Migration[6.0] include Gitlab::Database::MigrationHelpers include Gitlab::Database::MultiThreadedMigration def up with_multiple_threads(4) do disable_statement_timeout # ... end end end ``` Here the call to `disable_statement_timeout` uses the connection local to the `with_multiple_threads` block, instead of re-using the global connection pool. This ensures each thread has its own connection object, and doesn't time out when trying to obtain one. PostgreSQL has a maximum amount of connections that it allows. This limit can vary from installation to installation. As a result, it's recommended you do not use more than 32 threads in a single migration. Usually, 4-8 threads should be more than enough. ## Removing indexes If the table is not empty when removing an index, make sure to use the method `remove_concurrent_index` instead of the regular `remove_index` method. The `remove_concurrent_index` method drops indexes concurrently, so no locking is required, and there is no need for downtime. To use this method, you must disable single-transaction mode by calling the method `disable_ddl_transaction!` in the body of your migration class like so: ```ruby class MyMigration < ActiveRecord::Migration[6.0] include Gitlab::Database::MigrationHelpers disable_ddl_transaction! INDEX_NAME = 'index_name' def up remove_concurrent_index :table_name, :column_name, name: INDEX_NAME end end ``` Note that it is not necessary to check if the index exists prior to removing it, however it is required to specify the name of the index that is being removed. This can be done either by passing the name as an option to the appropriate form of `remove_index` or `remove_concurrent_index`, or more simply by using the `remove_concurrent_index_by_name` method. Explicitly specifying the name is important to ensure the correct index is removed. For a small table (such as an empty one or one with less than `1,000` records), it is recommended to use `remove_index` in a single-transaction migration, combining it with other operations that don't require `disable_ddl_transaction!`. ### Disabling an index There are certain situations in which you might want to disable an index before removing it. See the [maintenance operations guide](database/maintenance_operations.md#disabling-an-index) for more details. ## Adding indexes Before adding an index, consider if this one is necessary. There are situations in which an index might not be required, like: - The table is small (less than `1,000` records) and it's not expected to exponentially grow in size. - Any existing indexes filter out enough rows. - The reduction in query timings after the index is added is not significant. Additionally, wide indexes are not required to match all filter criteria of queries, we just need to cover enough columns so that the index lookup has a small enough selectivity. Please review our [Adding Database indexes](adding_database_indexes.md) guide for more details. When adding an index to a non-empty table make sure to use the method `add_concurrent_index` instead of the regular `add_index` method. The `add_concurrent_index` method automatically creates concurrent indexes when using PostgreSQL, removing the need for downtime. To use this method, you must disable single-transactions mode by calling the method `disable_ddl_transaction!` in the body of your migration class like so: ```ruby class MyMigration < ActiveRecord::Migration[6.0] include Gitlab::Database::MigrationHelpers DOWNTIME = false disable_ddl_transaction! INDEX_NAME = 'index_name' def up add_concurrent_index :table, :column, name: INDEX_NAME end def down remove_concurrent_index :table, :column, name: INDEX_NAME end end ``` You must explicitly name indexes that are created with more complex definitions beyond table name, column name(s) and uniqueness constraint. Consult the [Adding Database Indexes](adding_database_indexes.md#requirements-for-naming-indexes) guide for more details. If you need to add a unique index, please keep in mind there is the possibility of existing duplicates being present in the database. This means that should always _first_ add a migration that removes any duplicates, before adding the unique index. For a small table (such as an empty one or one with less than `1,000` records), it is recommended to use `add_index` in a single-transaction migration, combining it with other operations that don't require `disable_ddl_transaction!`. ## Testing for existence of indexes If a migration requires conditional logic based on the absence or presence of an index, you must test for existence of that index using its name. This helps avoids problems with how Rails compares index definitions, which can lead to unexpected results. For more details, review the [Adding Database Indexes](adding_database_indexes.md#why-explicit-names-are-required) guide. The easiest way to test for existence of an index by name is to use the `index_name_exists?` method, but the `index_exists?` method can also be used with a name option. For example: ```ruby class MyMigration < ActiveRecord::Migration[6.0] include Gitlab::Database::MigrationHelpers INDEX_NAME = 'index_name' def up # an index must be conditionally created due to schema inconsistency unless index_exists?(:table_name, :column_name, name: INDEX_NAME) add_index :table_name, :column_name, name: INDEX_NAME end end def down # no op end end ``` Keep in mind that concurrent index helpers like `add_concurrent_index`, `remove_concurrent_index`, and `remove_concurrent_index_by_name` already perform existence checks internally. ## Adding foreign-key constraints When adding a foreign-key constraint to either an existing or a new column also remember to add an index on the column. This is **required** for all foreign-keys, e.g., to support efficient cascading deleting: when a lot of rows in a table get deleted, the referenced records need to be deleted too. The database has to look for corresponding records in the referenced table. Without an index, this results in a sequential scan on the table, which can take a long time. Here's an example where we add a new column with a foreign key constraint. Note it includes `index: true` to create an index for it. ```ruby class Migration < ActiveRecord::Migration[6.0] def change add_reference :model, :other_model, index: true, foreign_key: { on_delete: :cascade } end end ``` When adding a foreign-key constraint to an existing column in a non-empty table, we have to employ `add_concurrent_foreign_key` and `add_concurrent_index` instead of `add_reference`. If you have a new or empty table that doesn't reference a [high-traffic table](#high-traffic-tables), we recommend that you use `add_reference` in a single-transaction migration. You can combine it with other operations that don't require `disable_ddl_transaction!`. You can read more about adding [foreign key constraints to an existing column](database/add_foreign_key_to_existing_column.md). ## `NOT NULL` constraints > [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/38358) in GitLab 13.0. See the style guide on [`NOT NULL` constraints](database/not_null_constraints.md) for more information. ## Adding Columns With Default Values With PostgreSQL 11 being the minimum version in GitLab 13.0 and later, adding columns with default values has become much easier and the standard `add_column` helper should be used in all cases. Before PostgreSQL 11, adding a column with a default was problematic as it would have caused a full table rewrite. The corresponding helper `add_column_with_default` has been deprecated and is scheduled to be removed in a later release. If a backport adding a column with a default value is needed for %12.9 or earlier versions, it should use `add_column_with_default` helper. If a [large table](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L3) is involved, backporting to %12.9 is contraindicated. ## Changing the column default One might think that changing a default column with `change_column_default` is an expensive and disruptive operation for larger tables, but in reality it's not. Take the following migration as an example: ```ruby class DefaultRequestAccessGroups < ActiveRecord::Migration[5.2] DOWNTIME = false def change change_column_default(:namespaces, :request_access_enabled, from: false, to: true) end end ``` Migration above changes the default column value of one of our largest tables: `namespaces`. This can be translated to: ```sql ALTER TABLE namespaces ALTER COLUMN request_access_enabled SET DEFAULT false ``` In this particular case, the default value exists and we're just changing the metadata for `request_access_enabled` column, which does not imply a rewrite of all the existing records in the `namespaces` table. Only when creating a new column with a default, all the records are going be rewritten. NOTE: A faster [ALTER TABLE ADD COLUMN with a non-null default](https://www.depesz.com/2018/04/04/waiting-for-postgresql-11-fast-alter-table-add-column-with-a-non-null-default/) was introduced on PostgresSQL 11.0, removing the need of rewriting the table when a new column with a default value is added. For the reasons mentioned above, it's safe to use `change_column_default` in a single-transaction migration without requiring `disable_ddl_transaction!`. ## Updating an existing column To update an existing column to a particular value, you can use `update_column_in_batches`. This splits the updates into batches, so we don't update too many rows at in a single statement. This updates the column `foo` in the `projects` table to 10, where `some_column` is `'hello'`: ```ruby update_column_in_batches(:projects, :foo, 10) do |table, query| query.where(table[:some_column].eq('hello')) end ``` If a computed update is needed, the value can be wrapped in `Arel.sql`, so Arel treats it as an SQL literal. It's also a required deprecation for [Rails 6](https://gitlab.com/gitlab-org/gitlab/-/issues/28497). The below example is the same as the one above, but the value is set to the product of the `bar` and `baz` columns: ```ruby update_value = Arel.sql('bar * baz') update_column_in_batches(:projects, :foo, update_value) do |table, query| query.where(table[:some_column].eq('hello')) end ``` Like `add_column_with_default`, there is a RuboCop cop to detect usage of this on large tables. In the case of `update_column_in_batches`, it may be acceptable to run on a large table, as long as it is only updating a small subset of the rows in the table, but do not ignore that without validating on the GitLab.com staging environment - or asking someone else to do so for you - beforehand. ## Dropping a database table Dropping a database table is uncommon, and the `drop_table` method provided by Rails is generally considered safe. Before dropping the table, please consider the following: If your table has foreign keys on a [high-traffic table](#high-traffic-tables) (like `projects`), then the `DROP TABLE` statement is likely to stall concurrent traffic until it fails with **statement timeout** error. Table **has no records** (feature was never in use) and **no foreign keys**: - Simply use the `drop_table` method in your migration. ```ruby def change drop_table :my_table end ``` Table **has records** but **no foreign keys**: - First release: Remove the application code related to the table, such as models, controllers and services. - Second release: Use the `drop_table` method in your migration. ```ruby def up drop_table :my_table end def down # create_table ... end ``` Table **has foreign keys**: - First release: Remove the application code related to the table, such as models, controllers, and services. - Second release: Remove the foreign keys using the `with_lock_retries` helper method. Use `drop_table` in another migration file. **Migrations for the second release:** Removing the foreign key on the `projects` table: ```ruby # first migration file def up with_lock_retries do remove_foreign_key :my_table, :projects end end def down with_lock_retries do add_foreign_key :my_table, :projects end end ``` Dropping the table: ```ruby # second migration file def up drop_table :my_table end def down # create_table ... end ``` ## Integer column type By default, an integer column can hold up to a 4-byte (32-bit) number. That is a max value of 2,147,483,647. Be aware of this when creating a column that holds file sizes in byte units. If you are tracking file size in bytes, this restricts the maximum file size to just over 2GB. To allow an integer column to hold up to an 8-byte (64-bit) number, explicitly set the limit to 8-bytes. This allows the column to hold a value up to `9,223,372,036,854,775,807`. Rails migration example: ```ruby add_column(:projects, :foo, :integer, default: 10, limit: 8) ``` ## Strings and the Text data type > [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/30453) in GitLab 13.0. See the [text data type](database/strings_and_the_text_data_type.md) style guide for more information. ## Timestamp column type By default, Rails uses the `timestamp` data type that stores timestamp data without timezone information. The `timestamp` data type is used by calling either the `add_timestamps` or the `timestamps` method. Also, Rails converts the `:datetime` data type to the `timestamp` one. Example: ```ruby # timestamps create_table :users do |t| t.timestamps end # add_timestamps def up add_timestamps :users end # :datetime def up add_column :users, :last_sign_in, :datetime end ``` Instead of using these methods, one should use the following methods to store timestamps with timezones: - `add_timestamps_with_timezone` - `timestamps_with_timezone` - `datetime_with_timezone` This ensures all timestamps have a time zone specified. This, in turn, means existing timestamps don't suddenly use a different timezone when the system's timezone changes. It also makes it very clear which timezone was used in the first place. ## Storing JSON in database The Rails 5 natively supports `JSONB` (binary JSON) column type. Example migration adding this column: ```ruby class AddOptionsToBuildMetadata < ActiveRecord::Migration[5.0] DOWNTIME = false def change add_column :ci_builds_metadata, :config_options, :jsonb end end ``` You have to use a serializer to provide a translation layer: ```ruby class BuildMetadata serialize :config_options, Serializers::JSON # rubocop:disable Cop/ActiveRecordSerialize end ``` When using a `JSONB` column, use the [JsonSchemaValidator](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/validators/json_schema_validator.rb) to keep control of the data being inserted over time. ```ruby class BuildMetadata validates :config_options, json_schema: { filename: 'build_metadata_config_option' } end ``` ## Testing See the [Testing Rails migrations](testing_guide/testing_migrations_guide.md) style guide. ## Data migration Please prefer Arel and plain SQL over usual ActiveRecord syntax. In case of using plain SQL, you need to quote all input manually with `quote_string` helper. Example with Arel: ```ruby users = Arel::Table.new(:users) users.group(users[:user_id]).having(users[:id].count.gt(5)) #update other tables with these results ``` Example with plain SQL and `quote_string` helper: ```ruby select_all("SELECT name, COUNT(id) as cnt FROM tags GROUP BY name HAVING COUNT(id) > 1").each do |tag| tag_name = quote_string(tag["name"]) duplicate_ids = select_all("SELECT id FROM tags WHERE name = '#{tag_name}'").map{|tag| tag["id"]} origin_tag_id = duplicate_ids.first duplicate_ids.delete origin_tag_id execute("UPDATE taggings SET tag_id = #{origin_tag_id} WHERE tag_id IN(#{duplicate_ids.join(",")})") execute("DELETE FROM tags WHERE id IN(#{duplicate_ids.join(",")})") end ``` If you need more complex logic, you can define and use models local to a migration. For example: ```ruby class MyMigration < ActiveRecord::Migration[6.0] class Project < ActiveRecord::Base self.table_name = 'projects' end def up # Reset the column information of all the models that update the database # to ensure the Active Record's knowledge of the table structure is current Project.reset_column_information # ... ... end end ``` When doing so be sure to explicitly set the model's table name, so it's not derived from the class name or namespace. Be aware of the limitations [when using models in migrations](#using-models-in-migrations-discouraged). ### Renaming reserved paths When a new route for projects is introduced, it could conflict with any existing records. The path for these records should be renamed, and the related data should be moved on disk. Since we had to do this a few times already, there are now some helpers to help with this. To use this you can include `Gitlab::Database::RenameReservedPathsMigration::V1` in your migration. This provides 3 methods which you can pass one or more paths that need to be rejected. - **`rename_root_paths`**: Renames the path of all _namespaces_ with the given name that don't have a `parent_id`. - **`rename_child_paths`**: Renames the path of all _namespaces_ with the given name that have a `parent_id`. - **`rename_wildcard_paths`**: Renames the path of all _projects_, and all _namespaces_ that have a `project_id`. The `path` column for these rows are renamed to their previous value followed by an integer. For example: `users` would turn into `users0` ## Using models in migrations (discouraged) The use of models in migrations is generally discouraged. As such models are [contraindicated for background migrations](background_migrations.md#isolation), the model needs to be declared in the migration. If using a model in the migrations, you should first [clear the column cache](https://api.rubyonrails.org/classes/ActiveRecord/ModelSchema/ClassMethods.html#method-i-reset_column_information) using `reset_column_information`. This avoids problems where a column that you are using was altered and cached in a previous migration. ### Example: Add a column `my_column` to the users table It is important not to leave out the `User.reset_column_information` command, in order to ensure that the old schema is dropped from the cache and ActiveRecord loads the updated schema information. ```ruby class AddAndSeedMyColumn < ActiveRecord::Migration[6.0] class User < ActiveRecord::Base self.table_name = 'users' end def up User.count # Any ActiveRecord calls on the model that caches the column information. add_column :users, :my_column, :integer, default: 1 User.reset_column_information # The old schema is dropped from the cache. User.find_each do |user| user.my_column = 42 if some_condition # ActiveRecord sees the correct schema here. user.save! end end end ``` The underlying table is modified and then accessed via ActiveRecord. Note that this also needs to be used if the table is modified in a previous, different migration, if both migrations are run in the same `db:migrate` process. This results in the following. Note the inclusion of `my_column`: ```shell == 20200705232821 AddAndSeedMyColumn: migrating ============================== D, [2020-07-06T00:37:12.483876 #130101] DEBUG -- : (0.2ms) BEGIN D, [2020-07-06T00:37:12.521660 #130101] DEBUG -- : (0.4ms) SELECT COUNT(*) FROM "user" -- add_column(:users, :my_column, :integer, {:default=>1}) D, [2020-07-06T00:37:12.523309 #130101] DEBUG -- : (0.8ms) ALTER TABLE "users" ADD "my_column" integer DEFAULT 1 -> 0.0016s D, [2020-07-06T00:37:12.650641 #130101] DEBUG -- : AddAndSeedMyColumn::User Load (0.7ms) SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT $1 [["LIMIT", 1000]] D, [2020-07-18T00:41:26.851769 #459802] DEBUG -- : AddAndSeedMyColumn::User Update (1.1ms) UPDATE "users" SET "my_column" = $1, "updated_at" = $2 WHERE "users"."id" = $3 [["my_column", 42], ["updated_at", "2020-07-17 23:41:26.849044"], ["id", 1]] D, [2020-07-06T00:37:12.653648 #130101] DEBUG -- : ↳ config/initializers/config_initializers_active_record_locking.rb:13:in `_update_row' == 20200705232821 AddAndSeedMyColumn: migrated (0.1706s) ===================== ``` If you skip clearing the schema cache (`User.reset_column_information`), the column is not used by ActiveRecord and the intended changes are not made, leading to the result below, where `my_column` is missing from the query. ```shell == 20200705232821 AddAndSeedMyColumn: migrating ============================== D, [2020-07-06T00:37:12.483876 #130101] DEBUG -- : (0.2ms) BEGIN D, [2020-07-06T00:37:12.521660 #130101] DEBUG -- : (0.4ms) SELECT COUNT(*) FROM "user" -- add_column(:users, :my_column, :integer, {:default=>1}) D, [2020-07-06T00:37:12.523309 #130101] DEBUG -- : (0.8ms) ALTER TABLE "users" ADD "my_column" integer DEFAULT 1 -> 0.0016s D, [2020-07-06T00:37:12.650641 #130101] DEBUG -- : AddAndSeedMyColumn::User Load (0.7ms) SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT $1 [["LIMIT", 1000]] D, [2020-07-06T00:37:12.653459 #130101] DEBUG -- : AddAndSeedMyColumn::User Update (0.5ms) UPDATE "users" SET "updated_at" = $1 WHERE "users"."id" = $2 [["updated_at", "2020-07-05 23:37:12.652297"], ["id", 1]] D, [2020-07-06T00:37:12.653648 #130101] DEBUG -- : ↳ config/initializers/config_initializers_active_record_locking.rb:13:in `_update_row' == 20200705232821 AddAndSeedMyColumn: migrated (0.1706s) ===================== ``` ## High traffic tables Here's a list of current [high-traffic tables](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml). Determining what tables are high-traffic can be difficult. Self-managed instances might use different features of GitLab with different usage patterns, thus making assumptions based on GitLab.com not enough. To identify a high-traffic table for GitLab.com the following measures are considered. Note that the metrics linked here are GitLab-internal only: - [Read operations](https://thanos.gitlab.net/graph?g0.range_input=2h&g0.max_source_resolution=0s&g0.expr=topk(500%2C%20sum%20by%20(relname)%20(rate(pg_stat_user_tables_seq_tup_read%7Benvironment%3D%22gprd%22%7D%5B12h%5D)%20%2B%20rate(pg_stat_user_tables_idx_scan%7Benvironment%3D%22gprd%22%7D%5B12h%5D)%20%2B%20rate(pg_stat_user_tables_idx_tup_fetch%7Benvironment%3D%22gprd%22%7D%5B12h%5D)))&g0.tab=1) - [Number of records](https://thanos.gitlab.net/graph?g0.range_input=2h&g0.max_source_resolution=0s&g0.expr=topk(500%2C%20sum%20by%20(relname)%20(rate(pg_stat_user_tables_n_live_tup%7Benvironment%3D%22gprd%22%7D%5B12h%5D)))&g0.tab=1) - [Size](https://thanos.gitlab.net/graph?g0.range_input=2h&g0.max_source_resolution=0s&g0.expr=topk(500%2C%20sum%20by%20(relname)%20(rate(pg_total_relation_size_bytes%7Benvironment%3D%22gprd%22%7D%5B12h%5D)))&g0.tab=1) is greater than 10 GB Any table which has some high read operation compared to current [high-traffic tables](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L4) might be a good candidate.