Merge branch 'database-guides' into 'master'

Guide about what requires downtime ## What does this MR do? This MR adds a guide describing various SQL operations and whether they need downtime or not. ## Are there points in the code the reviewer needs to double check? Spalling and styling of the text mostly. ## Why was this MR needed? Developers aren't always aware of the impact of certain operations and the documentation of MySQL and PostgreSQL can be quite confusing at times. ## Screenshots (if relevant) ![screenshot](/uploads/d8afd4bd3755d26e4786dfafecfa9368/screenshot.png) <img src="https://emoji.slack-edge.com/T02592416/trollface/8c0ac4ae98.png" width="60" /> ## Does this MR meet the acceptance criteria? - [x] ~~[CHANGELOG](https://gitlab.com/gitlab-org/gitlab-ce/blob/master/CHANGELOG) entry added~~ - [x] [Documentation created/updated](https://gitlab.com/gitlab-org/gitlab-ce/blob/master/doc/development/doc_styleguide.md) - Tests - [ ] All builds are passing - [ ] Conform by the [style guides](https://gitlab.com/gitlab-org/gitlab-ce/blob/master/CONTRIBUTING.md#style-guides) - [ ] Branch has no merge conflicts with `master` (if you do - rebase it please) - [x] [Squashed related commits together](https://git-scm.com/book/en/Git-Tools-Rewriting-History#Squashing-Commits) See merge request !5672
2016-08-05 12:49:48 +00:00 · 2016-08-05 12:49:48 +00:00 · eee958616e
commit eee958616e
parent 7b4279984c c462dcec4d
2 changed files with 154 additions and 0 deletions
--- a/doc/development/README.md
+++ b/doc/development/README.md
@ -31,6 +31,7 @@
 - [Rake tasks](rake_tasks.md) for development
 - [Shell commands](shell_commands.md) in the GitLab codebase
 - [Sidekiq debugging](sidekiq_debugging.md)
 - [What requires downtime?](what_requires_downtime.md)
 ## Compliance
--- a/doc/development/what_requires_downtime.md
+++ b/doc/development/what_requires_downtime.md
@ -0,0 +1,153 @@
 # What requires downtime?
 When working with a database certain operations can be performed without taking
 GitLab offline, others do require a downtime period. This guide describes
 various operations and their impact.
 ## Adding Columns
 On PostgreSQL you can safely add a new column to an existing table as long as it
 does **not** have a default value. For example, this query would not require
 downtime:
 ```sql
 ALTER TABLE projects ADD COLUMN random_value int;
 ```
 Add a column _with_ a default however does require downtime. For example,
 consider this query:
 ```sql
 ALTER TABLE projects ADD COLUMN random_value int DEFAULT 42;
 ```
 This requires updating every single row in the `projects` table so that
 `random_value` is set to `42` by default. This requires updating all rows and
 indexes in a table. This in turn acquires enough locks on the table for it to
 effectively block any other queries.
 As of MySQL 5.6 adding a column to a table is still quite an expensive
 operation, even when using `ALGORITHM=INPLACE` and `LOCK=NONE`. This means
 downtime _may_ be required when modifying large tables as otherwise the
 operation could potentially take hours to complete.
 ## Dropping Columns
 On PostgreSQL you can safely remove an existing column without the need for
 downtime. When you drop a column in PostgreSQL it's not immediately removed,
 instead it is simply disabled. The data is removed on the next vacuum run.
 On MySQL this operation requires downtime.
 While database wise dropping a column may be fine on PostgreSQL this operation
 still requires downtime because the application code may still be using the
 column that was removed. For example, consider the following migration:
 ```ruby
 class MyMigration < ActiveRecord::Migration
  def change
    remove_column :projects, :dummy
  end
 end
 ```
 Now imagine that the GitLab instance is running and actively uses the `dummy`
 column. If we were to run the migration this would result in the GitLab instance
 producing errors whenever it tries to use the `dummy` column.
 As a result of the above downtime _is_ required when removing a column, even
 when using PostgreSQL.
 ## Changing Column Constraints
 Generally changing column constraints requires checking all rows in the table to
 see if they meet the new constraint, unless a constraint is _removed_. For
 example, changing a column that previously allowed NULL values to not allow NULL
 values requires the database to verify all existing rows.
 The specific behaviour varies a bit between databases but in general the safest
 approach is to assume changing constraints requires downtime.
 ## Changing Column Types
 This operation requires downtime.
 ## Adding Indexes
 Adding indexes is an expensive process that blocks INSERT and UPDATE queries for
 the duration. When using PostgreSQL one can work arounds this by using the
 `CONCURRENTLY` option:
 ```sql
 CREATE INDEX CONCURRENTLY index_name ON projects (column_name);
 ```
 Migrations can take advantage of this by using the method
 `add_concurrent_index`. For example:
 ```ruby
 class MyMigration < ActiveRecord::Migration
  def change
    add_concurrent_index :projects, :column_name
  end
 end
 ```
 When running this on PostgreSQL the `CONCURRENTLY` option mentioned above is
 used. On MySQL this method produces a regular `CREATE INDEX` query.
 MySQL doesn't really have a workaround for this. Supposedly it _can_ create
 indexes without the need for downtime but only for variable width columns. The
 details on this are a bit sketchy. Since it's better to be safe than sorry one
 should assume that adding indexes requires downtime on MySQL.
 ## Dropping Indexes
 Dropping an index does not require downtime on both PostgreSQL and MySQL.
 ## Adding Tables
 This operation is safe as there's no code using the table just yet.
 ## Dropping Tables
 This operation requires downtime as application code may still be using the
 table.
 ## Adding Foreign Keys
 Adding foreign keys acquires an exclusive lock on both the source and target
 tables in PostgreSQL. This requires downtime as otherwise the entire application
 grinds to a halt for the duration of the operation.
 On MySQL this operation also requires downtime _unless_ foreign key checks are
 disabled. Because this means checks aren't enforced this is not ideal, as such
 one should assume MySQL also requires downtime.
 ## Removing Foreign Keys
 This operation should not require downtime on both PostgreSQL and MySQL.
 ## Updating Data
 Updating data should generally be safe. The exception to this is data that's
 being migrated from one version to another while the application still produces
 data in the old version.
 For example, imagine the application writes the string `'dog'` to a column but
 it really is meant to write `'cat'` instead. One might think that the following
 migration is all that is needed to solve this problem:
 ```ruby
 class MyMigration < ActiveRecord::Migration
  def up
    execute("UPDATE some_table SET column = 'cat' WHERE column = 'dog';")
  end
 end
 ```
 Unfortunately this is not enough. Because the application is still running and
 using the old value this may result in the table still containing rows where
 `column` is set to `dog`, even after the migration finished.
 In these cases downtime _is_ required, even for rarely updated tables.