2019-06-03 11:24:13 -04:00
|
|
|
**DO NOT READ THIS FILE ON GITHUB, GUIDES ARE PUBLISHED ON https://guides.rubyonrails.org.**
|
|
|
|
|
|
|
|
Multiple Databases with Active Record
|
|
|
|
=====================================
|
|
|
|
|
|
|
|
This guide covers using multiple databases with your Rails application.
|
|
|
|
|
|
|
|
After reading this guide you will know:
|
|
|
|
|
2019-08-09 17:30:15 -04:00
|
|
|
* How to set up your application for multiple databases.
|
2019-06-03 11:24:13 -04:00
|
|
|
* How automatic connection switching works.
|
2020-01-24 09:01:32 -05:00
|
|
|
* How to use horizontal sharding for multiple databases.
|
2021-04-02 13:51:47 -04:00
|
|
|
* How to migrate from `legacy_connection_handling` to the new connection handling.
|
2019-06-03 11:24:13 -04:00
|
|
|
* What features are supported and what's still a work in progress.
|
|
|
|
|
|
|
|
--------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
As an application grows in popularity and usage you'll need to scale the application
|
|
|
|
to support your new users and their data. One way in which your application may need
|
|
|
|
to scale is on the database level. Rails now has support for multiple databases
|
|
|
|
so you don't have to store your data all in one place.
|
|
|
|
|
|
|
|
At this time the following features are supported:
|
|
|
|
|
2020-09-02 14:15:20 -04:00
|
|
|
* Multiple writer databases and a replica for each
|
2019-06-03 11:24:13 -04:00
|
|
|
* Automatic connection switching for the model you're working with
|
2020-09-02 14:15:20 -04:00
|
|
|
* Automatic swapping between the writer and replica depending on the HTTP verb
|
2019-06-03 11:24:13 -04:00
|
|
|
and recent writes
|
|
|
|
* Rails tasks for creating, dropping, migrating, and interacting with the multiple
|
|
|
|
databases
|
|
|
|
|
|
|
|
The following features are not (yet) supported:
|
|
|
|
|
2020-01-24 09:01:32 -05:00
|
|
|
* Automatic swapping for horizontal sharding
|
2019-06-03 11:24:13 -04:00
|
|
|
* Load balancing replicas
|
2019-06-14 14:27:52 -04:00
|
|
|
* Dumping schema caches for multiple databases
|
2019-06-03 11:24:13 -04:00
|
|
|
|
|
|
|
## Setting up your application
|
|
|
|
|
|
|
|
While Rails tries to do most of the work for you there are still some steps you'll
|
|
|
|
need to do to get your application ready for multiple databases.
|
|
|
|
|
2020-09-02 14:15:20 -04:00
|
|
|
Let's say we have an application with a single writer database and we need to add a
|
2019-06-03 11:24:13 -04:00
|
|
|
new database for some new tables we're adding. The name of the new database will be
|
|
|
|
"animals".
|
|
|
|
|
2019-08-16 08:41:38 -04:00
|
|
|
The `database.yml` looks like this:
|
2019-06-03 11:24:13 -04:00
|
|
|
|
|
|
|
```yaml
|
|
|
|
production:
|
|
|
|
database: my_primary_database
|
2021-03-01 16:52:59 -05:00
|
|
|
adapter: mysql2
|
2021-02-27 12:05:18 -05:00
|
|
|
username: root
|
|
|
|
password: <%= ENV['ROOT_PASSWORD'] %>
|
2019-06-03 11:24:13 -04:00
|
|
|
```
|
|
|
|
|
2020-09-02 14:15:20 -04:00
|
|
|
Let's add a replica for the first configuration, and a second database called animals and a
|
|
|
|
replica for that as well. To do this we need to change our `database.yml` from a 2-tier
|
|
|
|
to a 3-tier config.
|
|
|
|
|
2021-03-24 23:36:20 -04:00
|
|
|
If a primary configuration is provided, it will be used as the "default" configuration. If
|
|
|
|
there is no configuration named `"primary"`, Rails will use the first configuration as default
|
|
|
|
for each environment. The default configurations will use the default Rails filenames. For example,
|
|
|
|
primary configurations will use `schema.rb` for the schema file, whereas all the other entries
|
2020-09-02 14:15:20 -04:00
|
|
|
will use `[CONFIGURATION_NAMESPACE]_schema.rb` for the filename.
|
2019-06-03 11:24:13 -04:00
|
|
|
|
|
|
|
```yaml
|
|
|
|
production:
|
|
|
|
primary:
|
|
|
|
database: my_primary_database
|
2021-02-27 12:05:18 -05:00
|
|
|
username: root
|
|
|
|
password: <%= ENV['ROOT_PASSWORD'] %>
|
2021-03-01 16:52:59 -05:00
|
|
|
adapter: mysql2
|
2019-06-03 11:24:13 -04:00
|
|
|
primary_replica:
|
|
|
|
database: my_primary_database
|
2021-02-27 12:05:18 -05:00
|
|
|
username: root_readonly
|
|
|
|
password: <%= ENV['ROOT_READONLY_PASSWORD'] %>
|
2021-03-01 16:52:59 -05:00
|
|
|
adapter: mysql2
|
2019-06-03 11:24:13 -04:00
|
|
|
replica: true
|
|
|
|
animals:
|
|
|
|
database: my_animals_database
|
2021-02-27 12:05:18 -05:00
|
|
|
username: animals_root
|
|
|
|
password: <%= ENV['ANIMALS_ROOT_PASSWORD'] %>
|
2021-03-01 16:52:59 -05:00
|
|
|
adapter: mysql2
|
2019-06-03 11:24:13 -04:00
|
|
|
migrations_paths: db/animals_migrate
|
|
|
|
animals_replica:
|
|
|
|
database: my_animals_database
|
2021-02-27 12:05:18 -05:00
|
|
|
username: animals_readonly
|
|
|
|
password: <%= ENV['ANIMALS_READONLY_PASSWORD'] %>
|
2021-03-01 16:52:59 -05:00
|
|
|
adapter: mysql2
|
2019-06-03 11:24:13 -04:00
|
|
|
replica: true
|
|
|
|
```
|
|
|
|
|
2021-03-24 23:36:20 -04:00
|
|
|
When using multiple databases, there are a few important settings.
|
2019-06-03 11:24:13 -04:00
|
|
|
|
2020-09-02 14:15:20 -04:00
|
|
|
First, the database name for the `primary` and `primary_replica` should be the same because they contain
|
|
|
|
the same data. This is also the case for `animals` and `animals_replica`.
|
|
|
|
|
|
|
|
Second, the username for the writers and replicas should be different, and the
|
|
|
|
replica user's permissions should be set to only read and not write.
|
2019-06-03 11:24:13 -04:00
|
|
|
|
2021-03-24 23:36:20 -04:00
|
|
|
When using a replica database, you need to add a `replica: true` entry to the replica in the
|
2019-06-03 11:24:13 -04:00
|
|
|
`database.yml`. This is because Rails otherwise has no way of knowing which one is a replica
|
2020-09-02 14:15:20 -04:00
|
|
|
and which one is the writer.
|
2019-06-03 11:24:13 -04:00
|
|
|
|
2021-03-24 23:36:20 -04:00
|
|
|
Lastly, for new writer databases, you need to set the `migrations_paths` to the directory
|
2019-06-03 11:24:13 -04:00
|
|
|
where you will store migrations for that database. We'll look more at `migrations_paths`
|
|
|
|
later on in this guide.
|
|
|
|
|
2020-07-30 08:30:16 -04:00
|
|
|
Now that we have a new database, let's set up the connection model. In order to use the
|
|
|
|
new database we need to create a new abstract class and connect to the animals databases.
|
2019-06-03 11:24:13 -04:00
|
|
|
|
|
|
|
```ruby
|
2020-07-30 08:30:16 -04:00
|
|
|
class AnimalsRecord < ApplicationRecord
|
2019-06-03 11:24:13 -04:00
|
|
|
self.abstract_class = true
|
|
|
|
|
|
|
|
connects_to database: { writing: :animals, reading: :animals_replica }
|
|
|
|
end
|
|
|
|
```
|
2020-01-26 13:50:47 -05:00
|
|
|
|
2020-10-29 16:24:32 -04:00
|
|
|
Then we need to update `ApplicationRecord` to be aware of our new replica.
|
2019-06-03 11:24:13 -04:00
|
|
|
|
|
|
|
```ruby
|
|
|
|
class ApplicationRecord < ActiveRecord::Base
|
|
|
|
self.abstract_class = true
|
|
|
|
|
|
|
|
connects_to database: { writing: :primary, reading: :primary_replica }
|
|
|
|
end
|
|
|
|
```
|
|
|
|
|
2021-02-06 14:12:51 -05:00
|
|
|
If you use a differently named class for your application record you need to
|
|
|
|
set `primary_abstract_class` instead, so that Rails knows which class `ActiveRecord::Base`
|
2021-01-25 10:52:56 -05:00
|
|
|
should share a connection with.
|
|
|
|
|
|
|
|
```
|
|
|
|
class PrimaryApplicationRecord < ActiveRecord::Base
|
|
|
|
self.primary_abstract_class
|
|
|
|
end
|
|
|
|
```
|
|
|
|
|
|
|
|
Classes that connect to primary/primary_replica can inherit from your primary abstract
|
|
|
|
class like standard Rails applications:
|
2020-10-29 16:24:32 -04:00
|
|
|
|
|
|
|
```ruby
|
|
|
|
class Person < ApplicationRecord
|
|
|
|
end
|
|
|
|
```
|
|
|
|
|
2019-06-03 11:24:13 -04:00
|
|
|
By default Rails expects the database roles to be `writing` and `reading` for the primary
|
|
|
|
and replica respectively. If you have a legacy system you may already have roles set up that
|
|
|
|
you don't want to change. In that case you can set a new role name in your application config.
|
|
|
|
|
|
|
|
```ruby
|
|
|
|
config.active_record.writing_role = :default
|
|
|
|
config.active_record.reading_role = :readonly
|
|
|
|
```
|
|
|
|
|
2019-06-14 14:27:52 -04:00
|
|
|
It's important to connect to your database in a single model and then inherit from that model
|
|
|
|
for the tables rather than connect multiple individual models to the same database. Database
|
|
|
|
clients have a limit to the number of open connections there can be and if you do this it will
|
|
|
|
multiply the number of connections you have since Rails uses the model class name for the
|
|
|
|
connection specification name.
|
|
|
|
|
2021-03-24 23:36:20 -04:00
|
|
|
Now that we have the `database.yml` and the new model set up, it's time to create the databases.
|
2019-06-03 11:24:13 -04:00
|
|
|
Rails 6.0 ships with all the rails tasks you need to use multiple databases in Rails.
|
|
|
|
|
2020-03-21 17:37:54 -04:00
|
|
|
You can run `bin/rails -T` to see all the commands you're able to run. You should see the following:
|
2019-06-03 11:24:13 -04:00
|
|
|
|
2019-12-17 03:53:54 -05:00
|
|
|
```bash
|
2020-03-21 17:37:54 -04:00
|
|
|
$ bin/rails -T
|
2019-06-03 11:24:13 -04:00
|
|
|
rails db:create # Creates the database from DATABASE_URL or config/database.yml for the ...
|
|
|
|
rails db:create:animals # Create animals database for current environment
|
|
|
|
rails db:create:primary # Create primary database for current environment
|
|
|
|
rails db:drop # Drops the database from DATABASE_URL or config/database.yml for the cu...
|
|
|
|
rails db:drop:animals # Drop animals database for current environment
|
|
|
|
rails db:drop:primary # Drop primary database for current environment
|
|
|
|
rails db:migrate # Migrate the database (options: VERSION=x, VERBOSE=false, SCOPE=blog)
|
|
|
|
rails db:migrate:animals # Migrate animals database for current environment
|
|
|
|
rails db:migrate:primary # Migrate primary database for current environment
|
|
|
|
rails db:migrate:status # Display status of migrations
|
|
|
|
rails db:migrate:status:animals # Display status of migrations for animals database
|
|
|
|
rails db:migrate:status:primary # Display status of migrations for primary database
|
2020-04-05 10:34:39 -04:00
|
|
|
rails db:rollback # Rolls the schema back to the previous version (specify steps w/ STEP=n)
|
|
|
|
rails db:rollback:animals # Rollback animals database for current environment (specify steps w/ STEP=n)
|
|
|
|
rails db:rollback:primary # Rollback primary database for current environment (specify steps w/ STEP=n)
|
2020-05-28 15:36:05 -04:00
|
|
|
rails db:schema:dump # Creates a database schema file (either db/schema.rb or db/structure.sql ...
|
|
|
|
rails db:schema:dump:animals # Creates a database schema file (either db/schema.rb or db/structure.sql ...
|
2020-02-13 14:16:03 -05:00
|
|
|
rails db:schema:dump:primary # Creates a db/schema.rb file that is portable against any DB supported ...
|
2020-05-28 15:36:05 -04:00
|
|
|
rails db:schema:load # Loads a database schema file (either db/schema.rb or db/structure.sql ...
|
|
|
|
rails db:schema:load:animals # Loads a database schema file (either db/schema.rb or db/structure.sql ...
|
|
|
|
rails db:schema:load:primary # Loads a database schema file (either db/schema.rb or db/structure.sql ...
|
2019-06-03 11:24:13 -04:00
|
|
|
```
|
|
|
|
|
2020-03-21 17:37:54 -04:00
|
|
|
Running a command like `bin/rails db:create` will create both the primary and animals databases.
|
2021-03-24 23:36:20 -04:00
|
|
|
Note that there is no command for creating the database users, and you'll need to do that manually
|
2019-06-03 11:24:13 -04:00
|
|
|
to support the readonly users for your replicas. If you want to create just the animals
|
2020-03-21 17:37:54 -04:00
|
|
|
database you can run `bin/rails db:create:animals`.
|
2019-06-03 11:24:13 -04:00
|
|
|
|
2020-12-21 17:03:50 -05:00
|
|
|
## Generators and Migrations
|
2019-06-03 11:24:13 -04:00
|
|
|
|
|
|
|
Migrations for multiple databases should live in their own folders prefixed with the
|
|
|
|
name of the database key in the configuration.
|
|
|
|
|
|
|
|
You also need to set the `migrations_paths` in the database configurations to tell Rails
|
|
|
|
where to find the migrations.
|
|
|
|
|
2020-01-26 13:50:47 -05:00
|
|
|
For example the `animals` database would look for migrations in the `db/animals_migrate` directory and
|
2019-06-03 11:24:13 -04:00
|
|
|
`primary` would look in `db/migrate`. Rails generators now take a `--database` option
|
|
|
|
so that the file is generated in the correct directory. The command can be run like so:
|
|
|
|
|
2019-12-17 03:53:54 -05:00
|
|
|
```bash
|
2019-01-22 03:53:47 -05:00
|
|
|
$ bin/rails generate migration CreateDogs name:string --database animals
|
2019-06-03 11:24:13 -04:00
|
|
|
```
|
|
|
|
|
2020-07-30 08:30:16 -04:00
|
|
|
If you are using Rails generators, the scaffold and model generators will create the abstract
|
2021-03-24 23:36:20 -04:00
|
|
|
class for you. Simply pass the database key to the command line.
|
2020-07-30 08:30:16 -04:00
|
|
|
|
|
|
|
```bash
|
2020-08-01 05:54:09 -04:00
|
|
|
$ bin/rails generate scaffold Dog name:string --database animals
|
2020-07-30 08:30:16 -04:00
|
|
|
```
|
|
|
|
|
|
|
|
A class with the database name and `Record` will be created. In this example
|
|
|
|
the database is `Animals` so we end up with `AnimalsRecord`:
|
|
|
|
|
|
|
|
```ruby
|
|
|
|
class AnimalsRecord < ApplicationRecord
|
|
|
|
self.abstract_class = true
|
|
|
|
|
|
|
|
connects_to database: { writing: :animals }
|
|
|
|
end
|
|
|
|
```
|
|
|
|
|
|
|
|
The generated model will automatically inherit from `AnimalsRecord`.
|
|
|
|
|
|
|
|
```ruby
|
|
|
|
class Dog < AnimalsRecord
|
|
|
|
end
|
|
|
|
```
|
|
|
|
|
|
|
|
Note: Since Rails doesn't know which database is the replica for your writer you will need to
|
|
|
|
add this to the abstract class after you're done.
|
|
|
|
|
|
|
|
Rails will only generate the new class once. It will not be overwritten by new scaffolds
|
|
|
|
or deleted if the scaffold is deleted.
|
|
|
|
|
2021-03-24 23:36:20 -04:00
|
|
|
If you already have an abstract class and its name differs from `AnimalsRecord`, you can pass
|
2020-07-30 08:30:16 -04:00
|
|
|
the `--parent` option to indicate you want a different abstract class:
|
|
|
|
|
|
|
|
```bash
|
2020-08-01 05:54:09 -04:00
|
|
|
$ bin/rails generate scaffold Dog name:string --database animals --parent Animals::Record
|
2020-07-30 08:30:16 -04:00
|
|
|
```
|
|
|
|
|
|
|
|
This will skip generating `AnimalsRecord` since you've indicated to Rails that you want to
|
|
|
|
use a different parent class.
|
|
|
|
|
2019-06-03 11:24:13 -04:00
|
|
|
## Activating automatic connection switching
|
|
|
|
|
2021-03-24 23:36:20 -04:00
|
|
|
Finally, in order to use the read-only replica in your application, you'll need to activate
|
2019-06-03 11:24:13 -04:00
|
|
|
the middleware for automatic switching.
|
|
|
|
|
2020-09-02 14:15:20 -04:00
|
|
|
Automatic switching allows the application to switch from the writer to replica or replica
|
2021-03-24 23:36:20 -04:00
|
|
|
to writer based on the HTTP verb and whether there was a recent write by the requesting user.
|
2019-06-03 11:24:13 -04:00
|
|
|
|
|
|
|
If the application is receiving a POST, PUT, DELETE, or PATCH request the application will
|
2020-09-02 14:15:20 -04:00
|
|
|
automatically write to the writer database. For the specified time after the write, the
|
|
|
|
application will read from the primary. For a GET or HEAD request the application will read
|
|
|
|
from the replica unless there was a recent write.
|
2019-06-03 11:24:13 -04:00
|
|
|
|
|
|
|
To activate the automatic connection switching middleware, add or uncomment the following
|
|
|
|
lines in your application config.
|
|
|
|
|
|
|
|
```ruby
|
|
|
|
config.active_record.database_selector = { delay: 2.seconds }
|
|
|
|
config.active_record.database_resolver = ActiveRecord::Middleware::DatabaseSelector::Resolver
|
|
|
|
config.active_record.database_resolver_context = ActiveRecord::Middleware::DatabaseSelector::Resolver::Session
|
|
|
|
```
|
|
|
|
|
|
|
|
Rails guarantees "read your own write" and will send your GET or HEAD request to the
|
2020-09-02 14:15:20 -04:00
|
|
|
writer if it's within the `delay` window. By default the delay is set to 2 seconds. You
|
2019-06-03 11:24:13 -04:00
|
|
|
should change this based on your database infrastructure. Rails doesn't guarantee "read
|
|
|
|
a recent write" for other users within the delay window and will send GET and HEAD requests
|
|
|
|
to the replicas unless they wrote recently.
|
|
|
|
|
2019-06-06 03:56:25 -04:00
|
|
|
The automatic connection switching in Rails is relatively primitive and deliberately doesn't
|
2020-01-26 13:50:47 -05:00
|
|
|
do a whole lot. The goal is a system that demonstrates how to do automatic connection
|
2019-06-03 11:24:13 -04:00
|
|
|
switching that was flexible enough to be customizable by app developers.
|
|
|
|
|
|
|
|
The setup in Rails allows you to easily change how the switching is done and what
|
|
|
|
parameters it's based on. Let's say you want to use a cookie instead of a session to
|
|
|
|
decide when to swap connections. You can write your own class:
|
|
|
|
|
|
|
|
```ruby
|
|
|
|
class MyCookieResolver
|
|
|
|
# code for your cookie class
|
|
|
|
end
|
|
|
|
```
|
|
|
|
|
|
|
|
And then pass it to the middleware:
|
|
|
|
|
|
|
|
```ruby
|
|
|
|
config.active_record.database_selector = { delay: 2.seconds }
|
|
|
|
config.active_record.database_resolver = ActiveRecord::Middleware::DatabaseSelector::Resolver
|
2019-06-06 03:56:25 -04:00
|
|
|
config.active_record.database_resolver_context = MyCookieResolver
|
2019-06-03 11:24:13 -04:00
|
|
|
```
|
|
|
|
|
|
|
|
## Using manual connection switching
|
|
|
|
|
2020-09-02 14:15:20 -04:00
|
|
|
There are some cases where you may want your application to connect to a writer or a replica
|
2019-06-03 11:24:13 -04:00
|
|
|
and the automatic connection switching isn't adequate. For example, you may know that for a
|
|
|
|
particular request you always want to send the request to a replica, even when you are in a
|
|
|
|
POST request path.
|
|
|
|
|
|
|
|
To do this Rails provides a `connected_to` method that will switch to the connection you
|
|
|
|
need.
|
|
|
|
|
|
|
|
```ruby
|
|
|
|
ActiveRecord::Base.connected_to(role: :reading) do
|
|
|
|
# all code in this block will be connected to the reading role
|
|
|
|
end
|
|
|
|
```
|
|
|
|
|
|
|
|
The "role" in the `connected_to` call looks up the connections that are connected on that
|
|
|
|
connection handler (or role). The `reading` connection handler will hold all the connections
|
|
|
|
that were connected via `connects_to` with the role name of `reading`.
|
|
|
|
|
|
|
|
Note that `connected_to` with a role will look up an existing connection and switch
|
|
|
|
using the connection specification name. This means that if you pass an unknown role
|
2019-06-06 03:56:25 -04:00
|
|
|
like `connected_to(role: :nonexistent)` you will get an error that says
|
2020-11-13 06:27:04 -05:00
|
|
|
`ActiveRecord::ConnectionNotEstablished (No connection pool for 'ActiveRecord::Base' found for the 'nonexistent' role.)`
|
2019-06-03 11:24:13 -04:00
|
|
|
|
2020-01-24 09:01:32 -05:00
|
|
|
## Horizontal sharding
|
|
|
|
|
|
|
|
Horizontal sharding is when you split up your database to reduce the number of rows on each
|
|
|
|
database server, but maintain the same schema across "shards". This is commonly called "multi-tenant"
|
|
|
|
sharding.
|
|
|
|
|
|
|
|
The API for supporting horizontal sharding in Rails is similar to the multiple database / vertical
|
|
|
|
sharding API that's existed since Rails 6.0.
|
|
|
|
|
|
|
|
Shards are declared in the three-tier config like this:
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
production:
|
|
|
|
primary:
|
|
|
|
database: my_primary_database
|
2021-03-01 16:52:59 -05:00
|
|
|
adapter: mysql2
|
2020-01-24 09:01:32 -05:00
|
|
|
primary_replica:
|
|
|
|
database: my_primary_database
|
2021-03-01 16:52:59 -05:00
|
|
|
adapter: mysql2
|
2020-01-24 09:01:32 -05:00
|
|
|
replica: true
|
|
|
|
primary_shard_one:
|
|
|
|
database: my_primary_shard_one
|
2021-03-01 16:52:59 -05:00
|
|
|
adapter: mysql2
|
2020-01-24 09:01:32 -05:00
|
|
|
primary_shard_one_replica:
|
|
|
|
database: my_primary_shard_one
|
2021-03-01 16:52:59 -05:00
|
|
|
adapter: mysql2
|
2020-01-24 09:01:32 -05:00
|
|
|
replica: true
|
|
|
|
```
|
|
|
|
|
|
|
|
Models are then connected with the `connects_to` API via the `shards` key:
|
|
|
|
|
|
|
|
```ruby
|
|
|
|
class ApplicationRecord < ActiveRecord::Base
|
|
|
|
self.abstract_class = true
|
|
|
|
|
|
|
|
connects_to shards: {
|
|
|
|
default: { writing: :primary, reading: :primary_replica },
|
|
|
|
shard_one: { writing: :primary_shard_one, reading: :primary_shard_one_replica }
|
|
|
|
}
|
|
|
|
end
|
2020-04-18 17:57:43 -04:00
|
|
|
```
|
2020-01-24 09:01:32 -05:00
|
|
|
|
2020-09-16 16:38:16 -04:00
|
|
|
Then models can swap connections manually via the `connected_to` API. If
|
2021-03-24 23:36:20 -04:00
|
|
|
using sharding, both a `role` and a `shard` must be passed:
|
2020-01-24 09:01:32 -05:00
|
|
|
|
|
|
|
```ruby
|
2020-09-16 16:38:16 -04:00
|
|
|
ActiveRecord::Base.connected_to(role: :writing, shard: :default) do
|
2020-11-03 07:42:02 -05:00
|
|
|
@id = Person.create! # Creates a record in shard default
|
2020-01-24 09:01:32 -05:00
|
|
|
end
|
|
|
|
|
2020-09-16 16:38:16 -04:00
|
|
|
ActiveRecord::Base.connected_to(role: :writing, shard: :shard_one) do
|
2020-11-03 07:42:02 -05:00
|
|
|
Person.find(@id) # Can't find record, doesn't exist because it was created
|
|
|
|
# in the default shard
|
2020-01-24 09:01:32 -05:00
|
|
|
end
|
|
|
|
```
|
|
|
|
|
|
|
|
The horizontal sharding API also supports read replicas. You can swap the
|
|
|
|
role and the shard with the `connected_to` API.
|
|
|
|
|
|
|
|
```ruby
|
|
|
|
ActiveRecord::Base.connected_to(role: :reading, shard: :shard_one) do
|
2020-11-03 07:42:02 -05:00
|
|
|
Person.first # Lookup record from read replica of shard one
|
2020-10-29 16:24:32 -04:00
|
|
|
end
|
|
|
|
```
|
|
|
|
|
2021-04-02 13:51:47 -04:00
|
|
|
## Migrate to the new connection handling
|
|
|
|
|
|
|
|
In Rails 6.1+, Active Record provides a new internal API for connection management.
|
|
|
|
In most cases applications will not need to make any changes except to opt-in to the
|
|
|
|
new behavior (if upgrading from 6.0 and below) by setting
|
|
|
|
`config.active_record.legacy_connection_handling = false`. If you have a single database
|
|
|
|
application, no other changes will be required. If you have a multiple database application
|
|
|
|
the following changes are required if you application is using these methods:
|
|
|
|
|
|
|
|
* `connection_handlers` and `connection_handlers=` no longer works in the new connection
|
|
|
|
handling. If you were calling a method on one of the connection handlers, for example,
|
|
|
|
`connection_handlers[:reading].retrieve_connection_pool("ActiveRecord::Base")`
|
|
|
|
you will now need to update that call to be
|
|
|
|
`connection_handlers.retrieve_connection_pool("ActiveRecord::Base", role: :reading)`.
|
|
|
|
* Calls to `ActiveRecord::Base.connection_handler.prevent_writes` will need to be updated
|
|
|
|
to `ActiveRecord::Base.connection.preventing_writes?`.
|
|
|
|
* If you need all the pools, including writing and reading, a new method has been provided on
|
|
|
|
the handler. Call `connection_handler.all_connection_pools` to use this. In most cases though
|
|
|
|
you'll want writing or reading pools with `connection_handler.connection_pool_list(:writing)` or
|
|
|
|
`connection_handler.connection_pool_list(:reading)`.
|
|
|
|
* If you turn off `legacy_connection_handling` in your application, any method that's unsupported
|
2021-04-11 13:30:55 -04:00
|
|
|
will raise an error (i.e. `connection_handlers=`).
|
2021-04-02 13:51:47 -04:00
|
|
|
|
2020-10-29 16:24:32 -04:00
|
|
|
## Granular Database Connection Switching
|
|
|
|
|
|
|
|
In Rails 6.1 it's possible to switch connections for one database instead of
|
|
|
|
all databases globally. To use this feature you must first set
|
|
|
|
`config.active_record.legacy_connection_handling` to `false` in your application
|
|
|
|
configuration. The majority of applications should not need to make any other
|
2021-04-02 13:51:47 -04:00
|
|
|
changes since the public APIs have the same behavior. See the above section for
|
|
|
|
how to enable and migrate away from `legacy_connection_handling`.
|
2020-10-29 16:24:32 -04:00
|
|
|
|
2021-03-24 23:36:20 -04:00
|
|
|
With `legacy_connection_handling` set to `false`, any abstract connection class
|
2020-10-29 16:24:32 -04:00
|
|
|
will be able to switch connections without affecting other connections. This
|
|
|
|
is useful for switching your `AnimalsRecord` queries to read from the replica
|
|
|
|
while ensuring your `ApplicationRecord` queries go to the primary.
|
|
|
|
|
|
|
|
```ruby
|
|
|
|
AnimalsRecord.connected_to(role: :reading) do
|
|
|
|
Dog.first # Reads from animals_replica
|
|
|
|
Person.first # Reads from primary
|
2020-01-24 09:01:32 -05:00
|
|
|
end
|
|
|
|
```
|
|
|
|
|
2020-10-29 16:24:32 -04:00
|
|
|
It's also possible to swap connections granularly for shards.
|
|
|
|
|
|
|
|
```ruby
|
|
|
|
AnimalsRecord.connected_to(role: :reading, shard: :shard_one) do
|
2020-11-03 08:42:35 -05:00
|
|
|
Dog.first # Will read from shard_one_replica. If no connection exists for shard_one_replica,
|
|
|
|
# a ConnectionNotEstablished error will be raised
|
2020-10-29 16:24:32 -04:00
|
|
|
Person.first # Will read from primary writer
|
|
|
|
end
|
|
|
|
```
|
|
|
|
|
|
|
|
To switch only the primary database cluster use `ApplicationRecord`:
|
|
|
|
|
|
|
|
```ruby
|
|
|
|
ApplicationRecord.connected_to(role: :reading, shard: :shard_one) do
|
|
|
|
Person.first # Reads from primary_shard_one_replica
|
|
|
|
Dog.first # Reads from animals_primary
|
|
|
|
end
|
|
|
|
```
|
|
|
|
|
|
|
|
`ActiveRecord::Base.connected_to` maintains the ability to switch
|
|
|
|
connections globally.
|
|
|
|
|
Add option to skip joins for associations.
In a multiple database application, associations can't join across
databases. When set, this option tells Rails to make 2 or more queries
rather than using joins for associations.
Set the option on a has many through association:
```ruby
class Dog
has_many :treats, through: :humans, disable_joins: true
has_many :humans
end
```
Then instead of generating join SQL, two queries are used for `@dog.treats`:
```
SELECT "humans"."id" FROM "humans" WHERE "humans"."dog_id" = ? [["dog_id", 1]]
SELECT "treats".* FROM "treats" WHERE "treats"."human_id" IN (?, ?, ?) [["human_id", 1], ["human_id", 2], ["human_id", 3]]
```
This code is extracted from a gem we use internally at GitHub which
means the implementation here is used in production daily and isn't
experimental.
I often get the question "why can't Rails do this automatically" so I
figured I'd include the answer in the commit. Rails can't do this
automatically because associations are lazily loaded. `dog.treats` needs
to load `Dog`, then `Human` and then `Treats`. When `dog.treats` is
called Rails pre-generates the SQL that will be run and puts that
information into a reflection object. Because the SQL parts are pre-generated,
as soon as `dog.treats` is loaded it's too late to skip a join. The join
is already available on the object and that join is what's run to load
`treats` from `dog` through `humans`. I think the only way to avoid setting
an option on the association is to rewrite how and when the SQL is
generated for associations which is a large undertaking. Basically the
way that Active Record associations are designed, it is currently
impossible to have Rails figure out to not join (loading the association
will cause the join to occur, and that join will raise an error if the
models don't live in the same db).
The original implementation was written by me and Aaron. Lee helped port
over tests, and I refactored the extraction to better match Rails style.
Co-authored-by: Lee Quarella <leequarella@gmail.com>
Co-authored-by: Aaron Patterson <aaron@rubyonrails.org>
2020-11-03 13:01:41 -05:00
|
|
|
### Handling associations with joins across databases
|
|
|
|
|
|
|
|
As of Rails 7.0+, Active Record has an option for handling associations that would perform
|
2021-04-26 12:49:15 -04:00
|
|
|
a join across multiple databases. If you have a has many through or a has one through association
|
|
|
|
that you want to disable joining and perform 2 or more queries, pass the `disable_joins: true` option.
|
Add option to skip joins for associations.
In a multiple database application, associations can't join across
databases. When set, this option tells Rails to make 2 or more queries
rather than using joins for associations.
Set the option on a has many through association:
```ruby
class Dog
has_many :treats, through: :humans, disable_joins: true
has_many :humans
end
```
Then instead of generating join SQL, two queries are used for `@dog.treats`:
```
SELECT "humans"."id" FROM "humans" WHERE "humans"."dog_id" = ? [["dog_id", 1]]
SELECT "treats".* FROM "treats" WHERE "treats"."human_id" IN (?, ?, ?) [["human_id", 1], ["human_id", 2], ["human_id", 3]]
```
This code is extracted from a gem we use internally at GitHub which
means the implementation here is used in production daily and isn't
experimental.
I often get the question "why can't Rails do this automatically" so I
figured I'd include the answer in the commit. Rails can't do this
automatically because associations are lazily loaded. `dog.treats` needs
to load `Dog`, then `Human` and then `Treats`. When `dog.treats` is
called Rails pre-generates the SQL that will be run and puts that
information into a reflection object. Because the SQL parts are pre-generated,
as soon as `dog.treats` is loaded it's too late to skip a join. The join
is already available on the object and that join is what's run to load
`treats` from `dog` through `humans`. I think the only way to avoid setting
an option on the association is to rewrite how and when the SQL is
generated for associations which is a large undertaking. Basically the
way that Active Record associations are designed, it is currently
impossible to have Rails figure out to not join (loading the association
will cause the join to occur, and that join will raise an error if the
models don't live in the same db).
The original implementation was written by me and Aaron. Lee helped port
over tests, and I refactored the extraction to better match Rails style.
Co-authored-by: Lee Quarella <leequarella@gmail.com>
Co-authored-by: Aaron Patterson <aaron@rubyonrails.org>
2020-11-03 13:01:41 -05:00
|
|
|
|
|
|
|
For example:
|
|
|
|
|
|
|
|
```ruby
|
|
|
|
class Dog < AnimalsRecord
|
|
|
|
has_many :treats, through: :humans, disable_joins: true
|
|
|
|
has_many :humans
|
2021-04-26 12:49:15 -04:00
|
|
|
|
|
|
|
belongs_to :home
|
|
|
|
has_one :yard, through: :home, disable_joins: true
|
Add option to skip joins for associations.
In a multiple database application, associations can't join across
databases. When set, this option tells Rails to make 2 or more queries
rather than using joins for associations.
Set the option on a has many through association:
```ruby
class Dog
has_many :treats, through: :humans, disable_joins: true
has_many :humans
end
```
Then instead of generating join SQL, two queries are used for `@dog.treats`:
```
SELECT "humans"."id" FROM "humans" WHERE "humans"."dog_id" = ? [["dog_id", 1]]
SELECT "treats".* FROM "treats" WHERE "treats"."human_id" IN (?, ?, ?) [["human_id", 1], ["human_id", 2], ["human_id", 3]]
```
This code is extracted from a gem we use internally at GitHub which
means the implementation here is used in production daily and isn't
experimental.
I often get the question "why can't Rails do this automatically" so I
figured I'd include the answer in the commit. Rails can't do this
automatically because associations are lazily loaded. `dog.treats` needs
to load `Dog`, then `Human` and then `Treats`. When `dog.treats` is
called Rails pre-generates the SQL that will be run and puts that
information into a reflection object. Because the SQL parts are pre-generated,
as soon as `dog.treats` is loaded it's too late to skip a join. The join
is already available on the object and that join is what's run to load
`treats` from `dog` through `humans`. I think the only way to avoid setting
an option on the association is to rewrite how and when the SQL is
generated for associations which is a large undertaking. Basically the
way that Active Record associations are designed, it is currently
impossible to have Rails figure out to not join (loading the association
will cause the join to occur, and that join will raise an error if the
models don't live in the same db).
The original implementation was written by me and Aaron. Lee helped port
over tests, and I refactored the extraction to better match Rails style.
Co-authored-by: Lee Quarella <leequarella@gmail.com>
Co-authored-by: Aaron Patterson <aaron@rubyonrails.org>
2020-11-03 13:01:41 -05:00
|
|
|
end
|
|
|
|
```
|
|
|
|
|
2021-04-26 12:49:15 -04:00
|
|
|
Previously calling `@dog.treats` without `disable_joins` or `@dog.yard` without `disable_joins`
|
|
|
|
would raise an error because databases are unable to handle joins across clusters. With the
|
|
|
|
`disable_joins` option, Rails will generate multiple select queries
|
|
|
|
to avoid attempting joining across clusters. For the above association, `@dog.treats` would generate the
|
Add option to skip joins for associations.
In a multiple database application, associations can't join across
databases. When set, this option tells Rails to make 2 or more queries
rather than using joins for associations.
Set the option on a has many through association:
```ruby
class Dog
has_many :treats, through: :humans, disable_joins: true
has_many :humans
end
```
Then instead of generating join SQL, two queries are used for `@dog.treats`:
```
SELECT "humans"."id" FROM "humans" WHERE "humans"."dog_id" = ? [["dog_id", 1]]
SELECT "treats".* FROM "treats" WHERE "treats"."human_id" IN (?, ?, ?) [["human_id", 1], ["human_id", 2], ["human_id", 3]]
```
This code is extracted from a gem we use internally at GitHub which
means the implementation here is used in production daily and isn't
experimental.
I often get the question "why can't Rails do this automatically" so I
figured I'd include the answer in the commit. Rails can't do this
automatically because associations are lazily loaded. `dog.treats` needs
to load `Dog`, then `Human` and then `Treats`. When `dog.treats` is
called Rails pre-generates the SQL that will be run and puts that
information into a reflection object. Because the SQL parts are pre-generated,
as soon as `dog.treats` is loaded it's too late to skip a join. The join
is already available on the object and that join is what's run to load
`treats` from `dog` through `humans`. I think the only way to avoid setting
an option on the association is to rewrite how and when the SQL is
generated for associations which is a large undertaking. Basically the
way that Active Record associations are designed, it is currently
impossible to have Rails figure out to not join (loading the association
will cause the join to occur, and that join will raise an error if the
models don't live in the same db).
The original implementation was written by me and Aaron. Lee helped port
over tests, and I refactored the extraction to better match Rails style.
Co-authored-by: Lee Quarella <leequarella@gmail.com>
Co-authored-by: Aaron Patterson <aaron@rubyonrails.org>
2020-11-03 13:01:41 -05:00
|
|
|
following SQL:
|
|
|
|
|
|
|
|
```sql
|
|
|
|
SELECT "humans"."id" FROM "humans" WHERE "humans"."dog_id" = ? [["dog_id", 1]]
|
|
|
|
SELECT "treats".* FROM "treats" WHERE "treats"."human_id" IN (?, ?, ?) [["human_id", 1], ["human_id", 2], ["human_id", 3]]
|
|
|
|
```
|
|
|
|
|
2021-04-26 12:49:15 -04:00
|
|
|
While `@dog.yard` would generate the following SQL:
|
|
|
|
|
|
|
|
```sql
|
|
|
|
SELECT "home"."id" FROM "homes" WHERE "homes"."dog_id" = ? [["dog_id", 1]]
|
|
|
|
SELECT "yards".* FROM "yards" WHERE "yards"."home_id" = ? [["home_id", 1]]
|
|
|
|
```
|
|
|
|
|
Add option to skip joins for associations.
In a multiple database application, associations can't join across
databases. When set, this option tells Rails to make 2 or more queries
rather than using joins for associations.
Set the option on a has many through association:
```ruby
class Dog
has_many :treats, through: :humans, disable_joins: true
has_many :humans
end
```
Then instead of generating join SQL, two queries are used for `@dog.treats`:
```
SELECT "humans"."id" FROM "humans" WHERE "humans"."dog_id" = ? [["dog_id", 1]]
SELECT "treats".* FROM "treats" WHERE "treats"."human_id" IN (?, ?, ?) [["human_id", 1], ["human_id", 2], ["human_id", 3]]
```
This code is extracted from a gem we use internally at GitHub which
means the implementation here is used in production daily and isn't
experimental.
I often get the question "why can't Rails do this automatically" so I
figured I'd include the answer in the commit. Rails can't do this
automatically because associations are lazily loaded. `dog.treats` needs
to load `Dog`, then `Human` and then `Treats`. When `dog.treats` is
called Rails pre-generates the SQL that will be run and puts that
information into a reflection object. Because the SQL parts are pre-generated,
as soon as `dog.treats` is loaded it's too late to skip a join. The join
is already available on the object and that join is what's run to load
`treats` from `dog` through `humans`. I think the only way to avoid setting
an option on the association is to rewrite how and when the SQL is
generated for associations which is a large undertaking. Basically the
way that Active Record associations are designed, it is currently
impossible to have Rails figure out to not join (loading the association
will cause the join to occur, and that join will raise an error if the
models don't live in the same db).
The original implementation was written by me and Aaron. Lee helped port
over tests, and I refactored the extraction to better match Rails style.
Co-authored-by: Lee Quarella <leequarella@gmail.com>
Co-authored-by: Aaron Patterson <aaron@rubyonrails.org>
2020-11-03 13:01:41 -05:00
|
|
|
There are some important things to be aware of with this option:
|
|
|
|
|
|
|
|
1) There may be performance implications since now two or more queries will be performed (depending
|
|
|
|
on the association) rather than a join. If the select for `humans` returned a high number of IDs
|
|
|
|
the select for `treats` may send too many IDs.
|
2021-04-26 12:49:15 -04:00
|
|
|
2) Since we are no longer performing joins, a query with an order or limit is now sorted in-memory since
|
Add option to skip joins for associations.
In a multiple database application, associations can't join across
databases. When set, this option tells Rails to make 2 or more queries
rather than using joins for associations.
Set the option on a has many through association:
```ruby
class Dog
has_many :treats, through: :humans, disable_joins: true
has_many :humans
end
```
Then instead of generating join SQL, two queries are used for `@dog.treats`:
```
SELECT "humans"."id" FROM "humans" WHERE "humans"."dog_id" = ? [["dog_id", 1]]
SELECT "treats".* FROM "treats" WHERE "treats"."human_id" IN (?, ?, ?) [["human_id", 1], ["human_id", 2], ["human_id", 3]]
```
This code is extracted from a gem we use internally at GitHub which
means the implementation here is used in production daily and isn't
experimental.
I often get the question "why can't Rails do this automatically" so I
figured I'd include the answer in the commit. Rails can't do this
automatically because associations are lazily loaded. `dog.treats` needs
to load `Dog`, then `Human` and then `Treats`. When `dog.treats` is
called Rails pre-generates the SQL that will be run and puts that
information into a reflection object. Because the SQL parts are pre-generated,
as soon as `dog.treats` is loaded it's too late to skip a join. The join
is already available on the object and that join is what's run to load
`treats` from `dog` through `humans`. I think the only way to avoid setting
an option on the association is to rewrite how and when the SQL is
generated for associations which is a large undertaking. Basically the
way that Active Record associations are designed, it is currently
impossible to have Rails figure out to not join (loading the association
will cause the join to occur, and that join will raise an error if the
models don't live in the same db).
The original implementation was written by me and Aaron. Lee helped port
over tests, and I refactored the extraction to better match Rails style.
Co-authored-by: Lee Quarella <leequarella@gmail.com>
Co-authored-by: Aaron Patterson <aaron@rubyonrails.org>
2020-11-03 13:01:41 -05:00
|
|
|
order from one table cannot be applied to another table.
|
2021-04-26 12:49:15 -04:00
|
|
|
3) This setting must be added to all associations where you want joining to be disabled.
|
Add option to skip joins for associations.
In a multiple database application, associations can't join across
databases. When set, this option tells Rails to make 2 or more queries
rather than using joins for associations.
Set the option on a has many through association:
```ruby
class Dog
has_many :treats, through: :humans, disable_joins: true
has_many :humans
end
```
Then instead of generating join SQL, two queries are used for `@dog.treats`:
```
SELECT "humans"."id" FROM "humans" WHERE "humans"."dog_id" = ? [["dog_id", 1]]
SELECT "treats".* FROM "treats" WHERE "treats"."human_id" IN (?, ?, ?) [["human_id", 1], ["human_id", 2], ["human_id", 3]]
```
This code is extracted from a gem we use internally at GitHub which
means the implementation here is used in production daily and isn't
experimental.
I often get the question "why can't Rails do this automatically" so I
figured I'd include the answer in the commit. Rails can't do this
automatically because associations are lazily loaded. `dog.treats` needs
to load `Dog`, then `Human` and then `Treats`. When `dog.treats` is
called Rails pre-generates the SQL that will be run and puts that
information into a reflection object. Because the SQL parts are pre-generated,
as soon as `dog.treats` is loaded it's too late to skip a join. The join
is already available on the object and that join is what's run to load
`treats` from `dog` through `humans`. I think the only way to avoid setting
an option on the association is to rewrite how and when the SQL is
generated for associations which is a large undertaking. Basically the
way that Active Record associations are designed, it is currently
impossible to have Rails figure out to not join (loading the association
will cause the join to occur, and that join will raise an error if the
models don't live in the same db).
The original implementation was written by me and Aaron. Lee helped port
over tests, and I refactored the extraction to better match Rails style.
Co-authored-by: Lee Quarella <leequarella@gmail.com>
Co-authored-by: Aaron Patterson <aaron@rubyonrails.org>
2020-11-03 13:01:41 -05:00
|
|
|
Rails can't guess this for you because association loading is lazy, to load `treats` in `@dog.treats`
|
|
|
|
Rails already needs to know what SQL should be generated.
|
|
|
|
|
2019-06-03 11:24:13 -04:00
|
|
|
## Caveats
|
|
|
|
|
2020-01-24 09:01:32 -05:00
|
|
|
### Automatic swapping for horizontal sharding
|
2019-06-14 14:27:52 -04:00
|
|
|
|
2020-01-24 09:01:32 -05:00
|
|
|
While Rails now supports an API for connecting to and swapping connections of shards, it does
|
|
|
|
not yet support an automatic swapping strategy. Any shard swapping will need to be done manually
|
|
|
|
in your app via a middleware or `around_action`.
|
2019-06-03 11:24:13 -04:00
|
|
|
|
2019-06-14 14:27:52 -04:00
|
|
|
### Load Balancing Replicas
|
|
|
|
|
2019-06-03 11:24:13 -04:00
|
|
|
Rails also doesn't support automatic load balancing of replicas. This is very
|
2019-06-04 11:12:52 -04:00
|
|
|
dependent on your infrastructure. We may implement basic, primitive load balancing
|
2019-06-03 11:24:13 -04:00
|
|
|
in the future, but for an application at scale this should be something your application
|
|
|
|
handles outside of Rails.
|
|
|
|
|
2019-06-14 14:27:52 -04:00
|
|
|
### Schema Cache
|
|
|
|
|
2021-03-24 23:36:20 -04:00
|
|
|
If you use a schema cache and multiple databases, you'll need to write an initializer
|
2019-06-14 14:27:52 -04:00
|
|
|
that loads the schema cache from your app. This wasn't an issue we could resolve in
|
|
|
|
time for Rails 6.0 but hope to have it in a future version soon.
|