1
0
Fork 0
mirror of https://github.com/rails/rails.git synced 2022-11-09 12:12:34 -05:00

Clarify Active Record Encryption docs

I read quite a bit of this code the other day, so it made sense for me
to read through the guide as well.

While reading through it, I looked for opportunities to increase clarity
and simplify things. I also fixed a few typos!

Co-authored-by: Gannon McGibbon <gannon@hey.com>
This commit is contained in:
Jacob Herrington 2021-11-22 22:25:26 -06:00 committed by GitHub
parent 05ec88cdb2
commit 85b2f7f2bd

View file

@ -5,7 +5,7 @@ Active Record Encryption
This guide covers encrypting your database information using Active Record.
After reading this guide you will know:
After reading this guide, you will know:
* How to set up database encryption with Active Record.
* How to migrate unencrypted data
@ -15,21 +15,21 @@ After reading this guide you will know:
--------------------------------------------------------------------------------
Active Record supports application-level encryption. It works by declaring which attributes should be encrypted and seamlessly encrypting and decrypting them when necessary. The encryption layer is placed between the database and the application. The application will access unencrypted data but the database will store it encrypted.
Active Record supports application-level encryption. It works by declaring which attributes should be encrypted and seamlessly encrypting and decrypting them when necessary. The encryption layer sits between the database and the application. The application will access unencrypted data, but the database will store it encrypted.
## Why Encrypt Data at the Application Level?
Active Record Encryption is meant to protect sensitive information in your application. A typical example is personal information from customers. But why would you want to do this if, for example, you are already encrypting your database at rest?
Active Record Encryption exists to protect sensitive information in your application. A typical example is personally identifiable information from users. But why would you want application-level encryption if you are already encrypting your database at rest?
As an immediate practical benefit, encrypting sensitive attributes adds an additional security layer. For example, if an attacker gained access to your database, a snapshot of it, or your application logs, they wouldn't be able to make sense of the encrypted information. And even without thinking about malicious actors, checking application logs for legit reasons shouldn't expose personal information from customers either.
As an immediate practical benefit, encrypting sensitive attributes adds an additional security layer. For example, if an attacker gained access to your database, a snapshot of it, or your application logs, they wouldn't be able to make sense of the encrypted information. Additionally, encryption can prevent developers from unintentionally exposing users' sensitive data in application logs.
But more importantly, by using Active Record Encryption, you define what constitutes sensitive information in your application at the code level. This enables controlling how this information is accessed and building services around it. As examples, think about auditable Rails consoles that protect encrypted data or check the built-in system to [filter controller params automatically](#filtering-params-named-as-encrypted-columns).
But more importantly, by using Active Record Encryption, you define what constitutes sensitive information in your application at the code level. Active Record Encryption enables granular control of data access in your application and services consuming data from your application. For example, consider auditable Rails consoles that protect encrypted data or check the built-in system to [filter controller params automatically](#filtering-params-named-as-encrypted-columns).
## Basic Usage
### Setup
First, you need to add some keys to your [rails credentials](/security.html#custom-credentials). Run `bin/rails db:encryption:init` to generate a random key set:
First, you need to add some keys to your [Rails credentials](/security.html#custom-credentials). Run `bin/rails db:encryption:init` to generate a random key set:
```bash
$ bin/rails db:encryption:init
@ -41,7 +41,7 @@ active_record_encryption:
key_derivation_salt: xEY0dt6TZcAMg52K7O84wYzkjvbA62Hz
```
NOTE: These generated keys and salt are 32 bytes length. If you generate these yourself, the minimum lengths you should use are 12 bytes for the primary key (this will be used to derive the AES 32 bytes key) and 20 bytes for the salt.
NOTE: These generated values are 32 bytes in length. If you generate these yourself, the minimum lengths you should use are 12 bytes for the primary key (this will be used to derive the AES 32 bytes key) and 20 bytes for the salt.
### Declaration of Encrypted Attributes
@ -53,26 +53,24 @@ class Article < ApplicationRecord
end
````
The library will transparently encrypt these attributes before saving them into the database, and will decrypt them when retrieving their values:
The library will transparently encrypt these attributes before saving them in the database and will decrypt them upon retrieval:
```ruby
article = Article.create title: "Encrypt it all!"
article.title # => "Encrypt it all!"
```
But, under the hood, the executed SQL would look like this:
But, under the hood, the executed SQL looks like this:
```sql
INSERT INTO `articles` (`title`) VALUES ('{\"p\":\"n7J0/ol+a7DRMeaE\",\"h\":{\"iv\":\"DXZMDWUKfp3bg/Yu\",\"at\":\"X1/YjMHbHD4talgF9dt61A==\"}}')
```
Encryption takes additional space in the column. You can estimate the worst-case overload in around 250 bytes when the built-in envelope encryption key provider is used. For medium and large text columns this overload is negligible, but for `string` columns of 255 bytes, you should increase their limit accordingly (510 is recommended).
NOTE: The reason for the additional space are Base 64 encoding and additional metadata stored with the encrypted values.
Because Base 64 encoding and metadata are stored with the values, encryption requires extra space in the column. You can estimate the worst-case overload at around 250 bytes when the built-in envelope encryption key provider is used. This overload is negligible for medium and large text columns, but for `string` columns of 255 bytes, you should increase their limit accordingly (510 bytes is recommended).
### Deterministic and Non-deterministic Encryption
By default, Active Record Encryption uses a non-deterministic approach to encryption. This means that encrypting the same content with the same password twice will result in different ciphertexts. This is good for security, since it makes crypto-analysis of encrypted content much harder, but it makes querying the database impossible.
By default, Active Record Encryption uses a non-deterministic approach to encryption. Non-deterministic, in this context, means that encrypting the same content with the same password twice will result in different ciphertexts. This approach improves security by making crypto-analysis of ciphertexts harder, and querying the database impossible.
You can use the `deterministic:` option to generate initialization vectors in a deterministic way, effectively enabling querying encrypted data.
@ -84,11 +82,11 @@ end
Author.find_by_email("some@email.com") # You can query the model normally
```
The recommendation is using the default (non deterministic) unless you need to query the data.
The non-deterministic approach is recommended unless you need to query the data.
NOTE: In non-deterministic mode, it uses AES-GCM with a 256-bits key and a random initialization vector. In deterministic mode, it uses AES-GCM too but the initialization vector is generated as a HMAC-SHA-256 digest of the key and contents to encrypt.
NOTE: In non-deterministic mode, Active Record uses AES-GCM with a 256-bits key and a random initialization vector. In deterministic mode, it also uses AES-GCM, but the initialization vector is generated as an HMAC-SHA-256 digest of the key and contents to encrypt.
NOTE: You can disable deterministic encryption just by not configuring a `deterministic_key`.
NOTE: You can disable deterministic encryption by omitting a `deterministic_key`.
## Features
@ -116,7 +114,7 @@ When enabled, all the encryptable attributes will be encrypted according to the
#### Action Text Fixtures
To encrypt action text fixtures you should place them in `fixtures/action_text/encrypted_rich_texts.yml`.
To encrypt action text fixtures, you should place them in `fixtures/action_text/encrypted_rich_texts.yml`.
### Supported Types
@ -125,13 +123,13 @@ To encrypt action text fixtures you should place them in `fixtures/action_text/e
If you need to support a custom type, the recommended way is using a [serialized attribute](https://api.rubyonrails.org/classes/ActiveRecord/AttributeMethods/Serialization/ClassMethods.html). The declaration of the serialized attribute should go **before** the encryption declaration:
```ruby
# GOOD
# CORRECT
class Article < ApplicationRecord
serialize :title, Title
encrypts :title
end
# WRONG
# INCORRECT
class Article < ApplicationRecord
encrypts :title
serialize :title, Title
@ -140,7 +138,7 @@ end
### Ignoring Case
You might need to ignore case when querying deterministically encrypted data. There are two options that can help you here.
You might need to ignore casing when querying deterministically encrypted data. Two approaches make accomplishing this easier:
You can use the `:downcase` option when declaring the encrypted attribute to downcase the content before encryption occurs.
@ -150,7 +148,7 @@ class Person
end
```
When using `:downcase`, the original case is lost. In some situations, you might want to ignore the case only when querying, while also storing the original case. For those situations, you can use the option `:ignore_case`. This requires you to add a new column named `original_<column_name>` to store the content with the case unchanged:
When using `:downcase`, the original case is lost. In some situations, you might want to ignore the case only when querying while also storing the original case. For those situations, you can use the option `:ignore_case`. This requires you to add a new column named `original_<column_name>` to store the content with the case unchanged:
```ruby
class Label
@ -163,18 +161,18 @@ end
To ease migrations of unencrypted data, the library includes the option `config.active_record.encryption.support_unencrypted_data`. When set to `true`:
* Trying to read encrypted attributes that are not encrypted will work normally, without raising any error
* Queries with deterministically-encrypted attributes will include the "clear text" version of them, to support finding both encrypted and unencrypted content. You need to set `config.active_record.encryption.extend_queries = true` to enable this.
* Queries with deterministically-encrypted attributes will include the "clear text" version of them to support finding both encrypted and unencrypted content. You need to set `config.active_record.encryption.extend_queries = true` to enable this.
**This options is meant to be used in transition periods** while clear data and encrypted data need to coexist. Their value is `false` by default, which is the recommended goal for any application: errors will be raised when working with unencrypted data.
**This option is meant to be used during transition periods** while clear data and encrypted data must coexist. Both are set to `false` by default, which is the recommended goal for any application: errors will be raised when working with unencrypted data.
### Support for Previous Encryption Schemes
Changing encryption properties of attributes can break existing data. For example, imagine you want to make a "deterministic" attribute "not deterministic". If you just change the declaration in the model, reading existing ciphertexts will fail because they are different now.
Changing encryption properties of attributes can break existing data. For example, imagine you want to make a deterministic attribute non-deterministic. If you just change the declaration in the model, reading existing ciphertexts will fail because the encryption method is different now.
To support these situations, you can declare previous encryption schemes that will be used in two scenarios:
* When reading encrypted data, Active Record Encryption will try previous encryption schemes if the current scheme doesn't work.
* When querying deterministic data, it will add ciphertexts using previous schemes to the queries so that queries work seamlessly with data encrypted with different scheme. You need to set `config.active_record.encryption.extend_queries = true` to enable this.
* When querying deterministic data, it will add ciphertexts using previous schemes so that queries work seamlessly with data encrypted with different schemes. You must set `config.active_record.encryption.extend_queries = true` to enable this.
You can configure previous encryption schemes:
@ -206,11 +204,11 @@ When adding previous encryption schemes:
* With **non-deterministic encryption**, new information will always be encrypted with the *newest* (current) encryption scheme.
* With **deterministic encryption**, new information will always be encrypted with the *oldest* encryption scheme by default.
The reason is that, with deterministic encryption, you normally want ciphertexts to remain constant. You can change this behavior by setting `deterministic: { fixed: false }`. In that case, it will use the *newest* encryption scheme for encrypting new data.
Typically, with deterministic encryption, you want ciphertexts to remain constant. You can change this behavior by setting `deterministic: { fixed: false }`. In that case, it will use the *newest* encryption scheme for encrypting new data.
### Unique Constraints
NOTE: Unique constraints can only be used with data encrypted deterministically.
NOTE: Unique constraints can only be used with deterministically encrypted data.
#### Unique Validations
@ -223,15 +221,15 @@ class Person
end
```
They will also work when combining encrypted and unencrypted data, and when configuring previous encryption schemes.
They will also work when combining encrypted and unencrypted data,git and when configuring previous encryption schemes.
NOTE: If you want to ignore case make sure to use `downcase:` or `ignore_case:` in the `encrypts` declaration. Using the `case_sensitive:` option in the validation won't work.
NOTE: If you want to ignore case, make sure to use `downcase:` or `ignore_case:` in the `encrypts` declaration. Using the `case_sensitive:` option in the validation won't work.
#### Unique Indexes
To support unique indexes on deterministically-encrypted columns, you need to make sure their ciphertext doesn't ever change.
To support unique indexes on deterministically-encrypted columns, you need to ensure their ciphertext doesn't ever change.
To encourage this, by default, deterministic attributes will always use the oldest encryption scheme, when multiple encryption schemes are configured. Other than this, it's up to you making sure that encryption properties don't change for these attributes, or the unique indexes won't work.
To encourage this, deterministic attributes will always use the oldest available encryption scheme by default when multiple encryption schemes are configured. Otherwise, it's your job to ensure encryption properties don't change for these attributes, or the unique indexes won't work.
```ruby
class Person
@ -241,7 +239,7 @@ end
### Filtering Params Named as Encrypted Columns
By default, encrypted columns are configured to be [automatically filtered in Rails logs](https://guides.rubyonrails.org/action_controller_overview.html#parameters-filtering). You can disable this behavior by adding this to your `application.rb`:
By default, encrypted columns are configured to be [automatically filtered in Rails logs](https://guides.rubyonrails.org/action_controller_overview.html#parameters-filtering). You can disable this behavior by adding the following to your `application.rb`:
```ruby
config.active_record.encryption.add_to_filter_parameters = false
@ -252,7 +250,7 @@ In case you want exclude specific columns from this automatic filtering, add the
The library will preserve the encoding for string values encrypted non-deterministically.
For values encrypted deterministically, by default, the library will force UTF-8 encoding. The reason is that encoding is stored along with the encrypted payload. This means that the same value with a different encoding will result in different ciphertexts when encrypted. You normally want to avoid this to keep queries and uniqueness constraints working, so the library will perform the conversion automatically on your behalf.
Because encoding is stored along with the encrypted payload, values encrypted deterministically will force UTF-8 encoding by default. Therefore the same value with a different encoding will result in a different ciphertext when encrypted. You usually want to avoid this to keep queries and uniqueness constraints working, so the library will perform the conversion automatically on your behalf.
You can configure the desired default encoding for deterministic encryption with:
@ -268,7 +266,7 @@ config.active_record.encryption.forced_encoding_for_deterministic_encryption = n
## Key Management
Key management strategies are implemented by key providers. You can configure key providers globally or on a per-attribute basis.
Key providers implement key management strategies. You can configure key providers globally, or on a per attribute basis.
### Built-in Key Providers
@ -289,13 +287,13 @@ Implements a simple [envelope encryption](https://docs.aws.amazon.com/kms/latest
- It generates a random key for each data-encryption operation
- It stores the data-key with the data itself, encrypted with a primary key defined in the credential `active_record.encryption.primary_key`.
You can configure by adding this to your `application.rb`:
You can configure Active Record to use this key provider by adding this to your `application.rb`:
```ruby
config.active_record.encryption.key_provider = ActiveRecord::Encryption::EnvelopeEncryptionKeyProvider.new
```
As with other built-in key providers, you can provide a list of primary keys in `active_record.encryption.primary_key`, to implement key-rotation schemes.
As with other built-in key providers, you can provide a list of primary keys in `active_record.encryption.primary_key` to implement key-rotation schemes.
### Custom Key Providers
@ -344,7 +342,7 @@ class Article < ApplicationRecord
end
```
The key will be used internally to derive the key used to encrypt and decrypt the data.
Active Record uses the key to derive the key used to encrypt and decrypt the data.
### Rotating Keys
@ -362,7 +360,7 @@ active_record
key_derivation_salt: a3226b97b3b2f8372d1fc6d497a0c0d3
```
This enables workflows where you keep a short list of keys, by adding new keys, re-encrypting content and deleting old keys.
This enables workflows in which you keep a short list of keys by adding new keys, re-encrypting content, and deleting old keys.
NOTE: Rotating keys is not currently supported for deterministic encryption.
@ -370,19 +368,19 @@ NOTE: Active Record Encryption doesn't provide automatic management of key rotat
### Storing Key References
There is a setting `active_record.encryption.store_key_references` you can use to make `active_record.encryption` store a reference to the encryption key in the encrypted message itself.
You can configure `active_record.encryption.store_key_references` to make `active_record.encryption` store a reference to the encryption key in the encrypted message itself.
```ruby
config.active_record.encryption.store_key_references = true
```
This makes for a more performant decryption since, instead of trying lists of keys, the system can now locate keys directly. The price to pay is storage: encrypted data will be a bit bigger in size.
Doing so makes for more performant decryption because the system can now locate keys directly instead of trying lists of keys. The price to pay is storage: encrypted data will be a bit bigger.
## API
### Basic API
ActiveRecord encryption is meant to be used declaratively, but it presents an API for advanced usage scenarios.
ActiveRecord encryption is meant to be used declaratively, but it offers an API for advanced usage scenarios.
#### Encrypt and Decrypt
@ -407,7 +405,7 @@ article.encrypted_attribute?(:title)
### Configuration Options
You can configure Active Record Encryption options by setting them in your `application.rb` (most common scenario) or in a specific environment config file `config/environments/<env name>.rb` if you want to set them on a per-environment basis.
You can configure Active Record Encryption options in your `application.rb` (most common scenario) or in a specific environment config file `config/environments/<env name>.rb` if you want to set them on a per-environment basis.
All the config options are namespaced in `active_record.encryption.config`. For example:
@ -420,19 +418,19 @@ The available config options are:
| Key | Value |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
| `support_unencrypted_data` | When true, unencrypted data can be read normally. When false, it will raise. Default: false. |
| `extend_queries` | When true, queries referencing deterministically encrypted attributes will be modified to include additional values if needed. Those additional values will be the clean version of the value, when `support_unencrypted_data` is true) and values encrypted with previous encryption schemes if any (as provided with the `previous:` option). Default: false (experimental). |
| `encrypt_fixtures` | When true, encryptable attributes in fixtures will be automatically encrypted when those are loaded. Default: false. |
| `store_key_references` | When true, a reference to the encryption key is stored in the headers of the encrypted message. This makes for a faster decryption when multiple keys are in use. Default: false. |
| `add_to_filter_parameters` | When true, encrypted attribute names are added automatically to the [list of filtered params](https://guides.rubyonrails.org/configuring.html#rails-general-configuration) that won't be shown in logs. Default: true. |
| `support_unencrypted_data` | When true, unencrypted data can be read normally. When false, it will raise errors. Default: false. |
| `extend_queries` | When true, queries referencing deterministically encrypted attributes will be modified to include additional values if needed. Those additional values will be the clean version of the value (when `support_unencrypted_data` is true) and values encrypted with previous encryption schemes, if any (as provided with the `previous:` option). Default: false (experimental). |
| `encrypt_fixtures` | When true, encryptable attributes in fixtures will be automatically encrypted when loaded. Default: false. |
| `store_key_references` | When true, a reference to the encryption key is stored in the headers of the encrypted message. This makes for faster decryption when multiple keys are in use. Default: false. |
| `add_to_filter_parameters` | When true, encrypted attribute names are added automatically to the [list of filtered params](https://guides.rubyonrails.org/configuring.html#rails-general-configuration) and won't be shown in logs. Default: true. |
| `excluded_from_filter_parameters` | You can configure a list of params that won't be filtered out when `add_to_filter_parameters` is true. Default: []. |
| `validate_column_size` | Adds a validation based on the column size. This is recommended to prevent storing huge values using highly compressible payloads. Default: true. |
| `primary_key` | The key or lists of keys that is used to derive root data-encryption keys. They way they are used depends on the key provider configured. It's preferred to configure it via a credential `active_record_encryption.primary_key`. |
| `primary_key` | The key or lists of keys used to derive root data-encryption keys. The way they are used depends on the key provider configured. It's preferred to configure it via a credential `active_record_encryption.primary_key`. |
| `deterministic_key` | The key or list of keys used for deterministic encryption. It's preferred to configure it via a credential `active_record_encryption.deterministic_key`. |
| `key_derivation_salt` | The salt used when deriving keys. It's preferred to configure it via a credential `active_record_encryption.key_derivation_salt`. |
| `forced_encoding_for_deterministic_encryption` | The default encoding for attributes encrypted deterministically. You can disable forced encoding by setting this option to `nil`. It's `Encoding::UTF_8` by default. |
NOTE: It's recommended to use Rails built-in credentials support to store keys. If you prefer to set them manually via config properties, make sure you don't commit them with your code (e.g: use environment variables).
NOTE: It's recommended to use Rails built-in credentials support to store keys. If you prefer to set them manually via config properties, make sure you don't commit them with your code (e.g. use environment variables).
### Encryption Contexts
@ -443,11 +441,11 @@ NOTE: Encryption contexts are a flexible but advanced configuration mechanism. M
The main components of encryption contexts are:
* `encryptor`: exposes the internal API for encrypting and decrypting data. It interacts with a `key_provider` to build encrypted messages and deal with their serialization. The encryption/decryption itself is done by the `cipher` and the serialization by `message_serializer`.
* `cipher` the encryption algorithm itself (Aes 256 GCM)
* `key_provider` serves encryption and decryption keys.
* `cipher`: the encryption algorithm itself (AES 256 GCM)
* `key_provider`: serves encryption and decryption keys.
* `message_serializer`: serializes and deserializes encrypted payloads (`Message`).
NOTE: If you decide to build your own `message_serializer`, It's important to use safe mechanisms that can't deserialize arbitrary objects. A common supported scenario is encrypting existing unencrypted data. An attacker can leverage this to enter a tampered payload before encryption takes place and perform RCE attacks. This means custom serializers should avoid `Marshal`, `YAML.load` (use `YAML.safe_load` instead) or `JSON.load` (use `JSON.parse` instead).
NOTE: If you decide to build your own `message_serializer`, it's important to use safe mechanisms that can't deserialize arbitrary objects. A common supported scenario is encrypting existing unencrypted data. An attacker can leverage this to enter a tampered payload before encryption takes place and perform RCE attacks. This means custom serializers should avoid `Marshal`, `YAML.load` (use `YAML.safe_load` instead), or `JSON.load` (use `JSON.parse` instead).
#### Global Encryption Context
@ -489,15 +487,15 @@ ActiveRecord::Encryption.without_encryption do
...
end
```
This means that reading encrypted text will return the ciphertext and saved content will be stored unencrypted.
This means that reading encrypted text will return the ciphertext, and saved content will be stored unencrypted.
##### Protect Encrypted Data
You can run code without encryption but preventing overwriting encrypted content:
You can run code without encryption but prevent overwriting encrypted content:
```ruby
ActiveRecord::Encryption.protecting_encrypted_data do
...
end
```
This can be handy if you want to protect encrypted data while still letting someone run arbitrary code against it (e.g: in a Rails console).
This can be handy if you want to protect encrypted data while still running arbitrary code against it (e.g. in a Rails console).