From 3d160480c07cd967480716d7cfbe332dfc53507e Mon Sep 17 00:00:00 2001 From: Yorick Peterse Date: Wed, 31 May 2017 13:47:14 +0200 Subject: [PATCH] Document not storing serialized data --- doc/development/README.md | 1 + doc/development/serializing_data.md | 84 +++++++++++++++++++++++++++++ 2 files changed, 85 insertions(+) create mode 100644 doc/development/serializing_data.md diff --git a/doc/development/README.md b/doc/development/README.md index 934c6849ff9..5989f755a4d 100644 --- a/doc/development/README.md +++ b/doc/development/README.md @@ -49,6 +49,7 @@ - [Adding database indexes](adding_database_indexes.md) - [Post Deployment Migrations](post_deployment_migrations.md) - [Foreign Keys & Associations](foreign_keys.md) +- [Serializing Data](serializing_data.md) ## i18n diff --git a/doc/development/serializing_data.md b/doc/development/serializing_data.md new file mode 100644 index 00000000000..2b56f48bc44 --- /dev/null +++ b/doc/development/serializing_data.md @@ -0,0 +1,84 @@ +# Serializing Data + +**Summary:** don't store serialized data in the database, use separate columns +and/or tables instead. + +Rails makes it possible to store serialized data in JSON, YAML or other formats. +Such a field can be defined as follows: + +```ruby +class Issue < ActiveRecord::Model + serialize :custom_fields +end +``` + +While it may be tempting to store serialized data in the database there are many +problems with this. This document will outline these problems and provide an +alternative. + +## Serialized Data Is Less Powerful + +When using a relational database you have the ability to query individual +fields, change the schema, index data and so forth. When you use serialized data +all of that becomes either very difficult or downright impossible. While +PostgreSQL does offer the ability to query JSON fields it is mostly meant for +very specialized use cases, and not for more general use. If you use YAML in +turn there's no way to query the data at all. + +## Waste Of Space + +Storing serialized data such as JSON or YAML will end up wasting a lot of space. +This is because these formats often include additional characters (e.g. double +quotes or newlines) besides the data that you are storing. + +## Difficult To Manage + +There comes a time where you will need to add a new field to the serialized +data, or change an existing one. Using serialized data this becomes difficult +and very time consuming as the only way of doing so is to re-write all the +stored values. To do so you would have to: + +1. Retrieve the data +1. Parse it into a Ruby structure +1. Mutate it +1. Serialize it back to a String +1. Store it in the database + +On the other hand, if one were to use regular columns adding a column would be +as easy as this: + +```sql +ALTER TABLE table_name ADD COLUMN column_name type; +``` + +Such a query would take very little to no time and would immediately apply to +all rows, without having to re-write large JSON or YAML structures. + +Finally, there comes a time when the JSON or YAML structure is no longer +sufficient and you need to migrate away from it. When storing only a few rows +this may not be a problem, but when storing millions of rows such a migration +can easily take hours or even days to complete. + +## Relational Databases Are Not Document Stores + +When storing data as JSON or YAML you're essentially using your database as if +it were a document store (e.g. MongoDB), except you're not using any of the +powerful features provided by a typical RDBMS _nor_ are you using any of the +features provided by a typical document store (e.g. the ability to index fields +of documents with variable fields). In other words, it's a waste. + +## Consistent Fields + +One argument sometimes made in favour of serialized data is having to store +widely varying fields and values. Sometimes this is truly the case, and then +perhaps it might make sense to use serialized data. However, in 99% of the cases +the fields and types stored tend to be the same for every row. Even if there is +a slight difference you can still use separate columns and just not set the ones +you don't need. + +## The Solution + +The solution is very simple: just use separate columns and/or separate tables. +This will allow you to use all the features provided by your database, it will +make it easier to manage and migrate the data, you'll conserve space, you can +index the data efficiently and so forth.