gitlab-org--gitlab-foss/doc/development/polymorphic_associations.md

---
stage: none
group: unassigned
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---

# Polymorphic Associations

**Summary:** always use separate tables instead of polymorphic associations.

Rails makes it possible to define so called "polymorphic associations". This
usually works by adding two columns to a table: a target type column, and a
target ID. For example, at the time of writing we have such a setup for
`members` with the following columns:

- `source_type`: a string defining the model to use, can be either `Project` or
  `Namespace`.
- `source_id`: the ID of the row to retrieve based on `source_type`. For
  example, when `source_type` is `Project` then `source_id` contains a
  project ID.

While such a setup may appear to be useful, it comes with many drawbacks; enough
that you should avoid this at all costs.

## Space Wasted

Because this setup relies on string values to determine the model to use, it
wastes a lot of space. For example, for `Project` and `Namespace` the
maximum size is 9 bytes, plus 1 extra byte for every string when using
PostgreSQL. While this may only be 10 bytes per row, given enough tables and
rows using such a setup we can end up wasting quite a bit of disk space and
memory (for any indexes).

## Indexes

Because our associations are broken up into two columns this may result in
requiring composite indexes for queries to be performed efficiently. While
composite indexes are not wrong at all, they can be tricky to set up as the
ordering of columns in these indexes is important to ensure optimal performance.

## Consistency

One really big problem with polymorphic associations is being unable to enforce
data consistency on the database level using foreign keys. For consistency to be
enforced on the database level one would have to write their own foreign key
logic to support polymorphic associations.

Enforcing consistency on the database level is absolutely crucial for
maintaining a healthy environment, and thus is another reason to avoid
polymorphic associations.

## Query Overhead

When using polymorphic associations you always need to filter using both
columns. For example, you may end up writing a query like this:

```sql
SELECT *
FROM members
WHERE source_type = 'Project'
AND source_id = 13083;
```

Here PostgreSQL can perform the query quite efficiently if both columns are
indexed. As the query gets more complex, it may not be able to use these
indexes effectively.

## Mixed Responsibilities

Similar to functions and classes, a table should have a single responsibility:
storing data with a certain set of pre-defined columns. When using polymorphic
associations, you are storing different types of data (possibly with
different columns set) in the same table.

## The Solution

Fortunately, there is a solution to these problems: use a
separate table for every type you would otherwise store in the same table. Using
a separate table allows you to use everything a database may provide to ensure
consistency and query data efficiently, without any additional application logic
being necessary.

Let's say you have a `members` table storing both approved and pending members,
for both projects and groups, and the pending state is determined by the column
`requested_at` being set or not. Schema wise such a setup can lead to various
columns only being set for certain rows, wasting space. It's also possible that
certain indexes are only set for certain rows, again wasting space. Finally,
querying such a table requires less than ideal queries. For example:

```sql
SELECT *
FROM members
WHERE requested_at IS NULL
AND source_type = 'GroupMember'
AND source_id = 4
```

Instead such a table should be broken up into separate tables. For example, you
may end up with 4 tables in this case:

- project_members
- group_members
- pending_project_members
- pending_group_members

This makes querying data trivial. For example, to get the members of a group
you'd run:

```sql
SELECT *
FROM group_members
WHERE group_id = 4
```

To get all the pending members of a group in turn you'd run:

```sql
SELECT *
FROM pending_group_members
WHERE group_id = 4
```

If you want to get both you can use a `UNION`, though you need to be explicit
about what columns you want to `SELECT` as otherwise the result set uses the
columns of the first query. For example:

```sql
SELECT id, 'Group' AS target_type, group_id AS target_id
FROM group_members

UNION ALL

SELECT id, 'Project' AS target_type, project_id AS target_id
FROM project_members
```

The above example is perhaps a bit silly, but it shows that there's nothing
stopping you from merging the data together and presenting it on the same page.
Selecting columns explicitly can also speed up queries as the database has to do
less work to get the data (compared to selecting all columns, even ones you're
not using).

Our schema also becomes easier. No longer do we need to both store and index the
`source_type` column, we can define foreign keys easily, and we don't need to
filter rows using the `IS NULL` condition.

To summarize: using separate tables allows us to use foreign keys effectively,
create indexes only where necessary, conserve space, query data more
efficiently, and scale these tables more easily (for example, by storing them on
separate disks). A nice side effect of this is that code can also become easier,
as a single model isn't responsible for handling different kinds of
data.
Add latest changes from gitlab-org/gitlab@master 2020-10-30 18:08:56 +00:00			`---`
			`stage: none`
			`group: unassigned`
Add latest changes from gitlab-org/gitlab@master 2020-11-26 06:09:20 +00:00			`info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments`
Add latest changes from gitlab-org/gitlab@master 2020-10-30 18:08:56 +00:00			`---`

Document not using polymorphic associations Instead of using polymorphic associations a developer should use separate tables. 2017-05-08 11:36:20 +00:00			`# Polymorphic Associations`

			`Summary: always use separate tables instead of polymorphic associations.`

			`Rails makes it possible to define so called "polymorphic associations". This`
			`usually works by adding two columns to a table: a target type column, and a`
Add latest changes from gitlab-org/gitlab@master 2020-05-07 06:09:38 +00:00			`target ID. For example, at the time of writing we have such a setup for`
Document not using polymorphic associations Instead of using polymorphic associations a developer should use separate tables. 2017-05-08 11:36:20 +00:00			`members` with the following columns:

Make unordered lists conform to styleguide - Also makes other minor Markdown fixes that were near the main fixes. 2018-11-13 06:07:16 +00:00			- `source_type`: a string defining the model to use, can be either `Project` or
Document not using polymorphic associations Instead of using polymorphic associations a developer should use separate tables. 2017-05-08 11:36:20 +00:00			`Namespace`.
Make unordered lists conform to styleguide - Also makes other minor Markdown fixes that were near the main fixes. 2018-11-13 06:07:16 +00:00			- `source_id`: the ID of the row to retrieve based on `source_type`. For
Add latest changes from gitlab-org/gitlab@master 2020-12-07 15:09:49 +00:00			example, when `source_type` is `Project` then `source_id` contains a
Document not using polymorphic associations Instead of using polymorphic associations a developer should use separate tables. 2017-05-08 11:36:20 +00:00			`project ID.`

			`While such a setup may appear to be useful, it comes with many drawbacks; enough`
			`that you should avoid this at all costs.`

			`## Space Wasted`

Add latest changes from gitlab-org/gitlab@master 2020-12-07 15:09:49 +00:00			`Because this setup relies on string values to determine the model to use, it`
			wastes a lot of space. For example, for `Project` and `Namespace` the
Document not using polymorphic associations Instead of using polymorphic associations a developer should use separate tables. 2017-05-08 11:36:20 +00:00			`maximum size is 9 bytes, plus 1 extra byte for every string when using`
			`PostgreSQL. While this may only be 10 bytes per row, given enough tables and`
			`rows using such a setup we can end up wasting quite a bit of disk space and`
			`memory (for any indexes).`

			`## Indexes`

			`Because our associations are broken up into two columns this may result in`
			`requiring composite indexes for queries to be performed efficiently. While`
			`composite indexes are not wrong at all, they can be tricky to set up as the`
			`ordering of columns in these indexes is important to ensure optimal performance.`

			`## Consistency`

			`One really big problem with polymorphic associations is being unable to enforce`
			`data consistency on the database level using foreign keys. For consistency to be`
			`enforced on the database level one would have to write their own foreign key`
			`logic to support polymorphic associations.`

			`Enforcing consistency on the database level is absolutely crucial for`
			`maintaining a healthy environment, and thus is another reason to avoid`
			`polymorphic associations.`

			`## Query Overhead`

			`When using polymorphic associations you always need to filter using both`
			`columns. For example, you may end up writing a query like this:`

			```sql
			`SELECT *`
			`FROM members`
			`WHERE source_type = 'Project'`
			`AND source_id = 13083;`
			```

			`Here PostgreSQL can perform the query quite efficiently if both columns are`
Add latest changes from gitlab-org/gitlab@master 2021-06-14 18:10:28 +00:00			`indexed. As the query gets more complex, it may not be able to use these`
			`indexes effectively.`
Document not using polymorphic associations Instead of using polymorphic associations a developer should use separate tables. 2017-05-08 11:36:20 +00:00
			`## Mixed Responsibilities`

Add latest changes from gitlab-org/gitlab@master 2021-06-14 18:10:28 +00:00			`Similar to functions and classes, a table should have a single responsibility:`
Document not using polymorphic associations Instead of using polymorphic associations a developer should use separate tables. 2017-05-08 11:36:20 +00:00			`storing data with a certain set of pre-defined columns. When using polymorphic`
Add latest changes from gitlab-org/gitlab@master 2021-06-14 18:10:28 +00:00			`associations, you are storing different types of data (possibly with`
Document not using polymorphic associations Instead of using polymorphic associations a developer should use separate tables. 2017-05-08 11:36:20 +00:00			`different columns set) in the same table.`

			`## The Solution`

Add latest changes from gitlab-org/gitlab@master 2021-06-14 18:10:28 +00:00			`Fortunately, there is a solution to these problems: use a`
Document not using polymorphic associations Instead of using polymorphic associations a developer should use separate tables. 2017-05-08 11:36:20 +00:00			`separate table for every type you would otherwise store in the same table. Using`
			`a separate table allows you to use everything a database may provide to ensure`
			`consistency and query data efficiently, without any additional application logic`
			`being necessary.`

			Let's say you have a `members` table storing both approved and pending members,
			`for both projects and groups, and the pending state is determined by the column`
			`requested_at` being set or not. Schema wise such a setup can lead to various
			`columns only being set for certain rows, wasting space. It's also possible that`
Add latest changes from gitlab-org/gitlab@master 2020-12-07 15:09:49 +00:00			`certain indexes are only set for certain rows, again wasting space. Finally,`
Document not using polymorphic associations Instead of using polymorphic associations a developer should use separate tables. 2017-05-08 11:36:20 +00:00			`querying such a table requires less than ideal queries. For example:`

			```sql
			`SELECT *`
			`FROM members`
			`WHERE requested_at IS NULL`
			`AND source_type = 'GroupMember'`
			`AND source_id = 4`
			```

			`Instead such a table should be broken up into separate tables. For example, you`
			`may end up with 4 tables in this case:`

Make unordered lists conform to styleguide - Also makes other minor Markdown fixes that were near the main fixes. 2018-11-13 06:07:16 +00:00			`- project_members`
			`- group_members`
			`- pending_project_members`
			`- pending_group_members`
Document not using polymorphic associations Instead of using polymorphic associations a developer should use separate tables. 2017-05-08 11:36:20 +00:00
			`This makes querying data trivial. For example, to get the members of a group`
			`you'd run:`

			```sql
			`SELECT *`
			`FROM group_members`
			`WHERE group_id = 4`
			```

			`To get all the pending members of a group in turn you'd run:`

			```sql
			`SELECT *`
			`FROM pending_group_members`
			`WHERE group_id = 4`
			```

Add latest changes from gitlab-org/gitlab@master 2021-06-14 18:10:28 +00:00			If you want to get both you can use a `UNION`, though you need to be explicit
			about what columns you want to `SELECT` as otherwise the result set uses the
Document not using polymorphic associations Instead of using polymorphic associations a developer should use separate tables. 2017-05-08 11:36:20 +00:00			`columns of the first query. For example:`

			```sql
			`SELECT id, 'Group' AS target_type, group_id AS target_id`
			`FROM group_members`

			`UNION ALL`

			`SELECT id, 'Project' AS target_type, project_id AS target_id`
			`FROM project_members`
			```

			`The above example is perhaps a bit silly, but it shows that there's nothing`
			`stopping you from merging the data together and presenting it on the same page.`
			`Selecting columns explicitly can also speed up queries as the database has to do`
			`less work to get the data (compared to selecting all columns, even ones you're`
			`not using).`

			`Our schema also becomes easier. No longer do we need to both store and index the`
			`source_type` column, we can define foreign keys easily, and we don't need to
			filter rows using the `IS NULL` condition.

			`To summarize: using separate tables allows us to use foreign keys effectively,`
			`create indexes only where necessary, conserve space, query data more`
Add latest changes from gitlab-org/gitlab@master 2021-07-19 18:08:23 +00:00			`efficiently, and scale these tables more easily (for example, by storing them on`
Add latest changes from gitlab-org/gitlab@master 2020-12-07 15:09:49 +00:00			`separate disks). A nice side effect of this is that code can also become easier,`
			`as a single model isn't responsible for handling different kinds of`
Document not using polymorphic associations Instead of using polymorphic associations a developer should use separate tables. 2017-05-08 11:36:20 +00:00			`data.`