gitlab-org--gitlab-foss/doc/development/ordering_table_columns.md

# Ordering Table Columns in PostgreSQL

Similar to C structures the space of a table is influenced by the order of
columns. This is because the size of columns is aligned depending on the type of
the following column. Let's consider an example:

- `id` (integer, 4 bytes)
- `name` (text, variable)
- `user_id` (integer, 4 bytes)

The first column is a 4-byte integer. The next is text of variable length. The
`text` data type requires 1-word alignment, and on 64-bit platform, 1 word is 8
bytes. To meet the alignment requirements, four zeros are to be added right
after the first column, so `id` occupies 4 bytes, then 4 bytes of alignment
padding, and only next `name` is being stored. Therefore, in this case, 8 bytes
will be spent for storing a 4-byte integer.

The space between rows is also subject to alignment padding. The `user_id`
column takes only 4 bytes, and on 64-bit platform, 4 zeroes will be added for
alignment padding, to allow storing the next row beginning with the "clear" word.

As a result, the actual size of each column would be (omitting variable length
data and 24-byte tuple header): 8 bytes, variable, 8 bytes. This means that
each row will require at least 16 bytes for the two 4-byte integers. If a table
has a few rows this is not an issue. However, once you start storing millions of
rows you can save space by using a different order. For the above example, the
ideal column order would be the following:

- `id` (integer, 4 bytes)
- `user_id` (integer, 4 bytes)
- `name` (text, variable)

or

- `name` (text, variable)
- `id` (integer, 4 bytes)
- `user_id` (integer, 4 bytes)

In these examples, the `id` and `user_id` columns are packed together, which
means we only need 8 bytes to store _both_ of them. This in turn means each row
will require 8 bytes less space.

For GitLab we require that columns of new tables are ordered based to use the
least amount of space. An easy way of doing this is to order them based on the
type size in descending order with variable sizes (`text`, `varchar`, arrays,
`json`, `jsonb`, and so on) at the end.

## Type Sizes

While the [PostgreSQL documentation](https://www.postgresql.org/docs/current/datatype.html) contains plenty
of information we will list the sizes of common types here so it's easier to
look them up. Here "word" refers to the word size, which is 4 bytes for a 32
bits platform and 8 bytes for a 64 bits platform.

| Type             | Size                                 | Alignment needed |
|:-----------------|:-------------------------------------|:-----------|
| smallint         | 2 bytes                              | 1 word     |
| integer          | 4 bytes                              | 1 word     |
| bigint           | 8 bytes                              | 8 bytes    |
| real             | 4 bytes                              | 1 word     |
| double precision | 8 bytes                              | 8 bytes    |
| boolean          | 1 byte                               | not needed |
| text / string    | variable, 1 byte plus the data       | 1 word     |
| bytea            | variable, 1 or 4 bytes plus the data | 1 word     |
| timestamp        | 8 bytes                              | 8 bytes    |
| timestamptz      | 8 bytes                              | 8 bytes    |
| date             | 4 bytes                              | 1 word     |

A "variable" size means the actual size depends on the value being stored. If
PostgreSQL determines this can be embedded directly into a row it may do so, but
for very large values it will store the data externally and store a pointer (of
1 word in size) in the column. Because of this variable sized columns should
always be at the end of a table.

## Real Example

Let's use the `events` table as an example, which currently has the following
layout:

| Column      | Type                        | Size     |
|:------------|:----------------------------|:---------|
| id          | integer                     | 4 bytes  |
| target_type | character varying           | variable |
| target_id   | integer                     | 4 bytes  |
| title       | character varying           | variable |
| data        | text                        | variable |
| project_id  | integer                     | 4 bytes  |
| created_at  | timestamp without time zone | 8 bytes  |
| updated_at  | timestamp without time zone | 8 bytes  |
| action      | integer                     | 4 bytes  |
| author_id   | integer                     | 4 bytes  |

After adding padding to align the columns this would translate to columns being
divided into fixed size chunks as follows:

| Chunk Size | Columns           |
|:-----------|:------------------|
| 8 bytes    | id                |
| variable   | target_type       |
| 8 bytes    | target_id         |
| variable   | title             |
| variable   | data              |
| 8 bytes    | project_id        |
| 8 bytes    | created_at        |
| 8 bytes    | updated_at        |
| 8 bytes    | action, author_id |

This means that excluding the variable sized data and tuple header, we need at
least 8 * 6 = 48 bytes per row.

We can optimise this by using the following column order instead:

| Column      | Type                        | Size     |
|:------------|:----------------------------|:---------|
| created_at  | timestamp without time zone | 8 bytes  |
| updated_at  | timestamp without time zone | 8 bytes  |
| id          | integer                     | 4 bytes  |
| target_id   | integer                     | 4 bytes  |
| project_id  | integer                     | 4 bytes  |
| action      | integer                     | 4 bytes  |
| author_id   | integer                     | 4 bytes  |
| target_type | character varying           | variable |
| title       | character varying           | variable |
| data        | text                        | variable |

This would produce the following chunks:

| Chunk Size | Columns            |
|:-----------|:-------------------|
| 8 bytes    | created_at         |
| 8 bytes    | updated_at         |
| 8 bytes    | id, target_id      |
| 8 bytes    | project_id, action |
| 8 bytes    | author_id          |
| variable   | target_type        |
| variable   | title              |
| variable   | data               |

Here we only need 40 bytes per row excluding the variable sized data and 24-byte
tuple header. 8 bytes being saved may not sound like much, but for tables as
large as the `events` table it does begin to matter. For example, when storing
80 000 000 rows this translates to a space saving of at least 610 MB, all by
just changing the order of a few columns.
Postgres alignment padding – logic fixed: alignment padding is applied based on the next column's data type, so, for example, we cannot state that for "boolean" (1 byte of data), the alignment is not needed. This commit fixes the logic in the doc. Update ordering_table_columns.md Typos fixed more polishing of text Minor improvements of text style Get rid of two "following" words in one line, having different meanings More text style polishing according @eread's comments 2018-09-17 02:46:14 -04:00			`# Ordering Table Columns in PostgreSQL`
Add more database development related docs 2017-08-10 11:53:20 -04:00
			`Similar to C structures the space of a table is influenced by the order of`
			`columns. This is because the size of columns is aligned depending on the type of`
Postgres alignment padding – logic fixed: alignment padding is applied based on the next column's data type, so, for example, we cannot state that for "boolean" (1 byte of data), the alignment is not needed. This commit fixes the logic in the doc. Update ordering_table_columns.md Typos fixed more polishing of text Minor improvements of text style Get rid of two "following" words in one line, having different meanings More text style polishing according @eread's comments 2018-09-17 02:46:14 -04:00			`the following column. Let's consider an example:`
Add more database development related docs 2017-08-10 11:53:20 -04:00
Postgres alignment padding – logic fixed: alignment padding is applied based on the next column's data type, so, for example, we cannot state that for "boolean" (1 byte of data), the alignment is not needed. This commit fixes the logic in the doc. Update ordering_table_columns.md Typos fixed more polishing of text Minor improvements of text style Get rid of two "following" words in one line, having different meanings More text style polishing according @eread's comments 2018-09-17 02:46:14 -04:00			- `id` (integer, 4 bytes)
			- `name` (text, variable)
			- `user_id` (integer, 4 bytes)
Add more database development related docs 2017-08-10 11:53:20 -04:00
Postgres alignment padding – logic fixed: alignment padding is applied based on the next column's data type, so, for example, we cannot state that for "boolean" (1 byte of data), the alignment is not needed. This commit fixes the logic in the doc. Update ordering_table_columns.md Typos fixed more polishing of text Minor improvements of text style Get rid of two "following" words in one line, having different meanings More text style polishing according @eread's comments 2018-09-17 02:46:14 -04:00			`The first column is a 4-byte integer. The next is text of variable length. The`
			`text` data type requires 1-word alignment, and on 64-bit platform, 1 word is 8
			`bytes. To meet the alignment requirements, four zeros are to be added right`
			after the first column, so `id` occupies 4 bytes, then 4 bytes of alignment
			padding, and only next `name` is being stored. Therefore, in this case, 8 bytes
			`will be spent for storing a 4-byte integer.`
Add more database development related docs 2017-08-10 11:53:20 -04:00
Postgres alignment padding – logic fixed: alignment padding is applied based on the next column's data type, so, for example, we cannot state that for "boolean" (1 byte of data), the alignment is not needed. This commit fixes the logic in the doc. Update ordering_table_columns.md Typos fixed more polishing of text Minor improvements of text style Get rid of two "following" words in one line, having different meanings More text style polishing according @eread's comments 2018-09-17 02:46:14 -04:00			The space between rows is also subject to alignment padding. The `user_id`
			`column takes only 4 bytes, and on 64-bit platform, 4 zeroes will be added for`
			`alignment padding, to allow storing the next row beginning with the "clear" word.`
Add more database development related docs 2017-08-10 11:53:20 -04:00
Fix misspellings in docs Signed-off-by: Takuya Noguchi <takninnovationresearch@gmail.com> 2019-02-23 07:49:55 -05:00			`As a result, the actual size of each column would be (omitting variable length`
Postgres alignment padding – logic fixed: alignment padding is applied based on the next column's data type, so, for example, we cannot state that for "boolean" (1 byte of data), the alignment is not needed. This commit fixes the logic in the doc. Update ordering_table_columns.md Typos fixed more polishing of text Minor improvements of text style Get rid of two "following" words in one line, having different meanings More text style polishing according @eread's comments 2018-09-17 02:46:14 -04:00			`data and 24-byte tuple header): 8 bytes, variable, 8 bytes. This means that`
			`each row will require at least 16 bytes for the two 4-byte integers. If a table`
			`has a few rows this is not an issue. However, once you start storing millions of`
			`rows you can save space by using a different order. For the above example, the`
			`ideal column order would be the following:`

			- `id` (integer, 4 bytes)
			- `user_id` (integer, 4 bytes)
			- `name` (text, variable)

Fix most instances of bare URLs in markdown 2019-01-24 01:52:33 -05:00			`or`
Postgres alignment padding – logic fixed: alignment padding is applied based on the next column's data type, so, for example, we cannot state that for "boolean" (1 byte of data), the alignment is not needed. This commit fixes the logic in the doc. Update ordering_table_columns.md Typos fixed more polishing of text Minor improvements of text style Get rid of two "following" words in one line, having different meanings More text style polishing according @eread's comments 2018-09-17 02:46:14 -04:00
			- `name` (text, variable)
			- `id` (integer, 4 bytes)
			- `user_id` (integer, 4 bytes)

			In these examples, the `id` and `user_id` columns are packed together, which
			`means we only need 8 bytes to store _both_ of them. This in turn means each row`
			`will require 8 bytes less space.`
Add more database development related docs 2017-08-10 11:53:20 -04:00
			`For GitLab we require that columns of new tables are ordered based to use the`
			`least amount of space. An easy way of doing this is to order them based on the`
Postgres alignment padding – logic fixed: alignment padding is applied based on the next column's data type, so, for example, we cannot state that for "boolean" (1 byte of data), the alignment is not needed. This commit fixes the logic in the doc. Update ordering_table_columns.md Typos fixed more polishing of text Minor improvements of text style Get rid of two "following" words in one line, having different meanings More text style polishing according @eread's comments 2018-09-17 02:46:14 -04:00			type size in descending order with variable sizes (`text`, `varchar`, arrays,
			`json`, `jsonb`, and so on) at the end.
Add more database development related docs 2017-08-10 11:53:20 -04:00
			`## Type Sizes`

Fix most instances of bare URLs in markdown 2019-01-24 01:52:33 -05:00			`While the [PostgreSQL documentation](https://www.postgresql.org/docs/current/datatype.html) contains plenty`
Add more database development related docs 2017-08-10 11:53:20 -04:00			`of information we will list the sizes of common types here so it's easier to`
			`look them up. Here "word" refers to the word size, which is 4 bytes for a 32`
			`bits platform and 8 bytes for a 64 bits platform.`

Postgres alignment padding – logic fixed: alignment padding is applied based on the next column's data type, so, for example, we cannot state that for "boolean" (1 byte of data), the alignment is not needed. This commit fixes the logic in the doc. Update ordering_table_columns.md Typos fixed more polishing of text Minor improvements of text style Get rid of two "following" words in one line, having different meanings More text style polishing according @eread's comments 2018-09-17 02:46:14 -04:00			`\| Type \| Size \| Alignment needed \|`
Add more database development related docs 2017-08-10 11:53:20 -04:00			`\|:-----------------\|:-------------------------------------\|:-----------\|`
			`\| smallint \| 2 bytes \| 1 word \|`
			`\| integer \| 4 bytes \| 1 word \|`
			`\| bigint \| 8 bytes \| 8 bytes \|`
			`\| real \| 4 bytes \| 1 word \|`
			`\| double precision \| 8 bytes \| 8 bytes \|`
			`\| boolean \| 1 byte \| not needed \|`
			`\| text / string \| variable, 1 byte plus the data \| 1 word \|`
			`\| bytea \| variable, 1 or 4 bytes plus the data \| 1 word \|`
			`\| timestamp \| 8 bytes \| 8 bytes \|`
			`\| timestamptz \| 8 bytes \| 8 bytes \|`
			`\| date \| 4 bytes \| 1 word \|`

			`A "variable" size means the actual size depends on the value being stored. If`
			`PostgreSQL determines this can be embedded directly into a row it may do so, but`
			`for very large values it will store the data externally and store a pointer (of`
			`1 word in size) in the column. Because of this variable sized columns should`
			`always be at the end of a table.`

			`## Real Example`

Postgres alignment padding – logic fixed: alignment padding is applied based on the next column's data type, so, for example, we cannot state that for "boolean" (1 byte of data), the alignment is not needed. This commit fixes the logic in the doc. Update ordering_table_columns.md Typos fixed more polishing of text Minor improvements of text style Get rid of two "following" words in one line, having different meanings More text style polishing according @eread's comments 2018-09-17 02:46:14 -04:00			Let's use the `events` table as an example, which currently has the following
Add more database development related docs 2017-08-10 11:53:20 -04:00			`layout:`

			`\| Column \| Type \| Size \|`
			`\|:------------\|:----------------------------\|:---------\|`
			`\| id \| integer \| 4 bytes \|`
			`\| target_type \| character varying \| variable \|`
			`\| target_id \| integer \| 4 bytes \|`
			`\| title \| character varying \| variable \|`
			`\| data \| text \| variable \|`
			`\| project_id \| integer \| 4 bytes \|`
			`\| created_at \| timestamp without time zone \| 8 bytes \|`
			`\| updated_at \| timestamp without time zone \| 8 bytes \|`
			`\| action \| integer \| 4 bytes \|`
			`\| author_id \| integer \| 4 bytes \|`

			`After adding padding to align the columns this would translate to columns being`
			`divided into fixed size chunks as follows:`

			`\| Chunk Size \| Columns \|`
			`\|:-----------\|:------------------\|`
			`\| 8 bytes \| id \|`
			`\| variable \| target_type \|`
			`\| 8 bytes \| target_id \|`
			`\| variable \| title \|`
			`\| variable \| data \|`
			`\| 8 bytes \| project_id \|`
			`\| 8 bytes \| created_at \|`
			`\| 8 bytes \| updated_at \|`
			`\| 8 bytes \| action, author_id \|`

Postgres alignment padding – logic fixed: alignment padding is applied based on the next column's data type, so, for example, we cannot state that for "boolean" (1 byte of data), the alignment is not needed. This commit fixes the logic in the doc. Update ordering_table_columns.md Typos fixed more polishing of text Minor improvements of text style Get rid of two "following" words in one line, having different meanings More text style polishing according @eread's comments 2018-09-17 02:46:14 -04:00			`This means that excluding the variable sized data and tuple header, we need at`
			`least 8 * 6 = 48 bytes per row.`
Add more database development related docs 2017-08-10 11:53:20 -04:00
			`We can optimise this by using the following column order instead:`

			`\| Column \| Type \| Size \|`
			`\|:------------\|:----------------------------\|:---------\|`
			`\| created_at \| timestamp without time zone \| 8 bytes \|`
			`\| updated_at \| timestamp without time zone \| 8 bytes \|`
			`\| id \| integer \| 4 bytes \|`
			`\| target_id \| integer \| 4 bytes \|`
			`\| project_id \| integer \| 4 bytes \|`
			`\| action \| integer \| 4 bytes \|`
			`\| author_id \| integer \| 4 bytes \|`
			`\| target_type \| character varying \| variable \|`
			`\| title \| character varying \| variable \|`
			`\| data \| text \| variable \|`

			`This would produce the following chunks:`

			`\| Chunk Size \| Columns \|`
			`\|:-----------\|:-------------------\|`
			`\| 8 bytes \| created_at \|`
			`\| 8 bytes \| updated_at \|`
			`\| 8 bytes \| id, target_id \|`
			`\| 8 bytes \| project_id, action \|`
			`\| 8 bytes \| author_id \|`
			`\| variable \| target_type \|`
			`\| variable \| title \|`
			`\| variable \| data \|`

Postgres alignment padding – logic fixed: alignment padding is applied based on the next column's data type, so, for example, we cannot state that for "boolean" (1 byte of data), the alignment is not needed. This commit fixes the logic in the doc. Update ordering_table_columns.md Typos fixed more polishing of text Minor improvements of text style Get rid of two "following" words in one line, having different meanings More text style polishing according @eread's comments 2018-09-17 02:46:14 -04:00			`Here we only need 40 bytes per row excluding the variable sized data and 24-byte`
Fix most instances of bare URLs in markdown 2019-01-24 01:52:33 -05:00			`tuple header. 8 bytes being saved may not sound like much, but for tables as`
Postgres alignment padding – logic fixed: alignment padding is applied based on the next column's data type, so, for example, we cannot state that for "boolean" (1 byte of data), the alignment is not needed. This commit fixes the logic in the doc. Update ordering_table_columns.md Typos fixed more polishing of text Minor improvements of text style Get rid of two "following" words in one line, having different meanings More text style polishing according @eread's comments 2018-09-17 02:46:14 -04:00			large as the `events` table it does begin to matter. For example, when storing
			`80 000 000 rows this translates to a space saving of at least 610 MB, all by`
			`just changing the order of a few columns.`