1
0
Fork 0
mirror of https://github.com/moby/moby.git synced 2022-11-09 12:21:53 -05:00

Merge pull request #19856 from moxiegirl/carry-close-19240

Updating for CAS changes and new select a driver section
This commit is contained in:
Sebastiaan van Stijn 2016-01-31 15:52:14 +01:00
commit 6f86bcee76
14 changed files with 1163 additions and 484 deletions

View file

@ -10,184 +10,203 @@ parent = "engine_driver"
# Docker and AUFS in practice
AUFS was the first storage driver in use with Docker. As a result, it has a long and close history with Docker, is very stable, has a lot of real-world deployments, and has strong community support. AUFS has several features that make it a good choice for Docker. These features enable:
AUFS was the first storage driver in use with Docker. As a result, it has a
long and close history with Docker, is very stable, has a lot of real-world
deployments, and has strong community support. AUFS has several features that
make it a good choice for Docker. These features enable:
- Fast container startup times.
- Efficient use of storage.
- Efficient use of memory.
Despite its capabilities and long history with Docker, some Linux distributions do not support AUFS. This is usually because AUFS is not included in the mainline (upstream) Linux kernel.
Despite its capabilities and long history with Docker, some Linux distributions
do not support AUFS. This is usually because AUFS is not included in the
mainline (upstream) Linux kernel.
The following sections examine some AUFS features and how they relate to Docker.
The following sections examine some AUFS features and how they relate to
Docker.
## Image layering and sharing with AUFS
AUFS is a *unification filesystem*. This means that it takes multiple directories on a single Linux host, stacks them on top of each other, and provides a single unified view. To achieve this, AUFS uses *union mount*.
AUFS is a *unification filesystem*. This means that it takes multiple
directories on a single Linux host, stacks them on top of each other, and
provides a single unified view. To achieve this, AUFS uses a *union mount*.
AUFS stacks multiple directories and exposes them as a unified view through a single mount point. All of the directories in the stack, as well as the union mount point, must all exist on the same Linux host. AUFS refers to each directory that it stacks as a *branch*.
AUFS stacks multiple directories and exposes them as a unified view through a
single mount point. All of the directories in the stack, as well as the union
mount point, must all exist on the same Linux host. AUFS refers to each
directory that it stacks as a *branch*.
Within Docker, AUFS union mounts enable image layering. The AUFS storage driver implements Docker image layers using this union mount system. AUFS branches correspond to Docker image layers. The diagram below shows a Docker container based on the `ubuntu:latest` image.
Within Docker, AUFS union mounts enable image layering. The AUFS storage driver
implements Docker image layers using this union mount system. AUFS branches
correspond to Docker image layers. The diagram below shows a Docker container
based on the `ubuntu:latest` image.
![](images/aufs_layers.jpg)
This diagram shows the relationship between the Docker image layers and the AUFS branches (directories) in `/var/lib/docker/aufs`. Each image layer and the container layer correspond to an AUFS branch (directory) in the Docker host's local storage area. The union mount point gives the unified view of all layers.
This diagram shows that each image layer, and the container layer, is
represented in the Docker hosts filesystem as a directory under
`/var/lib/docker/`. The union mount point provides the unified view of all
layers. As of Docker 1.10, image layer IDs do not correspond to the names of
the directories that contain their data.
AUFS also supports the copy-on-write technology (CoW). Not all storage drivers do.
AUFS also supports the copy-on-write technology (CoW). Not all storage drivers
do.
## Container reads and writes with AUFS
Docker leverages AUFS CoW technology to enable image sharing and minimize the use of disk space. AUFS works at the file level. This means that all AUFS CoW operations copy entire files - even if only a small part of the file is being modified. This behavior can have a noticeable impact on container performance, especially if the files being copied are large, below a lot of image layers, or the CoW operation must search a deep directory tree.
Docker leverages AUFS CoW technology to enable image sharing and minimize the
use of disk space. AUFS works at the file level. This means that all AUFS CoW
operations copy entire files - even if only a small part of the file is being
modified. This behavior can have a noticeable impact on container performance,
especially if the files being copied are large, below a lot of image layers,
or the CoW operation must search a deep directory tree.
Consider, for example, an application running in a container needs to add a single new value to a large key-value store (file). If this is the first time the file is modified it does not yet exist in the container's top writable layer. So, the CoW must *copy up* the file from the underlying image. The AUFS storage driver searches each image layer for the file. The search order is from top to bottom. When it is found, the entire file is *copied up* to the container's top writable layer. From there, it can be opened and modified.
Larger files obviously take longer to *copy up* than smaller files, and files that exist in lower image layers take longer to locate than those in higher layers. However, a *copy up* operation only occurs once per file on any given container. Subsequent reads and writes happen against the file's copy already *copied-up* to the container's top layer.
Consider, for example, an application running in a container needs to add a
single new value to a large key-value store (file). If this is the first time
the file is modified, it does not yet exist in the container's top writable
layer. So, the CoW must *copy up* the file from the underlying image. The AUFS
storage driver searches each image layer for the file. The search order is from
top to bottom. When it is found, the entire file is *copied up* to the
container's top writable layer. From there, it can be opened and modified.
Larger files obviously take longer to *copy up* than smaller files, and files
that exist in lower image layers take longer to locate than those in higher
layers. However, a *copy up* operation only occurs once per file on any given
container. Subsequent reads and writes happen against the file's copy already
*copied-up* to the container's top layer.
## Deleting files with the AUFS storage driver
The AUFS storage driver deletes a file from a container by placing a *whiteout
file* in the container's top layer. The whiteout file effectively obscures the
existence of the file in image's lower, read-only layers. The simplified
existence of the file in the read-only image layers below. The simplified
diagram below shows a container based on an image with three image layers.
![](images/aufs_delete.jpg)
The `file3` was deleted from the container. So, the AUFS storage driver placed
a whiteout file in the container's top layer. This whiteout file effectively
"deletes" `file3` from the container by obscuring any of the original file's
existence in the image's read-only base layer. Of course, the image could have
been in any of the other layers instead or in addition depending on how the
layers are built.
"deletes" `file3` from the container by obscuring any of the original file's
existence in the image's read-only layers. This works the same no matter which
of the image's read-only layers the file exists in.
## Configure Docker with AUFS
You can only use the AUFS storage driver on Linux systems with AUFS installed. Use the following command to determine if your system supports AUFS.
You can only use the AUFS storage driver on Linux systems with AUFS installed.
Use the following command to determine if your system supports AUFS.
```bash
$ grep aufs /proc/filesystems
nodev aufs
```
$ grep aufs /proc/filesystems
nodev aufs
This output indicates the system supports AUFS. Once you've verified your
This output indicates the system supports AUFS. Once you've verified your
system supports AUFS, you can must instruct the Docker daemon to use it. You do
this from the command line with the `docker daemon` command:
```bash
$ sudo docker daemon --storage-driver=aufs &
```
$ sudo docker daemon --storage-driver=aufs &
Alternatively, you can edit the Docker config file and add the
`--storage-driver=aufs` option to the `DOCKER_OPTS` line.
```bash
# Use DOCKER_OPTS to modify the daemon startup options.
DOCKER_OPTS="--storage-driver=aufs"
```
# Use DOCKER_OPTS to modify the daemon startup options.
DOCKER_OPTS="--storage-driver=aufs"
Once your daemon is running, verify the storage driver with the `docker info` command.
Once your daemon is running, verify the storage driver with the `docker info`
command.
```bash
$ sudo docker info
Containers: 1
Images: 4
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 6
Dirperm1 Supported: false
Execution Driver: native-0.2
...output truncated...
```
$ sudo docker info
Containers: 1
Images: 4
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 6
Dirperm1 Supported: false
Execution Driver: native-0.2
...output truncated...
The output above shows that the Docker daemon is running the AUFS storage driver on top of an existing ext4 backing filesystem.
The output above shows that the Docker daemon is running the AUFS storage
driver on top of an existing `ext4` backing filesystem.
## Local storage and AUFS
As the `docker daemon` runs with the AUFS driver, the driver stores images and containers on within the Docker host's local storage area in the `/var/lib/docker/aufs` directory.
As the `docker daemon` runs with the AUFS driver, the driver stores images and
containers within the Docker host's local storage area under
`/var/lib/docker/aufs/`.
### Images
Image layers and their contents are stored under
`/var/lib/docker/aufs/diff/<image-id>` directory. The contents of an image
layer in this location includes all the files and directories belonging in that
image layer.
`/var/lib/docker/aufs/diff/`. With Docker 1.10 and higher, image layer IDs do
not correspond to directory names.
The `/var/lib/docker/aufs/layers/` directory contains metadata about how image
layers are stacked. This directory contains one file for every image or
container layer on the Docker host. Inside each file are the image layers names
that exist below it. The diagram below shows an image with 4 layers.
container layer on the Docker host (though file names no longer match image
layer IDs). Inside each file are the names of the directories that exist below
it in the stack
![](images/aufs_metadata.jpg)
The command below shows the contents of a metadata file in
`/var/lib/docker/aufs/layers/` that lists the the three directories that are
stacked below it in the union mount. Remember, these directory names do no map
to image layer IDs with Docker 1.10 and higher.
Inspecting the contents of the file relating to the top layer of the image
shows the three image layers below it. They are listed in the order they are
stacked.
```bash
$ cat /var/lib/docker/aufs/layers/91e54dfb11794fad694460162bf0cb0a4fa710cfa3f60979c177d920813e267c
d74508fb6632491cea586a1fd7d748dfc5274cd6fdfedee309ecdcbc2bf5cb82
c22013c8472965aa5b62559f2b540cd440716ef149756e7b958a1b2aba421e87
d3a1f33e8a5a513092f01bb7eb1c2abf4d711e5105390a3fe1ae2248cfde1391
```
$ cat /var/lib/docker/aufs/layers/91e54dfb11794fad694460162bf0cb0a4fa710cfa3f60979c177d920813e267c
d74508fb6632491cea586a1fd7d748dfc5274cd6fdfedee309ecdcbc2bf5cb82
c22013c8472965aa5b62559f2b540cd440716ef149756e7b958a1b2aba421e87
d3a1f33e8a5a513092f01bb7eb1c2abf4d711e5105390a3fe1ae2248cfde1391
The base layer in an image has no image layers below it, so its file is empty.
### Containers
Running containers are mounted at locations in the
`/var/lib/docker/aufs/mnt/<container-id>` directory. This is the AUFS union
mount point that exposes the container and all underlying image layers as a
single unified view. If a container is not running, its directory still exists
but is empty. This is because containers are only mounted when they are running.
Running containers are mounted below `/var/lib/docker/aufs/mnt/<container-id>`.
This is where the AUFS union mount point that exposes the container and all
underlying image layers as a single unified view exists. If a container is not
running, it still has a directory here but it is empty. This is because AUFS
only mounts a container when it is running. With Docker 1.10 and higher,
container IDs no longer correspond to directory names under
`/var/lib/docker/aufs/mnt/<container-id>`.
Container metadata and various config files that are placed into the running
container are stored in `/var/lib/containers/<container-id>`. Files in this
directory exist for all containers on the system, including ones that are
stopped. However, when a container is running the container's log files are also
in this directory.
A container's thin writable layer is stored under
`/var/lib/docker/aufs/diff/<container-id>`. This directory is stacked by AUFS as
the containers top writable layer and is where all changes to the container are
stored. The directory exists even if the container is stopped. This means that
restarting a container will not lose changes made to it. Once a container is
deleted this directory is deleted.
Information about which image layers are stacked below a container's top
writable layer is stored in the following file
`/var/lib/docker/aufs/layers/<container-id>`. The command below shows that the
container with ID `b41a6e5a508d` has 4 image layers below it:
```bash
$ cat /var/lib/docker/aufs/layers/b41a6e5a508dfa02607199dfe51ed9345a675c977f2cafe8ef3e4b0b5773404e-init
91e54dfb11794fad694460162bf0cb0a4fa710cfa3f60979c177d920813e267c
d74508fb6632491cea586a1fd7d748dfc5274cd6fdfedee309ecdcbc2bf5cb82
c22013c8472965aa5b62559f2b540cd440716ef149756e7b958a1b2aba421e87
d3a1f33e8a5a513092f01bb7eb1c2abf4d711e5105390a3fe1ae2248cfde1391
```
The image layers are shown in order. In the output above, the layer starting
with image ID "d3a1..." is the image's base layer. The image layer starting
with "91e5..." is the image's topmost layer.
container are stored in `/var/lib/docker/containers/<container-id>`. Files in
this directory exist for all containers on the system, including ones that are
stopped. However, when a container is running the container's log files are
also in this directory.
A container's thin writable layer is stored in a directory under
`/var/lib/docker/aufs/diff/`. With Docker 1.10 and higher, container IDs no
longer correspond to directory names. However, the containers thin writable
layer still exists under here and is stacked by AUFS as the top writable layer
and is where all changes to the container are stored. The directory exists even
if the container is stopped. This means that restarting a container will not
lose changes made to it. Once a container is deleted, it's thin writable layer
in this directory is deleted.
## AUFS and Docker performance
To summarize some of the performance related aspects already mentioned:
- The AUFS storage driver is a good choice for PaaS and other similar use-cases where container density is important. This is because AUFS efficiently shares images between multiple running containers, enabling fast container start times and minimal use of disk space.
- The AUFS storage driver is a good choice for PaaS and other similar use-cases
where container density is important. This is because AUFS efficiently shares
images between multiple running containers, enabling fast container start times
and minimal use of disk space.
- The underlying mechanics of how AUFS shares files between image layers and containers uses the systems page cache very efficiently.
- The underlying mechanics of how AUFS shares files between image layers and
containers uses the systems page cache very efficiently.
- The AUFS storage driver can introduce significant latencies into container write performance. This is because the first time a container writes to any file, the file has be located and copied into the containers top writable layer. These latencies increase and are compounded when these files exist below many image layers and the files themselves are large.
- The AUFS storage driver can introduce significant latencies into container
write performance. This is because the first time a container writes to any
file, the file has be located and copied into the containers top writable
layer. These latencies increase and are compounded when these files exist below
many image layers and the files themselves are large.
One final point. Data volumes provide the best and most predictable performance.
This is because they bypass the storage driver and do not incur any of the
potential overheads introduced by thin provisioning and copy-on-write. For this
reason, you may want to place heavy write workloads on data volumes.
One final point. Data volumes provide the best and most predictable
performance. This is because they bypass the storage driver and do not incur
any of the potential overheads introduced by thin provisioning and
copy-on-write. For this reason, you may want to place heavy write workloads on
data volumes.
## Related information

View file

@ -13,126 +13,118 @@ parent = "engine_driver"
Btrfs is a next generation copy-on-write filesystem that supports many advanced
storage technologies that make it a good fit for Docker. Btrfs is included in
the mainline Linux kernel and its on-disk-format is now considered stable.
However, many of its features are still under heavy development and users should
consider it a fast-moving target.
However, many of its features are still under heavy development and users
should consider it a fast-moving target.
Docker's `btrfs` storage driver leverages many Btrfs features for image and
container management. Among these features are thin provisioning, copy-on-write,
and snapshotting.
container management. Among these features are thin provisioning,
copy-on-write, and snapshotting.
This article refers to Docker's Btrfs storage driver as `btrfs` and the overall Btrfs Filesystem as Btrfs.
This article refers to Docker's Btrfs storage driver as `btrfs` and the overall
Btrfs Filesystem as Btrfs.
>**Note**: The [Commercially Supported Docker Engine (CS-Engine)](https://www.docker.com/compatibility-maintenance) does not currently support the `btrfs` storage driver.
## The future of Btrfs
Btrfs has been long hailed as the future of Linux filesystems. With full support in the mainline Linux kernel, a stable on-disk-format, and active development with a focus on stability, this is now becoming more of a reality.
Btrfs has been long hailed as the future of Linux filesystems. With full
support in the mainline Linux kernel, a stable on-disk-format, and active
development with a focus on stability, this is now becoming more of a reality.
As far as Docker on the Linux platform goes, many people see the `btrfs` storage driver as a potential long-term replacement for the `devicemapper` storage driver. However, at the time of writing, the `devicemapper` storage driver should be considered safer, more stable, and more *production ready*. You should only consider the `btrfs` driver for production deployments if you understand it well and have existing experience with Btrfs.
As far as Docker on the Linux platform goes, many people see the `btrfs`
storage driver as a potential long-term replacement for the `devicemapper`
storage driver. However, at the time of writing, the `devicemapper` storage
driver should be considered safer, more stable, and more *production ready*.
You should only consider the `btrfs` driver for production deployments if you
understand it well and have existing experience with Btrfs.
## Image layering and sharing with Btrfs
Docker leverages Btrfs *subvolumes* and *snapshots* for managing the on-disk components of image and container layers. Btrfs subvolumes look and feel like a normal Unix filesystem. As such, they can have their own internal directory structure that hooks into the wider Unix filesystem.
Docker leverages Btrfs *subvolumes* and *snapshots* for managing the on-disk
components of image and container layers. Btrfs subvolumes look and feel like
a normal Unix filesystem. As such, they can have their own internal directory
structure that hooks into the wider Unix filesystem.
Subvolumes are natively copy-on-write and have space allocated to them on-demand
from an underlying storage pool. They can also be nested and snapped. The
diagram blow shows 4 subvolumes. 'Subvolume 2' and 'Subvolume 3' are nested,
whereas 'Subvolume 4' shows its own internal directory tree.
Subvolumes are natively copy-on-write and have space allocated to them
on-demand from an underlying storage pool. They can also be nested and snapped.
The diagram blow shows 4 subvolumes. 'Subvolume 2' and 'Subvolume 3' are
nested, whereas 'Subvolume 4' shows its own internal directory tree.
![](images/btfs_subvolume.jpg)
Snapshots are a point-in-time read-write copy of an entire subvolume. They exist directly below the subvolume they were created from. You can create snapshots of snapshots as shown in the diagram below.
Snapshots are a point-in-time read-write copy of an entire subvolume. They
exist directly below the subvolume they were created from. You can create
snapshots of snapshots as shown in the diagram below.
![](images/btfs_snapshots.jpg)
Btfs allocates space to subvolumes and snapshots on demand from an underlying pool of storage. The unit of allocation is referred to as a *chunk* and *chunks* are normally ~1GB in size.
Btfs allocates space to subvolumes and snapshots on demand from an underlying
pool of storage. The unit of allocation is referred to as a *chunk*, and
*chunks* are normally ~1GB in size.
Snapshots are first-class citizens in a Btrfs filesystem. This means that they look, feel, and operate just like regular subvolumes. The technology required to create them is built directly into the Btrfs filesystem thanks to its native copy-on-write design. This means that Btrfs snapshots are space efficient with little or no performance overhead. The diagram below shows a subvolume and its snapshot sharing the same data.
Snapshots are first-class citizens in a Btrfs filesystem. This means that they
look, feel, and operate just like regular subvolumes. The technology required
to create them is built directly into the Btrfs filesystem thanks to its
native copy-on-write design. This means that Btrfs snapshots are space
efficient with little or no performance overhead. The diagram below shows a
subvolume and its snapshot sharing the same data.
![](images/btfs_pool.jpg)
Docker's `btrfs` storage driver stores every image layer and container in its own Btrfs subvolume or snapshot. The base layer of an image is stored as a subvolume whereas child image layers and containers are stored as snapshots. This is shown in the diagram below.
Docker's `btrfs` storage driver stores every image layer and container in its
own Btrfs subvolume or snapshot. The base layer of an image is stored as a
subvolume whereas child image layers and containers are stored as snapshots.
This is shown in the diagram below.
![](images/btfs_container_layer.jpg)
The high level process for creating images and containers on Docker hosts running the `btrfs` driver is as follows:
The high level process for creating images and containers on Docker hosts
running the `btrfs` driver is as follows:
1. The image's base layer is stored in a Btrfs subvolume under
1. The image's base layer is stored in a Btrfs *subvolume* under
`/var/lib/docker/btrfs/subvolumes`.
The image ID is used as the subvolume name. E.g., a base layer with image ID
"f9a9f253f6105141e0f8e091a6bcdb19e3f27af949842db93acba9048ed2410b" will be
stored in
`/var/lib/docker/btrfs/subvolumes/f9a9f253f6105141e0f8e091a6bcdb19e3f27af949842db93acba9048ed2410b`
2. Subsequent image layers are stored as a Btrfs *snapshot* of the parent
layer's subvolume or snapshot.
2. Subsequent image layers are stored as a Btrfs snapshot of the parent layer's subvolume or snapshot.
The diagram below shows a three-layer image. The base layer is a subvolume. Layer 1 is a snapshot of the base layer's subvolume. Layer 2 is a snapshot of Layer 1's snapshot.
The diagram below shows a three-layer image. The base layer is a subvolume.
Layer 1 is a snapshot of the base layer's subvolume. Layer 2 is a snapshot of
Layer 1's snapshot.
![](images/btfs_constructs.jpg)
As of Docker 1.10, image layer IDs no longer correspond to directory names
under `/var/lib/docker/`.
## Image and container on-disk constructs
Image layers and containers are visible in the Docker host's filesystem at
`/var/lib/docker/btrfs/subvolumes/<image-id> OR <container-id>`. Directories for
`/var/lib/docker/btrfs/subvolumes/`. However, as previously stated, directory
names no longer correspond to image layer IDs. That said, directories for
containers are present even for containers with a stopped status. This is
because the `btrfs` storage driver mounts a default, top-level subvolume at
`/var/lib/docker/subvolumes`. All other subvolumes and snapshots exist below
that as Btrfs filesystem objects and not as individual mounts.
The following example shows a single Docker image with four image layers.
```bash
$ sudo docker images -a
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
ubuntu latest 0a17decee413 2 weeks ago 188.3 MB
<none> <none> 3c9a9d7cc6a2 2 weeks ago 188.3 MB
<none> <none> eeb7cb91b09d 2 weeks ago 188.3 MB
<none> <none> f9a9f253f610 2 weeks ago 188.1 MB
```
Each image layer exists as a Btrfs subvolume or snapshot with the same name as its image ID as illustrated by the `btrfs subvolume list` command shown below:
```bash
$ sudo btrfs subvolume list /var/lib/docker
ID 257 gen 9 top level 5 path btrfs/subvolumes/f9a9f253f6105141e0f8e091a6bcdb19e3f27af949842db93acba9048ed2410b
ID 258 gen 10 top level 5 path btrfs/subvolumes/eeb7cb91b09d5de9edb2798301aeedf50848eacc2123e98538f9d014f80f243c
ID 260 gen 11 top level 5 path btrfs/subvolumes/3c9a9d7cc6a235eb2de58ca9ef3551c67ae42a991933ba4958d207b29142902b
ID 261 gen 12 top level 5 path btrfs/subvolumes/0a17decee4139b0de68478f149cc16346f5e711c5ae3bb969895f22dd6723751
```
Under the `/var/lib/docker/btrfs/subvolumes` directory, each of these subvolumes and snapshots are visible as a normal Unix directory:
```bash
$ ls -l /var/lib/docker/btrfs/subvolumes/
total 0
drwxr-xr-x 1 root root 132 Oct 16 14:44 0a17decee4139b0de68478f149cc16346f5e711c5ae3bb969895f22dd6723751
drwxr-xr-x 1 root root 132 Oct 16 14:44 3c9a9d7cc6a235eb2de58ca9ef3551c67ae42a991933ba4958d207b29142902b
drwxr-xr-x 1 root root 132 Oct 16 14:44 eeb7cb91b09d5de9edb2798301aeedf50848eacc2123e98538f9d014f80f243c
drwxr-xr-x 1 root root 132 Oct 16 14:44 f9a9f253f6105141e0f8e091a6bcdb19e3f27af949842db93acba9048ed2410b
```
Because Btrfs works at the filesystem level and not the block level, each image
and container layer can be browsed in the filesystem using normal Unix commands.
The example below shows a truncated output of an `ls -l` command against the
image's top layer:
and container layer can be browsed in the filesystem using normal Unix
commands. The example below shows a truncated output of an `ls -l` command an
image layer:
```bash
$ ls -l /var/lib/docker/btrfs/subvolumes/0a17decee4139b0de68478f149cc16346f5e711c5ae3bb969895f22dd6723751/
total 0
drwxr-xr-x 1 root root 1372 Oct 9 08:39 bin
drwxr-xr-x 1 root root 0 Apr 10 2014 boot
drwxr-xr-x 1 root root 882 Oct 9 08:38 dev
drwxr-xr-x 1 root root 2040 Oct 12 17:27 etc
drwxr-xr-x 1 root root 0 Apr 10 2014 home
...output truncated...
```
$ ls -l /var/lib/docker/btrfs/subvolumes/0a17decee4139b0de68478f149cc16346f5e711c5ae3bb969895f22dd6723751/
total 0
drwxr-xr-x 1 root root 1372 Oct 9 08:39 bin
drwxr-xr-x 1 root root 0 Apr 10 2014 boot
drwxr-xr-x 1 root root 882 Oct 9 08:38 dev
drwxr-xr-x 1 root root 2040 Oct 12 17:27 etc
drwxr-xr-x 1 root root 0 Apr 10 2014 home
...output truncated...
## Container reads and writes with Btrfs
A container is a space-efficient snapshot of an image. Metadata in the snapshot
points to the actual data blocks in the storage pool. This is the same as with a
subvolume. Therefore, reads performed against a snapshot are essentially the
points to the actual data blocks in the storage pool. This is the same as with
a subvolume. Therefore, reads performed against a snapshot are essentially the
same as reads performed against a subvolume. As a result, no performance
overhead is incurred from the Btrfs driver.
@ -145,28 +137,34 @@ new files to a container's snapshot operate at native Btrfs speeds.
Updating an existing file in a container causes a copy-on-write operation
(technically *redirect-on-write*). The driver leaves the original data and
allocates new space to the snapshot. The updated data is written to this new
space. Then, the driver updates the filesystem metadata in the snapshot to point
to this new data. The original data is preserved in-place for subvolumes and
snapshots further up the tree. This behavior is native to copy-on-write
space. Then, the driver updates the filesystem metadata in the snapshot to
point to this new data. The original data is preserved in-place for subvolumes
and snapshots further up the tree. This behavior is native to copy-on-write
filesystems like Btrfs and incurs very little overhead.
With Btfs, writing and updating lots of small files can result in slow performance. More on this later.
With Btfs, writing and updating lots of small files can result in slow
performance. More on this later.
## Configuring Docker with Btrfs
The `btrfs` storage driver only operates on a Docker host where `/var/lib/docker` is mounted as a Btrfs filesystem. The following procedure shows how to configure Btrfs on Ubuntu 14.04 LTS.
The `btrfs` storage driver only operates on a Docker host where
`/var/lib/docker` is mounted as a Btrfs filesystem. The following procedure
shows how to configure Btrfs on Ubuntu 14.04 LTS.
### Prerequisites
If you have already used the Docker daemon on your Docker host and have images you want to keep, `push` them to Docker Hub or your private Docker Trusted Registry before attempting this procedure.
If you have already used the Docker daemon on your Docker host and have images
you want to keep, `push` them to Docker Hub or your private Docker Trusted
Registry before attempting this procedure.
Stop the Docker daemon. Then, ensure that you have a spare block device at `/dev/xvdb`. The device identifier may be different in your environment and you should substitute your own values throughout the procedure.
Stop the Docker daemon. Then, ensure that you have a spare block device at
`/dev/xvdb`. The device identifier may be different in your environment and you
should substitute your own values throughout the procedure.
The procedure also assumes your kernel has the appropriate Btrfs modules loaded. To verify this, use the following command:
The procedure also assumes your kernel has the appropriate Btrfs modules
loaded. To verify this, use the following command:
```bash
$ cat /proc/filesystems | grep btrfs
```
$ cat /proc/filesystems | grep btrfs
### Configure Btrfs on Ubuntu 14.04 LTS
@ -181,7 +179,9 @@ Assuming your system meets the prerequisites, do the following:
2. Create the Btrfs storage pool.
Btrfs storage pools are created with the `mkfs.btrfs` command. Passing multiple devices to the `mkfs.btrfs` command creates a pool across all of those devices. Here you create a pool with a single device at `/dev/xvdb`.
Btrfs storage pools are created with the `mkfs.btrfs` command. Passing
multiple devices to the `mkfs.btrfs` command creates a pool across all of those
devices. Here you create a pool with a single device at `/dev/xvdb`.
$ sudo mkfs.btrfs -f /dev/xvdb
WARNING! - Btrfs v3.12 IS EXPERIMENTAL
@ -199,7 +199,8 @@ Assuming your system meets the prerequisites, do the following:
noted earlier, Btrfs is not currently recommended for production deployments
unless you already have extensive experience.
3. If it does not already exist, create a directory for the Docker host's local storage area at `/var/lib/docker`.
3. If it does not already exist, create a directory for the Docker host's local
storage area at `/var/lib/docker`.
$ sudo mkdir /var/lib/docker
@ -210,7 +211,10 @@ Assuming your system meets the prerequisites, do the following:
$ sudo blkid /dev/xvdb
/dev/xvdb: UUID="a0ed851e-158b-4120-8416-c9b072c8cf47" UUID_SUB="c3927a64-4454-4eef-95c2-a7d44ac0cf27" TYPE="btrfs"
b. Create a `/etc/fstab` entry to automatically mount `/var/lib/docker` each time the system boots.
b. Create an `/etc/fstab` entry to automatically mount `/var/lib/docker`
each time the system boots. Either of the following lines will work, just
remember to substitute the UUID value with the value obtained from the previous
command.
/dev/xvdb /var/lib/docker btrfs defaults 0 0
UUID="a0ed851e-158b-4120-8416-c9b072c8cf47" /var/lib/docker btrfs defaults 0 0
@ -223,10 +227,11 @@ Assuming your system meets the prerequisites, do the following:
<output truncated>
/dev/xvdb on /var/lib/docker type btrfs (rw)
The last line in the output above shows the `/dev/xvdb` mounted at `/var/lib/docker` as Btrfs.
The last line in the output above shows the `/dev/xvdb` mounted at
`/var/lib/docker` as Btrfs.
Now that you have a Btrfs filesystem mounted at `/var/lib/docker`, the daemon should automatically load with the `btrfs` storage driver.
Now that you have a Btrfs filesystem mounted at `/var/lib/docker`, the daemon
should automatically load with the `btrfs` storage driver.
1. Start the Docker daemon.
@ -236,9 +241,10 @@ Now that you have a Btrfs filesystem mounted at `/var/lib/docker`, the daemon sh
The procedure for starting the Docker daemon may differ depending on the
Linux distribution you are using.
You can start the Docker daemon with the `btrfs` storage driver by passing
the `--storage-driver=btrfs` flag to the `docker daemon` command or you can
add the `DOCKER_OPTS` line to the Docker config file.
You can force the the Docker daemon to start with the `btrfs` storage
driver by either passing the `--storage-driver=btrfs` flag to the `docker
daemon` at startup, or adding it to the `DOCKER_OPTS` line to the Docker config
file.
2. Verify the storage driver with the `docker info` command.
@ -252,25 +258,54 @@ Your Docker host is now configured to use the `btrfs` storage driver.
## Btrfs and Docker performance
There are several factors that influence Docker's performance under the `btrfs` storage driver.
There are several factors that influence Docker's performance under the `btrfs`
storage driver.
- **Page caching**. Btrfs does not support page cache sharing. This means that *n* containers accessing the same file require *n* copies to be cached. As a result, the `btrfs` driver may not be the best choice for PaaS and other high density container use cases.
- **Page caching**. Btrfs does not support page cache sharing. This means that
*n* containers accessing the same file require *n* copies to be cached. As a
result, the `btrfs` driver may not be the best choice for PaaS and other high
density container use cases.
- **Small writes**. Containers performing lots of small writes (including Docker hosts that start and stop many containers) can lead to poor use of Btrfs chunks. This can ultimately lead to out-of-space conditions on your Docker host and stop it working. This is currently a major drawback to using current versions of Btrfs.
- **Small writes**. Containers performing lots of small writes (including
Docker hosts that start and stop many containers) can lead to poor use of Btrfs
chunks. This can ultimately lead to out-of-space conditions on your Docker
host and stop it working. This is currently a major drawback to using current
versions of Btrfs.
If you use the `btrfs` storage driver, closely monitor the free space on your Btrfs filesystem using the `btrfs filesys show` command. Do not trust the output of normal Unix commands such as `df`; always use the Btrfs native commands.
If you use the `btrfs` storage driver, closely monitor the free space on
your Btrfs filesystem using the `btrfs filesys show` command. Do not trust the
output of normal Unix commands such as `df`; always use the Btrfs native
commands.
- **Sequential writes**. Btrfs writes data to disk via journaling technique. This can impact sequential writes, where performance can be up to half.
- **Sequential writes**. Btrfs writes data to disk via journaling technique.
This can impact sequential writes, where performance can be up to half.
- **Fragmentation**. Fragmentation is a natural byproduct of copy-on-write filesystems like Btrfs. Many small random writes can compound this issue. It can manifest as CPU spikes on Docker hosts using SSD media and head thrashing on Docker hosts using spinning media. Both of these result in poor performance.
- **Fragmentation**. Fragmentation is a natural byproduct of copy-on-write
filesystems like Btrfs. Many small random writes can compound this issue. It
can manifest as CPU spikes on Docker hosts using SSD media and head thrashing
on Docker hosts using spinning media. Both of these result in poor performance.
Recent versions of Btrfs allow you to specify `autodefrag` as a mount option. This mode attempts to detect random writes and defragment them. You should perform your own tests before enabling this option on your Docker hosts. Some tests have shown this option has a negative performance impact on Docker hosts performing lots of small writes (including systems that start and stop many containers).
Recent versions of Btrfs allow you to specify `autodefrag` as a mount
option. This mode attempts to detect random writes and defragment them. You
should perform your own tests before enabling this option on your Docker hosts.
Some tests have shown this option has a negative performance impact on Docker
hosts performing lots of small writes (including systems that start and stop
many containers).
- **Solid State Devices (SSD)**. Btrfs has native optimizations for SSD media. To enable these, mount with the `-o ssd` mount option. These optimizations include enhanced SSD write performance by avoiding things like *seek optimizations* that have no use on SSD media.
- **Solid State Devices (SSD)**. Btrfs has native optimizations for SSD media.
To enable these, mount with the `-o ssd` mount option. These optimizations
include enhanced SSD write performance by avoiding things like *seek
optimizations* that have no use on SSD media.
Btfs also supports the TRIM/Discard primitives. However, mounting with the `-o discard` mount option can cause performance issues. Therefore, it is recommended you perform your own tests before using this option.
Btfs also supports the TRIM/Discard primitives. However, mounting with the
`-o discard` mount option can cause performance issues. Therefore, it is
recommended you perform your own tests before using this option.
- **Use Data Volumes**. Data volumes provide the best and most predictable performance. This is because they bypass the storage driver and do not incur any of the potential overheads introduced by thin provisioning and copy-on-write. For this reason, you may want to place heavy write workloads on data volumes.
- **Use Data Volumes**. Data volumes provide the best and most predictable
performance. This is because they bypass the storage driver and do not incur
any of the potential overheads introduced by thin provisioning and
copy-on-write. For this reason, you should place heavy write workloads on data
volumes.
## Related Information

View file

@ -51,56 +51,84 @@ Device Mapper technology works at the block level rather than the file level.
This means that `devicemapper` storage driver's thin provisioning and
copy-on-write operations work with blocks rather than entire files.
>**Note**: Snapshots are also referred to as *thin devices* or *virtual devices*. They all mean the same thing in the context of the `devicemapper` storage driver.
>**Note**: Snapshots are also referred to as *thin devices* or *virtual
>devices*. They all mean the same thing in the context of the `devicemapper`
>storage driver.
With the `devicemapper` the high level process for creating images is as follows:
With `devicemapper` the high level process for creating images is as follows:
1. The `devicemapper` storage driver creates a thin pool.
The pool is created from block devices or loop mounted sparse files (more on this later).
The pool is created from block devices or loop mounted sparse files (more
on this later).
2. Next it creates a *base device*.
A base device is a thin device with a filesystem. You can see which filesystem is in use by running the `docker info` command and checking the `Backing filesystem` value.
A base device is a thin device with a filesystem. You can see which
filesystem is in use by running the `docker info` command and checking the
`Backing filesystem` value.
3. Each new image (and image layer) is a snapshot of this base device.
These are thin provisioned copy-on-write snapshots. This means that they are initially empty and only consume space from the pool when data is written to them.
These are thin provisioned copy-on-write snapshots. This means that they
are initially empty and only consume space from the pool when data is written
to them.
With `devicemapper`, container layers are snapshots of the image they are created from. Just as with images, container snapshots are thin provisioned copy-on-write snapshots. The container snapshot stores all updates to the container. The `devicemapper` allocates space to them on-demand from the pool as and when data is written to the container.
With `devicemapper`, container layers are snapshots of the image they are
created from. Just as with images, container snapshots are thin provisioned
copy-on-write snapshots. The container snapshot stores all updates to the
container. The `devicemapper` allocates space to them on-demand from the pool
as and when data is written to the container.
The high level diagram below shows a thin pool with a base device and two images.
The high level diagram below shows a thin pool with a base device and two
images.
![](images/base_device.jpg)
If you look closely at the diagram you'll see that it's snapshots all the way down. Each image layer is a snapshot of the layer below it. The lowest layer of each image is a snapshot of the the base device that exists in the pool. This base device is a `Device Mapper` artifact and not a Docker image layer.
If you look closely at the diagram you'll see that it's snapshots all the way
down. Each image layer is a snapshot of the layer below it. The lowest layer of
each image is a snapshot of the the base device that exists in the pool. This
base device is a `Device Mapper` artifact and not a Docker image layer.
A container is a snapshot of the image it is created from. The diagram below shows two containers - one based on the Ubuntu image and the other based on the Busybox image.
A container is a snapshot of the image it is created from. The diagram below
shows two containers - one based on the Ubuntu image and the other based on the
Busybox image.
![](images/two_dm_container.jpg)
## Reads with the devicemapper
Let's look at how reads and writes occur using the `devicemapper` storage driver. The diagram below shows the high level process for reading a single block (`0x44f`) in an example container.
Let's look at how reads and writes occur using the `devicemapper` storage
driver. The diagram below shows the high level process for reading a single
block (`0x44f`) in an example container.
![](images/dm_container.jpg)
1. An application makes a read request for block 0x44f in the container.
1. An application makes a read request for block `0x44f` in the container.
Because the container is a thin snapshot of an image it does not have the data. Instead, it has a pointer (PTR) to where the data is stored in the image snapshot lower down in the image stack.
Because the container is a thin snapshot of an image it does not have the
data. Instead, it has a pointer (PTR) to where the data is stored in the image
snapshot lower down in the image stack.
2. The storage driver follows the pointer to block `0xf33` in the snapshot relating to image layer `a005...`.
2. The storage driver follows the pointer to block `0xf33` in the snapshot
relating to image layer `a005...`.
3. The `devicemapper` copies the contents of block `0xf33` from the image snapshot to memory in the container.
3. The `devicemapper` copies the contents of block `0xf33` from the image
snapshot to memory in the container.
4. The storage driver returns the data to the requesting application.
### Write examples
With the `devicemapper` driver, writing new data to a container is accomplished by an *allocate-on-demand* operation. Updating existing data uses a copy-on-write operation. Because Device Mapper is a block-based technology these operations occur at the block level.
With the `devicemapper` driver, writing new data to a container is accomplished
by an *allocate-on-demand* operation. Updating existing data uses a
copy-on-write operation. Because Device Mapper is a block-based technology
these operations occur at the block level.
For example, when making a small change to a large file in a container, the `devicemapper` storage driver does not copy the entire file. It only copies the blocks to be modified. Each block is 64KB.
For example, when making a small change to a large file in a container, the
`devicemapper` storage driver does not copy the entire file. It only copies the
blocks to be modified. Each block is 64KB.
#### Writing new data
@ -108,9 +136,11 @@ To write 56KB of new data to a container:
1. An application makes a request to write 56KB of new data to the container.
2. The allocate-on-demand operation allocates a single new 64KB block to the containers snapshot.
2. The allocate-on-demand operation allocates a single new 64KB block to the
container's snapshot.
If the write operation is larger than 64KB, multiple new blocks are allocated to the container snapshot.
If the write operation is larger than 64KB, multiple new blocks are
allocated to the container's snapshot.
3. The data is written to the newly allocated block.
@ -122,7 +152,8 @@ To modify existing data for the first time:
2. A copy-on-write operation locates the blocks that need updating.
3. The operation allocates new blocks to the container snapshot and copies the data into those blocks.
3. The operation allocates new empty blocks to the container snapshot and
copies the data into those blocks.
4. The modified data is written into the newly allocated blocks.
@ -133,7 +164,8 @@ to the application's read and write operations.
## Configuring Docker with Device Mapper
The `devicemapper` is the default Docker storage driver on some Linux
distributions. This includes RHEL and most of its forks. Currently, the following distributions support the driver:
distributions. This includes RHEL and most of its forks. Currently, the
following distributions support the driver:
* RHEL/CentOS/Fedora
* Ubuntu 12.04
@ -142,9 +174,9 @@ distributions. This includes RHEL and most of its forks. Currently, the followin
Docker hosts running the `devicemapper` storage driver default to a
configuration mode known as `loop-lvm`. This mode uses sparse files to build
the thin pool used by image and container snapshots. The mode is designed to work out-of-the-box
with no additional configuration. However, production deployments should not run
under `loop-lvm` mode.
the thin pool used by image and container snapshots. The mode is designed to
work out-of-the-box with no additional configuration. However, production
deployments should not run under `loop-lvm` mode.
You can detect the mode by viewing the `docker info` command:
@ -161,56 +193,84 @@ You can detect the mode by viewing the `docker info` command:
Library Version: 1.02.93-RHEL7 (2015-01-28)
...
The output above shows a Docker host running with the `devicemapper` storage driver operating in `loop-lvm` mode. This is indicated by the fact that the `Data loop file` and a `Metadata loop file` are on files under `/var/lib/docker/devicemapper/devicemapper`. These are loopback mounted sparse files.
The output above shows a Docker host running with the `devicemapper` storage
driver operating in `loop-lvm` mode. This is indicated by the fact that the
`Data loop file` and a `Metadata loop file` are on files under
`/var/lib/docker/devicemapper/devicemapper`. These are loopback mounted sparse
files.
### Configure direct-lvm mode for production
The preferred configuration for production deployments is `direct lvm`. This
mode uses block devices to create the thin pool. The following procedure shows
you how to configure a Docker host to use the `devicemapper` storage driver in a
`direct-lvm` configuration.
you how to configure a Docker host to use the `devicemapper` storage driver in
a `direct-lvm` configuration.
> **Caution:** If you have already run the Docker daemon on your Docker host and have images you want to keep, `push` them Docker Hub or your private Docker Trusted Registry before attempting this procedure.
> **Caution:** If you have already run the Docker daemon on your Docker host
> and have images you want to keep, `push` them Docker Hub or your private
> Docker Trusted Registry before attempting this procedure.
The procedure below will create a 90GB data volume and 4GB metadata volume to use as backing for the storage pool. It assumes that you have a spare block device at `/dev/xvdf` with enough free space to complete the task. The device identifier and volume sizes may be be different in your environment and you should substitute your own values throughout the procedure. The procedure also assumes that the Docker daemon is in the `stopped` state.
The procedure below will create a 90GB data volume and 4GB metadata volume to
use as backing for the storage pool. It assumes that you have a spare block
device at `/dev/xvdf` with enough free space to complete the task. The device
identifier and volume sizes may be be different in your environment and you
should substitute your own values throughout the procedure. The procedure also
assumes that the Docker daemon is in the `stopped` state.
1. Log in to the Docker host you want to configure and stop the Docker daemon.
2. If it exists, delete your existing image store by removing the `/var/lib/docker` directory.
2. If it exists, delete your existing image store by removing the
`/var/lib/docker` directory.
$ sudo rm -rf /var/lib/docker
3. Create an LVM physical volume (PV) on your spare block device using the `pvcreate` command.
3. Create an LVM physical volume (PV) on your spare block device using the
`pvcreate` command.
$ sudo pvcreate /dev/xvdf
Physical volume `/dev/xvdf` successfully created
The device identifier may be different on your system. Remember to substitute your value in the command above.
The device identifier may be different on your system. Remember to
substitute your value in the command above.
4. Create a new volume group (VG) called `vg-docker` using the PV created in the previous step.
4. Create a new volume group (VG) called `vg-docker` using the PV created in
the previous step.
$ sudo vgcreate vg-docker /dev/xvdf
Volume group `vg-docker` successfully created
5. Create a new 90GB logical volume (LV) called `data` from space in the `vg-docker` volume group.
5. Create a new 90GB logical volume (LV) called `data` from space in the
`vg-docker` volume group.
$ sudo lvcreate -L 90G -n data vg-docker
Logical volume `data` created.
The command creates an LVM logical volume called `data` and an associated block device file at `/dev/vg-docker/data`. In a later step, you instruct the `devicemapper` storage driver to use this block device to store image and container data.
The command creates an LVM logical volume called `data` and an associated
block device file at `/dev/vg-docker/data`. In a later step, you instruct the
`devicemapper` storage driver to use this block device to store image and
container data.
If you receive a signature detection warning, make sure you are working on the correct devices before continuing. Signature warnings indicate that the device you're working on is currently in use by LVM or has been used by LVM in the past.
If you receive a signature detection warning, make sure you are working on
the correct devices before continuing. Signature warnings indicate that the
device you're working on is currently in use by LVM or has been used by LVM in
the past.
6. Create a new logical volume (LV) called `metadata` from space in the `vg-docker` volume group.
6. Create a new logical volume (LV) called `metadata` from space in the
`vg-docker` volume group.
$ sudo lvcreate -L 4G -n metadata vg-docker
Logical volume `metadata` created.
This creates an LVM logical volume called `metadata` and an associated block device file at `/dev/vg-docker/metadata`. In the next step you instruct the `devicemapper` storage driver to use this block device to store image and container metadata.
This creates an LVM logical volume called `metadata` and an associated
block device file at `/dev/vg-docker/metadata`. In the next step you instruct
the `devicemapper` storage driver to use this block device to store image and
container metadata.
5. Start the Docker daemon with the `devicemapper` storage driver and the `--storage-opt` flags.
7. Start the Docker daemon with the `devicemapper` storage driver and the
`--storage-opt` flags.
The `data` and `metadata` devices that you pass to the `--storage-opt` options were created in the previous steps.
The `data` and `metadata` devices that you pass to the `--storage-opt`
options were created in the previous steps.
$ sudo docker daemon --storage-driver=devicemapper --storage-opt dm.datadev=/dev/vg-docker/data --storage-opt dm.metadatadev=/dev/vg-docker/metadata &
[1] 2163
@ -221,11 +281,12 @@ The procedure below will create a 90GB data volume and 4GB metadata volume to us
INFO[0027] Daemon has completed initialization
INFO[0027] Docker daemon commit=0a8c2e3 execdriver=native-0.2 graphdriver=devicemapper version=1.8.2
It is also possible to set the `--storage-driver` and `--storage-opt` flags in
the Docker config file and start the daemon normally using the `service` or
`systemd` commands.
It is also possible to set the `--storage-driver` and `--storage-opt` flags
in the Docker config file and start the daemon normally using the `service` or
`systemd` commands.
6. Use the `docker info` command to verify that the daemon is using `data` and `metadata` devices you created.
8. Use the `docker info` command to verify that the daemon is using `data` and
`metadata` devices you created.
$ sudo docker info
INFO[0180] GET /v1.20/info
@ -239,11 +300,14 @@ The procedure below will create a 90GB data volume and 4GB metadata volume to us
Metadata file: /dev/vg-docker/metadata
[...]
The output of the command above shows the storage driver as `devicemapper`. The last two lines also confirm that the correct devices are being used for the `Data file` and the `Metadata file`.
The output of the command above shows the storage driver as `devicemapper`.
The last two lines also confirm that the correct devices are being used for
the `Data file` and the `Metadata file`.
### Examine devicemapper structures on the host
You can use the `lsblk` command to see the device files created above and the `pool` that the `devicemapper` storage driver creates on top of them.
You can use the `lsblk` command to see the device files created above and the
`pool` that the `devicemapper` storage driver creates on top of them.
$ sudo lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
@ -255,11 +319,14 @@ You can use the `lsblk` command to see the device files created above and the `p
└─vg--docker-metadata 253:1 0 4G 0 lvm
└─docker-202:1-1032-pool 253:2 0 10G 0 dm
The diagram below shows the image from prior examples updated with the detail from the `lsblk` command above.
The diagram below shows the image from prior examples updated with the detail
from the `lsblk` command above.
![](http://farm1.staticflickr.com/703/22116692899_0471e5e160_b.jpg)
In the diagram, the pool is named `Docker-202:1-1032-pool` and spans the `data` and `metadata` devices created earlier. The `devicemapper` constructs the pool name as follows:
In the diagram, the pool is named `Docker-202:1-1032-pool` and spans the `data`
and `metadata` devices created earlier. The `devicemapper` constructs the pool
name as follows:
```
Docker-MAJ:MIN-INO-pool
@ -268,41 +335,74 @@ Docker-MAJ:MIN-INO-pool
`MAJ`, `MIN` and `INO` refer to the major and minor device numbers and inode.
Because Device Mapper operates at the block level it is more difficult to see
diffs between image layers and containers. However, there are two key
directories. The `/var/lib/docker/devicemapper/mnt` directory contains the mount
points for images and containers. The `/var/lib/docker/devicemapper/metadata`
directory contains one file for every image and container snapshot. The files
contain metadata about each snapshot in JSON format.
diffs between image layers and containers. Docker 1.10 and later no longer
matches image layer IDs with directory names in `/var/lib/docker`. However,
there are two key directories. The `/var/lib/docker/devicemapper/mnt` directory
contains the mount points for image and container layers. The
`/var/lib/docker/devicemapper/metadata`directory contains one file for every
image layer and container snapshot. The files contain metadata about each
snapshot in JSON format.
## Device Mapper and Docker performance
It is important to understand the impact that allocate-on-demand and copy-on-write operations can have on overall container performance.
It is important to understand the impact that allocate-on-demand and
copy-on-write operations can have on overall container performance.
### Allocate-on-demand performance impact
The `devicemapper` storage driver allocates new blocks to a container via an allocate-on-demand operation. This means that each time an app writes to somewhere new inside a container, one or more empty blocks has to be located from the pool and mapped into the container.
The `devicemapper` storage driver allocates new blocks to a container via an
allocate-on-demand operation. This means that each time an app writes to
somewhere new inside a container, one or more empty blocks has to be located
from the pool and mapped into the container.
All blocks are 64KB. A write that uses less than 64KB still results in a single 64KB block being allocated. Writing more than 64KB of data uses multiple 64KB blocks. This can impact container performance, especially in containers that perform lots of small writes. However, once a block is allocated to a container subsequent reads and writes can operate directly on that block.
All blocks are 64KB. A write that uses less than 64KB still results in a single
64KB block being allocated. Writing more than 64KB of data uses multiple 64KB
blocks. This can impact container performance, especially in containers that
perform lots of small writes. However, once a block is allocated to a container
subsequent reads and writes can operate directly on that block.
### Copy-on-write performance impact
Each time a container updates existing data for the first time, the `devicemapper` storage driver has to perform a copy-on-write operation. This copies the data from the image snapshot to the container's snapshot. This process can have a noticeable impact on container performance.
Each time a container updates existing data for the first time, the
`devicemapper` storage driver has to perform a copy-on-write operation. This
copies the data from the image snapshot to the container's snapshot. This
process can have a noticeable impact on container performance.
All copy-on-write operations have a 64KB granularity. As a results, updating 32KB of a 1GB file causes the driver to copy a single 64KB block into the container's snapshot. This has obvious performance advantages over file-level copy-on-write operations which would require copying the entire 1GB file into the container layer.
All copy-on-write operations have a 64KB granularity. As a results, updating
32KB of a 1GB file causes the driver to copy a single 64KB block into the
container's snapshot. This has obvious performance advantages over file-level
copy-on-write operations which would require copying the entire 1GB file into
the container layer.
In practice, however, containers that perform lots of small block writes (<64KB) can perform worse with `devicemapper` than with AUFS.
In practice, however, containers that perform lots of small block writes
(<64KB) can perform worse with `devicemapper` than with AUFS.
### Other device mapper performance considerations
There are several other things that impact the performance of the `devicemapper` storage driver..
There are several other things that impact the performance of the
`devicemapper` storage driver.
- **The mode.** The default mode for Docker running the `devicemapper` storage driver is `loop-lvm`. This mode uses sparse files and suffers from poor performance. It is **not recommended for production**. The recommended mode for production environments is `direct-lvm` where the storage driver writes directly to raw block devices.
- **The mode.** The default mode for Docker running the `devicemapper` storage
driver is `loop-lvm`. This mode uses sparse files and suffers from poor
performance. It is **not recommended for production**. The recommended mode for
production environments is `direct-lvm` where the storage driver writes
directly to raw block devices.
- **High speed storage.** For best performance you should place the `Data file` and `Metadata file` on high speed storage such as SSD. This can be direct attached storage or from a SAN or NAS array.
- **High speed storage.** For best performance you should place the `Data file`
and `Metadata file` on high speed storage such as SSD. This can be direct
attached storage or from a SAN or NAS array.
- **Memory usage.** `devicemapper` is not the most memory efficient Docker storage driver. Launching *n* copies of the same container loads *n* copies of its files into memory. This can have a memory impact on your Docker host. As a result, the `devicemapper` storage driver may not be the best choice for PaaS and other high density use cases.
- **Memory usage.** `devicemapper` is not the most memory efficient Docker
storage driver. Launching *n* copies of the same container loads *n* copies of
its files into memory. This can have a memory impact on your Docker host. As a
result, the `devicemapper` storage driver may not be the best choice for PaaS
and other high density use cases.
One final point, data volumes provide the best and most predictable performance. This is because they bypass the storage driver and do not incur any of the potential overheads introduced by thin provisioning and copy-on-write. For this reason, you may want to place heavy write workloads on data volumes.
One final point, data volumes provide the best and most predictable
performance. This is because they bypass the storage driver and do not incur
any of the potential overheads introduced by thin provisioning and
copy-on-write. For this reason, you should to place heavy write workloads on
data volumes.
## Related Information

Binary file not shown.

Before

Width:  |  Height:  |  Size: 78 KiB

After

Width:  |  Height:  |  Size: 81 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 47 KiB

After

Width:  |  Height:  |  Size: 62 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 51 KiB

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 136 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 103 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 43 KiB

After

Width:  |  Height:  |  Size: 56 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 59 KiB

After

Width:  |  Height:  |  Size: 246 KiB

View file

@ -13,25 +13,159 @@ weight = -2
# Understand images, containers, and storage drivers
To use storage drivers effectively, you must understand how Docker builds and
stores images. Then, you need an understanding of how these images are used in containers. Finally, you'll need a short introduction to the technologies that enable both images and container operations.
stores images. Then, you need an understanding of how these images are used by
containers. Finally, you'll need a short introduction to the technologies that
enable both images and container operations.
## Images and containers rely on layers
## Images and layers
Each Docker image references a list of read-only layers that represent filesystem differences. Layers are stacked on top of each other to form a base for a container's root filesystem. The diagram below shows the Ubuntu 15.04 image comprising 4 stacked image layers.
Each Docker image references a list of read-only layers that represent
filesystem differences. Layers are stacked on top of each other to form a base
for a container's root filesystem. The diagram below shows the Ubuntu 15.04
image comprising 4 stacked image layers.
![](images/image-layers.jpg)
When you make a change inside a container by, for example, adding a new file to a container created from Ubuntu 15.04 image, you add a new layer on top of the underlying stack. This change creates a new writable layer containing the newly added file on top of the image layers. Each image layer is stored by a cryptographic hash over its contents and multiple images can share the same layers. The diagram below shows a container running the Ubuntu 15.04 image.
The Docker storage driver is responsible for stacking these layers and
providing a single unified view.
When you create a new container, you add a new, thin, writable layer on top of
the underlying stack. This layer is often called the "container layer". All
changes made to the running container - such as writing new files, modifying
existing files, and deleting files - are written to this thin writable
container layer. The diagram below shows a container based on the Ubuntu 15.04
image.
![](images/container-layers.jpg)
The major difference between a container and an image is this writable layer. All writes to the container that add new or modifying existing data are stored in this writable layer. When the container is deleted the writeable layer is also deleted. The image remains unchanged.
### Content addressable storage
Because each container has its own thin writable container layer and all data is stored this container layer, this means that multiple containers can share access to the same underlying image and yet have their own data state. The diagram below shows multiple containers sharing the same Ubuntu 15.04 image.
Docker 1.10 introduced a new content addressable storage model. This is a
completely new way to address image and layer data on disk. Previously, image
and layer data was referenced and stored using a a randomly generated UUID. In
the new model this is replaced by a secure *content hash*.
The new model improves security, provides a built-in way to avoid ID
collisions, and guarantees data integrity after pull, push, load, and save
operations. It also enables better sharing of layers by allowing many images to
freely share their layers even if they didnt come from the same build.
The diagram below shows an updated version of the previous diagram,
highlighting the changes implemented by Docker 1.10.
![](images/container-layers-cas.jpg)
As can be seen, all image layer IDs are cryptographic hashes, whereas the
container ID is still a randomly generated UUID.
There are several things to note regarding the new model. These include:
1. Migration of existing images
2. Image and layer filesystem structures
Existing images, those created and pulled by earlier versions of Docker, need
to be migrated before they can be used with the new model. This migration
involves calculating new secure checksums and is performed automatically the
first time you start an updated Docker daemon. After the migration is complete,
all images and tags will have brand new secure IDs.
Although the migration is automatic and transparent, it is computationally
intensive. This means it and can take time if you have lots of image data.
During this time your Docker daemon will not respond to other requests.
A migration tool exists that allows you to migrate existing images to the new
format before upgrading your Docker daemon. This means that upgraded Docker
daemons do not need to perform the migration in-band, and therefore avoids any
associated downtime. It also provides a way to manually migrate existing images
so that they can be distributed to other Docker daemons in your environment
that are already running the latest versions of Docker.
The migration tool is provided by Docker, Inc., and runs as a container. You
can download it from [https://github.com/docker/v1.10-migrator/releases](https://github.com/docker/v1.10-migrator/releases).
While running the "migrator" image you need to expose your Docker host's data
directory to the container. If you are using the default Docker data path, the
command to run the container will look like this
$ sudo docker run --rm -v /var/lib/docker:/var/lib/docker docker/v1.10-migrator
If you use the `devicemapper` storage driver, you will need to include the
`--privileged` option so that the container has access to your storage devices.
#### Migration example
The following example shows the migration tool in use on a Docker host running
version 1.9.1 of the Docker daemon and the AUFS storage driver. The Docker host
is running on a **t2.micro** AWS EC2 instance with 1 vCPU, 1GB RAM, and a
single 8GB general purpose SSD EBS volume. The Docker data directory
(`/var/lib/docker`) was consuming 2GB of space.
$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
jenkins latest 285c9f0f9d3d 17 hours ago 708.5 MB
mysql latest d39c3fa09ced 8 days ago 360.3 MB
mongo latest a74137af4532 13 days ago 317.4 MB
postgres latest 9aae83d4127f 13 days ago 270.7 MB
redis latest 8bccd73928d9 2 weeks ago 151.3 MB
centos latest c8a648134623 4 weeks ago 196.6 MB
ubuntu 15.04 c8be1ac8145a 7 weeks ago 131.3 MB
$ du -hs /var/lib/docker
2.0G /var/lib/docker
$ time docker run --rm -v /var/lib/docker:/var/lib/docker docker/v1.10-migrator
Unable to find image 'docker/v1.10-migrator:latest' locally
latest: Pulling from docker/v1.10-migrator
ed1f33c5883d: Pull complete
b3ca410aa2c1: Pull complete
2b9c6ed9099e: Pull complete
dce7e318b173: Pull complete
Digest: sha256:bd2b245d5d22dd94ec4a8417a9b81bb5e90b171031c6e216484db3fe300c2097
Status: Downloaded newer image for docker/v1.10-migrator:latest
time="2016-01-27T12:31:06Z" level=debug msg="Assembling tar data for 01e70da302a553ba13485ad020a0d77dbb47575a31c4f48221137bb08f45878d from /var/lib/docker/aufs/diff/01e70da302a553ba13485ad020a0d77dbb47575a31c4f48221137bb08f45878d"
time="2016-01-27T12:31:06Z" level=debug msg="Assembling tar data for 07ac220aeeef9febf1ac16a9d1a4eff7ef3c8cbf5ed0be6b6f4c35952ed7920d from /var/lib/docker/aufs/diff/07ac220aeeef9febf1ac16a9d1a4eff7ef3c8cbf5ed0be6b6f4c35952ed7920d"
<snip>
time="2016-01-27T12:32:00Z" level=debug msg="layer dbacfa057b30b1feaf15937c28bd8ca0d6c634fc311ccc35bd8d56d017595d5b took 10.80 seconds"
real 0m59.583s
user 0m0.046s
sys 0m0.008s
The Unix `time` command prepends the `docker run` command to produce timings
for the operation. As can be seen, the overall time taken to migrate 7 images
comprising 2GB of disk space took approximately 1 minute. However, this
included the time taken to pull the `docker/v1.10-migrator` image
(approximately 3.5 seconds). The same operation on an m4.10xlarge EC2 instance
with 40 vCPUs, 160GB RAM and an 8GB provisioned IOPS EBS volume resulted in the
following improved timings:
real 0m9.871s
user 0m0.094s
sys 0m0.021s
This shows that the migration operation is affected by the hardware spec of the
machine performing the migration.
## Container and layers
The major difference between a container and an image is the top writable
layer. All writes to the container that add new or modify existing data are
stored in this writable layer. When the container is deleted the writable layer
is also deleted. The underlying image remains unchanged.
Because each container has its own thin writable container layer, and all
changes are stored this container layer, this means that multiple containers
can share access to the same underlying image and yet have their own data
state. The diagram below shows multiple containers sharing the same Ubuntu
15.04 image.
![](images/sharing-layers.jpg)
A storage driver is responsible for enabling and managing both the image layers and the writeable container layer. How a storage driver accomplishes these behaviors can vary. Two key technologies behind Docker image and container management are stackable image layers and copy-on-write (CoW).
The Docker storage driver is responsible for enabling and managing both the
image layers and the writable container layer. How a storage driver
accomplishes these can vary between drivers. Two key technologies behind Docker
image and container management are stackable image layers and copy-on-write
(CoW).
## The copy-on-write strategy
@ -40,24 +174,29 @@ Sharing is a good way to optimize resources. People do this instinctively in
daily life. For example, twins Jane and Joseph taking an Algebra class at
different times from different teachers can share the same exercise book by
passing it between each other. Now, suppose Jane gets an assignment to complete
the homework on page 11 in the book. At that point, Jane copies page 11, completes the homework, and hands in her copy. The original exercise book is unchanged and only Jane has a copy of the changed page 11.
the homework on page 11 in the book. At that point, Jane copies page 11,
completes the homework, and hands in her copy. The original exercise book is
unchanged and only Jane has a copy of the changed page 11.
Copy-on-write is a similar strategy of sharing and copying. In this strategy,
system processes that need the same data share the same instance of that data
rather than having their own copy. At some point, if one process needs to modify
or write to the data, only then does the operating system make a copy of the
data for that process to use. Only the process that needs to write has access to
the data copy. All the other processes continue to use the original data.
rather than having their own copy. At some point, if one process needs to
modify or write to the data, only then does the operating system make a copy of
the data for that process to use. Only the process that needs to write has
access to the data copy. All the other processes continue to use the original
data.
Docker uses a copy-on-write technology with both images and containers. This CoW
strategy optimizes both image disk space usage and the performance of container
start times. The next sections look at how copy-on-write is leveraged with
images and containers thru sharing and copying.
Docker uses a copy-on-write technology with both images and containers. This
CoW strategy optimizes both image disk space usage and the performance of
container start times. The next sections look at how copy-on-write is leveraged
with images and containers through sharing and copying.
### Sharing promotes smaller images
This section looks at image layers and copy-on-write technology. All image and container layers exist inside the Docker host's *local storage area* and are managed by the storage driver. It is a location on the host's
filesystem.
This section looks at image layers and copy-on-write technology. All image and
container layers exist inside the Docker host's *local storage area* and are
managed by the storage driver. On Linux-based Docker hosts this is usually
located under `/var/lib/docker/`.
The Docker client reports on image layers when instructed to pull and push
images with `docker pull` and `docker push`. The command below pulls the
@ -65,38 +204,85 @@ images with `docker pull` and `docker push`. The command below pulls the
$ docker pull ubuntu:15.04
15.04: Pulling from library/ubuntu
6e6a100fa147: Pull complete
13c0c663a321: Pull complete
2bd276ed39d5: Pull complete
013f3d01d247: Pull complete
Digest: sha256:c7ecf33cef00ae34b131605c31486c91f5fd9a76315d075db2afd39d1ccdf3ed
1ba8ac955b97: Pull complete
f157c4e5ede7: Pull complete
0b7e98f84c4c: Pull complete
a3ed95caeb02: Pull complete
Digest: sha256:5e279a9df07990286cce22e1b0f5b0490629ca6d187698746ae5e28e604a640e
Status: Downloaded newer image for ubuntu:15.04
From the output, you'll see that the command actually pulls 4 image layers.
Each of the above lines lists an image layer and its UUID. The combination of
these four layers makes up the `ubuntu:15.04` Docker image.
Each of the above lines lists an image layer and its UUID or cryptographic
hash. The combination of these four layers makes up the `ubuntu:15.04` Docker
image.
The image layers are stored in the Docker host's local storage area. Typically,
the local storage area is in the host's `/var/lib/docker` directory. Depending
on which storage driver the local storage area may be in a different location. You can list the layers in the local storage area. The following example shows the storage as it appears under the AUFS storage driver:
Each of these layers is stored in its own directory inside the Docker host's
local storage are.
$ sudo ls /var/lib/docker/aufs/layers
013f3d01d24738964bb7101fa83a926181d600ebecca7206dced59669e6e6778 2bd276ed39d5fcfd3d00ce0a190beeea508332f5aec3c6a125cc619a3fdbade6
13c0c663a321cd83a97f4ce1ecbaf17c2ba166527c3b06daaefe30695c5fcb8c 6e6a100fa147e6db53b684c8516e3e2588b160fd4898b6265545d5d4edb6796d
Versions of Docker prior to 1.10 stored each layer in a directory with the same
name as the image layer ID. However, this is not the case for images pulled
with Docker version 1.10 and later. For example, the command below shows an
image being pulled from Docker Hub, followed by a directory listing on a host
running version 1.9.1 of the Docker Engine.
If you `pull` another image that shares some of the same image layers as the `ubuntu:15.04` image, the Docker daemon recognize this, and only pull the layers it hasn't already stored. After the second pull, the two images will share any common image layers.
$ docker pull ubuntu:15.04
15.04: Pulling from library/ubuntu
47984b517ca9: Pull complete
df6e891a3ea9: Pull complete
e65155041eed: Pull complete
c8be1ac8145a: Pull complete
Digest: sha256:5e279a9df07990286cce22e1b0f5b0490629ca6d187698746ae5e28e604a640e
Status: Downloaded newer image for ubuntu:15.04
You can illustrate this now for yourself. Starting the `ubuntu:15.04` image that
you just pulled, make a change to it, and build a new image based on the change.
One way to do this is using a Dockerfile and the `docker build` command.
$ ls /var/lib/docker/aufs/layers
47984b517ca9ca0312aced5c9698753ffa964c2015f2a5f18e5efa9848cf30e2
c8be1ac8145a6e59a55667f573883749ad66eaeef92b4df17e5ea1260e2d7356
df6e891a3ea9cdce2a388a2cf1b1711629557454fd120abd5be6d32329a0e0ac
e65155041eed7ec58dea78d90286048055ca75d41ea893c7246e794389ecf203
1. In an empty directory, create a simple `Dockerfile` that starts with the ubuntu:15.04 image.
Notice how the four directories match up with the layer IDs of the downloaded
image. Now compare this with the same operations performed on a host running
version 1.10 of the Docker Engine.
$ docker pull ubuntu:15.04
15.04: Pulling from library/ubuntu
1ba8ac955b97: Pull complete
f157c4e5ede7: Pull complete
0b7e98f84c4c: Pull complete
a3ed95caeb02: Pull complete
Digest: sha256:5e279a9df07990286cce22e1b0f5b0490629ca6d187698746ae5e28e604a640e
Status: Downloaded newer image for ubuntu:15.04
$ ls /var/lib/docker/aufs/layers/
1d6674ff835b10f76e354806e16b950f91a191d3b471236609ab13a930275e24
5dbb0cbe0148cf447b9464a358c1587be586058d9a4c9ce079320265e2bb94e7
bef7199f2ed8e86fa4ada1309cfad3089e0542fec8894690529e4c04a7ca2d73
ebf814eccfe98f2704660ca1d844e4348db3b5ccc637eb905d4818fbfb00a06a
See how the four directories do not match up with the image layer IDs pulled in
the previous step.
Despite the differences between image management before and after version 1.10,
all versions of Docker still allow images to share layers. For example, If you
`pull` an image that shares some of the same image layers as an image that has
already been pulled, the Docker daemon recognizes this, and only pulls the
layers it doesn't already have stored locally. After the second pull, the two
images will share any common image layers.
You can illustrate this now for yourself. Starting with the `ubuntu:15.04`
image that you just pulled, make a change to it, and build a new image based on
the change. One way to do this is using a `Dockerfile` and the `docker build`
command.
1. In an empty directory, create a simple `Dockerfile` that starts with the
2. ubuntu:15.04 image.
FROM ubuntu:15.04
2. Add a new file called "newfile" in the image's `/tmp` directory with the text "Hello world" in it.
2. Add a new file called "newfile" in the image's `/tmp` directory with the
3. text "Hello world" in it.
When you are done, the `Dockerfile` contains two lines:
When you are done, the `Dockerfile` contains two lines:
FROM ubuntu:15.04
@ -104,78 +290,125 @@ One way to do this is using a Dockerfile and the `docker build` command.
3. Save and close the file.
2. From a terminal in the same folder as your Dockerfile, run the following command:
4. From a terminal in the same folder as your `Dockerfile`, run the following
5. command:
$ docker build -t changed-ubuntu .
Sending build context to Docker daemon 2.048 kB
Step 0 : FROM ubuntu:15.04
---> 013f3d01d247
Step 1 : RUN echo "Hello world" > /tmp/newfile
---> Running in 2023460815df
---> 03b964f68d06
Removing intermediate container 2023460815df
Successfully built 03b964f68d06
Step 1 : FROM ubuntu:15.04
---> 3f7bcee56709
Step 2 : RUN echo "Hello world" > /tmp/newfile
---> Running in d14acd6fad4e
---> 94e6b7d2c720
Removing intermediate container d14acd6fad4e
Successfully built 94e6b7d2c720
> **Note:** The period (.) at the end of the above command is important. It tells the `docker build` command to use the current working directory as its build context.
> **Note:** The period (.) at the end of the above command is important. It
> tells the `docker build` command to use the current working directory as
> its build context.
The output above shows a new image with image ID `03b964f68d06`.
The output above shows a new image with image ID `94e6b7d2c720`.
3. Run the `docker images` command to verify the new image is in the Docker host's local storage area.
5. Run the `docker images` command to verify the new `changed-ubuntu` image is
6. in the Docker host's local storage area.
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
changed-ubuntu latest 03b964f68d06 33 seconds ago 131.4 MB
ubuntu
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
changed-ubuntu latest 03b964f68d06 33 seconds ago 131.4 MB
ubuntu 15.04 013f3d01d247 6 weeks ago 131.3 MB
4. Run the `docker history` command to see which image layers were used to create the new `changed-ubuntu` image.
6. Run the `docker history` command to see which image layers were used to
7. create the new `changed-ubuntu` image.
$ docker history changed-ubuntu
IMAGE CREATED CREATED BY SIZE COMMENT
03b964f68d06 About a minute ago /bin/sh -c echo "Hello world" > /tmp/newfile 12 B
013f3d01d247 6 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B
<missing> 6 weeks ago /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/ 1.879 kB
<missing> 6 weeks ago /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic 701 B
<missing> 6 weeks ago /bin/sh -c #(nop) ADD file:49710b44e2ae0edef4 131.4 MB
IMAGE CREATED CREATED BY SIZE COMMENT
94e6b7d2c720 2 minutes ago /bin/sh -c echo "Hello world" > /tmp/newfile 12 B
3f7bcee56709 6 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B
<missing> 6 weeks ago /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/ 1.879 kB
<missing> 6 weeks ago /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic 701 B
<missing> 6 weeks ago /bin/sh -c #(nop) ADD file:8e4943cd86e9b2ca13 131.3 MB
The `docker history` output shows the new `03b964f68d06` image layer at the
top. You know that the `03b964f68d06` layer was added because it was created
by the `echo "Hello world" > /tmp/newfile` command in your `Dockerfile`.
The 4 image layers below it are the exact same image layers the make up the
ubuntu:15.04 image as their UUIDs match.
The `docker history` output shows the new `94e6b7d2c720` image layer at the
top. You know that this is the new image layer added because it was created
by the `echo "Hello world" > /tmp/newfile` command in your `Dockerfile`.
The 4 image layers below it are the exact same image layers
that make up the `ubuntu:15.04` image.
Notice the new `changed-ubuntu` image does not have its own copies of every layer. As can be seen in the diagram below, the new image is sharing it's four underlying layers with the `ubuntu:15.04` image.
> **Note:** Under the content addressable storage model introduced with Docker
> 1.10, image history data is no longer stored in a config file with each image
> layer. It is now stored as a string of text in a single config file that
> relates to the overall image. This can result in some image layers showing as
> "missing" in the output of the `docker history` command. This is normal
> behaviour and can be ignored.
>
> You may hear images like these referred to as *flat images*.
Notice the new `changed-ubuntu` image does not have its own copies of every
layer. As can be seen in the diagram below, the new image is sharing its four
underlying layers with the `ubuntu:15.04` image.
![](images/saving-space.jpg)
The `docker history` command also shows the size of each image layer. The `03b964f68d06` is only consuming 13 Bytes of disk space. Because all of the layers below it already exist on the Docker host and are shared with the `ubuntu15:04` image, this means the entire `changed-ubuntu` image only consumes 13 Bytes of disk space.
The `docker history` command also shows the size of each image layer. As you
can see, the `94e6b7d2c720` layer is only consuming 12 Bytes of disk space.
This means that the `changed-ubuntu` image we just created is only consuming an
additional 12 Bytes of disk space on the Docker host - all layers below the
`94e6b7d2c720` layer already exist on the Docker host and are shared by other
images.
This sharing of image layers is what makes Docker images and containers so space
efficient.
This sharing of image layers is what makes Docker images and containers so
space efficient.
### Copying makes containers efficient
You learned earlier that a container a Docker image with a thin writable, container layer added. The diagram below shows the layers of a container based on the `ubuntu:15.04` image:
You learned earlier that a container is a Docker image with a thin writable,
container layer added. The diagram below shows the layers of a container based
on the `ubuntu:15.04` image:
![](images/container-layers.jpg)
![](images/container-layers-cas.jpg)
All writes made to a container are stored in the thin writable container layer. The other layers are read-only (RO) image layers and can't be changed. This means that multiple containers can safely share a single underlying image. The diagram below shows multiple containers sharing a single copy of the `ubuntu:15.04` image. Each container has its own thin RW layer, but they all share a single instance of the ubuntu:15.04 image:
All writes made to a container are stored in the thin writable container layer.
The other layers are read-only (RO) image layers and can't be changed. This
means that multiple containers can safely share a single underlying image. The
diagram below shows multiple containers sharing a single copy of the
`ubuntu:15.04` image. Each container has its own thin RW layer, but they all
share a single instance of the ubuntu:15.04 image:
![](images/sharing-layers.jpg)
When a write operation occurs in a container, Docker uses the storage driver to perform a copy-on-write operation. The type of operation depends on the storage driver. For AUFS and OverlayFS storage drivers the copy-on-write operation is pretty much as follows:
When an existing file in a container is modified, Docker uses the storage
driver to perform a copy-on-write operation. The specifics of operation depends
on the storage driver. For the AUFS and OverlayFS storage drivers, the
copy-on-write operation is pretty much as follows:
* Search through the layers for the file to update. The process starts at the top, newest layer and works down to the base layer one-at-a-time.
* Perform a "copy-up" operation on the first copy of the file that is found. A "copy up" copies the file up to the container's own thin writable layer.
* Search through the image layers for the file to update. The process starts
at the top, newest layer and works down to the base layer one layer at a
time.
* Perform a "copy-up" operation on the first copy of the file that is found. A
"copy up" copies the file up to the container's own thin writable layer.
* Modify the *copy of the file* in container's thin writable layer.
BTFS, ZFS, and other drivers handle the copy-on-write differently. You can read more about the methods of these drivers later in their detailed descriptions.
Btrfs, ZFS, and other drivers handle the copy-on-write differently. You can
read more about the methods of these drivers later in their detailed
descriptions.
Containers that write a lot of data will consume more space than containers that do not. This is because most write operations consume new space in the containers thin writable top layer. If your container needs to write a lot of data, you can use a data volume.
Containers that write a lot of data will consume more space than containers
that do not. This is because most write operations consume new space in the
container's thin writable top layer. If your container needs to write a lot of
data, you should consider using a data volume.
A copy-up operation can incur a noticeable performance overhead. This overhead is different depending on which storage driver is in use. However, large files, lots of layers, and deep directory trees can make the impact more noticeable. Fortunately, the operation only occurs the first time any particular file is modified. Subsequent modifications to the same file do not cause a copy-up operation and can operate directly on the file's existing copy already present in container layer.
A copy-up operation can incur a noticeable performance overhead. This overhead
is different depending on which storage driver is in use. However, large files,
lots of layers, and deep directory trees can make the impact more noticeable.
Fortunately, the operation only occurs the first time any particular file is
modified. Subsequent modifications to the same file do not cause a copy-up
operation and can operate directly on the file's existing copy already present
in the container layer.
Let's see what happens if we spin up 5 containers based on our `changed-ubuntu` image we built earlier:
Let's see what happens if we spin up 5 containers based on our `changed-ubuntu`
image we built earlier:
1. From a terminal on your Docker host, run the following `docker run` command 5 times.
1. From a terminal on your Docker host, run the following `docker run` command
5 times.
$ docker run -dit changed-ubuntu bash
75bab0d54f3cf193cfdc3a86483466363f442fba30859f7dcd1b816b6ede82d4
@ -188,28 +421,38 @@ Let's see what happens if we spin up 5 containers based on our `changed-ubuntu`
$ docker run -dit changed-ubuntu bash
0ad25d06bdf6fca0dedc38301b2aff7478b3e1ce3d1acd676573bba57cb1cfef
This launches 5 containers based on the `changed-ubuntu` image. As the container is created, Docker adds a writable layer and assigns it a UUID. This is the value returned from the `docker run` command.
This launches 5 containers based on the `changed-ubuntu` image. As each
container is created, Docker adds a writable layer and assigns it a random
UUID. This is the value returned from the `docker run` command.
2. Run the `docker ps` command to verify the 5 containers are running.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0ad25d06bdf6 changed-ubuntu "bash" About a minute ago Up About a minute stoic_ptolemy
8eb24b3b2d24 changed-ubuntu "bash" About a minute ago Up About a minute pensive_bartik
a651680bd6c2 changed-ubuntu "bash" 2 minutes ago Up 2 minutes hopeful_turing
9280e777d109 changed-ubuntu "bash" 2 minutes ago Up 2 minutes backstabbing_mahavira
75bab0d54f3c changed-ubuntu "bash" 2 minutes ago Up 2 minutes boring_pasteur
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0ad25d06bdf6 changed-ubuntu "bash" About a minute ago Up About a minute stoic_ptolemy
8eb24b3b2d24 changed-ubuntu "bash" About a minute ago Up About a minute pensive_bartik
a651680bd6c2 changed-ubuntu "bash" 2 minutes ago Up 2 minutes hopeful_turing
9280e777d109 changed-ubuntu "bash" 2 minutes ago Up 2 minutes backstabbing_mahavira
75bab0d54f3c changed-ubuntu "bash" 2 minutes ago Up 2 minutes boring_pasteur
The output above shows 5 running containers, all sharing the `changed-ubuntu` image. Each `CONTAINER ID` is derived from the UUID when creating each container.
The output above shows 5 running containers, all sharing the
`changed-ubuntu` image. Each `CONTAINER ID` is derived from the UUID when
creating each container.
3. List the contents of the local storage area.
$ sudo ls containers
0ad25d06bdf6fca0dedc38301b2aff7478b3e1ce3d1acd676573bba57cb1cfef 9280e777d109e2eb4b13ab211553516124a3d4d4280a0edfc7abf75c59024d47
75bab0d54f3cf193cfdc3a86483466363f442fba30859f7dcd1b816b6ede82d4 a651680bd6c2ef64902e154eeb8a064b85c9abf08ac46f922ad8dfc11bb5cd8a
$ sudo ls /var/lib/docker/containers
0ad25d06bdf6fca0dedc38301b2aff7478b3e1ce3d1acd676573bba57cb1cfef
9280e777d109e2eb4b13ab211553516124a3d4d4280a0edfc7abf75c59024d47
75bab0d54f3cf193cfdc3a86483466363f442fba30859f7dcd1b816b6ede82d4
a651680bd6c2ef64902e154eeb8a064b85c9abf08ac46f922ad8dfc11bb5cd8a
8eb24b3b2d246f225b24f2fca39625aaad71689c392a7b552b78baf264647373
Docker's copy-on-write strategy not only reduces the amount of space consumed by containers, it also reduces the time required to start a container. At start time, Docker only has to create the thin writable layer for each container. The diagram below shows these 5 containers sharing a single read-only (RO) copy of the `changed-ubuntu` image.
Docker's copy-on-write strategy not only reduces the amount of space consumed
by containers, it also reduces the time required to start a container. At start
time, Docker only has to create the thin writable layer for each container.
The diagram below shows these 5 containers sharing a single read-only (RO)
copy of the `changed-ubuntu` image.
![](images/shared-uuid.jpg)
@ -219,18 +462,30 @@ significantly increased.
## Data volumes and the storage driver
When a container is deleted, any data written to the container that is not stored in a *data volume* is deleted along with the container. A data volume is directory or file that is mounted directly into a container.
When a container is deleted, any data written to the container that is not
stored in a *data volume* is deleted along with the container.
Data volumes are not controlled by the storage driver. Reads and writes to data
volumes bypass the storage driver and operate at native host speeds. You can mount any number of data volumes into a container. Multiple containers can also share one or more data volumes.
A data volume is a directory or file in the Docker host's filesystem that is
mounted directly into a container. Data volumes are not controlled by the
storage driver. Reads and writes to data volumes bypass the storage driver and
operate at native host speeds. You can mount any number of data volumes into a
container. Multiple containers can also share one or more data volumes.
The diagram below shows a single Docker host running two containers. Each container exists inside of its own address space within the Docker host's local storage area. There is also a single shared data volume located at `/data` on the Docker host. This is mounted directly into both containers.
The diagram below shows a single Docker host running two containers. Each
container exists inside of its own address space within the Docker host's local
storage area (`/var/lib/docker/...`). There is also a single shared data
volume located at `/data` on the Docker host. This is mounted directly into
both containers.
![](images/shared-volume.jpg)
The data volume resides outside of the local storage area on the Docker host further reinforcing its independence from the storage driver's control. When a container is deleted, any data stored in shared data volumes persists on the Docker host.
Data volumes reside outside of the local storage area on the Docker host,
further reinforcing their independence from the storage driver's control. When
a container is deleted, any data stored in data volumes persists on the Docker
host.
For detailed information about data volumes [Managing data in containers](https://docs.docker.com/userguide/dockervolumes/).
For detailed information about data volumes
[Managing data in containers](https://docs.docker.com/userguide/dockervolumes/).
## Related information

View file

@ -10,47 +10,83 @@ parent = "engine_driver"
# Docker and OverlayFS in practice
OverlayFS is a modern *union filesystem* that is similar to AUFS. In comparison to AUFS, OverlayFS:
OverlayFS is a modern *union filesystem* that is similar to AUFS. In comparison
to AUFS, OverlayFS:
* has a simpler design
* has been in the mainline Linux kernel since version 3.18
* is potentially faster
As a result, OverlayFS is rapidly gaining popularity in the Docker community and is seen by many as a natural successor to AUFS. As promising as OverlayFS is, it is still relatively young. Therefore caution should be taken before using it in production Docker environments.
As a result, OverlayFS is rapidly gaining popularity in the Docker community
and is seen by many as a natural successor to AUFS. As promising as OverlayFS
is, it is still relatively young. Therefore caution should be taken before
using it in production Docker environments.
Docker's `overlay` storage driver leverages several OverlayFS features to build and manage the on-disk structures of images and containers.
>**Note**: Since it was merged into the mainline kernel, the OverlayFS *kernel module* was renamed from "overlayfs" to "overlay". As a result you may see the two terms used interchangeably in some documentation. However, this document uses "OverlayFS" to refer to the overall filesystem, and `overlay` to refer to Docker's storage-driver.
Docker's `overlay` storage driver leverages several OverlayFS features to build
and manage the on-disk structures of images and containers.
>**Note**: Since it was merged into the mainline kernel, the OverlayFS *kernel
>module* was renamed from "overlayfs" to "overlay". As a result you may see the
> two terms used interchangeably in some documentation. However, this document
> uses "OverlayFS" to refer to the overall filesystem, and `overlay` to refer
> to Docker's storage-driver.
## Image layering and sharing with OverlayFS
OverlayFS takes two directories on a single Linux host, layers one on top of the other, and provides a single unified view. These directories are often referred to as *layers* and the technology used to layer them is known as a *union mount*. The OverlayFS terminology is "lowerdir" for the bottom layer and "upperdir" for the top layer. The unified view is exposed through its own directory called "merged".
OverlayFS takes two directories on a single Linux host, layers one on top of
the other, and provides a single unified view. These directories are often
referred to as *layers* and the technology used to layer them is known as a
*union mount*. The OverlayFS terminology is "lowerdir" for the bottom layer and
"upperdir" for the top layer. The unified view is exposed through its own
directory called "merged".
The diagram below shows how a Docker image and a Docker container are layered. The image layer is the "lowerdir" and the container layer is the "upperdir". The unified view is exposed through a directory called "merged" which is effectively the containers mount point. The diagram shows how Docker constructs map to OverlayFS constructs.
The diagram below shows how a Docker image and a Docker container are layered.
The image layer is the "lowerdir" and the container layer is the "upperdir".
The unified view is exposed through a directory called "merged" which is
effectively the containers mount point. The diagram shows how Docker constructs
map to OverlayFS constructs.
![](images/overlay_constructs.jpg)
Notice how the image layer and container layer can contain the same files. When this happens, the files in the container layer ("upperdir") are dominant and obscure the existence of the same files in the image layer ("lowerdir"). The container mount ("merged") presents the unified view.
Notice how the image layer and container layer can contain the same files. When
this happens, the files in the container layer ("upperdir") are dominant and
obscure the existence of the same files in the image layer ("lowerdir"). The
container mount ("merged") presents the unified view.
OverlayFS only works with two layers. This means that multi-layered images cannot be implemented as multiple OverlayFS layers. Instead, each image layer is implemented as its own directory under `/var/lib/docker/overlay`. Hard links are then used as a space-efficient way to reference data shared with lower layers. The diagram below shows a four-layer image and how it is represented in the Docker host's filesystem.
OverlayFS only works with two layers. This means that multi-layered images
cannot be implemented as multiple OverlayFS layers. Instead, each image layer
is implemented as its own directory under `/var/lib/docker/overlay`.
Hard links are then used as a space-efficient way to reference data shared with
lower layers. As of Docker 1.10, image layer IDs no longer correspond to
directory names in `/var/lib/docker/`
![](images/overlay_constructs2.jpg)
To create a container, the `overlay` driver combines the directory representing the image's top layer plus a new directory for the container. The image's top layer is the "lowerdir" in the overlay and read-only. The new directory for the container is the "upperdir" and is writable.
To create a container, the `overlay` driver combines the directory representing
the image's top layer plus a new directory for the container. The image's top
layer is the "lowerdir" in the overlay and read-only. The new directory for the
container is the "upperdir" and is writable.
## Example: Image and container on-disk constructs
The following `docker images -a` command shows a Docker host with a single image. As can be seen, the image consists of four layers.
The following `docker pull` command shows a Docker host with downloading a
Docker image comprising four layers.
$ docker images -a
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
ubuntu latest 1d073211c498 7 days ago 187.9 MB
<none> <none> 5a4526e952f0 7 days ago 187.9 MB
<none> <none> 99fcaefe76ef 7 days ago 187.9 MB
<none> <none> c63fb41c2213 7 days ago 187.7 MB
$ sudo docker pull ubuntu
Using default tag: latest
latest: Pulling from library/ubuntu
8387d9ff0016: Pull complete
3b52deaaf0ed: Pull complete
4bd501fad6de: Pull complete
a3ed95caeb02: Pull complete
Digest: sha256:457b05828bdb5dcc044d93d042863fba3f2158ae249a6db5ae3934307c757c54
Status: Downloaded newer image for ubuntu:latest
Below, the command's output illustrates that each of the four image layers has it's own directory under `/var/lib/docker/overlay/`.
Each image layer has it's own directory under `/var/lib/docker/overlay/`. This
is where the the contents of each image layer are stored.
The output of the command below shows the four directories that store the
contents of each image layer just pulled. However, as can be seen, the image
layer IDs do not match the directory names in `/var/lib/docker/overlay`. This
is normal behavior in Docker 1.10 and later.
$ ls -l /var/lib/docker/overlay/
total 24
@ -59,35 +95,42 @@ Below, the command's output illustrates that each of the four image layers has i
drwx------ 5 root root 4096 Oct 28 11:06 99fcaefe76ef1aa4077b90a413af57fd17d19dce4e50d7964a273aae67055235
drwx------ 3 root root 4096 Oct 28 11:01 c63fb41c2213f511f12f294dd729b9903a64d88f098c20d2350905ac1fdbcbba
Each directory is named after the image layer IDs in the previous `docker images -a` command. The image layer directories contain the files unique to that layer as well as hard links to the data that is shared with lower layers. This allows for efficient use of disk space.
The image layer directories contain the files unique to that layer as well as
hard links to the data that is shared with lower layers. This allows for
efficient use of disk space.
The following `docker ps` command shows the same Docker host running a single container. The container ID is "73de7176c223".
Containers also exist on-disk in the Docker host's filesystem under
`/var/lib/docker/overlay/`. If you inspect the directory relating to a running
container using the `ls -l` command, you find the following file and
directories.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
73de7176c223 ubuntu "bash" 2 days ago Up 2 days stupefied_nobel
This container exists on-disk in the Docker host's filesystem under `/var/lib/docker/overlay/73de7176c223...`. If you inspect this directory using the `ls -l` command you find the following file and directories.
$ ls -l /var/lib/docker/overlay/73de7176c223a6c82fd46c48c5f152f2c8a7e49ecb795a7197c3bb795c4d879e
$ ls -l /var/lib/docker/overlay/<directory-of-running-container>
total 16
-rw-r--r-- 1 root root 64 Oct 28 11:06 lower-id
drwxr-xr-x 1 root root 4096 Oct 28 11:06 merged
drwxr-xr-x 4 root root 4096 Oct 28 11:06 upper
drwx------ 3 root root 4096 Oct 28 11:06 work
These four filesystem objects are all artifacts of OverlayFS. The "lower-id" file contains the ID of the top layer of the image the container is based on. This is used by OverlayFS as the "lowerdir".
These four filesystem objects are all artefacts of OverlayFS. The "lower-id"
file contains the ID of the top layer of the image the container is based on.
This is used by OverlayFS as the "lowerdir".
$ cat /var/lib/docker/overlay/73de7176c223a6c82fd46c48c5f152f2c8a7e49ecb795a7197c3bb795c4d879e/lower-id
1d073211c498fd5022699b46a936b4e4bdacb04f637ad64d3475f558783f5c3e
The "upper" directory is the containers read-write layer. Any changes made to the container are written to this directory.
The "upper" directory is the containers read-write layer. Any changes made to
the container are written to this directory.
The "merged" directory is effectively the containers mount point. This is where the unified view of the image ("lowerdir") and container ("upperdir") is exposed. Any changes written to the container are immediately reflected in this directory.
The "merged" directory is effectively the containers mount point. This is where
the unified view of the image ("lowerdir") and container ("upperdir") is
exposed. Any changes written to the container are immediately reflected in this
directory.
The "work" directory is required for OverlayFS to function. It is used for things such as *copy_up* operations.
The "work" directory is required for OverlayFS to function. It is used for
things such as *copy_up* operations.
You can verify all of these constructs from the output of the `mount` command. (Ellipses and line breaks are used in the output below to enhance readability.)
You can verify all of these constructs from the output of the `mount` command.
(Ellipses and line breaks are used in the output below to enhance readability.)
$ mount | grep overlay
overlay on /var/lib/docker/overlay/73de7176c223.../merged
@ -95,39 +138,73 @@ You can verify all of these constructs from the output of the `mount` command. (
upperdir=/var/lib/docker/overlay/73de7176c223.../upper,
workdir=/var/lib/docker/overlay/73de7176c223.../work)
The output reflects the overlay is mounted as read-write ("rw").
The output reflects that the overlay is mounted as read-write ("rw").
## Container reads and writes with overlay
Consider three scenarios where a container opens a file for read access with overlay.
Consider three scenarios where a container opens a file for read access with
overlay.
- **The file does not exist in the container layer**. If a container opens a file for read access and the file does not already exist in the container ("upperdir") it is read from the image ("lowerdir"). This should incur very little performance overhead.
- **The file does not exist in the container layer**. If a container opens a
file for read access and the file does not already exist in the container
("upperdir") it is read from the image ("lowerdir"). This should incur very
little performance overhead.
- **The file only exists in the container layer**. If a container opens a file for read access and the file exists in the container ("upperdir") and not in the image ("lowerdir"), it is read directly from the container.
- **The file only exists in the container layer**. If a container opens a file
for read access and the file exists in the container ("upperdir") and not in
the image ("lowerdir"), it is read directly from the container.
- **The file exists in the container layer and the image layer**. If a container opens a file for read access and the file exists in the image layer and the container layer, the file's version in the container layer is read. This is because files in the container layer ("upperdir") obscure files with the same name in the image layer ("lowerdir").
- **The file exists in the container layer and the image layer**. If a
container opens a file for read access and the file exists in the image layer
and the container layer, the file's version in the container layer is read.
This is because files in the container layer ("upperdir") obscure files with
the same name in the image layer ("lowerdir").
Consider some scenarios where files in a container are modified.
- **Writing to a file for the first time**. The first time a container writes to an existing file, that file does not exist in the container ("upperdir"). The `overlay` driver performs a *copy_up* operation to copy the file from the image ("lowerdir") to the container ("upperdir"). The container then writes the changes to the new copy of the file in the container layer.
- **Writing to a file for the first time**. The first time a container writes
to an existing file, that file does not exist in the container ("upperdir").
The `overlay` driver performs a *copy_up* operation to copy the file from the
image ("lowerdir") to the container ("upperdir"). The container then writes the
changes to the new copy of the file in the container layer.
However, OverlayFS works at the file level not the block level. This means that all OverlayFS copy-up operations copy entire files, even if the file is very large and only a small part of it is being modified. This can have a noticeable impact on container write performance. However, two things are worth noting:
However, OverlayFS works at the file level not the block level. This means
that all OverlayFS copy-up operations copy entire files, even if the file is
very large and only a small part of it is being modified. This can have a
noticeable impact on container write performance. However, two things are
worth noting:
* The copy_up operation only occurs the first time any given file is written to. Subsequent writes to the same file will operate against the copy of the file already copied up to the container.
* The copy_up operation only occurs the first time any given file is
written to. Subsequent writes to the same file will operate against the copy of
the file already copied up to the container.
* OverlayFS only works with two layers. This means that performance should be better than AUFS which can suffer noticeable latencies when searching for files in images with many layers.
* OverlayFS only works with two layers. This means that performance should
be better than AUFS which can suffer noticeable latencies when searching for
files in images with many layers.
- **Deleting files and directories**. When files are deleted within a container a *whiteout* file is created in the containers "upperdir". The version of the file in the image layer ("lowerdir") is not deleted. However, the whiteout file in the container obscures it.
- **Deleting files and directories**. When files are deleted within a container
a *whiteout* file is created in the containers "upperdir". The version of the
file in the image layer ("lowerdir") is not deleted. However, the whiteout file
in the container obscures it.
Deleting a directory in a container results in *opaque directory* being created in the "upperdir". This has the same effect as a whiteout file and effectively masks the existence of the directory in the image's "lowerdir".
Deleting a directory in a container results in *opaque directory* being
created in the "upperdir". This has the same effect as a whiteout file and
effectively masks the existence of the directory in the image's "lowerdir".
## Configure Docker with the overlay storage driver
To configure Docker to use the overlay storage driver your Docker host must be running version 3.18 of the Linux kernel (preferably newer) with the overlay kernel module loaded. OverlayFS can operate on top of most supported Linux filesystems. However, ext4 is currently recommended for use in production environments.
To configure Docker to use the overlay storage driver your Docker host must be
running version 3.18 of the Linux kernel (preferably newer) with the overlay
kernel module loaded. OverlayFS can operate on top of most supported Linux
filesystems. However, ext4 is currently recommended for use in production
environments.
The following procedure shows you how to configure your Docker host to use OverlayFS. The procedure assumes that the Docker daemon is in a stopped state.
The following procedure shows you how to configure your Docker host to use
OverlayFS. The procedure assumes that the Docker daemon is in a stopped state.
> **Caution:** If you have already run the Docker daemon on your Docker host and have images you want to keep, `push` them Docker Hub or your private Docker Trusted Registry before attempting this procedure.
> **Caution:** If you have already run the Docker daemon on your Docker host
> and have images you want to keep, `push` them Docker Hub or your private
> Docker Trusted Registry before attempting this procedure.
1. If it is running, stop the Docker `daemon`.
@ -163,28 +240,60 @@ The following procedure shows you how to configure your Docker host to use Overl
Backing Filesystem: extfs
<output truncated>
Notice that the *Backing filesystem* in the output above is showing as `extfs`. Multiple backing filesystems are supported but `extfs` (ext4) is recommended for production use cases.
Notice that the *Backing filesystem* in the output above is showing as
`extfs`. Multiple backing filesystems are supported but `extfs` (ext4) is
recommended for production use cases.
Your Docker host is now using the `overlay` storage driver. If you run the `mount` command, you'll find Docker has automatically created the `overlay` mount with the required "lowerdir", "upperdir", "merged" and "workdir" constructs.
Your Docker host is now using the `overlay` storage driver. If you run the
`mount` command, you'll find Docker has automatically created the `overlay`
mount with the required "lowerdir", "upperdir", "merged" and "workdir"
constructs.
## OverlayFS and Docker Performance
As a general rule, the `overlay` driver should be fast. Almost certainly faster than `aufs` and `devicemapper`. In certain circumstances it may also be faster than `btrfs`. That said, there are a few things to be aware of relative to the performance of Docker using the `overlay` storage driver.
As a general rule, the `overlay` driver should be fast. Almost certainly faster
than `aufs` and `devicemapper`. In certain circumstances it may also be faster
than `btrfs`. That said, there are a few things to be aware of relative to the
performance of Docker using the `overlay` storage driver.
- **Page Caching**. OverlayFS supports page cache sharing. This means multiple containers accessing the same file can share a single page cache entry (or entries). This makes the `overlay` driver efficient with memory and a good option for PaaS and other high density use cases.
- **Page Caching**. OverlayFS supports page cache sharing. This means multiple
containers accessing the same file can share a single page cache entry (or
entries). This makes the `overlay` driver efficient with memory and a good
option for PaaS and other high density use cases.
- **copy_up**. As with AUFS, OverlayFS has to perform copy-up operations any time a container writes to a file for the first time. This can insert latency into the write operation &mdash; especially if the file being copied up is large. However, once the file has been copied up, all subsequent writes to that file occur without the need for further copy-up operations.
- **copy_up**. As with AUFS, OverlayFS has to perform copy-up operations any
time a container writes to a file for the first time. This can insert latency
into the write operation &mdash; especially if the file being copied up is
large. However, once the file has been copied up, all subsequent writes to that
file occur without the need for further copy-up operations.
The OverlayFS copy_up operation should be faster than the same operation with AUFS. This is because AUFS supports more layers than OverlayFS and it is possible to incur far larger latencies if searching through many AUFS layers.
The OverlayFS copy_up operation should be faster than the same operation
with AUFS. This is because AUFS supports more layers than OverlayFS and it is
possible to incur far larger latencies if searching through many AUFS layers.
- **RPMs and Yum**. OverlayFS only implements a subset of the POSIX standards. This can result in certain OverlayFS operations breaking POSIX standards. One such operation is the *copy-up* operation. Therefore, using `yum` inside of a container on a Docker host using the `overlay` storage driver is unlikely to work without implementing workarounds.
- **RPMs and Yum**. OverlayFS only implements a subset of the POSIX standards.
This can result in certain OverlayFS operations breaking POSIX standards. One
such operation is the *copy-up* operation. Therefore, using `yum` inside of a
container on a Docker host using the `overlay` storage driver is unlikely to
work without implementing workarounds.
- **Inode limits**. Use of the `overlay` storage driver can cause excessive inode consumption. This is especially so as the number of images and containers on the Docker host grows. A Docker host with a large number of images and lots of started and stopped containers can quickly run out of inodes.
- **Inode limits**. Use of the `overlay` storage driver can cause excessive
inode consumption. This is especially so as the number of images and containers
on the Docker host grows. A Docker host with a large number of images and lots
of started and stopped containers can quickly run out of inodes.
Unfortunately you can only specify the number of inodes in a filesystem at the time of creation. For this reason, you may wish to consider putting `/var/lib/docker` on a separate device with its own filesystem or manually specifying the number of inodes when creating the filesystem.
Unfortunately you can only specify the number of inodes in a filesystem at the
time of creation. For this reason, you may wish to consider putting
`/var/lib/docker` on a separate device with its own filesystem, or manually
specifying the number of inodes when creating the filesystem.
The following generic performance best practices also apply to OverlayFS.
- **Solid State Devices (SSD)**. For best performance it is always a good idea to use fast storage media such as solid state devices (SSD).
- **Solid State Devices (SSD)**. For best performance it is always a good idea
to use fast storage media such as solid state devices (SSD).
- **Use Data Volumes**. Data volumes provide the best and most predictable performance. This is because they bypass the storage driver and do not incur any of the potential overheads introduced by thin provisioning and copy-on-write. For this reason, you may want to place heavy write workloads on data volumes.
- **Use Data Volumes**. Data volumes provide the best and most predictable
performance. This is because they bypass the storage driver and do not incur
any of the potential overheads introduced by thin provisioning and
copy-on-write. For this reason, you should place heavy write workloads on data
volumes.

View file

@ -12,15 +12,27 @@ weight = -1
# Select a storage driver
This page describes Docker's storage driver feature. It lists the storage
driver's that Docker supports and the basic commands associated with managing them. Finally, this page provides guidance on choosing a storage driver.
driver's that Docker supports and the basic commands associated with managing
them. Finally, this page provides guidance on choosing a storage driver.
The material on this page is intended for readers who already have an [understanding of the storage driver technology](imagesandcontainers.md).
The material on this page is intended for readers who already have an
[understanding of the storage driver technology](imagesandcontainers.md).
## A pluggable storage driver architecture
The Docker has a pluggable storage driver architecture. This gives you the flexibility to "plug in" the storage driver is best for your environment and use-case. Each Docker storage driver is based on a Linux filesystem or volume manager. Further, each storage driver is free to implement the management of image layers and the container layer in it's own unique way. This means some storage drivers perform better than others in different circumstances.
Docker has a pluggable storage driver architecture. This gives you the
flexibility to "plug in" the storage driver that is best for your environment
and use-case. Each Docker storage driver is based on a Linux filesystem or
volume manager. Further, each storage driver is free to implement the
management of image layers and the container layer in its own unique way. This
means some storage drivers perform better than others in different
circumstances.
Once you decide which driver is best, you set this driver on the Docker daemon at start time. As a result, the Docker daemon can only run one storage driver, and all containers created by that daemon instance use the same storage driver. The table below shows the supported storage driver technologies and their driver names:
Once you decide which driver is best, you set this driver on the Docker daemon
at start time. As a result, the Docker daemon can only run one storage driver,
and all containers created by that daemon instance use the same storage driver.
The table below shows the supported storage driver technologies and their
driver names:
|Technology |Storage driver name |
|--------------|---------------------|
@ -31,7 +43,8 @@ Once you decide which driver is best, you set this driver on the Docker daemon a
|VFS* |`vfs` |
|ZFS |`zfs` |
To find out which storage driver is set on the daemon , you use the `docker info` command:
To find out which storage driver is set on the daemon , you use the
`docker info` command:
$ docker info
Containers: 0
@ -44,9 +57,19 @@ To find out which storage driver is set on the daemon , you use the `docker info
Operating System: Ubuntu 15.04
... output truncated ...
The `info` subcommand reveals that the Docker daemon is using the `overlay` storage driver with a `Backing Filesystem` value of `extfs`. The `extfs` value means that the `overlay` storage driver is operating on top of an existing (ext) filesystem. The backing filesystem refers to the filesystem that was used to create the Docker host's local storage area under `/var/lib/docker`.
The `info` subcommand reveals that the Docker daemon is using the `overlay`
storage driver with a `Backing Filesystem` value of `extfs`. The `extfs` value
means that the `overlay` storage driver is operating on top of an existing
(ext) filesystem. The backing filesystem refers to the filesystem that was used
to create the Docker host's local storage area under `/var/lib/docker`.
Which storage driver you use, in part, depends on the backing filesystem you plan to use for your Docker host's local storage area. Some storage drivers can operate on top of different backing filesystems. However, other storage drivers require the backing filesystem to be the same as the storage driver. For example, the `btrfs` storage driver can only work with a `btrfs` backing filesystem. The following table lists each storage driver and whether it must match the host's backing file system:
Which storage driver you use, in part, depends on the backing filesystem you
plan to use for your Docker host's local storage area. Some storage drivers can
operate on top of different backing filesystems. However, other storage
drivers require the backing filesystem to be the same as the storage driver.
For example, the `btrfs` storage driver on a Btrfs backing filesystem. The
following table lists each storage driver and whether it must match the host's
backing file system:
|Storage driver |Must match backing filesystem |
|---------------|------------------------------|
@ -58,9 +81,12 @@ Which storage driver you use, in part, depends on the backing filesystem you pla
|zfs |Yes |
You can set the storage driver by passing the `--storage-driver=<name>` option to the `docker daemon` command line or by setting the option on the `DOCKER_OPTS` line in `/etc/default/docker` file.
You can set the storage driver by passing the `--storage-driver=<name>` option
to the `docker daemon` command line, or by setting the option on the
`DOCKER_OPTS` line in the `/etc/default/docker` file.
The following command shows how to start the Docker daemon with the `devicemapper` storage driver using the `docker daemon` command:
The following command shows how to start the Docker daemon with the
`devicemapper` storage driver using the `docker daemon` command:
$ docker daemon --storage-driver=devicemapper &
@ -90,25 +116,82 @@ The following command shows how to start the Docker daemon with the `devicemappe
Operating System: Ubuntu 15.04
<output truncated>
Your choice of storage driver can affect the performance of your containerized applications. So it's important to understand the different storage driver options available and select the right one for your application. Later, in this page you'll find some advice for choosing an appropriate driver.
Your choice of storage driver can affect the performance of your containerized
applications. So it's important to understand the different storage driver
options available and select the right one for your application. Later, in this
page you'll find some advice for choosing an appropriate driver.
## Shared storage systems and the storage driver
Many enterprises consume storage from shared storage systems such as SAN and NAS arrays. These often provide increased performance and availability, as well as advanced features such as thin provisioning, deduplication and compression.
Many enterprises consume storage from shared storage systems such as SAN and
NAS arrays. These often provide increased performance and availability, as well
as advanced features such as thin provisioning, deduplication and compression.
The Docker storage driver and data volumes can both operate on top of storage provided by shared storage systems. This allows Docker to leverage the increased performance and availability these systems provide. However, Docker does not integrate with these underlying systems.
The Docker storage driver and data volumes can both operate on top of storage
provided by shared storage systems. This allows Docker to leverage the
increased performance and availability these systems provide. However, Docker
does not integrate with these underlying systems.
Remember that each Docker storage driver is based on a Linux filesystem or volume manager. Be sure to follow existing best practices for operating your storage driver (filesystem or volume manager) on top of your shared storage system. For example, if using the ZFS storage driver on top of *XYZ* shared storage system, be sure to follow best practices for operating ZFS filesystems on top of XYZ shared storage system.
Remember that each Docker storage driver is based on a Linux filesystem or
volume manager. Be sure to follow existing best practices for operating your
storage driver (filesystem or volume manager) on top of your shared storage
system. For example, if using the ZFS storage driver on top of *XYZ* shared
storage system, be sure to follow best practices for operating ZFS filesystems
on top of XYZ shared storage system.
## Which storage driver should you choose?
As you might expect, the answer to this question is "it depends". While there are some clear cases where one particular storage driver outperforms other for certain workloads, you should factor all of the following into your decision:
Several factors influence the selection of a storage driver. However, these two
facts must be kept in mind:
Choose a storage driver that you and your team/organization are comfortable with. Consider how much experience you have with a particular storage driver. There is no substitute for experience and it is rarely a good idea to try something brand new in production. That's what labs and laptops are for!
1. No single driver is well suited to every use-case
2. Storage drivers are improving and evolving all of the time
If your Docker infrastructure is under support contracts, choose an option that will get you good support. You probably don't want to go with a solution that your support partners have little or no experience with.
With these factors in mind, the following points, coupled with the table below,
should provide some guidance.
Whichever driver you choose, make sure it has strong community support and momentum. This is important because storage driver development in the Docker project relies on the community as much as the Docker staff to thrive.
### Stability
For the most stable and hassle-free Docker experience, you should consider the
following:
- **Use the default storage driver for your distribution**. When Docker
installs, it chooses a default storage driver based on the configuration of
your system. Stability is an important factor influencing which storage driver
is used by default. Straying from this default may increase your chances of
encountering bugs and nuances.
- **Follow the configuration specified on the CS Engine
[compatibility matrix](https://www.docker.com/compatibility-maintenance)**. The
CS Engine is the commercially supported version of the Docker Engine. It's
code-base is identical to the open source Engine, but it has a limited set of
supported configurations. These *supported configurations* use the most stable
and mature storage drivers. Straying from these configurations may also
increase your chances of encountering bugs and nuances.
### Experience and expertise
Choose a storage driver that you and your team/organization have experience
with. For example, if you use RHEL or one of its downstream forks, you may
already have experience with LVM and Device Mapper. If so, you may wish to use
the `devicemapper` driver.
If you do not feel you have expertise with any of the storage drivers supported
by Docker, and you want an easy-to-use stable Docker experience, you should
consider using the default driver installed by your distribution's Docker
package.
### Future-proofing
Many people consider OverlayFS as the future of the Docker storage driver.
However, it is less mature, and potentially less stable than some of the more
mature drivers such as `aufs` and `devicemapper`. For this reason, you should
use the OverlayFS driver with caution and expect to encounter more bugs and
nuances than if you were using a more mature driver.
The following diagram lists each storage driver and provides insight into some
of their pros and cons. When selecting which storage driver to use, consider
the guidance offered by the table below along with the points mentioned above.
![](images/driver-pros-cons.png)
## Related information

View file

@ -10,13 +10,24 @@ parent = "engine_driver"
# Docker and ZFS in practice
ZFS is a next generation filesystem that supports many advanced storage technologies such as volume management, snapshots, checksumming, compression and deduplication, replication and more.
ZFS is a next generation filesystem that supports many advanced storage
technologies such as volume management, snapshots, checksumming, compression
and deduplication, replication and more.
It was created by Sun Microsystems (now Oracle Corporation) and is open sourced under the CDDL license. Due to licensing incompatibilities between the CDDL and GPL, ZFS cannot be shipped as part of the mainline Linux kernel. However, the ZFS On Linux (ZoL) project provides an out-of-tree kernel module and userspace tools which can be installed separately.
It was created by Sun Microsystems (now Oracle Corporation) and is open sourced
under the CDDL license. Due to licensing incompatibilities between the CDDL
and GPL, ZFS cannot be shipped as part of the mainline Linux kernel. However,
the ZFS On Linux (ZoL) project provides an out-of-tree kernel module and
userspace tools which can be installed separately.
The ZFS on Linux (ZoL) port is healthy and maturing. However, at this point in time it is not recommended to use the `zfs` Docker storage driver for production use unless you have substantial experience with ZFS on Linux.
The ZFS on Linux (ZoL) port is healthy and maturing. However, at this point in
time it is not recommended to use the `zfs` Docker storage driver for
production use unless you have substantial experience with ZFS on Linux.
> **Note:** There is also a FUSE implementation of ZFS on the Linux platform. This should work with Docker but is not recommended. The native ZFS driver (ZoL) is more tested, more performant, and is more widely used. The remainder of this document will relate to the native ZoL port.
> **Note:** There is also a FUSE implementation of ZFS on the Linux platform.
> This should work with Docker but is not recommended. The native ZFS driver
> (ZoL) is more tested, more performant, and is more widely used. The remainder
> of this document will relate to the native ZoL port.
## Image layering and sharing with ZFS
@ -27,53 +38,96 @@ The Docker `zfs` storage driver makes extensive use of three ZFS datasets:
- snapshots
- clones
ZFS filesystems are thinly provisioned and have space allocated to them from a ZFS pool (zpool) via allocate on demand operations. Snapshots and clones are space-efficient point-in-time copies of ZFS filesystems. Snapshots are read-only. Clones are read-write. Clones can only be created from snapshots. This simple relationship is shown in the diagram below.
ZFS filesystems are thinly provisioned and have space allocated to them from a
ZFS pool (zpool) via allocate on demand operations. Snapshots and clones are
space-efficient point-in-time copies of ZFS filesystems. Snapshots are
read-only. Clones are read-write. Clones can only be created from snapshots.
This simple relationship is shown in the diagram below.
![](images/zfs_clones.jpg)
The solid line in the diagram shows the process flow for creating a clone. Step 1 creates a snapshot of the filesystem, and step two creates the clone from the snapshot. The dashed line shows the relationship between the clone and the filesystem, via the snapshot. All three ZFS datasets draw space form the same underlying zpool.
The solid line in the diagram shows the process flow for creating a clone. Step
1 creates a snapshot of the filesystem, and step two creates the clone from
the snapshot. The dashed line shows the relationship between the clone and the
filesystem, via the snapshot. All three ZFS datasets draw space form the same
underlying zpool.
On Docker hosts using the `zfs` storage driver, the base layer of an image is a ZFS filesystem. Each child layer is a ZFS clone based on a ZFS snapshot of the layer below it. A container is a ZFS clone based on a ZFS Snapshot of the top layer of the image it's created from. All ZFS datasets draw their space from a common zpool. The diagram below shows how this is put together with a running container based on a two-layer image.
On Docker hosts using the `zfs` storage driver, the base layer of an image is a
ZFS filesystem. Each child layer is a ZFS clone based on a ZFS snapshot of the
layer below it. A container is a ZFS clone based on a ZFS Snapshot of the top
layer of the image it's created from. All ZFS datasets draw their space from a
common zpool. The diagram below shows how this is put together with a running
container based on a two-layer image.
![](images/zfs_zpool.jpg)
The following process explains how images are layered and containers created. The process is based on the diagram above.
The following process explains how images are layered and containers created.
The process is based on the diagram above.
1. The base layer of the image exists on the Docker host as a ZFS filesystem.
This filesystem consumes space from the zpool used to create the Docker host's local storage area at `/var/lib/docker`.
This filesystem consumes space from the zpool used to create the Docker
host's local storage area at `/var/lib/docker`.
2. Additional image layers are clones of the dataset hosting the image layer directly below it.
2. Additional image layers are clones of the dataset hosting the image layer
directly below it.
In the diagram, "Layer 1" is added by making a ZFS snapshot of the base layer and then creating a clone from that snapshot. The clone is writable and consumes space on-demand from the zpool. The snapshot is read-only, maintaining the base layer as an immutable object.
In the diagram, "Layer 1" is added by making a ZFS snapshot of the base
layer and then creating a clone from that snapshot. The clone is writable and
consumes space on-demand from the zpool. The snapshot is read-only, maintaining
the base layer as an immutable object.
3. When the container is launched, a read-write layer is added above the image.
In the diagram above, the container's read-write layer is created by making a snapshot of the top layer of the image (Layer 1) and creating a clone from that snapshot.
In the diagram above, the container's read-write layer is created by making
a snapshot of the top layer of the image (Layer 1) and creating a clone from
that snapshot.
As changes are made to the container, space is allocated to it from the zpool via allocate-on-demand operations. By default, ZFS will allocate space in blocks of 128K.
As changes are made to the container, space is allocated to it from the
zpool via allocate-on-demand operations. By default, ZFS will allocate space in
blocks of 128K.
This process of creating child layers and containers from *read-only* snapshots allows images to be maintained as immutable objects.
This process of creating child layers and containers from *read-only* snapshots
allows images to be maintained as immutable objects.
## Container reads and writes with ZFS
Container reads with the `zfs` storage driver are very simple. A newly launched container is based on a ZFS clone. This clone initially shares all of its data with the dataset it was created from. This means that read operations with the `zfs` storage driver are fast &ndash; even if the data being read was copied into the container yet. This sharing of data blocks is shown in the diagram below.
Container reads with the `zfs` storage driver are very simple. A newly launched
container is based on a ZFS clone. This clone initially shares all of its data
with the dataset it was created from. This means that read operations with the
`zfs` storage driver are fast &ndash; even if the data being read was note
copied into the container yet. This sharing of data blocks is shown in the
diagram below.
![](images/zpool_blocks.jpg)
Writing new data to a container is accomplished via an allocate-on-demand operation. Every time a new area of the container needs writing to, a new block is allocated from the zpool. This means that containers consume additional space as new data is written to them. New space is allocated to the container (ZFS Clone) from the underlying zpool.
Writing new data to a container is accomplished via an allocate-on-demand
operation. Every time a new area of the container needs writing to, a new block
is allocated from the zpool. This means that containers consume additional
space as new data is written to them. New space is allocated to the container
(ZFS Clone) from the underlying zpool.
Updating *existing data* in a container is accomplished by allocating new blocks to the containers clone and storing the changed data in those new blocks. The original are unchanged, allowing the underlying image dataset to remain immutable. This is the same as writing to a normal ZFS filesystem and is an implementation of copy-on-write semantics.
Updating *existing data* in a container is accomplished by allocating new
blocks to the containers clone and storing the changed data in those new
blocks. The original blocks are unchanged, allowing the underlying image
dataset to remain immutable. This is the same as writing to a normal ZFS
filesystem and is an implementation of copy-on-write semantics.
## Configure Docker with the ZFS storage driver
The `zfs` storage driver is only supported on a Docker host where `/var/lib/docker` is mounted as a ZFS filesystem. This section shows you how to install and configure native ZFS on Linux (ZoL) on an Ubuntu 14.04 system.
The `zfs` storage driver is only supported on a Docker host where
`/var/lib/docker` is mounted as a ZFS filesystem. This section shows you how to
install and configure native ZFS on Linux (ZoL) on an Ubuntu 14.04 system.
### Prerequisites
If you have already used the Docker daemon on your Docker host and have images you want to keep, `push` them Docker Hub or your private Docker Trusted Registry before attempting this procedure.
If you have already used the Docker daemon on your Docker host and have images
you want to keep, `push` them Docker Hub or your private Docker Trusted
Registry before attempting this procedure.
Stop the Docker daemon. Then, ensure that you have a spare block device at `/dev/xvdb`. The device identifier may be be different in your environment and you should substitute your own values throughout the procedure.
Stop the Docker daemon. Then, ensure that you have a spare block device at
`/dev/xvdb`. The device identifier may be be different in your environment and
you should substitute your own values throughout the procedure.
### Install Zfs on Ubuntu 14.04 LTS
@ -98,7 +152,8 @@ Stop the Docker daemon. Then, ensure that you have a spare block device at `/dev
gpg: imported: 1 (RSA: 1)
OK
3. Get the latest package lists for all registered repositories and package archives.
3. Get the latest package lists for all registered repositories and package
archives.
$ sudo apt-get update
Ign http://us-west-2.ec2.archive.ubuntu.com trusty InRelease
@ -156,7 +211,8 @@ Once ZFS is installed and loaded, you're ready to configure ZFS for Docker.
zpool-docker 93.5K 3.84G 19K /zpool-docker
zpool-docker/docker 19K 3.84G 19K /var/lib/docker
Now that you have a ZFS filesystem mounted to `/var/lib/docker`, the daemon should automatically load with the `zfs` storage driver.
Now that you have a ZFS filesystem mounted to `/var/lib/docker`, the daemon
should automatically load with the `zfs` storage driver.
5. Start the Docker daemon.
@ -165,9 +221,9 @@ Once ZFS is installed and loaded, you're ready to configure ZFS for Docker.
The procedure for starting the Docker daemon may differ depending on the
Linux distribution you are using. It is possible to force the Docker daemon
to start with the `zfs` storage driver by passing the `--storage-driver=zfs`
flag to the `docker daemon` command, or to the `DOCKER_OPTS` line in the
Docker config file.
to start with the `zfs` storage driver by passing the
`--storage-driver=zfs`flag to the `docker daemon` command, or to the
`DOCKER_OPTS` line in the Docker config file.
6. Verify that the daemon is using the `zfs` storage driver.
@ -186,33 +242,55 @@ Once ZFS is installed and loaded, you're ready to configure ZFS for Docker.
[...]
The output of the command above shows that the Docker daemon is using the
`zfs` storage driver and that the parent dataset is the `zpool-docker/docker`
filesystem created earlier.
`zfs` storage driver and that the parent dataset is the
`zpool-docker/docker` filesystem created earlier.
Your Docker host is now using ZFS to store to manage its images and containers.
## ZFS and Docker performance
There are several factors that influence the performance of Docker using the `zfs` storage driver.
There are several factors that influence the performance of Docker using the
`zfs` storage driver.
- **Memory**. Memory has a major impact on ZFS performance. This goes back to the fact that ZFS was originally designed for use on big Sun Solaris servers with large amounts of memory. Keep this in mind when sizing your Docker hosts.
- **Memory**. Memory has a major impact on ZFS performance. This goes back to
the fact that ZFS was originally designed for use on big Sun Solaris servers
with large amounts of memory. Keep this in mind when sizing your Docker hosts.
- **ZFS Features**. Using ZFS features, such as deduplication, can significantly increase the amount
of memory ZFS uses. For memory consumption and performance reasons it is
recommended to turn off ZFS deduplication. However, deduplication at other
layers in the stack (such as SAN or NAS arrays) can still be used as these do
not impact ZFS memory usage and performance. If using SAN, NAS or other hardware
RAID technologies you should continue to follow existing best practices for
using them with ZFS.
- **ZFS Features**. Using ZFS features, such as deduplication, can
significantly increase the amount of memory ZFS uses. For memory consumption
and performance reasons it is recommended to turn off ZFS deduplication.
However, deduplication at other layers in the stack (such as SAN or NAS arrays)
can still be used as these do not impact ZFS memory usage and performance. If
using SAN, NAS or other hardware RAID technologies you should continue to
follow existing best practices for using them with ZFS.
* **ZFS Caching**. ZFS caches disk blocks in a memory structure called the adaptive replacement cache (ARC). The *Single Copy ARC* feature of ZFS allows a single cached copy of a block to be shared by multiple clones of a filesystem. This means that multiple running containers can share a single copy of cached block. This means that ZFS is a good option for PaaS and other high density use cases.
- **ZFS Caching**. ZFS caches disk blocks in a memory structure called the
adaptive replacement cache (ARC). The *Single Copy ARC* feature of ZFS allows a
single cached copy of a block to be shared by multiple clones of a filesystem.
This means that multiple running containers can share a single copy of cached
block. This means that ZFS is a good option for PaaS and other high density use
cases.
- **Fragmentation**. Fragmentation is a natural byproduct of copy-on-write filesystems like ZFS. However, ZFS writes in 128K blocks and allocates *slabs* (multiple 128K blocks) to CoW operations in an attempt to reduce fragmentation. The ZFS intent log (ZIL) and the coalescing of writes (delayed writes) also help to reduce fragmentation.
- **Fragmentation**. Fragmentation is a natural byproduct of copy-on-write
filesystems like ZFS. However, ZFS writes in 128K blocks and allocates *slabs*
(multiple 128K blocks) to CoW operations in an attempt to reduce fragmentation.
The ZFS intent log (ZIL) and the coalescing of writes (delayed writes) also
help to reduce fragmentation.
- **Use the native ZFS driver for Linux**. Although the Docker `zfs` storage driver supports the ZFS FUSE implementation, it is not recommended when high performance is required. The native ZFS on Linux driver tends to perform better than the FUSE implementation.
- **Use the native ZFS driver for Linux**. Although the Docker `zfs` storage
driver supports the ZFS FUSE implementation, it is not recommended when high
performance is required. The native ZFS on Linux driver tends to perform better
than the FUSE implementation.
The following generic performance best practices also apply to ZFS.
- **Use of SSD**. For best performance it is always a good idea to use fast storage media such as solid state devices (SSD). However, if you only have a limited amount of SSD storage available it is recommended to place the ZIL on SSD.
- **Use of SSD**. For best performance it is always a good idea to use fast
storage media such as solid state devices (SSD). However, if you only have a
limited amount of SSD storage available it is recommended to place the ZIL on
SSD.
- **Use Data Volumes**. Data volumes provide the best and most predictable performance. This is because they bypass the storage driver and do not incur any of the potential overheads introduced by thin provisioning and copy-on-write. For this reason, you may want to place heavy write workloads on data volumes.
- **Use Data Volumes**. Data volumes provide the best and most predictable
performance. This is because they bypass the storage driver and do not incur
any of the potential overheads introduced by thin provisioning and
copy-on-write. For this reason, you should place heavy write workloads on data
volumes.