2015-09-16 13:09:10 -04:00
|
|
|
<!--[metadata]>
|
|
|
|
+++
|
|
|
|
title = "OverlayFS storage in practice"
|
|
|
|
description = "Learn how to optimize your use of OverlayFS driver."
|
|
|
|
keywords = ["container, storage, driver, OverlayFS "]
|
|
|
|
[menu.main]
|
2016-01-23 23:36:40 -05:00
|
|
|
parent = "engine_driver"
|
2015-09-16 13:09:10 -04:00
|
|
|
+++
|
|
|
|
<![end-metadata]-->
|
|
|
|
|
|
|
|
# Docker and OverlayFS in practice
|
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
OverlayFS is a modern *union filesystem* that is similar to AUFS. In comparison
|
|
|
|
to AUFS, OverlayFS:
|
2015-09-16 13:09:10 -04:00
|
|
|
|
|
|
|
* has a simpler design
|
|
|
|
* has been in the mainline Linux kernel since version 3.18
|
|
|
|
* is potentially faster
|
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
As a result, OverlayFS is rapidly gaining popularity in the Docker community
|
|
|
|
and is seen by many as a natural successor to AUFS. As promising as OverlayFS
|
|
|
|
is, it is still relatively young. Therefore caution should be taken before
|
|
|
|
using it in production Docker environments.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
Docker's `overlay` storage driver leverages several OverlayFS features to build
|
|
|
|
and manage the on-disk structures of images and containers.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
>**Note**: Since it was merged into the mainline kernel, the OverlayFS *kernel
|
|
|
|
>module* was renamed from "overlayfs" to "overlay". As a result you may see the
|
|
|
|
> two terms used interchangeably in some documentation. However, this document
|
|
|
|
> uses "OverlayFS" to refer to the overall filesystem, and `overlay` to refer
|
|
|
|
> to Docker's storage-driver.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
|
|
|
## Image layering and sharing with OverlayFS
|
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
OverlayFS takes two directories on a single Linux host, layers one on top of
|
|
|
|
the other, and provides a single unified view. These directories are often
|
|
|
|
referred to as *layers* and the technology used to layer them is known as a
|
|
|
|
*union mount*. The OverlayFS terminology is "lowerdir" for the bottom layer and
|
|
|
|
"upperdir" for the top layer. The unified view is exposed through its own
|
|
|
|
directory called "merged".
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
The diagram below shows how a Docker image and a Docker container are layered.
|
|
|
|
The image layer is the "lowerdir" and the container layer is the "upperdir".
|
|
|
|
The unified view is exposed through a directory called "merged" which is
|
|
|
|
effectively the containers mount point. The diagram shows how Docker constructs
|
|
|
|
map to OverlayFS constructs.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
|
|
|
![](images/overlay_constructs.jpg)
|
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
Notice how the image layer and container layer can contain the same files. When
|
|
|
|
this happens, the files in the container layer ("upperdir") are dominant and
|
|
|
|
obscure the existence of the same files in the image layer ("lowerdir"). The
|
|
|
|
container mount ("merged") presents the unified view.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
OverlayFS only works with two layers. This means that multi-layered images
|
|
|
|
cannot be implemented as multiple OverlayFS layers. Instead, each image layer
|
|
|
|
is implemented as its own directory under `/var/lib/docker/overlay`.
|
|
|
|
Hard links are then used as a space-efficient way to reference data shared with
|
|
|
|
lower layers. As of Docker 1.10, image layer IDs no longer correspond to
|
|
|
|
directory names in `/var/lib/docker/`
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
To create a container, the `overlay` driver combines the directory representing
|
|
|
|
the image's top layer plus a new directory for the container. The image's top
|
|
|
|
layer is the "lowerdir" in the overlay and read-only. The new directory for the
|
|
|
|
container is the "upperdir" and is writable.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
|
|
|
## Example: Image and container on-disk constructs
|
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
The following `docker pull` command shows a Docker host with downloading a
|
|
|
|
Docker image comprising four layers.
|
|
|
|
|
|
|
|
$ sudo docker pull ubuntu
|
|
|
|
Using default tag: latest
|
|
|
|
latest: Pulling from library/ubuntu
|
|
|
|
8387d9ff0016: Pull complete
|
|
|
|
3b52deaaf0ed: Pull complete
|
|
|
|
4bd501fad6de: Pull complete
|
|
|
|
a3ed95caeb02: Pull complete
|
|
|
|
Digest: sha256:457b05828bdb5dcc044d93d042863fba3f2158ae249a6db5ae3934307c757c54
|
|
|
|
Status: Downloaded newer image for ubuntu:latest
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
Each image layer has it's own directory under `/var/lib/docker/overlay/`. This
|
2016-03-17 04:13:51 -04:00
|
|
|
is where the contents of each image layer are stored.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
The output of the command below shows the four directories that store the
|
|
|
|
contents of each image layer just pulled. However, as can be seen, the image
|
|
|
|
layer IDs do not match the directory names in `/var/lib/docker/overlay`. This
|
|
|
|
is normal behavior in Docker 1.10 and later.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
|
|
|
$ ls -l /var/lib/docker/overlay/
|
|
|
|
total 24
|
|
|
|
drwx------ 3 root root 4096 Oct 28 11:02 1d073211c498fd5022699b46a936b4e4bdacb04f637ad64d3475f558783f5c3e
|
|
|
|
drwx------ 3 root root 4096 Oct 28 11:02 5a4526e952f0aa24f3fcc1b6971f7744eb5465d572a48d47c492cb6bbf9cbcda
|
|
|
|
drwx------ 5 root root 4096 Oct 28 11:06 99fcaefe76ef1aa4077b90a413af57fd17d19dce4e50d7964a273aae67055235
|
|
|
|
drwx------ 3 root root 4096 Oct 28 11:01 c63fb41c2213f511f12f294dd729b9903a64d88f098c20d2350905ac1fdbcbba
|
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
The image layer directories contain the files unique to that layer as well as
|
|
|
|
hard links to the data that is shared with lower layers. This allows for
|
|
|
|
efficient use of disk space.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
Containers also exist on-disk in the Docker host's filesystem under
|
|
|
|
`/var/lib/docker/overlay/`. If you inspect the directory relating to a running
|
|
|
|
container using the `ls -l` command, you find the following file and
|
|
|
|
directories.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
$ ls -l /var/lib/docker/overlay/<directory-of-running-container>
|
2015-09-16 13:09:10 -04:00
|
|
|
total 16
|
|
|
|
-rw-r--r-- 1 root root 64 Oct 28 11:06 lower-id
|
|
|
|
drwxr-xr-x 1 root root 4096 Oct 28 11:06 merged
|
|
|
|
drwxr-xr-x 4 root root 4096 Oct 28 11:06 upper
|
|
|
|
drwx------ 3 root root 4096 Oct 28 11:06 work
|
|
|
|
|
2016-02-11 18:21:52 -05:00
|
|
|
These four filesystem objects are all artifacts of OverlayFS. The "lower-id"
|
2016-01-11 12:39:41 -05:00
|
|
|
file contains the ID of the top layer of the image the container is based on.
|
|
|
|
This is used by OverlayFS as the "lowerdir".
|
2015-09-16 13:09:10 -04:00
|
|
|
|
|
|
|
$ cat /var/lib/docker/overlay/73de7176c223a6c82fd46c48c5f152f2c8a7e49ecb795a7197c3bb795c4d879e/lower-id
|
|
|
|
1d073211c498fd5022699b46a936b4e4bdacb04f637ad64d3475f558783f5c3e
|
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
The "upper" directory is the containers read-write layer. Any changes made to
|
|
|
|
the container are written to this directory.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
The "merged" directory is effectively the containers mount point. This is where
|
|
|
|
the unified view of the image ("lowerdir") and container ("upperdir") is
|
|
|
|
exposed. Any changes written to the container are immediately reflected in this
|
|
|
|
directory.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
The "work" directory is required for OverlayFS to function. It is used for
|
|
|
|
things such as *copy_up* operations.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
You can verify all of these constructs from the output of the `mount` command.
|
|
|
|
(Ellipses and line breaks are used in the output below to enhance readability.)
|
2015-09-16 13:09:10 -04:00
|
|
|
|
|
|
|
$ mount | grep overlay
|
|
|
|
overlay on /var/lib/docker/overlay/73de7176c223.../merged
|
|
|
|
type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay/1d073211c498.../root,
|
|
|
|
upperdir=/var/lib/docker/overlay/73de7176c223.../upper,
|
|
|
|
workdir=/var/lib/docker/overlay/73de7176c223.../work)
|
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
The output reflects that the overlay is mounted as read-write ("rw").
|
2015-09-16 13:09:10 -04:00
|
|
|
|
|
|
|
## Container reads and writes with overlay
|
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
Consider three scenarios where a container opens a file for read access with
|
|
|
|
overlay.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
- **The file does not exist in the container layer**. If a container opens a
|
|
|
|
file for read access and the file does not already exist in the container
|
|
|
|
("upperdir") it is read from the image ("lowerdir"). This should incur very
|
|
|
|
little performance overhead.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
- **The file only exists in the container layer**. If a container opens a file
|
|
|
|
for read access and the file exists in the container ("upperdir") and not in
|
|
|
|
the image ("lowerdir"), it is read directly from the container.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
- **The file exists in the container layer and the image layer**. If a
|
|
|
|
container opens a file for read access and the file exists in the image layer
|
|
|
|
and the container layer, the file's version in the container layer is read.
|
|
|
|
This is because files in the container layer ("upperdir") obscure files with
|
|
|
|
the same name in the image layer ("lowerdir").
|
2015-09-16 13:09:10 -04:00
|
|
|
|
|
|
|
Consider some scenarios where files in a container are modified.
|
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
- **Writing to a file for the first time**. The first time a container writes
|
|
|
|
to an existing file, that file does not exist in the container ("upperdir").
|
|
|
|
The `overlay` driver performs a *copy_up* operation to copy the file from the
|
|
|
|
image ("lowerdir") to the container ("upperdir"). The container then writes the
|
|
|
|
changes to the new copy of the file in the container layer.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
However, OverlayFS works at the file level not the block level. This means
|
|
|
|
that all OverlayFS copy-up operations copy entire files, even if the file is
|
|
|
|
very large and only a small part of it is being modified. This can have a
|
|
|
|
noticeable impact on container write performance. However, two things are
|
|
|
|
worth noting:
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
* The copy_up operation only occurs the first time any given file is
|
|
|
|
written to. Subsequent writes to the same file will operate against the copy of
|
|
|
|
the file already copied up to the container.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
* OverlayFS only works with two layers. This means that performance should
|
|
|
|
be better than AUFS which can suffer noticeable latencies when searching for
|
|
|
|
files in images with many layers.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
- **Deleting files and directories**. When files are deleted within a container
|
|
|
|
a *whiteout* file is created in the containers "upperdir". The version of the
|
|
|
|
file in the image layer ("lowerdir") is not deleted. However, the whiteout file
|
|
|
|
in the container obscures it.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
Deleting a directory in a container results in *opaque directory* being
|
|
|
|
created in the "upperdir". This has the same effect as a whiteout file and
|
|
|
|
effectively masks the existence of the directory in the image's "lowerdir".
|
2015-09-16 13:09:10 -04:00
|
|
|
|
|
|
|
## Configure Docker with the overlay storage driver
|
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
To configure Docker to use the overlay storage driver your Docker host must be
|
|
|
|
running version 3.18 of the Linux kernel (preferably newer) with the overlay
|
|
|
|
kernel module loaded. OverlayFS can operate on top of most supported Linux
|
|
|
|
filesystems. However, ext4 is currently recommended for use in production
|
|
|
|
environments.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
The following procedure shows you how to configure your Docker host to use
|
|
|
|
OverlayFS. The procedure assumes that the Docker daemon is in a stopped state.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
> **Caution:** If you have already run the Docker daemon on your Docker host
|
|
|
|
> and have images you want to keep, `push` them Docker Hub or your private
|
|
|
|
> Docker Trusted Registry before attempting this procedure.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
|
|
|
1. If it is running, stop the Docker `daemon`.
|
|
|
|
|
|
|
|
2. Verify your kernel version and that the overlay kernel module is loaded.
|
|
|
|
|
|
|
|
$ uname -r
|
|
|
|
3.19.0-21-generic
|
|
|
|
|
|
|
|
$ lsmod | grep overlay
|
|
|
|
overlay
|
|
|
|
|
|
|
|
3. Start the Docker daemon with the `overlay` storage driver.
|
|
|
|
|
2016-04-28 02:55:22 -04:00
|
|
|
$ dockerd --storage-driver=overlay &
|
2015-09-16 13:09:10 -04:00
|
|
|
[1] 29403
|
|
|
|
root@ip-10-0-0-174:/home/ubuntu# INFO[0000] Listening for HTTP on unix (/var/run/docker.sock)
|
|
|
|
INFO[0000] Option DefaultDriver: bridge
|
|
|
|
INFO[0000] Option DefaultNetwork: bridge
|
|
|
|
<output truncated>
|
|
|
|
|
|
|
|
Alternatively, you can force the Docker daemon to automatically start with
|
|
|
|
the `overlay` driver by editing the Docker config file and adding the
|
|
|
|
`--storage-driver=overlay` flag to the `DOCKER_OPTS` line. Once this option
|
|
|
|
is set you can start the daemon using normal startup scripts without having
|
|
|
|
to manually pass in the `--storage-driver` flag.
|
|
|
|
|
|
|
|
4. Verify that the daemon is using the `overlay` storage driver
|
|
|
|
|
|
|
|
$ docker info
|
|
|
|
Containers: 0
|
|
|
|
Images: 0
|
|
|
|
Storage Driver: overlay
|
|
|
|
Backing Filesystem: extfs
|
|
|
|
<output truncated>
|
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
Notice that the *Backing filesystem* in the output above is showing as
|
|
|
|
`extfs`. Multiple backing filesystems are supported but `extfs` (ext4) is
|
|
|
|
recommended for production use cases.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
Your Docker host is now using the `overlay` storage driver. If you run the
|
|
|
|
`mount` command, you'll find Docker has automatically created the `overlay`
|
|
|
|
mount with the required "lowerdir", "upperdir", "merged" and "workdir"
|
|
|
|
constructs.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
|
|
|
## OverlayFS and Docker Performance
|
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
As a general rule, the `overlay` driver should be fast. Almost certainly faster
|
|
|
|
than `aufs` and `devicemapper`. In certain circumstances it may also be faster
|
|
|
|
than `btrfs`. That said, there are a few things to be aware of relative to the
|
|
|
|
performance of Docker using the `overlay` storage driver.
|
|
|
|
|
|
|
|
- **Page Caching**. OverlayFS supports page cache sharing. This means multiple
|
|
|
|
containers accessing the same file can share a single page cache entry (or
|
|
|
|
entries). This makes the `overlay` driver efficient with memory and a good
|
|
|
|
option for PaaS and other high density use cases.
|
|
|
|
|
|
|
|
- **copy_up**. As with AUFS, OverlayFS has to perform copy-up operations any
|
|
|
|
time a container writes to a file for the first time. This can insert latency
|
|
|
|
into the write operation — especially if the file being copied up is
|
|
|
|
large. However, once the file has been copied up, all subsequent writes to that
|
|
|
|
file occur without the need for further copy-up operations.
|
|
|
|
|
|
|
|
The OverlayFS copy_up operation should be faster than the same operation
|
|
|
|
with AUFS. This is because AUFS supports more layers than OverlayFS and it is
|
|
|
|
possible to incur far larger latencies if searching through many AUFS layers.
|
|
|
|
|
|
|
|
- **RPMs and Yum**. OverlayFS only implements a subset of the POSIX standards.
|
|
|
|
This can result in certain OverlayFS operations breaking POSIX standards. One
|
|
|
|
such operation is the *copy-up* operation. Therefore, using `yum` inside of a
|
|
|
|
container on a Docker host using the `overlay` storage driver is unlikely to
|
|
|
|
work without implementing workarounds.
|
|
|
|
|
|
|
|
- **Inode limits**. Use of the `overlay` storage driver can cause excessive
|
|
|
|
inode consumption. This is especially so as the number of images and containers
|
|
|
|
on the Docker host grows. A Docker host with a large number of images and lots
|
|
|
|
of started and stopped containers can quickly run out of inodes.
|
|
|
|
|
|
|
|
Unfortunately you can only specify the number of inodes in a filesystem at the
|
|
|
|
time of creation. For this reason, you may wish to consider putting
|
|
|
|
`/var/lib/docker` on a separate device with its own filesystem, or manually
|
|
|
|
specifying the number of inodes when creating the filesystem.
|
2015-09-16 13:09:10 -04:00
|
|
|
|
|
|
|
The following generic performance best practices also apply to OverlayFS.
|
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
- **Solid State Devices (SSD)**. For best performance it is always a good idea
|
|
|
|
to use fast storage media such as solid state devices (SSD).
|
2015-09-16 13:09:10 -04:00
|
|
|
|
2016-01-11 12:39:41 -05:00
|
|
|
- **Use Data Volumes**. Data volumes provide the best and most predictable
|
|
|
|
performance. This is because they bypass the storage driver and do not incur
|
|
|
|
any of the potential overheads introduced by thin provisioning and
|
|
|
|
copy-on-write. For this reason, you should place heavy write workloads on data
|
|
|
|
volumes.
|