1
0
Fork 0
mirror of https://github.com/moby/moby.git synced 2022-11-09 12:21:53 -05:00

Add docs about how to extend devicemapper thin pool

Signed-off-by: Chun Chen <ramichen@tencent.com>

Update to device mapper
Entering comments

Signed-off-by: Mary Anthony <mary@docker.com>
This commit is contained in:
Chun Chen 2016-04-05 15:35:24 +08:00 committed by Mary Anthony
parent 973d6f0820
commit a7b2f87b06

View file

@ -16,12 +16,10 @@ leverages the thin provisioning and snapshotting capabilities of this framework
for image and container management. This article refers to the Device Mapper for image and container management. This article refers to the Device Mapper
storage driver as `devicemapper`, and the kernel framework as `Device Mapper`. storage driver as `devicemapper`, and the kernel framework as `Device Mapper`.
>**Note**: The [Commercially Supported Docker Engine (CS-Engine) running on RHEL >**Note**: The [Commercially Supported Docker Engine (CS-Engine) running on RHEL
and CentOS Linux](https://www.docker.com/compatibility-maintenance) requires and CentOS Linux](https://www.docker.com/compatibility-maintenance) requires
that you use the `devicemapper` storage driver. that you use the `devicemapper` storage driver.
## An alternative to AUFS ## An alternative to AUFS
Docker originally ran on Ubuntu and Debian Linux and used AUFS for its storage Docker originally ran on Ubuntu and Debian Linux and used AUFS for its storage
@ -61,20 +59,20 @@ With `devicemapper` the high level process for creating images is as follows:
1. The `devicemapper` storage driver creates a thin pool. 1. The `devicemapper` storage driver creates a thin pool.
The pool is created from block devices or loop mounted sparse files (more The pool is created from block devices or loop mounted sparse files (more
on this later). on this later).
2. Next it creates a *base device*. 2. Next it creates a *base device*.
A base device is a thin device with a filesystem. You can see which A base device is a thin device with a filesystem. You can see which
filesystem is in use by running the `docker info` command and checking the filesystem is in use by running the `docker info` command and checking the
`Backing filesystem` value. `Backing filesystem` value.
3. Each new image (and image layer) is a snapshot of this base device. 3. Each new image (and image layer) is a snapshot of this base device.
These are thin provisioned copy-on-write snapshots. This means that they These are thin provisioned copy-on-write snapshots. This means that they
are initially empty and only consume space from the pool when data is written are initially empty and only consume space from the pool when data is written
to them. to them.
With `devicemapper`, container layers are snapshots of the image they are With `devicemapper`, container layers are snapshots of the image they are
created from. Just as with images, container snapshots are thin provisioned created from. Just as with images, container snapshots are thin provisioned
@ -109,9 +107,9 @@ block (`0x44f`) in an example container.
1. An application makes a read request for block `0x44f` in the container. 1. An application makes a read request for block `0x44f` in the container.
Because the container is a thin snapshot of an image it does not have the Because the container is a thin snapshot of an image it does not have the
data. Instead, it has a pointer (PTR) to where the data is stored in the image data. Instead, it has a pointer (PTR) to where the data is stored in the image
snapshot lower down in the image stack. snapshot lower down in the image stack.
2. The storage driver follows the pointer to block `0xf33` in the snapshot 2. The storage driver follows the pointer to block `0xf33` in the snapshot
relating to image layer `a005...`. relating to image layer `a005...`.
@ -121,7 +119,7 @@ snapshot to memory in the container.
4. The storage driver returns the data to the requesting application. 4. The storage driver returns the data to the requesting application.
### Write examples ## Write examples
With the `devicemapper` driver, writing new data to a container is accomplished With the `devicemapper` driver, writing new data to a container is accomplished
by an *allocate-on-demand* operation. Updating existing data uses a by an *allocate-on-demand* operation. Updating existing data uses a
@ -132,7 +130,7 @@ For example, when making a small change to a large file in a container, the
`devicemapper` storage driver does not copy the entire file. It only copies the `devicemapper` storage driver does not copy the entire file. It only copies the
blocks to be modified. Each block is 64KB. blocks to be modified. Each block is 64KB.
#### Writing new data ### Writing new data
To write 56KB of new data to a container: To write 56KB of new data to a container:
@ -141,12 +139,12 @@ To write 56KB of new data to a container:
2. The allocate-on-demand operation allocates a single new 64KB block to the 2. The allocate-on-demand operation allocates a single new 64KB block to the
container's snapshot. container's snapshot.
If the write operation is larger than 64KB, multiple new blocks are If the write operation is larger than 64KB, multiple new blocks are
allocated to the container's snapshot. allocated to the container's snapshot.
3. The data is written to the newly allocated block. 3. The data is written to the newly allocated block.
#### Overwriting existing data ### Overwriting existing data
To modify existing data for the first time: To modify existing data for the first time:
@ -163,7 +161,7 @@ The application in the container is unaware of any of these
allocate-on-demand and copy-on-write operations. However, they may add latency allocate-on-demand and copy-on-write operations. However, they may add latency
to the application's read and write operations. to the application's read and write operations.
## Configuring Docker with Device Mapper ## Configure Docker with devicemapper
The `devicemapper` is the default Docker storage driver on some Linux The `devicemapper` is the default Docker storage driver on some Linux
distributions. This includes RHEL and most of its forks. Currently, the distributions. This includes RHEL and most of its forks. Currently, the
@ -182,18 +180,20 @@ deployments should not run under `loop-lvm` mode.
You can detect the mode by viewing the `docker info` command: You can detect the mode by viewing the `docker info` command:
$ sudo docker info ```bash
Containers: 0 $ sudo docker info
Images: 0 Containers: 0
Storage Driver: devicemapper Images: 0
Pool Name: docker-202:2-25220302-pool Storage Driver: devicemapper
Pool Blocksize: 65.54 kB Pool Name: docker-202:2-25220302-pool
Backing Filesystem: xfs Pool Blocksize: 65.54 kB
... Backing Filesystem: xfs
Data loop file: /var/lib/docker/devicemapper/devicemapper/data [...]
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata Data loop file: /var/lib/docker/devicemapper/devicemapper/data
Library Version: 1.02.93-RHEL7 (2015-01-28) Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
... Library Version: 1.02.93-RHEL7 (2015-01-28)
[...]
```
The output above shows a Docker host running with the `devicemapper` storage The output above shows a Docker host running with the `devicemapper` storage
driver operating in `loop-lvm` mode. This is indicated by the fact that the driver operating in `loop-lvm` mode. This is indicated by the fact that the
@ -203,175 +203,141 @@ files.
### Configure direct-lvm mode for production ### Configure direct-lvm mode for production
The preferred configuration for production deployments is `direct lvm`. This The preferred configuration for production deployments is `direct-lvm`. This
mode uses block devices to create the thin pool. The following procedure shows mode uses block devices to create the thin pool. The following procedure shows
you how to configure a Docker host to use the `devicemapper` storage driver in you how to configure a Docker host to use the `devicemapper` storage driver in
a `direct-lvm` configuration. a `direct-lvm` configuration.
> **Caution:** If you have already run the Engine daemon on your Docker host > **Caution:** If you have already run the Docker daemon on your Docker host
> and have images you want to keep, `push` them Docker Hub or your private > and have images you want to keep, `push` them Docker Hub or your private
> Docker Trusted Registry before attempting this procedure. > Docker Trusted Registry before attempting this procedure.
The procedure below will create a 90GB data volume and 4GB metadata volume to The procedure below will create a 90GB data volume and 4GB metadata volume to
use as backing for the storage pool. It assumes that you have a spare block use as backing for the storage pool. It assumes that you have a spare block
device at `/dev/sdd` with enough free space to complete the task. The device device at `/dev/xvdf` with enough free space to complete the task. The device
identifier and volume sizes may be be different in your environment and you identifier and volume sizes may be be different in your environment and you
should substitute your own values throughout the procedure. should substitute your own values throughout the procedure. The procedure also
assumes that the Docker daemon is in the `stopped` state.
The procedure also assumes that the Engine daemon is in the `stopped` state. 1. Log in to the Docker host you want to configure and stop the Docker daemon.
Any existing images or data are lost by this process.
1. Log in to the Docker host you want to configure. 2. If it exists, delete your existing image store by removing the
2. If it is running, stop the Engine daemon. `/var/lib/docker` directory.
3. Install the logical volume management version 2.
```bash ```bash
$ yum install lvm2 $ sudo rm -rf /var/lib/docker
``` ```
4. Create a physical volume replacing `/dev/sdd` with your block device.
```bash 3. Create an LVM physical volume (PV) on your spare block device using the
$ pvcreate /dev/sdd `pvcreate` command.
```
5. Create a 'docker' volume group. ```bash
$ sudo pvcreate /dev/xvdf
Physical volume `/dev/xvdf` successfully created
```
```bash The device identifier may be different on your system. Remember to substitute
$ vgcreate docker /dev/sdd your value in the command above. If your host is running on AWS EC2, you may
``` need to install `lvm2` and <a href="http://goo.gl/Q5pUwG"
target="_blank">attach an EBS device</a> to use this procedure.
6. Create a thin pool named `thinpool`. 4. Create a new volume group (VG) called `vg-docker` using the PV created in
the previous step.
In this example, the data logical is 95% of the 'docker' volume group size. ```bash
Leaving this free space allows for auto expanding of either the data or $ sudo vgcreate vg-docker /dev/xvdf
metadata if space runs low as a temporary stopgap. Volume group `vg-docker` successfully created
```
```bash 5. Create a new 90GB logical volume (LV) called `data` from space in the
$ lvcreate --wipesignatures y -n thinpool docker -l 95%VG `vg-docker` volume group.
$ lvcreate --wipesignatures y -n thinpoolmeta docker -l 1%VG
```
7. Convert the pool to a thin pool. ```bash
$ sudo lvcreate -L 90G -n data vg-docker
Logical volume `data` created.
```
```bash The command creates an LVM logical volume called `data` and an associated
$ lvconvert -y --zero n -c 512K --thinpool docker/thinpool --poolmetadata docker/thinpoolmeta block device file at `/dev/vg-docker/data`. In a later step, you instruct the
``` `devicemapper` storage driver to use this block device to store image and
container data.
8. Configure autoextension of thin pools via an `lvm` profile. If you receive a signature detection warning, make sure you are working on
the correct devices before continuing. Signature warnings indicate that the
device you're working on is currently in use by LVM or has been used by LVM in
the past.
```bash 6. Create a new logical volume (LV) called `metadata` from space in the
$ vi /etc/lvm/profile/docker-thinpool.profile `vg-docker` volume group.
```
9. Specify 'thin_pool_autoextend_threshold' value. ```bash
$ sudo lvcreate -L 4G -n metadata vg-docker
Logical volume `metadata` created.
```
The value should be the percentage of space used before `lvm` attempts This creates an LVM logical volume called `metadata` and an associated
to autoextend the available space (100 = disabled). block device file at `/dev/vg-docker/metadata`. In the next step you instruct
the `devicemapper` storage driver to use this block device to store image and
container metadata.
``` 7. Start the Docker daemon with the `devicemapper` storage driver and the
thin_pool_autoextend_threshold = 80 `--storage-opt` flags.
```
10. Modify the `thin_pool_autoextend_percent` for when thin pool autoextension occurs. The `data` and `metadata` devices that you pass to the `--storage-opt`
options were created in the previous steps.
The value's setting is the perentage of space to increase the thin pool (100 = ```bash
disabled) $ sudo docker daemon --storage-driver=devicemapper --storage-opt dm.datadev=/dev/vg-docker/data --storage-opt dm.metadatadev=/dev/vg-docker/metadata &
[1] 2163
[root@ip-10-0-0-75 centos]# INFO[0000] Listening for HTTP on unix (/var/run/docker.sock)
INFO[0027] Option DefaultDriver: bridge
INFO[0027] Option DefaultNetwork: bridge
<-- output truncated -->
INFO[0027] Daemon has completed initialization
INFO[0027] Docker daemon commit=1b09a95 graphdriver=aufs version=1.11.0-dev
```
``` It is also possible to set the `--storage-driver` and `--storage-opt` flags
thin_pool_autoextend_percent = 20 in the Docker config file and start the daemon normally using the `service` or
``` `systemd` commands.
11. Check your work, your `docker-thinpool.profile` file should appear similar to the following: 8. Use the `docker info` command to verify that the daemon is using `data` and
`metadata` devices you created.
An example `/etc/lvm/profile/docker-thinpool.profile` file: ```bash
$ sudo docker info
``` INFO[0180] GET /v1.20/info
activation { Containers: 0
thin_pool_autoextend_threshold=80 Images: 0
thin_pool_autoextend_percent=20 Storage Driver: devicemapper
} Pool Name: docker-202:1-1032-pool
``` Pool Blocksize: 65.54 kB
Backing Filesystem: xfs
12. Apply your new lvm profile Data file: /dev/vg-docker/data
Metadata file: /dev/vg-docker/metadata
```bash [...]
$ lvchange --metadataprofile docker-thinpool docker/thinpool ```
```
13. Verify the `lv` is monitored.
```bash
$ lvs -o+seg_monitor
```
14. If Engine was previously started, clear your graph driver directory.
Clearing your graph driver removes any images and containers in your Docker
installation.
```bash
$ rm -rf /var/lib/docker/*
```
14. Configure the Engine daemon with specific devicemapper options.
There are two ways to do this. You can set options on the commmand line if you start the daemon there:
```bash
--storage-driver=devicemapper --storage-opt=dm.thinpooldev=/dev/mapper/docker-thinpool --storage-opt dm.use_deferred_removal=true
```
You can also set them for startup in the `daemon.json` configuration, for example:
```json
{
"storage-driver": "devicemapper",
"storage-opts": [
"dm.thinpooldev=/dev/mapper/docker-thinpool",
"dm.use_deferred_removal=true"
]
}
```
15. Start the Engine daemon.
```bash
$ systemctl start docker
```
After you start the Engine daemon, ensure you monitor your thin pool and volume
group free space. While the volume group will auto-extend, it can still fill
up. To monitor logical volumes, use `lvs` without options or `lvs -a` to see tha
data and metadata sizes. To monitor volume group free space, use the `vgs` command.
Logs can show the auto-extension of the thin pool when it hits the threshold, to
view the logs use:
```bash
journalctl -fu dm-event.service
```
If you run into repeated problems with thin pool, you can use the
`dm.min_free_space` option to tune the Engine behavior. This value ensures that
operations fail with a warning when the free space is at or near the minimum.
For information, see <a
href="https://docs.docker.com/engine/reference/commandline/dockerd/#storage-driver-options"
target="_blank">the storage driver options in the Engine daemon reference</a>.
The output of the command above shows the storage driver as `devicemapper`.
The last two lines also confirm that the correct devices are being used for
the `Data file` and the `Metadata file`.
### Examine devicemapper structures on the host ### Examine devicemapper structures on the host
You can use the `lsblk` command to see the device files created above and the You can use the `lsblk` command to see the device files created above and the
`pool` that the `devicemapper` storage driver creates on top of them. `pool` that the `devicemapper` storage driver creates on top of them.
$ sudo lsblk ```bash
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT $ sudo lsblk
xvda 202:0 0 8G 0 disk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
└─xvda1 202:1 0 8G 0 part / xvda 202:0 0 8G 0 disk
xvdf 202:80 0 10G 0 disk └─xvda1 202:1 0 8G 0 part /
├─vg--docker-data 253:0 0 90G 0 lvm xvdf 202:80 0 10G 0 disk
│ └─docker-202:1-1032-pool 253:2 0 10G 0 dm ├─vg--docker-data 253:0 0 90G 0 lvm
└─vg--docker-metadata 253:1 0 4G 0 lvm │ └─docker-202:1-1032-pool 253:2 0 10G 0 dm
└─docker-202:1-1032-pool 253:2 0 10G 0 dm └─vg--docker-metadata 253:1 0 4G 0 lvm
└─docker-202:1-1032-pool 253:2 0 10G 0 dm
```
The diagram below shows the image from prior examples updated with the detail The diagram below shows the image from prior examples updated with the detail
from the `lsblk` command above. from the `lsblk` command above.
@ -379,8 +345,8 @@ from the `lsblk` command above.
![](http://farm1.staticflickr.com/703/22116692899_0471e5e160_b.jpg) ![](http://farm1.staticflickr.com/703/22116692899_0471e5e160_b.jpg)
In the diagram, the pool is named `Docker-202:1-1032-pool` and spans the `data` In the diagram, the pool is named `Docker-202:1-1032-pool` and spans the `data`
and `metadata` devices created earlier. The `devicemapper` constructs the pool and `metadata` devices created earlier. The `devicemapper` constructs the pool
name as follows: name as follows:
``` ```
Docker-MAJ:MIN-INO-pool Docker-MAJ:MIN-INO-pool
@ -440,18 +406,18 @@ Logging Driver: json-file
[...] [...]
``` ```
The `Data Space` values show that the pool is 100GiB total. This example extends the pool to 200GiB. The `Data Space` values show that the pool is 100GB total. This example extends the pool to 200GB.
1. List the sizes of the devices. 1. List the sizes of the devices.
```bash ```bash
$ sudo ls -lh /var/lib/docker/devicemapper/devicemapper/ $ sudo ls -lh /var/lib/docker/devicemapper/devicemapper/
total 1.2G total 1175492
-rw------- 1 root root 100G Apr 14 08:47 data -rw------- 1 root root 100G Mar 30 05:22 data
-rw------- 1 root root 2.0G Apr 19 13:27 metadata -rw------- 1 root root 2.0G Mar 31 11:17 metadata
``` ```
2. Truncate `data` file to 200GiB. 2. Truncate `data` file to the size of the `metadata` file (approximage 200GB).
```bash ```bash
$ sudo truncate -s 214748364800 /var/lib/docker/devicemapper/devicemapper/data $ sudo truncate -s 214748364800 /var/lib/docker/devicemapper/devicemapper/data
@ -460,10 +426,12 @@ The `Data Space` values show that the pool is 100GiB total. This example extends
3. Verify the file size changed. 3. Verify the file size changed.
```bash ```bash
$ sudo ls -lh /var/lib/docker/devicemapper/devicemapper/ $ sudo ls -al /var/lib/docker/devicemapper/devicemapper/
total 1.2G total 1175492
-rw------- 1 root root 200G Apr 14 08:47 data drwx------ 2 root root 4096 Mar 29 02:45 .
-rw------- 1 root root 2.0G Apr 19 13:27 metadata drwx------ 5 root root 4096 Mar 29 02:48 ..
-rw------- 1 root root 214748364800 Mar 31 11:20 data
-rw------- 1 root root 2147483648 Mar 31 11:17 metadata
``` ```
4. Reload data loop device 4. Reload data loop device
@ -480,19 +448,19 @@ The `Data Space` values show that the pool is 100GiB total. This example extends
a. Get the pool name first. a. Get the pool name first.
$ sudo dmsetup status | grep pool $ sudo dmsetup status docker-8:1-123141-pool: 0 209715200 thin-pool 91
docker-8:1-123141-pool: 0 209715200 thin-pool 91 422/524288 18338/1638400 - rw discard_passdown queue_if_no_space - 422/524288 18338/1638400 - rw discard_passdown queue_if_no_space -
The name is the string before the colon. The name is the string before the colon.
b. Dump the device mapper table first. b. Dump the device mapper table first.
$ sudo dmsetup table docker-8:1-123141-pool $ sudo dmsetup table docker-8:1-123141-pool
0 209715200 thin-pool 7:1 7:0 128 32768 1 skip_block_zeroing 0 209715200 thin-pool 7:1 7:0 128 32768 1 skip_block_zeroing
c. Calculate the real total sectors of the thin pool now. c. Calculate the real total sectors of the thin pool now.
Change the second number of the table info (i.e. the number of sectors) to reflect the new number of 512 byte sectors in the disk. For example, as the new loop size is 200GiB, change the second number to 419430400. Change the second number of the table info (i.e. the disk end sector) to reflect the new number of 512 byte sectors in the disk. For example, as the new loop size is 200GB, change the second number to 419430400.
d. Reload the thin pool with the new sector number d. Reload the thin pool with the new sector number
@ -514,7 +482,7 @@ $ ./device_tool resize 200GB
### For a direct-lvm mode configuration ### For a direct-lvm mode configuration
In this example, you extend the capacity of a running device that uses the In this example, you extend the capacity of a running device that uses the
`direct-lvm` configuration. This example assumes you are using the `/dev/sdh1` `direct-lvm` configuration. This example assumes you are using the `/dev/sdh1`
disk partition. disk partition.
1. Extend the volume group (VG) `vg-docker`. 1. Extend the volume group (VG) `vg-docker`.
@ -550,7 +518,7 @@ disk partition.
c. Calculate the real total sectors of the thin pool now. we can use `blockdev` to get the real size of data lv. c. Calculate the real total sectors of the thin pool now. we can use `blockdev` to get the real size of data lv.
Change the second number of the table info (i.e. the number of sectors) to Change the second number of the table info (i.e. the disk end sector) to
reflect the new number of 512 byte sectors in the disk. For example, as the reflect the new number of 512 byte sectors in the disk. For example, as the
new data `lv` size is `264132100096` bytes, change the second number to new data `lv` size is `264132100096` bytes, change the second number to
`515883008`. `515883008`.
@ -562,7 +530,6 @@ disk partition.
$ sudo dmsetup suspend docker-253:17-1835016-pool && sudo dmsetup reload docker-253:17-1835016-pool --table '0 515883008 thin-pool 252:0 252:1 128 32768 1 skip_block_zeroing' && sudo dmsetup resume docker-253:17-1835016-pool $ sudo dmsetup suspend docker-253:17-1835016-pool && sudo dmsetup reload docker-253:17-1835016-pool --table '0 515883008 thin-pool 252:0 252:1 128 32768 1 skip_block_zeroing' && sudo dmsetup resume docker-253:17-1835016-pool
## Device Mapper and Docker performance ## Device Mapper and Docker performance
It is important to understand the impact that allocate-on-demand and It is important to understand the impact that allocate-on-demand and