This fix tries to fix logrus formatting by removing `f` from
`logrus.[Error|Warn|Debug|Fatal|Panic|Info]f` when formatting string
is not present.
Fixed issue #23459
Signed-off-by: Daehyeok Mun <daehyeok@gmail.com>
We just introduced a new tunable dm.xfs_nospace_max_retries. But this tunable
will work only on new kernels where xfs supports this feature. On older
kernels xfs does not allow tuning this behavior.
There are two issues. First one is that if xfsSetNospaceRetries() fails,
it returns error but leaves the device activated and mounted. We should
be unmounting the device and deactivate it before returning.
Second issue is, if docker is started on older kernel, with
dm.xfs_nospace_max_retries specified, then docker will silently ignore the
fact that /sys file to tweak this behavior is not present and will continue.
But I think it might be better to fail container creation/start if kernel
does not support this feature.
This patch fixes it. After this patch, user will get an error like following
when container is run.
# docker run -ti fedora bash
docker: Error response from daemon: devmapper: user specified daemon option dm.xfs_nospace_max_retries but it does not seem to be supported on this system :open /sys/fs/xfs/dm-5/error/metadata/ENOSPC/max_retries: no such file or directory.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
When xfs filesystem is being used on top of thin pool, xfs can get ENOSPC
errors from thin pool when thin pool is full. As of now xfs retries the
IO and keeps on retrying and does not give up. This can result in container
application being stuck for a very long time. In fact I have seen instances
of unkillable processes. So that means once thin pool is full and process
gets stuck, container can't be stopped/killed either and only option left
seems to be power recycle of the box.
In another instance, writer did not block but failed after a while. But
when I tried to exit/stop the container, unmounting xfs hanged and only
thing I could do was power cycle the machine.
Now upstream kernel has committed patches where it allows user space to
customize user space behavior in case of errors. One of the knobs is
max_retries, which specifies how many times an IO should be retried
when ENOSPC is encountered.
This patch sets provides a tunable knob (dm.xfs_nospace_max_retries) so
that user can specify value for max_retries and tune xfs behavior. If
one sets this value to 0, xfs will not retry IO when ENOSPC error is
encountered. It will instead give up and shutdown filesystem.
This knob can be useful if one is running into unkillable
processes/containers issue on top of xfs.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Problem Description:
An example scenario that involves deferred removal
1. A new base image gets created (e.g. 'docker load -i'). The base device is activated and
mounted at some point in time during image creation.
2. While image creation is in progress, a privileged container is started
from another image and the host's mount name space is shared with this
container ('docker run --privileged -v /:/host').
3. Image creation completes and the base device gets unmounted. However,
as the privileged container still holds a reference on the base image
mount point, the base device cannot be removed right away. So it gets
flagged for deferred removal.
4. Next, the privileged container terminates and thus its reference to the
base image mount point gets released. The base device (which is flagged
for deferred removal) may now be cleaned up by the device-mapper. This
opens up an opportunity for a race between a 'kworker' thread (executing
the do_deferred_remove() function) and the Docker daemon (executing the
CreateSnapDevice() function).
This PR cancel the deferred removal, if the device is marked for it. And reschedule the
deferred removal later after the device is resumed successfully.
Signed-off-by: Shishir Mahajan <shishir.mahajan@redhat.com>
This fix tries to fix logrus formatting by removing `f` from
`logrus.[Error|Warn|Debug|Fatal|Panic|Info]f` when formatting string
is not present.
This fix fixes#23459.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
device Base should not exists on failure:
--- FAIL: TestDevmapperCreateBase (0.06s)
graphtest_unix.go:122: stat
/tmp/docker-graphtest-079240530/devicemapper/mnt/Base/rootfs/a subdir:
no such file or directory
--- FAIL: TestDevmapperCreateSnap (0.00s)
graphtest_unix.go:219: devmapper: device Base already
exists.
it should be:
--- FAIL: TestDevmapperCreateBase (0.25s)
graphtest_unix.go:122: stat
/tmp/docker-graphtest-828994195/devicemapper/mnt/Base/rootfs/a subdir:
no such file or directory
--- FAIL: TestDevmapperCreateSnap (0.13s)
graphtest_unix.go:122: stat
/tmp/docker-graphtest-828994195/devicemapper/mnt/Snap/rootfs/a subdir:
no such file or directory
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
1) docker create / run / start: this would create a snapshot device and mounts it onto the filesystem.
So the first time GET operation is called. it will create the rootfs directory and return the path to rootfs
2) Now when I do docker commit. It will call the GET operation second time. This time the refcount will check
that the count > 1 (count=2). so the rootfs already exists, it will just return the path to rootfs.
Earlier it was just returning the mp: /var/lib/docker/devicemapper/mnt/{ID} and hence the inconsistent paths error.
Signed-off-by: Shishir Mahajan <shishir.mahajan@redhat.com>
For things that we can check if they are mounted by using their fsmagic
we should use that and for others do it the slow way.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This makes sure fsdiff doesn't try to unmount things that shouldn't be.
**Note**: This is intended as a temporary solution to have as minor a
change as possible for 1.11.1. A bigger change will be required in order
to support container re-attach.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Right now there is no way to know what's the minimum free space threshold
daemon is applying. It would be good to export it through docker info and
then user knows what's the current value. Also this could be useful to
higher level management tools which can look at this value and setup their
own internal thresholds for image garbage collection etc.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Since the layer store was introduced, the level above the graphdriver
now differentiates between read/write and read-only layers. This
distinction is useful for graphdrivers that need to take special steps
when creating a layer based on whether it is read-only or not.
Adding this parameter allows the graphdrivers to differentiate, which
in the case of the Windows graphdriver, removes our dependence on parsing
the id of the parent for "-init" in order to infer this information.
This will also set the stage for unblocking some of the layer store
unit tests in the next preview build of Windows.
Signed-off-by: Stefan J. Wernli <swernli@microsoft.com>
Instead of implementing refcounts at each graphdriver, implement this in
the layer package which is what the engine actually interacts with now.
This means interacting directly with the graphdriver is no longer
explicitly safe with regard to Get/Put calls being refcounted.
In addition, with the containerd, layers may still be mounted after
a daemon restart since we will no longer explicitly kill containers when
we shutdown or startup engine.
Because of this ref counts would need to be repopulated.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Instead of implementing refcounts at each graphdriver, implement this in
the layer package which is what the engine actually interacts with now.
This means interacting directly with the graphdriver is no longer
explicitly safe with regard to Get/Put calls being refcounted.
In addition, with the containerd, layers may still be mounted after
a daemon restart since we will no longer explicitly kill containers when
we shutdown or startup engine.
Because of this ref counts would need to be repopulated.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Now what we provide dynamic binaries for all plaforms,
we shouldn't try to run docker without udev sync support.
This change changes the previous warning to an Error,
unless the user explicitly overrides the warning, in
which case they're at their own risk.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Once thin pool gets full, bad things can happen. Especially in case of xfs
it is possible that xfs keeps on retrying IO infinitely (for certain kind
of IO) and container hangs.
One way to mitigate the problem is that once thin pool is about to get full,
start failing some of the docker operations like pulling new images or
creation of new containers. That way user will get warning ahead of time
and can try to rectify it by creating more free space in thin pool. This
can be done either by deleting existing images/containers or by adding more
free space to thin pool.
This patch adds a new option dm.min_free_space to devicemapper graph
driver. Say one specifies dm.min_free_space=10%. This means atleast
10% of data and metadata blocks should be free in pool before new device
creation is allowed, otherwise operation will fail.
By default min_free_space is 10%. User can change it by specifying
dm.min_free_space=X% on command line. A value of 0% will disable the
check.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Right now if somebody has enabled deferred device deletion, then
deleteTransaction() returns success even if device could not be deleted. It
has been marked for deferred deletion. Right now we will mark device ID free
and potentially use it again when somebody tries to create new container. And
that's wrong. Device ID is not free yet. It will become free once devices
has actually been deleted by the goroutine later.
So move the location of call to markDeviceIDFree() to a place where we know
device actually got deleted and was not marked for deferred deletion.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
The loopback logic is not technically exclusive to the devicemapper
driver. This reorganizes the code such that the loopback code is usable
outside of the devicemapper package and driver.
Signed-off-by: Vincent Batts <vbatts@redhat.com>
After the very first init of the graph `docker info` correctly shows the
base fs type under `Backing Filesystem`. This information isn't stored
anywhere. After a restart (w/o erasing `/var/lib/docker`) `docker info`
shows an empty string under `Backing Filesystem`.
This patch records the base fs type after the first run in the metadata
or, to fix old devices that don't have this info in the metadata, just
probe the fs type of the base device at graph startup.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>