1
0
Fork 0
mirror of https://github.com/moby/moby.git synced 2022-11-09 12:21:53 -05:00
Commit graph

74 commits

Author SHA1 Message Date
Akihiro Suda
3deac5dc85
btrfs: annotate error with human-readable hint string
Add hints for "Failed to destroy btrfs snapshot <DIR> for <ID>: operation not permitted" on rootless

Related to issue 41762

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-07-27 15:45:02 +09:00
Michal Rostecki
1ec689c4c2 btrfs: Do not disable quota on cleanup
Before this change, cleanup of the btrfs driver (occuring on each daemon
shutdown) resulted in disabling quotas. It was done with an assumption
that quotas can be enabled or disabled on a subvolume level, which is
not true - enabling or disabling quota is always done on a filesystem
level.

That was leading to disabling quota on btrfs filesystems on each daemon
shutdown.

This change fixes that behavior and removes misleading `subvol` prefix
from functions and methods which set up quota (on a filesystem level).

Fixes: 
Fixes: 401c8d1767 ("Add disk quota support for btrfs")
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
2021-04-13 16:23:39 +01:00
Akihiro Suda
62b5194f62
btrfs: Allow unprivileged user to delete subvolumes (kernel >= 4.18)
Fix issue 41762

Cherry-pick "drivers: btrfs: Allow unprivileged user to delete subvolumes" from containers/storage
831e32b6bd

> In btrfs, subvolume can be deleted by IOC_SNAP_DESTROY ioctl but there
> is one catch: unprivileged IOC_SNAP_DESTROY call is restricted by default.
>
> This is because IOC_SNAP_DESTROY only performs permission checks on
> the top directory(subvolume) and unprivileged user might delete dirs/files
> which cannot be deleted otherwise. This restriction can be relaxed if
> user_subvol_rm_allowed mount option is used.
>
> Although the above ioctl had been the only way to delete a subvolume,
> btrfs now allows deletion of subvolume just like regular directory
> (i.e. rmdir sycall) since kernel 4.18.
>
> So if we fail to cleanup subvolume in subvolDelete(), just fallback to
> system.EnsureRmoveall() to try to cleanup subvolumes again.
> (Note: quota needs privilege, so if quota is enabled we do not fallback)
>
> This fix will allow non-privileged container works with btrfs backend.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-03-26 14:30:40 +09:00
Brian Goff
7f5e39bd4f
Use real root with 0701 perms
Various dirs in /var/lib/docker contain data that needs to be mounted
into a container. For this reason, these dirs are set to be owned by the
remapped root user, otherwise there can be permissions issues.
However, this uneccessarily exposes these dirs to an unprivileged user
on the host.

Instead, set the ownership of these dirs to the real root (or rather the
UID/GID of dockerd) with 0701 permissions, which allows the remapped
root to enter the directories but not read/write to them.
The remapped root needs to enter these dirs so the container's rootfs
can be configured... e.g. to mount /etc/resolve.conf.

This prevents an unprivileged user from having read/write access to
these dirs on the host.
The flip side of this is now any user can enter these directories.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit e908cc3901)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-02-02 13:01:25 +01:00
Kir Kolyshkin
39048cf656 Really switch to moby/sys/mount*
Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential
outside users.

This commit was generated by the following bash script:

```
set -e -u -o pipefail

for file in $(git grep -l 'docker/docker/pkg/mount"' | grep -v ^pkg/mount); do
	sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \
		-e 's#mount\.\(GetMounts\|Mounted\|Info\|[A-Za-z]*Filter\)#mountinfo.\1#g' \
		$file
	goimports -w $file
done
```

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-20 09:46:25 -07:00
Sebastiaan van Stijn
ec4bc83258
daemon/graphdriver: normalize comment formatting
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-11-27 15:43:23 +01:00
Sebastiaan van Stijn
cba180cac9
graphdriver/btrfs: SA4003: no value of type uint64 is less than 0 (staticcheck)
```
daemon/graphdriver/btrfs/btrfs.go:609:5: SA4003: no value of type uint64 is less than 0 (staticcheck)
	if driver.options.size <= 0 {
	   ^
```

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-10-18 00:45:39 +02:00
Sebastiaan van Stijn
07ff4f1de8
goimports: fix imports
Format the source according to latest goimports.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-09-18 12:56:54 +02:00
Eli Uriegas
e665263b10
daemon: Remove btrfs_noversion build flag
btrfs_noversion was added in d7c37b5a28
for distributions that did not have the `btrfs/version.h` header file.

Seeing how all of the distributions we currently support do have the
`btrfs/version.h` file we should probably just remove this build flag
altogether.

Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>
2019-08-06 22:55:29 +00:00
Kir Kolyshkin
6533136961 pkg/mount: wrap mount/umount errors
The errors returned from Mount and Unmount functions are raw
syscall.Errno errors (like EPERM or EINVAL), which provides
no context about what has happened and why.

Similar to os.PathError type, introduce mount.Error type
with some context. The error messages will now look like this:

> mount /tmp/mount-tests/source:/tmp/mount-tests/target, flags: 0x1001: operation not permitted

or

> mount tmpfs:/tmp/mount-test-source-516297835: operation not permitted

Before this patch, it was just

> operation not permitted

[v2: add Cause()]
[v3: rename MountError to Error, document Cause()]
[v4: fixes; audited all users]
[v5: make Error type private; changes after @cpuguy83 reviews]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2018-12-10 20:07:02 -08:00
Kir Kolyshkin
16d822bba8 btrfs: ensure graphdriver home is bind mount
For some reason, shared mount propagation between the host
and a container does not work for btrfs, unless container
root directory (i.e. graphdriver home) is a bind mount.

The above issue was reproduced on SLES 12sp3 + btrfs using
the following script:

	#!/bin/bash
	set -eux -o pipefail

	# DIR should not be under a subvolume
	DIR=${DIR:-/lib}
	MNT=$DIR/my-mnt
	FILE=$MNT/file

	ID=$(docker run -d --privileged -v $DIR:$DIR:rshared ubuntu sleep 24h)
	docker exec $ID mkdir -p $MNT
	docker exec $ID mount -t tmpfs tmpfs $MNT
	docker exec $ID touch $FILE
	ls -l $FILE
	umount $MNT
	docker rm -f $ID

which fails this way:

	+ ls -l /lib/my-mnt/file
	ls: cannot access '/lib/my-mnt/file': No such file or directory

meaning the mount performed inside a priviledged container is not
propagated back to the host (even if all the mounts have "shared"
propagation mode).

The remedy to the above is to make graphdriver home a bind mount.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2018-10-11 23:45:00 -07:00
Salahuddin Khan
763d839261 Add ADD/COPY --chown flag support to Windows
This implements chown support on Windows. Built-in accounts as well
as accounts included in the SAM database of the container are supported.

NOTE: IDPair is now named Identity and IDMappings is now named
IdentityMapping.

The following are valid examples:
ADD --chown=Guest . <some directory>
COPY --chown=Administrator . <some directory>
COPY --chown=Guests . <some directory>
COPY --chown=ContainerUser . <some directory>

On Windows an owner is only granted the permission to read the security
descriptor and read/write the discretionary access control list. This
fix also grants read/write and execute permissions to the owner.

Signed-off-by: Salahuddin Khan <salah@docker.com>
2018-08-13 21:59:11 -07:00
Alejandro González Hevia
9392838150 Standardized log messages accross the different storage drivers.
Now all of the storage drivers use the field "storage-driver" in their log
messages, which is set to name of the respective driver.
Storage drivers changed:
- Aufs
- Btrfs
- Devicemapper
- Overlay
- Overlay 2
- Zfs

Signed-off-by: Alejandro GonzÃlez Hevia <alejandrgh11@gmail.com>
2018-03-27 14:37:30 +02:00
Yong Tang
742d4506bd Golint fix up
This fix fixes a golint issue.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2018-02-23 16:40:37 +00:00
Brian Goff
68c3201626
Merge pull request from cpuguy83/zfs_do_not_unmount
Do not recursive unmount on cleanup of zfs/btrfs
2018-02-14 09:49:17 -05:00
Brian Goff
2fe4f888be Do not recursive unmount on cleanup of zfs/btrfs
This was added in  just as a way to make sure the tree is fully
unmounted on shutdown.

For ZFS this could be a breaking change since there was no unmount before.
Someone could have setup the zfs tree themselves. It would be better, if
we really do want the cleanup to actually the unpacked layers checking
for mounts rather than a blind recursive unmount of the root.

BTRFS does not use mounts and does not need to unmount anyway.
These was only an unmount to begin with because for some reason the
btrfs tree was being moutned with `private` propagation.

For the other graphdrivers that still have a recursive unmount here...
these were already being unmounted and performing the recursive unmount
shouldn't break anything. If anyone had anything mounted at the
graphdriver location it would have been unmounted on shutdown anyway.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-02-07 15:08:17 -05:00
Daniel Nephin
4f0d95fa6e Add canonical import comment
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-05 16:51:57 -05:00
Brian Goff
9803272f2d Do not make graphdriver homes private mounts.
The idea behind making the graphdrivers private is to prevent leaking
mounts into other namespaces.
Unfortunately this is not really what happens.

There is one case where this does work, and that is when the namespace
was created before the daemon's namespace.
However with systemd each system servie winds up with it's own mount
namespace. This causes a race betwen daemon startup and other system
services as to if the mount is actually private.

This also means there is a negative impact when other system services
are started while the daemon is running.

Basically there are too many things that the daemon does not have
control over (nor should it) to be able to protect against these kinds
of leakages. One thing is certain, setting the graphdriver roots to
private disconnects the mount ns heirarchy preventing propagation of
unmounts... new mounts are of course not propagated either, but the
behavior is racey (or just bad in the case of restarting services)... so
it's better to just be able to keep mount propagation in tact.

It also does not protect situations like `-v
/var/lib/docker:/var/lib/docker` where all mounts are recursively bound
into the container anyway.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2018-01-18 09:34:00 -05:00
Sebastiaan van Stijn
b4a6313969
Golint: remove redundant ifs
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-01-15 00:42:25 +01:00
Sebastiaan van Stijn
f9c8fa305e
Perform fsmagic detection on driver's home-dir if it exists
The fsmagic check was always performed on "data-root" (`/var/lib/docker`),
not on the storage-driver's home directory (e.g. `/var/lib/docker/<somedriver>`).

This caused detection to be done on the wrong filesystem in situations
where `/var/lib/docker/<somedriver>` was a mount, and a different
filesystem than `/var/lib/docker` itself.

This patch checks if the storage-driver's home directory exists, and only
falls back to `/var/lib/docker` if it doesn't exist.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2017-12-04 17:10:07 -08:00
Sebastiaan van Stijn
38b3af567f
Remove deprecated MkdirAllAs(), MkdirAs()
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2017-11-21 13:53:54 +01:00
Akash Gupta
7a7357dae1 LCOW: Implemented support for docker cp + build
This enables docker cp and ADD/COPY docker build support for LCOW.
Originally, the graphdriver.Get() interface returned a local path
to the container root filesystem. This does not work for LCOW, so
the Get() method now returns an interface that LCOW implements to
support copying to and from the container.

Signed-off-by: Akash Gupta <akagup@microsoft.com>
2017-09-14 12:07:52 -07:00
Derek McGowan
1009e6a40b
Update logrus to v1.0.1
Fixes case sensitivity issue

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2017-07-31 13:16:46 -07:00
Christopher Jones
069fdc8a08
[project] change syscall to /x/sys/unix|windows
Changes most references of syscall to golang.org/x/sys/
Ones aren't changes include, Errno, Signal and SysProcAttr
as they haven't been implemented in /x/sys/.

Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>

[s390x] switch utsname from unsigned to signed

per 33267e036f
char in s390x in the /x/sys/unix package is now signed, so
change the buildtags

Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>
2017-07-11 08:00:32 -04:00
Yong Tang
16328cc207 Persist the quota size for btrfs so that daemon restart keeps quota
This commit is an extension of fix for 29325 based on the review comment.
In this commit, the quota size for btrfs is kept in `/var/lib/docker/btrfs/quotas`
so that a daemon restart keeps quota.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2017-06-01 21:15:51 -07:00
Yong Tang
e907c6418a Remove btrfs quota groups after containers destroyed
This fix tries to address the issue raised in 29325 where
btrfs quota groups are not clean up even after containers
have been destroyed.

The reason for the issue is that btrfs quota groups have
to be explicitly destroyed. This fix fixes this issue.

This fix is tested manually in Ubuntu 16.04,
with steps specified in 29325.

This fix fixes 29325.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2017-06-01 20:24:26 -07:00
Brian Goff
54dcbab25e Do not remove containers from memory on error
Before this, if `forceRemove` is set the container data will be removed
no matter what, including if there are issues with removing container
on-disk state (rw layer, container root).

In practice this causes a lot of issues with leaked data sitting on
disk that users are not able to clean up themselves.
This is particularly a problem while the `EBUSY` errors on remove are so
prevalent. So for now let's not keep this behavior.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2017-05-05 17:02:04 -04:00
Antonio Murdaca
abbbf91498
Switch to using opencontainers/selinux for selinux bindings
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-04-24 21:29:47 +02:00
Yong Tang
b36e613d9f Run btrfs rescan only if userDiskQuota is enabled
This fix tries to address the issue raised in 29810
where btrfs subvolume removal failed when docker
is in an unprivileged lxc container. The failure
was caused by `Failed to rescan btrfs quota` with
`operation not permitted`.

However, if disk quota is not enabled, there is no
need to run a btrfs rescan at the first place.

This fix checks for `quotaEnabled` and only run btrfs
rescan if `quotaEnabled` is true.

This fix fixes 29810.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2017-01-05 05:18:11 -08:00
wefine
f78f7de96a fix t.Errorf to t.Error in serveral _test.go
Signed-off-by: wefine <wang.xiaoren@zte.com.cn>
2016-11-14 17:54:43 +08:00
Vivek Goyal
b937aa8e69 Pass all graphdriver create() parameters in a struct
This allows for easy extension of adding more parameters to existing
parameters list. Otherwise adding a single parameter changes code
at so many places.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2016-11-09 15:59:58 -05:00
Zhu Guihua
401c8d1767 Add disk quota support for btrfs
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
2016-05-05 14:35:13 +08:00
John Howard
fec6cd2eb9 Merge pull request from Microsoft/sjw/update-graphdriver-create
Adding readOnly parameter to graphdriver Create method
2016-04-08 20:44:03 -07:00
Stefan J. Wernli
ef5bfad321 Adding readOnly parameter to graphdriver Create method
Since the layer store was introduced, the level above the graphdriver
now differentiates between read/write and read-only layers.  This
distinction is useful for graphdrivers that need to take special steps
when creating a layer based on whether it is read-only or not.
Adding this parameter allows the graphdrivers to differentiate, which
in the case of the Windows graphdriver, removes our dependence on parsing
the id of the parent for "-init" in order to infer this information.

This will also set the stage for unblocking some of the layer store
unit tests in the next preview build of Windows.

Signed-off-by: Stefan J. Wernli <swernli@microsoft.com>
2016-04-06 13:52:53 -07:00
Julio Montes
a038cccf88 Fix compilation errors with btrfs-progs-4.5
btrfs-progs-4.5 introduces device delete by devid
for this reason btrfs_ioctl_vol_args_v2's name was encapsulated
in a union

this patch is for setting btrfs_ioctl_vol_args_v2's name
using a C function in order to preserve compatibility
with all btrfs-progs versions

Signed-off-by: Julio Montes <imc.coder@gmail.com>
2016-04-01 08:58:29 -06:00
Shishir Mahajan
b16decfccf CLI flag for docker create(run) to change block device size.
Signed-off-by: Shishir Mahajan <shishir.mahajan@redhat.com>
2016-03-28 10:05:18 -04:00
Kai Qiang Wu(Kennan)
c33cdf9ee3 Fix the typo
Signed-off-by: Kai Qiang Wu(Kennan) <wkqwu@cn.ibm.com>
2016-02-16 07:00:01 +00:00
Liu Bo
b2e27fee53 Graphdriver/btrfs: Avoid using single d.Get()
For btrfs driver, in d.Create(), Get() of parentDir is called but not followed
by Put().

If we apply SElinux mount label, we need to mount btrfs subvolumes in d.Get(),
without a Put() would end up with a later Remove() failure on
"Device resourse is busy".

This calls the subvolume helper function directly in d.Create().

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
2016-02-04 10:25:24 -08:00
Kai Qiang Wu(Kennan)
feda5d7684 Make btrfs call same interface as others
Most storage drivers call graphdriver.GetFSMagic(home),
it is more clean to easy to maintain. So btrfs need to
adopt such change.

Signed-off-by: Kai Qiang Wu(Kennan) <wkqwu@cn.ibm.com>
2016-02-01 07:50:21 +00:00
Phil Estes
72e65e8793 Fix btrfs subvolume snapshot dir perms for user namespaces
Make sure btrfs mounted subvolumes are owned properly when a remapped
root exists (user namespaces are enabled, for example)

Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
2016-01-07 23:05:28 -05:00
Shijiang Wei
de7f6cf16b ingnore the NotExist error when removing inexistent files
Signed-off-by: Shijiang Wei <mountkin@gmail.com>
2015-12-25 15:19:48 +08:00
Vincent Batts
f57d56350e Merge pull request from cpuguy83/fix_btrfs_subvol_delete_panic
Fix btrfs recursive btrfs subvol delete
2015-12-16 14:26:40 -05:00
Brian Goff
f9befce2d3 Fix btrfs recursive btrfs subvol delete
Really fixing 2 things:

1. Panic when any error is detected while walking the btrfs graph dir on
removal due to no error check.
2. Nested subvolumes weren't actually being removed due to passing in
the wrong path

On point 2, for a path detected as a nested subvolume, we were calling
`subvolDelete("/path/to/subvol", "subvol")`, where the last part of the
path was duplicated due to a logic error, and as such actually causing
point  since `subvolDelete` joins the two arguemtns, and
`/path/to/subvol/subvol` (the joined version) doesn't exist.

Also adds a test for nested subvol delete.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2015-12-15 18:12:40 -05:00
Justas Brazauskas
927b334ebf Fix typos found across repository
Signed-off-by: Justas Brazauskas <brazauskasjustas@gmail.com>
2015-12-13 18:04:12 +02:00
Dan Walsh
1716d497a4 Relabel BTRFS Content on container Creation
This change will allow us to run SELinux in a container with
BTRFS back end.  We continue to work on fixing the kernel/BTRFS
but this change will allow SELinux Security separation on BTRFS.

It basically relabels the content on container creation.

Just relabling -init directory in BTRFS use case. Everything looks like it
works. I don't believe tar/achive stores the SELinux labels, so we are good
as far as docker commit.

Tested Speed on startup with BTRFS on top of loopback directory. BTRFS
not on loopback should get even better perfomance on startup time.  The
more inodes inside of the container image will increase the relabel time.

This patch will give people who care more about security the option of
runnin BTRFS with SELinux.  Those who don't want to take the slow down
can disable SELinux either in individual containers or for all containers
by continuing to disable SELinux in the daemon.

Without relabel:

> time docker run --security-opt label:disable fedora echo test
test

real    0m0.918s
user    0m0.009s
sys    0m0.026s

With Relabel

test

real    0m1.942s
user    0m0.007s
sys    0m0.030s

Signed-off-by: Dan Walsh <dwalsh@redhat.com>

Signed-off-by: Dan Walsh <dwalsh@redhat.com>
2015-11-11 14:49:27 -05:00
Phil Estes
442b45628e Add user namespace (mapping) support to the Docker engine
Adds support for the daemon to handle user namespace maps as a
per-daemon setting.

Support for handling uid/gid mapping is added to the builder,
archive/unarchive packages and functions, all graphdrivers (except
Windows), and the test suite is updated to handle user namespace daemon
rootgraph changes.

Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
2015-10-09 17:47:37 -04:00
Chun Chen
2458452a3b Try to resize data and metadata loopback file when initiating devicemapper
Signed-off-by: Chun Chen <ramichen@tencent.com>
2015-09-24 09:31:00 +08:00
Jessica Frazelle
bd06432ba3 cleanup and fix btrfs subvolume recursion deletion
Signed-off-by: Jessica Frazelle <acidburn@docker.com>
2015-08-25 13:00:41 -07:00
Ma Shimiao
dea78fc2ce fix 9939: docker does not remove btrfs subvolumes when destroying container
Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2015-08-24 14:52:07 -07:00
Srini Brahmaroutu
22873eae31 fix unit test breakage due to lint changes
Addresses 

Signed-off-by: Srini Brahmaroutu <srbrahma@us.ibm.com>
2015-07-31 00:22:28 +00:00