The implementation in libcontainer/system is quite complicated,
and we only use it to detect if user-namespaces are enabled.
In addition, the implementation in containerd uses a sync.Once,
so that detection (and reading/parsing `/proc/self/uid_map`) is
only performed once.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Switch to moby/sys/mount and mountinfo. Keep the pkg/mount for potential
outside users.
This commit was generated by the following bash script:
```
set -e -u -o pipefail
for file in $(git grep -l 'docker/docker/pkg/mount"' | grep -v ^pkg/mount); do
sed -i -e 's#/docker/docker/pkg/mount"#/moby/sys/mount"#' \
-e 's#mount\.\(GetMounts\|Mounted\|Info\|[A-Za-z]*Filter\)#mountinfo.\1#g' \
$file
goimports -w $file
done
```
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
With `rprivate` there exists a race where a reference to a mount has
propagated to the new namespace, when `rprivate` is set the parent
namespace is not able to remove the mount due to that reference.
With `rslave` unmounts will propagate correctly into the namespace and
prevent the sort of transient errors that are possible with `rprivate`.
This is a similar fix to 117c92745b
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Changes most references of syscall to golang.org/x/sys/
Ones aren't changes include, Errno, Signal and SysProcAttr
as they haven't been implemented in /x/sys/.
Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>
[s390x] switch utsname from unsigned to signed
per 33267e036f
char in s390x in the /x/sys/unix package is now signed, so
change the buildtags
Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>
In some cases, attempting to `docker cp` to a container's volume dir
would fail due to the volume mounts not existing after performing a
bind-mount on the container path prior to doing a pivot_root.
This does not seem to be effecting all systems, but was found to be a
problem on centos.
The solution is to use an `rbind` rather than `bind` so that any
existing mounts are carried over.
The `MakePrivate` on `path` is no longer neccessary since we are already
doing `MakeRPrivate` on `/`.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
If parent of the destination path is shared, this
path will be unmounted from the parent ns even if
the path itself is private.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
The namespace unshare+pivot root is not possible when running inside a
user namespace, so fallback to the original "real" chroot code.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
When pivot_root fails we need to unmount the bind mounted path we
previously mounted in preparation for pivot_root.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
The path we're trying to remove doesn't exist after a successful
chroot+chdir because a / is only appended after pivot_root is
successful and so we can't cleanup anymore with the old path.
Also fix leaking .pivot_root dirs under /var/lib/docker/tmp/docker-builder*
on error.
Fix https://github.com/docker/docker/issues/22587
Introduced by https://github.com/docker/docker/pull/22506
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
This fixes one issue with Docker running under a grsec kernel, which
denies chmod and mknod under chroot.
Note, if pivot_root fails it will still fallback to chroot.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>