Movified from 686be57d0a, and re-ran
gofmt again to address for files not present in 20.10 and vice-versa.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 686be57d0a)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
libcontainer does not guarantee a stable API, and is not intended
for external consumers.
this patch replaces some uses of libcontainer/cgroups with
containerd/cgroups.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
dockerd currently sets the oom-score-adjust itself. This functionality
was added when we did not yet run dockerd as a systemd service.
Now that we do, it's better to instead have systemd handle this.
Keeping the option itself for situations where dockerd is started
manually or without using systemd.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Format the source according to latest goimports.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These options configure the parent cgroup, not the default for containers,
nor the daemon itself, so adding that information to the flag description
to make this slightly more clear.
relates to 56f77d5ade (#23430) which implemented
these flags.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This adds both a daemon-wide flag and a container creation property:
- Set the `CgroupnsMode: "host|private"` HostConfig property at
container creation time to control what cgroup namespace the container
is created in
- Set the `--default-cgroupns-mode=host|private` daemon flag to control
what cgroup namespace containers are created in by default
- Set the default if the daemon flag is unset to "host", for backward
compatibility
- Default to CgroupnsMode: "host" for client versions < 1.40
Signed-off-by: Rob Gulewich <rgulewich@netflix.com>
The `--rootless` flag had a couple of issues:
* #38702: euid=0, $USER="root" but no access to cgroup ("rootful" Docker in rootless Docker)
* #39009: euid=0 but $USER="docker" (rootful boot2docker)
To fix#38702, XDG dirs are ignored as in rootful Docker, unless the
dockerd is directly running under RootlessKit namespaces.
RootlessKit detection is implemented by checking whether `$ROOTLESSKIT_STATE_DIR` is set.
To fix#39009, the non-robust `$USER` check is now completely removed.
The entire logic can be illustrated as follows:
```
withRootlessKit := getenv("ROOTLESSKIT_STATE_DIR")
rootlessMode := withRootlessKit || cliFlag("--rootless")
honorXDG := withRootlessKit
useRootlessKitDockerProxy := withRootlessKit
removeCgroupSpec := rootlessMode
adjustOOMScoreAdj := rootlessMode
```
Close#39024Fix#38702#39009
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Please refer to `docs/rootless.md`.
TLDR:
* Make sure `/etc/subuid` and `/etc/subgid` contain the entry for you
* `dockerd-rootless.sh --experimental`
* `docker -H unix://$XDG_RUNTIME_DIR/docker.sock run ...`
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
The `--enable-api-cors` flag was deprecated in f3dd2db4ff,
and marked for removal in docker 17.09 through 85f92ef359.
This patch removes the deprecated flag.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Since the commit d88fe447df ("Add support for sharing /dev/shm/ and
/dev/mqueue between containers") container's /dev/shm is mounted on the
host first, then bind-mounted inside the container. This is done that
way in order to be able to share this container's IPC namespace
(and the /dev/shm mount point) with another container.
Unfortunately, this functionality breaks container checkpoint/restore
(even if IPC is not shared). Since /dev/shm is an external mount, its
contents is not saved by `criu checkpoint`, and so upon restore any
application that tries to access data under /dev/shm is severily
disappointed (which usually results in a fatal crash).
This commit solves the issue by introducing new IPC modes for containers
(in addition to 'host' and 'container:ID'). The new modes are:
- 'shareable': enables sharing this container's IPC with others
(this used to be the implicit default);
- 'private': disables sharing this container's IPC.
In 'private' mode, container's /dev/shm is truly mounted inside the
container, without any bind-mounting from the host, which solves the
issue.
While at it, let's also implement 'none' mode. The motivation, as
eloquently put by Justin Cormack, is:
> I wondered a while back about having a none shm mode, as currently it is
> not possible to have a totally unwriteable container as there is always
> a /dev/shm writeable mount. It is a bit of a niche case (and clearly
> should never be allowed to be daemon default) but it would be trivial to
> add now so maybe we should...
...so here's yet yet another mode:
- 'none': no /dev/shm mount inside the container (though it still
has its own private IPC namespace).
Now, to ultimately solve the abovementioned checkpoint/restore issue, we'd
need to make 'private' the default mode, but unfortunately it breaks the
backward compatibility. So, let's make the default container IPC mode
per-daemon configurable (with the built-in default set to 'shareable'
for now). The default can be changed either via a daemon CLI option
(--default-shm-mode) or a daemon.json configuration file parameter
of the same name.
Note one can only set either 'shareable' or 'private' IPC modes as a
daemon default (i.e. in this context 'host', 'container', or 'none'
do not make much sense).
Some other changes this patch introduces are:
1. A mount for /dev/shm is added to default OCI Linux spec.
2. IpcMode.Valid() is simplified to remove duplicated code that parsed
'container:ID' form. Note the old version used to check that ID does
not contain a semicolon -- this is no longer the case (tests are
modified accordingly). The motivation is we should either do a
proper check for container ID validity, or don't check it at all
(since it is checked in other places anyway). I chose the latter.
3. IpcMode.Container() is modified to not return container ID if the
mode value does not start with "container:", unifying the check to
be the same as in IpcMode.IsContainer().
3. IPC mode unit tests (runconfig/hostconfig_test.go) are modified
to add checks for newly added values.
[v2: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-51345997]
[v3: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-53902833]
[v4: addressed the case of upgrading from older daemon, in this case
container.HostConfig.IpcMode is unset and this is valid]
[v5: document old and new IpcMode values in api/swagger.yaml]
[v6: add the 'none' mode, changelog entry to docs/api/version-history.md]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The daemon config for defaulting to no-new-privileges for containers was
added in d7fda019bb, but somehow we
managed to omit the flag itself, but also documented the flag.
This just adds the actual flag.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This also moves some cli specific in `cmd/dockerd` as it does not
really belong to the `daemon/config` package.
Signed-off-by: Vincent Demeester <vincent@sbr.pm>