1
0
Fork 0
mirror of https://github.com/moby/moby.git synced 2022-11-09 12:21:53 -05:00
Commit graph

23 commits

Author SHA1 Message Date
Christophe Vidal
dffa5d6df2 Dropped hyphen in bind mount where appropriate
Signed-off-by: Christophe Vidal <kriss@krizalys.com>
2017-08-19 21:25:07 +07:00
Kir Kolyshkin
7120976d74 Implement none, private, and shareable ipc modes
Since the commit d88fe447df ("Add support for sharing /dev/shm/ and
/dev/mqueue between containers") container's /dev/shm is mounted on the
host first, then bind-mounted inside the container. This is done that
way in order to be able to share this container's IPC namespace
(and the /dev/shm mount point) with another container.

Unfortunately, this functionality breaks container checkpoint/restore
(even if IPC is not shared). Since /dev/shm is an external mount, its
contents is not saved by `criu checkpoint`, and so upon restore any
application that tries to access data under /dev/shm is severily
disappointed (which usually results in a fatal crash).

This commit solves the issue by introducing new IPC modes for containers
(in addition to 'host' and 'container:ID'). The new modes are:

 - 'shareable':	enables sharing this container's IPC with others
		(this used to be the implicit default);

 - 'private':	disables sharing this container's IPC.

In 'private' mode, container's /dev/shm is truly mounted inside the
container, without any bind-mounting from the host, which solves the
issue.

While at it, let's also implement 'none' mode. The motivation, as
eloquently put by Justin Cormack, is:

> I wondered a while back about having a none shm mode, as currently it is
> not possible to have a totally unwriteable container as there is always
> a /dev/shm writeable mount. It is a bit of a niche case (and clearly
> should never be allowed to be daemon default) but it would be trivial to
> add now so maybe we should...

...so here's yet yet another mode:

 - 'none':	no /dev/shm mount inside the container (though it still
		has its own private IPC namespace).

Now, to ultimately solve the abovementioned checkpoint/restore issue, we'd
need to make 'private' the default mode, but unfortunately it breaks the
backward compatibility. So, let's make the default container IPC mode
per-daemon configurable (with the built-in default set to 'shareable'
for now). The default can be changed either via a daemon CLI option
(--default-shm-mode) or a daemon.json configuration file parameter
of the same name.

Note one can only set either 'shareable' or 'private' IPC modes as a
daemon default (i.e. in this context 'host', 'container', or 'none'
do not make much sense).

Some other changes this patch introduces are:

1. A mount for /dev/shm is added to default OCI Linux spec.

2. IpcMode.Valid() is simplified to remove duplicated code that parsed
   'container:ID' form. Note the old version used to check that ID does
   not contain a semicolon -- this is no longer the case (tests are
   modified accordingly). The motivation is we should either do a
   proper check for container ID validity, or don't check it at all
   (since it is checked in other places anyway). I chose the latter.

3. IpcMode.Container() is modified to not return container ID if the
   mode value does not start with "container:", unifying the check to
   be the same as in IpcMode.IsContainer().

3. IPC mode unit tests (runconfig/hostconfig_test.go) are modified
   to add checks for newly added values.

[v2: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-51345997]
[v3: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-53902833]
[v4: addressed the case of upgrading from older daemon, in this case
     container.HostConfig.IpcMode is unset and this is valid]
[v5: document old and new IpcMode values in api/swagger.yaml]
[v6: add the 'none' mode, changelog entry to docs/api/version-history.md]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2017-08-14 10:50:39 +03:00
Daniel J Walsh
bfdb0f3cb8 /dev should be constrained in size
There really is no reason why anyone should create content in /dev
other then device nodes.  Limiting it size to the 64 k size limit.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2017-07-20 08:59:56 -04:00
John Howard
f154588226 LCOW: OCI Spec and Environment for container start
Signed-off-by: John Howard <jhoward@microsoft.com>
2017-06-20 19:50:11 -07:00
Michael Crosby
005506d36c Update moby to runc and oci 1.0 runtime final rc
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-05-05 13:45:45 -07:00
Ma Shimiao
730e0994c8 oci/namespace: remove unnecessary variable idx
Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2016-12-22 09:08:43 +08:00
Tibor Vass
6547609870 plugins: misc fixes
Rename variable to reflect manifest -> config renaming
Populate Description fields when computing privileges.
Refactor/reuse code from daemon/oci_linux.go

Signed-off-by: Tibor Vass <tibor@docker.com>
2016-11-22 14:32:07 -08:00
Tibor Vass
53b9b99e5c plugins: support for devices
Signed-off-by: Tibor Vass <tibor@docker.com>
2016-11-22 09:54:45 -08:00
Amit Krishnan
934328d8ea Add functional support for Docker sub commands on Solaris
Signed-off-by: Amit Krishnan <krish.amit@gmail.com>

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2016-11-07 09:06:34 -08:00
Akihiro Suda
8b1772c86b oci/defaults_linux.go: mask /sys/firmware
On typical x86_64 machines, /sys/firmware can contain SMBIOS and ACPI tables.
There is no need to expose the directory to containers.

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2016-10-12 06:07:13 +00:00
John Howard
02309170a5 Remove hacked Windows OCI spec, compile fixups
Signed-off-by: John Howard <jhoward@microsoft.com>
2016-09-27 12:07:35 -07:00
Michael Crosby
ee3ac3aa66 Add init process for zombie fighting
This adds a small C binary for fighting zombies.  It is mounted under
`/dev/init` and is prepended to the args specified by the user.  You
enable it via a daemon flag, `dockerd --init`, as it is disable by
default for backwards compat.

You can also override the daemon option or specify this on a per
container basis with `docker run --init=true|false`.

You can test this by running a process like this as the pid 1 in a
container and see the extra zombie that appears in the container as it
is running.

```c

int main(int argc, char ** argv) {
	pid_t pid = fork();
	if (pid == 0) {
		pid = fork();
		if (pid == 0) {
			exit(0);
		}
		sleep(3);
		exit(0);
	}
	printf("got pid %d and exited\n", pid);
	sleep(20);
}
```

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-09-19 17:33:50 -07:00
John Howard
7c7c3d7746 Windows OCI convergence step 1
Signed-off-by: John Howard <jhoward@microsoft.com>
2016-09-12 16:11:47 -07:00
Michael Crosby
041e5a21dc Replace old oci specs import with runtime-specs
Fixes #25804

The upstream repo changed the import paths.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-08-17 09:38:34 -07:00
Justin Cormack
39ecc08f32 Do not create /dev/fuse by default
This device is not required by the OCI spec.

The rationale for this was linked to docker/docker#2393

So a non functional /dev/fuse was created, and actual fuse use still is
required to add the device explicitly. However even old versions of the JVM
on Ubuntu 12.04 no longer require the fuse package, and this is all not
needed.

See also https://github.com/opencontainers/runc/pull/983 although this
change alone stops the fuse device being created.

Tested and does not change actual ability to use fuse.

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
2016-08-12 12:33:42 +01:00
Davanum Srinivas
03bd00b68f Adding /proc/timer_list to the masked paths list
/proc/timer_list seems to leak information about the host. Here is
an example from a busybox container running on docker+kubernetes.

 # cat /proc/timer_list | grep -i -e kube
 <ffff8800b8cc3db0>, hrtimer_wakeup, S:01, futex_wait_queue_me, kubelet/2497
 <ffff880129ac3db0>, hrtimer_wakeup, S:01, futex_wait_queue_me, kube-proxy/3478
 <ffff8800b1b77db0>, hrtimer_wakeup, S:01, futex_wait_queue_me, kube-proxy/3470
 <ffff8800bb6abdb0>, hrtimer_wakeup, S:01, futex_wait_queue_me, kubelet/2499

Signed-Off-By: Davanum Srinivas <davanum@gmail.com>
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2016-08-11 15:19:26 -04:00
Amit Krishnan
86d8758e2b Get the Docker Engine to build clean on Solaris
Signed-off-by: Amit Krishnan <krish.amit@gmail.com>
2016-05-23 16:37:12 -07:00
Akihiro Suda
b397f978c6 oci: default devices don't need to be listed explicitly
Eliminating these things make the code much more understandable.

See also adcbe530a9/config-linux.md (default-devices)
df25eddce6/libcontainer/specconv/spec_linux.go (L454)

Signed-off-by: Akihiro Suda <suda.kyoto@gmail.com>
2016-04-11 05:58:49 +00:00
Alexander Morozov
5ee8652a21 all: remove some unused funcs and variables
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2016-04-06 10:40:01 -07:00
Tonis Tiigi
3f81b49352 Define readonly/mask paths in spec
This vendors in new spec/runc that supports
setting readonly and masked paths in the 
configuration. Using this allows us to make an
exception for `—-privileged`.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
2016-04-04 18:55:55 -07:00
Zhang Wei
c9db9e4ff1 Clean dead code
Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2016-03-21 16:43:24 +08:00
John Howard
94d70d8355 Windows libcontainerd implementation
Signed-off-by: John Howard <jhoward@microsoft.com>
Signed-off-by: John Starks <jostarks@microsoft.com>
Signed-off-by: Darren Stahl <darst@microsoft.com>
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
2016-03-18 13:38:41 -07:00
Tonis Tiigi
9c4570a958 Replace execdrivers with containerd implementation
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
Signed-off-by: Anusha Ragunathan <anusha@docker.com>
2016-03-18 13:38:32 -07:00