Commit Graph

20 Commits

Author SHA1 Message Date
Kir Kolyshkin 7120976d74 Implement none, private, and shareable ipc modes
Since the commit d88fe447df ("Add support for sharing /dev/shm/ and
/dev/mqueue between containers") container's /dev/shm is mounted on the
host first, then bind-mounted inside the container. This is done that
way in order to be able to share this container's IPC namespace
(and the /dev/shm mount point) with another container.

Unfortunately, this functionality breaks container checkpoint/restore
(even if IPC is not shared). Since /dev/shm is an external mount, its
contents is not saved by `criu checkpoint`, and so upon restore any
application that tries to access data under /dev/shm is severily
disappointed (which usually results in a fatal crash).

This commit solves the issue by introducing new IPC modes for containers
(in addition to 'host' and 'container:ID'). The new modes are:

 - 'shareable':	enables sharing this container's IPC with others
		(this used to be the implicit default);

 - 'private':	disables sharing this container's IPC.

In 'private' mode, container's /dev/shm is truly mounted inside the
container, without any bind-mounting from the host, which solves the
issue.

While at it, let's also implement 'none' mode. The motivation, as
eloquently put by Justin Cormack, is:

> I wondered a while back about having a none shm mode, as currently it is
> not possible to have a totally unwriteable container as there is always
> a /dev/shm writeable mount. It is a bit of a niche case (and clearly
> should never be allowed to be daemon default) but it would be trivial to
> add now so maybe we should...

...so here's yet yet another mode:

 - 'none':	no /dev/shm mount inside the container (though it still
		has its own private IPC namespace).

Now, to ultimately solve the abovementioned checkpoint/restore issue, we'd
need to make 'private' the default mode, but unfortunately it breaks the
backward compatibility. So, let's make the default container IPC mode
per-daemon configurable (with the built-in default set to 'shareable'
for now). The default can be changed either via a daemon CLI option
(--default-shm-mode) or a daemon.json configuration file parameter
of the same name.

Note one can only set either 'shareable' or 'private' IPC modes as a
daemon default (i.e. in this context 'host', 'container', or 'none'
do not make much sense).

Some other changes this patch introduces are:

1. A mount for /dev/shm is added to default OCI Linux spec.

2. IpcMode.Valid() is simplified to remove duplicated code that parsed
   'container:ID' form. Note the old version used to check that ID does
   not contain a semicolon -- this is no longer the case (tests are
   modified accordingly). The motivation is we should either do a
   proper check for container ID validity, or don't check it at all
   (since it is checked in other places anyway). I chose the latter.

3. IpcMode.Container() is modified to not return container ID if the
   mode value does not start with "container:", unifying the check to
   be the same as in IpcMode.IsContainer().

3. IPC mode unit tests (runconfig/hostconfig_test.go) are modified
   to add checks for newly added values.

[v2: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-51345997]
[v3: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-53902833]
[v4: addressed the case of upgrading from older daemon, in this case
     container.HostConfig.IpcMode is unset and this is valid]
[v5: document old and new IpcMode values in api/swagger.yaml]
[v6: add the 'none' mode, changelog entry to docs/api/version-history.md]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2017-08-14 10:50:39 +03:00
Antonio Murdaca a18d103b5e
remove --init-path from client
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-04-10 16:49:43 +02:00
Madhan Raj Mookkandy 040afcce8f (*) Support --net:container:<containername/id> for windows
(*) (vdemeester) Removed duplicate code across Windows and Unix wrt Net:Containers
(*) Return unsupported error for network sharing for hyperv isolation containers

Signed-off-by: Madhan Raj Mookkandy <MadhanRaj.Mookkandy@microsoft.com>
2017-02-28 20:03:43 -08:00
Brian Goff 054abff3b6 Implement optional ring buffer for container logs
This allows the user to set a logging mode to "blocking" (default), or
"non-blocking", which uses the ring buffer as a proxy to the real log
driver.

This allows a container to never be blocked on stdio at the cost of
dropping log messages.

Introduces 2 new log-opts that works for all drivers, `log-mode` and
`log-size`. `log-mode` takes a  value of "blocking", or "non-blocking"
I chose not to implement this as a bool since it is difficult to
determine if the mode was set to false vs just not set... especially
difficult when merging the default daemon config with the container config.
`log-size` takes a size string, e.g. `2MB`, which sets the max size
of the ring buffer. When the max size is reached, it will start
dropping log messages.

```
BenchmarkRingLoggerThroughputNoReceiver-8           	2000000000	        36.2 ns/op	 856.35 MB/s	       0 B/op	       0 allocs/op
BenchmarkRingLoggerThroughputWithReceiverDelay0-8   	300000000	       156 ns/op	 198.48 MB/s	      32 B/op	       0 allocs/op
BenchmarkRingLoggerThroughputConsumeDelay1-8        	2000000000	        36.1 ns/op	 857.80 MB/s	       0 B/op	       0 allocs/op
BenchmarkRingLoggerThroughputConsumeDelay10-8       	1000000000	        36.2 ns/op	 856.53 MB/s	       0 B/op	       0 allocs/op
BenchmarkRingLoggerThroughputConsumeDelay50-8       	2000000000	        34.7 ns/op	 894.65 MB/s	       0 B/op	       0 allocs/op
BenchmarkRingLoggerThroughputConsumeDelay100-8      	2000000000	        35.1 ns/op	 883.91 MB/s	       0 B/op	       0 allocs/op
BenchmarkRingLoggerThroughputConsumeDelay300-8      	1000000000	        35.9 ns/op	 863.90 MB/s	       0 B/op	       0 allocs/op
BenchmarkRingLoggerThroughputConsumeDelay500-8      	2000000000	        35.8 ns/op	 866.88 MB/s	       0 B/op	       0 allocs/op
```

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2017-02-01 13:52:37 -05:00
Kenfe-Mickael Laventure 1756af6faf Allow adding rules to cgroup devices.allow on container create/run
This introduce a new `--device-cgroup-rule` flag that allow a user to
add one or more entry to the container cgroup device `devices.allow`

Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-01-26 07:20:45 -08:00
yuexiao-wang 11454e1c97 Fix a bit typos
Signed-off-by: yuexiao-wang <wang.yuexiao@zte.com.cn>
2016-12-09 03:05:11 +08:00
Yong Tang 846baf1fd3 Add `--cpus` flag to control cpu resources
This fix tries to address the proposal raised in 27921 and add
`--cpus` flag for `docker run/create`.

Basically, `--cpus` will allow user to specify a number (possibly partial)
about how many CPUs the container will use. For example, on a 2-CPU system
`--cpus 1.5` means the container will take 75% (1.5/2) of the CPU share.

This fix adds a `NanoCPUs` field to `HostConfig` since swarmkit alreay
have a concept of NanoCPUs for tasks. The `--cpus` flag will translate
the number into reused `NanoCPUs` to be consistent.

This fix adds integration tests to cover the changes.

Related docs (`docker run` and Remote APIs) have been updated.

This fix fixes 27921.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2016-11-04 09:43:10 -07:00
Erik St. Martin 56f77d5ade Implementing support for --cpu-rt-period and --cpu-rt-runtime so that
containers may specify these cgroup values at runtime. This will allow
processes to change their priority to real-time within the container
when CONFIG_RT_GROUP_SCHED is enabled in the kernel. See #22380.

Also added sanity checks for the new --cpu-rt-runtime and --cpu-rt-period
flags to ensure that that the kernel supports these features and that
runtime is not greater than period.

Daemon will support a --cpu-rt-runtime flag to initialize the parent
cgroup on startup, this prevents the administrator from alotting runtime
to docker after each restart.

There are additional checks that could be added but maybe too far? Check
parent cgroups to ensure values are <= parent, inspecting rtprio ulimit
and issuing a warning.

Signed-off-by: Erik St. Martin <alakriti@gmail.com>
2016-10-26 11:33:06 -04:00
Antonio Murdaca 6a12685bb7
configure docker-init binary path
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2016-09-27 14:49:17 +02:00
John Howard 6b74e2f09d Revert Box from HostConfig
Signed-off-by: John Howard <jhoward@microsoft.com>
2016-09-20 12:01:04 -07:00
David Sheets a003419fca Fix typo in api/types/container/host_config.go
Originally merged in #26061.

Signed-off-by: David Sheets <sheets@alum.mit.edu>
2016-09-20 10:41:08 +01:00
Michael Crosby ee3ac3aa66 Add init process for zombie fighting
This adds a small C binary for fighting zombies.  It is mounted under
`/dev/init` and is prepended to the args specified by the user.  You
enable it via a daemon flag, `dockerd --init`, as it is disable by
default for backwards compat.

You can also override the daemon option or specify this on a per
container basis with `docker run --init=true|false`.

You can test this by running a process like this as the pid 1 in a
container and see the extra zombie that appears in the container as it
is running.

```c

int main(int argc, char ** argv) {
	pid_t pid = fork();
	if (pid == 0) {
		pid = fork();
		if (pid == 0) {
			exit(0);
		}
		sleep(3);
		exit(0);
	}
	printf("got pid %d and exited\n", pid);
	sleep(20);
}
```

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-09-19 17:33:50 -07:00
John Howard 53774423ff Windows: OCI process struct convergence
Signed-off-by: John Howard <jhoward@microsoft.com>
2016-09-19 10:34:31 -07:00
Michael Crosby 91e197d614 Add engine-api types to docker
This moves the types for the `engine-api` repo to the existing types
package.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-09-07 11:05:58 -07:00
David Calavera c7d811c816 Remove types and lib packages from the engine.
They live in github.com/docker/engine-api now.

Signed-off-by: David Calavera <david.calavera@gmail.com>
2016-01-06 19:49:00 -05:00
David Calavera 907407d0b2 Modify import paths to point to the new engine-api package.
Signed-off-by: David Calavera <david.calavera@gmail.com>
2016-01-06 19:48:59 -05:00
Anusha Ragunathan 5190794f1d Use ImageBuildOptions in builder.
dockerfile.Config is almost redundant with ImageBuildOptions.
Unify the two so that the latter can be removed. This also
helps build's API endpoint code to be less dependent on package
dockerfile.

Signed-off-by: Anusha Ragunathan <anusha@docker.com>
2016-01-05 10:09:34 -08:00
Qiang Huang 2e02077e9f Remove duplicated OomKilldisable
It's in Resources, but wrongly added back to HostConfig in
https://github.com/docker/docker/pull/18762

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-12-24 15:28:56 +08:00
Daniel Nephin 83237aab2b Remove package pkg/ulimit, use go-units instead.
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2015-12-23 13:27:58 -05:00
David Calavera 7ac4232e70 Move Config and HostConfig from runconfig to types/container.
- Make the API client library completely standalone.
- Move windows partition isolation detection to the client, so the
  driver doesn't use external types.

Signed-off-by: David Calavera <david.calavera@gmail.com>
2015-12-22 13:34:30 -05:00