mirror of
https://github.com/moby/moby.git
synced 2022-11-09 12:21:53 -05:00
Fix 2720 -- Expanded documentation for docker run.
Docker-DCO-1.1-Signed-off-by: Andy Rothfusz <github@developersupport.net> (github: metalivedev)
This commit is contained in:
parent
77d4df1e0b
commit
07c4eda46a
6 changed files with 867 additions and 4 deletions
|
@ -12,3 +12,4 @@ Articles
|
|||
|
||||
security
|
||||
baseimages
|
||||
runmetrics
|
||||
|
|
469
docs/sources/articles/runmetrics.rst
Normal file
469
docs/sources/articles/runmetrics.rst
Normal file
|
@ -0,0 +1,469 @@
|
|||
:title: Runtime Metrics
|
||||
:description: Measure the behavior of running containers
|
||||
:keywords: docker, metrics, CPU, memory, disk, IO, run, runtime
|
||||
|
||||
.. _run_metrics:
|
||||
|
||||
|
||||
Runtime Metrics
|
||||
===============
|
||||
|
||||
Linux Containers rely on `control groups
|
||||
<https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt>`_ which
|
||||
not only track groups of processes, but also expose metrics about CPU,
|
||||
memory, and block I/O usage. You can access those metrics and obtain
|
||||
network usage metrics as well. This is relevant for "pure" LXC
|
||||
containers, as well as for Docker containers.
|
||||
|
||||
Control Groups
|
||||
--------------
|
||||
|
||||
Control groups are exposed through a pseudo-filesystem. In recent
|
||||
distros, you should find this filesystem under
|
||||
``/sys/fs/cgroup``. Under that directory, you will see multiple
|
||||
sub-directories, called devices, freezer, blkio, etc.; each
|
||||
sub-directory actually corresponds to a different cgroup hierarchy.
|
||||
|
||||
On older systems, the control groups might be mounted on ``/cgroup``,
|
||||
without distinct hierarchies. In that case, instead of seeing the
|
||||
sub-directories, you will see a bunch of files in that directory, and
|
||||
possibly some directories corresponding to existing containers.
|
||||
|
||||
To figure out where your control groups are mounted, you can run:
|
||||
|
||||
::
|
||||
|
||||
grep cgroup /proc/mounts
|
||||
|
||||
.. _run_findpid:
|
||||
|
||||
Ennumerating Cgroups
|
||||
--------------------
|
||||
|
||||
You can look into ``/proc/cgroups`` to see the different control group
|
||||
subsystems known to the system, the hierarchy they belong to, and how
|
||||
many groups they contain.
|
||||
|
||||
You can also look at ``/proc/<pid>/cgroup`` to see which control
|
||||
groups a process belongs to. The control group will be shown as a path
|
||||
relative to the root of the hierarchy mountpoint; e.g. ``/`` means
|
||||
“this process has not been assigned into a particular group”, while
|
||||
``/lxc/pumpkin`` means that the process is likely to be a member of a
|
||||
container named ``pumpkin``.
|
||||
|
||||
Finding the Cgroup for a Given Container
|
||||
----------------------------------------
|
||||
|
||||
For each container, one cgroup will be created in each hierarchy. On
|
||||
older systems with older versions of the LXC userland tools, the name
|
||||
of the cgroup will be the name of the container. With more recent
|
||||
versions of the LXC tools, the cgroup will be ``lxc/<container_name>.``
|
||||
|
||||
For Docker containers using cgroups, the container name will be the
|
||||
full ID or long ID of the container. If a container shows up as
|
||||
ae836c95b4c3 in ``docker ps``, its long ID might be something like
|
||||
``ae836c95b4c3c9e9179e0e91015512da89fdec91612f63cebae57df9a5444c79``. You
|
||||
can look it up with ``docker inspect`` or ``docker ps -notrunc``.
|
||||
|
||||
Putting everything together to look at the memory metrics for a Docker
|
||||
container, take a look at ``/sys/fs/cgroup/memory/lxc/<longid>/``.
|
||||
|
||||
Metrics from Cgroups: Memory, CPU, Block IO
|
||||
-------------------------------------------
|
||||
|
||||
For each subsystem (memory, cpu, and block i/o), you will find one or
|
||||
more pseudo-files containing statistics.
|
||||
|
||||
Memory Metrics: ``memory.stat``
|
||||
...............................
|
||||
|
||||
Memory metrics are found in the "memory" cgroup. Note that the memory
|
||||
control group adds a little overhead, because it does very
|
||||
fine-grained accounting of the memory usage on your system. Therefore,
|
||||
many distros chose to not enable it by default. Generally, to enable
|
||||
it, all you have to do is to add some kernel command-line parameters:
|
||||
``cgroup_enable=memory swapaccount=1``.
|
||||
|
||||
The metrics are in the pseudo-file ``memory.stat``. Here is what it
|
||||
will look like:
|
||||
|
||||
::
|
||||
|
||||
cache 11492564992
|
||||
rss 1930993664
|
||||
mapped_file 306728960
|
||||
pgpgin 406632648
|
||||
pgpgout 403355412
|
||||
swap 0
|
||||
pgfault 728281223
|
||||
pgmajfault 1724
|
||||
inactive_anon 46608384
|
||||
active_anon 1884520448
|
||||
inactive_file 7003344896
|
||||
active_file 4489052160
|
||||
unevictable 32768
|
||||
hierarchical_memory_limit 9223372036854775807
|
||||
hierarchical_memsw_limit 9223372036854775807
|
||||
total_cache 11492564992
|
||||
total_rss 1930993664
|
||||
total_mapped_file 306728960
|
||||
total_pgpgin 406632648
|
||||
total_pgpgout 403355412
|
||||
total_swap 0
|
||||
total_pgfault 728281223
|
||||
total_pgmajfault 1724
|
||||
total_inactive_anon 46608384
|
||||
total_active_anon 1884520448
|
||||
total_inactive_file 7003344896
|
||||
total_active_file 4489052160
|
||||
total_unevictable 32768
|
||||
|
||||
The first half (without the ``total_`` prefix) contains statistics
|
||||
relevant to the processes within the cgroup, excluding
|
||||
sub-cgroups. The second half (with the ``total_`` prefix) includes
|
||||
sub-cgroups as well.
|
||||
|
||||
Some metrics are "gauges", i.e. values that can increase or decrease
|
||||
(e.g. swap, the amount of swap space used by the members of the
|
||||
cgroup). Some others are "counters", i.e. values that can only go up,
|
||||
because they represent occurrences of a specific event (e.g. pgfault,
|
||||
which indicates the number of page faults which happened since the
|
||||
creation of the cgroup; this number can never decrease).
|
||||
|
||||
cache
|
||||
the amount of memory used by the processes of this control group
|
||||
that can be associated precisely with a block on a block
|
||||
device. When you read and write files from and to disk, this amount
|
||||
will increase. This will be the case if you use "conventional" I/O
|
||||
(``open``, ``read``, ``write`` syscalls) as well as mapped files
|
||||
(with ``mmap``). It also accounts for the memory used by ``tmpfs``
|
||||
mounts, though the reasons are unclear.
|
||||
|
||||
rss
|
||||
the amount of memory that *doesn't* correspond to anything on
|
||||
disk: stacks, heaps, and anonymous memory maps.
|
||||
|
||||
mapped_file
|
||||
indicates the amount of memory mapped by the processes in the
|
||||
control group. It doesn't give you information about *how much*
|
||||
memory is used; it rather tells you *how* it is used.
|
||||
|
||||
pgpgin and pgpgout
|
||||
correspond to *charging events*. Each time a page is "charged"
|
||||
(=added to the accounting) to a cgroup, pgpgin increases. When a
|
||||
page is "uncharged" (=no longer "billed" to a cgroup), pgpgout
|
||||
increases.
|
||||
|
||||
pgfault and pgmajfault
|
||||
indicate the number of times that a process of the cgroup triggered
|
||||
a "page fault" and a "major fault", respectively. A page fault
|
||||
happens when a process accesses a part of its virtual memory space
|
||||
which is inexistent or protected. The former can happen if the
|
||||
process is buggy and tries to access an invalid address (it will
|
||||
then be sent a ``SIGSEGV`` signal, typically killing it with the
|
||||
famous ``Segmentation fault`` message). The latter can happen when
|
||||
the process reads from a memory zone which has been swapped out, or
|
||||
which corresponds to a mapped file: in that case, the kernel will
|
||||
load the page from disk, and let the CPU complete the memory
|
||||
access. It can also happen when the process writes to a
|
||||
copy-on-write memory zone: likewise, the kernel will preempt the
|
||||
process, duplicate the memory page, and resume the write operation
|
||||
on the process' own copy of the page. "Major" faults happen when the
|
||||
kernel actually has to read the data from disk. When it just has to
|
||||
duplicate an existing page, or allocate an empty page, it's a
|
||||
regular (or "minor") fault.
|
||||
|
||||
swap
|
||||
the amount of swap currently used by the processes in this cgroup.
|
||||
|
||||
active_anon and inactive_anon
|
||||
the amount of *anonymous* memory that has been identified has
|
||||
respectively *active* and *inactive* by the kernel. "Anonymous"
|
||||
memory is the memory that is *not* linked to disk pages. In other
|
||||
words, that's the equivalent of the rss counter described above. In
|
||||
fact, the very definition of the rss counter is **active_anon** +
|
||||
**inactive_anon** - **tmpfs** (where tmpfs is the amount of memory
|
||||
used up by ``tmpfs`` filesystems mounted by this control
|
||||
group). Now, what's the difference between "active" and "inactive"?
|
||||
Pages are initially "active"; and at regular intervals, the kernel
|
||||
sweeps over the memory, and tags some pages as "inactive". Whenever
|
||||
they are accessed again, they are immediately retagged
|
||||
"active". When the kernel is almost out of memory, and time comes to
|
||||
swap out to disk, the kernel will swap "inactive" pages.
|
||||
|
||||
active_file and inactive_file
|
||||
cache memory, with *active* and *inactive* similar to the *anon*
|
||||
memory above. The exact formula is cache = **active_file** +
|
||||
**inactive_file** + **tmpfs**. The exact rules used by the kernel to
|
||||
move memory pages between active and inactive sets are different
|
||||
from the ones used for anonymous memory, but the general principle
|
||||
is the same. Note that when the kernel needs to reclaim memory, it
|
||||
is cheaper to reclaim a clean (=non modified) page from this pool,
|
||||
since it can be reclaimed immediately (while anonymous pages and
|
||||
dirty/modified pages have to be written to disk first).
|
||||
|
||||
unevictable
|
||||
the amount of memory that cannot be reclaimed; generally, it will
|
||||
account for memory that has been "locked" with ``mlock``. It is
|
||||
often used by crypto frameworks to make sure that secret keys and
|
||||
other sensitive material never gets swapped out to disk.
|
||||
|
||||
memory and memsw limits
|
||||
These are not really metrics, but a reminder of the limits applied
|
||||
to this cgroup. The first one indicates the maximum amount of
|
||||
physical memory that can be used by the processes of this control
|
||||
group; the second one indicates the maximum amount of RAM+swap.
|
||||
|
||||
Accounting for memory in the page cache is very complex. If two
|
||||
processes in different control groups both read the same file
|
||||
(ultimately relying on the same blocks on disk), the corresponding
|
||||
memory charge will be split between the control groups. It's nice, but
|
||||
it also means that when a cgroup is terminated, it could increase the
|
||||
memory usage of another cgroup, because they are not splitting the
|
||||
cost anymore for those memory pages.
|
||||
|
||||
CPU metrics: ``cpuacct.stat``
|
||||
.............................
|
||||
|
||||
Now that we've covered memory metrics, everything else will look very
|
||||
simple in comparison. CPU metrics will be found in the ``cpuacct``
|
||||
controller.
|
||||
|
||||
For each container, you will find a pseudo-file ``cpuacct.stat``,
|
||||
containing the CPU usage accumulated by the processes of the
|
||||
container, broken down between ``user`` and ``system`` time. If you're
|
||||
not familiar with the distinction, ``user`` is the time during which
|
||||
the processes were in direct control of the CPU (i.e. executing
|
||||
process code), and ``system`` is the time during which the CPU was
|
||||
executing system calls on behalf of those processes.
|
||||
|
||||
Those times are expressed in ticks of 1/100th of second. Actually,
|
||||
they are expressed in "user jiffies". There are ``USER_HZ``
|
||||
*"jiffies"* per second, and on x86 systems, ``USER_HZ`` is 100. This
|
||||
used to map exactly to the number of scheduler "ticks" per second; but
|
||||
with the advent of higher frequency scheduling, as well as `tickless
|
||||
kernels <http://lwn.net/Articles/549580/>`_, the number of kernel
|
||||
ticks wasn't relevant anymore. It stuck around anyway, mainly for
|
||||
legacy and compatibility reasons.
|
||||
|
||||
Block I/O metrics
|
||||
.................
|
||||
|
||||
Block I/O is accounted in the ``blkio`` controller. Different metrics
|
||||
are scattered across different files. While you can find in-depth
|
||||
details in the `blkio-controller
|
||||
<https://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt>`_
|
||||
file in the kernel documentation, here is a short list of the most
|
||||
relevant ones:
|
||||
|
||||
blkio.sectors
|
||||
contain the number of 512-bytes sectors read and written by the
|
||||
processes member of the cgroup, device by device. Reads and writes
|
||||
are merged in a single counter.
|
||||
|
||||
blkio.io_service_bytes
|
||||
indicates the number of bytes read and written by the cgroup. It has
|
||||
4 counters per device, because for each device, it differentiates
|
||||
between synchronous vs. asynchronous I/O, and reads vs. writes.
|
||||
|
||||
blkio.io_serviced
|
||||
the number of I/O operations performed, regardless of their size. It
|
||||
also has 4 counters per device.
|
||||
|
||||
blkio.io_queued
|
||||
indicates the number of I/O operations currently queued for this
|
||||
cgroup. In other words, if the cgroup isn't doing any I/O, this will
|
||||
be zero. Note that the opposite is not true. In other words, if
|
||||
there is no I/O queued, it does not mean that the cgroup is idle
|
||||
(I/O-wise). It could be doing purely synchronous reads on an
|
||||
otherwise quiescent device, which is therefore able to handle them
|
||||
immediately, without queuing. Also, while it is helpful to figure
|
||||
out which cgroup is putting stress on the I/O subsystem, keep in
|
||||
mind that is is a relative quantity. Even if a process group does
|
||||
not perform more I/O, its queue size can increase just because the
|
||||
device load increases because of other devices.
|
||||
|
||||
Network Metrics
|
||||
---------------
|
||||
|
||||
Network metrics are not exposed directly by control groups. There is a
|
||||
good explanation for that: network interfaces exist within the context
|
||||
of *network namespaces*. The kernel could probably accumulate metrics
|
||||
about packets and bytes sent and received by a group of processes, but
|
||||
those metrics wouldn't be very useful. You want per-interface metrics
|
||||
(because traffic happening on the local ``lo`` interface doesn't
|
||||
really count). But since processes in a single cgroup can belong to
|
||||
multiple network namespaces, those metrics would be harder to
|
||||
interpret: multiple network namespaces means multiple ``lo``
|
||||
interfaces, potentially multiple ``eth0`` interfaces, etc.; so this is
|
||||
why there is no easy way to gather network metrics with control
|
||||
groups.
|
||||
|
||||
Instead we can gather network metrics from other sources:
|
||||
|
||||
IPtables
|
||||
........
|
||||
|
||||
IPtables (or rather, the netfilter framework for which iptables is
|
||||
just an interface) can do some serious accounting.
|
||||
|
||||
For instance, you can setup a rule to account for the outbound HTTP
|
||||
traffic on a web server:
|
||||
|
||||
::
|
||||
|
||||
iptables -I OUTPUT -p tcp --sport 80
|
||||
|
||||
|
||||
There is no ``-j`` or ``-g`` flag, so the rule will just count matched
|
||||
packets and go to the following rule.
|
||||
|
||||
Later, you can check the values of the counters, with:
|
||||
|
||||
::
|
||||
|
||||
iptables -nxvL OUTPUT
|
||||
|
||||
Technically, ``-n`` is not required, but it will prevent iptables from
|
||||
doing DNS reverse lookups, which are probably useless in this
|
||||
scenario.
|
||||
|
||||
Counters include packets and bytes. If you want to setup metrics for
|
||||
container traffic like this, you could execute a ``for`` loop to add
|
||||
two ``iptables`` rules per container IP address (one in each
|
||||
direction), in the ``FORWARD`` chain. This will only meter traffic
|
||||
going through the NAT layer; you will also have to add traffic going
|
||||
through the userland proxy.
|
||||
|
||||
Then, you will need to check those counters on a regular basis. If you
|
||||
happen to use ``collectd``, there is a nice plugin to automate
|
||||
iptables counters collection.
|
||||
|
||||
Interface-level counters
|
||||
........................
|
||||
|
||||
Since each container has a virtual Ethernet interface, you might want
|
||||
to check directly the TX and RX counters of this interface. You will
|
||||
notice that each container is associated to a virtual Ethernet
|
||||
interface in your host, with a name like ``vethKk8Zqi``. Figuring out
|
||||
which interface corresponds to which container is, unfortunately,
|
||||
difficult.
|
||||
|
||||
But for now, the best way is to check the metrics *from within the
|
||||
containers*. To accomplish this, you can run an executable from the
|
||||
host environment within the network namespace of a container using
|
||||
**ip-netns magic**.
|
||||
|
||||
The ``ip-netns exec`` command will let you execute any program
|
||||
(present in the host system) within any network namespace visible to
|
||||
the current process. This means that your host will be able to enter
|
||||
the network namespace of your containers, but your containers won't be
|
||||
able to access the host, nor their sibling containers. Containers will
|
||||
be able to “see” and affect their sub-containers, though.
|
||||
|
||||
The exact format of the command is::
|
||||
|
||||
ip netns exec <nsname> <command...>
|
||||
|
||||
For example::
|
||||
|
||||
ip netns exec mycontainer netstat -i
|
||||
|
||||
``ip netns`` finds the "mycontainer" container by using namespaces
|
||||
pseudo-files. Each process belongs to one network namespace, one PID
|
||||
namespace, one ``mnt`` namespace, etc., and those namespaces are
|
||||
materialized under ``/proc/<pid>/ns/``. For example, the network
|
||||
namespace of PID 42 is materialized by the pseudo-file
|
||||
``/proc/42/ns/net``.
|
||||
|
||||
When you run ``ip netns exec mycontainer ...``, it expects
|
||||
``/var/run/netns/mycontainer`` to be one of those
|
||||
pseudo-files. (Symlinks are accepted.)
|
||||
|
||||
In other words, to execute a command within the network namespace of a
|
||||
container, we need to:
|
||||
|
||||
* find out the PID of any process within the container that we want to
|
||||
investigate;
|
||||
* create a symlink from ``/var/run/netns/<somename>`` to
|
||||
``/proc/<thepid>/ns/net``
|
||||
* execute ``ip netns exec <somename> ....``
|
||||
|
||||
Please review :ref:`run_findpid` to learn how to find the cgroup of a
|
||||
pprocess running in the container of which you want to measure network
|
||||
usage. From there, you can examine the pseudo-file named ``tasks``,
|
||||
which containes the PIDs that are in the control group (i.e. in the
|
||||
container). Pick any one of them.
|
||||
|
||||
Putting everything together, if the "short ID" of a container is held
|
||||
in the environment variable ``$CID``, then you can do this::
|
||||
|
||||
TASKS=/sys/fs/cgroup/devices/$CID*/tasks
|
||||
PID=$(head -n 1 $TASKS)
|
||||
mkdir -p /var/run/netns
|
||||
ln -sf /proc/$PID/ns/net /var/run/netns/$CID
|
||||
ip netns exec $CID netstat -i
|
||||
|
||||
|
||||
Tips for high-performance metric collection
|
||||
-------------------------------------------
|
||||
|
||||
Note that running a new process each time you want to update metrics
|
||||
is (relatively) expensive. If you want to collect metrics at high
|
||||
resolutions, and/or over a large number of containers (think 1000
|
||||
containers on a single host), you do not want to fork a new process
|
||||
each time.
|
||||
|
||||
Here is how to collect metrics from a single process. You will have to
|
||||
write your metric collector in C (or any language that lets you do
|
||||
low-level system calls). You need to use a special system call,
|
||||
``setns()``, which lets the current process enter any arbitrary
|
||||
namespace. It requires, however, an open file descriptor to the
|
||||
namespace pseudo-file (remember: that’s the pseudo-file in
|
||||
``/proc/<pid>/ns/net``).
|
||||
|
||||
However, there is a catch: you must not keep this file descriptor
|
||||
open. If you do, when the last process of the control group exits, the
|
||||
namespace will not be destroyed, and its network resources (like the
|
||||
virtual interface of the container) will stay around for ever (or
|
||||
until you close that file descriptor).
|
||||
|
||||
The right approach would be to keep track of the first PID of each
|
||||
container, and re-open the namespace pseudo-file each time.
|
||||
|
||||
Collecting metrics when a container exits
|
||||
-----------------------------------------
|
||||
|
||||
Sometimes, you do not care about real time metric collection, but when
|
||||
a container exits, you want to know how much CPU, memory, etc. it has
|
||||
used.
|
||||
|
||||
Docker makes this difficult because it relies on ``lxc-start``, which
|
||||
carefully cleans up after itself, but it is still possible. It is
|
||||
usually easier to collect metrics at regular intervals (e.g. every
|
||||
minute, with the collectd LXC plugin) and rely on that instead.
|
||||
|
||||
But, if you'd still like to gather the stats when a container stops,
|
||||
here is how:
|
||||
|
||||
For each container, start a collection process, and move it to the
|
||||
control groups that you want to monitor by writing its PID to the
|
||||
tasks file of the cgroup. The collection process should periodically
|
||||
re-read the tasks file to check if it's the last process of the
|
||||
control group. (If you also want to collect network statistics as
|
||||
explained in the previous section, you should also move the process to
|
||||
the appropriate network namespace.)
|
||||
|
||||
When the container exits, ``lxc-start`` will try to delete the control
|
||||
groups. It will fail, since the control group is still in use; but
|
||||
that’s fine. You process should now detect that it is the only one
|
||||
remaining in the group. Now is the right time to collect all the
|
||||
metrics you need!
|
||||
|
||||
Finally, your process should move itself back to the root control
|
||||
group, and remove the container control group. To remove a control
|
||||
group, just ``rmdir`` its directory. It's counter-intuitive to
|
||||
``rmdir`` a directory as it still contains files; but remember that
|
||||
this is a pseudo-filesystem, so usual rules don't apply. After the
|
||||
cleanup is done, the collection process can exit safely.
|
||||
|
|
@ -1,12 +1,12 @@
|
|||
:title: Build Images (Dockerfile Reference)
|
||||
:title: Dockerfile Reference
|
||||
:description: Dockerfiles use a simple DSL which allows you to automate the steps you would normally manually take to create an image.
|
||||
:keywords: builder, docker, Dockerfile, automation, image creation
|
||||
|
||||
.. _dockerbuilder:
|
||||
|
||||
===================================
|
||||
Build Images (Dockerfile Reference)
|
||||
===================================
|
||||
====================
|
||||
Dockerfile Reference
|
||||
====================
|
||||
|
||||
**Docker can act as a builder** and read instructions from a text
|
||||
``Dockerfile`` to automate the steps you would otherwise take manually
|
||||
|
|
|
@ -18,6 +18,45 @@ To list available commands, either run ``docker`` with no parameters or execute
|
|||
|
||||
...
|
||||
|
||||
.. _cli_options:
|
||||
|
||||
Types of Options
|
||||
----------------
|
||||
|
||||
Boolean
|
||||
~~~~~~~
|
||||
|
||||
Boolean options look like ``-d=false``. The value you see is the
|
||||
default value which gets set if you do **not** use the boolean
|
||||
flag. If you do call ``run -d``, that sets the opposite boolean value,
|
||||
so in this case, ``true``, and so ``docker run -d`` **will** run in
|
||||
"detached" mode, in the background. Other boolean options are similar
|
||||
-- specifying them will set the value to the opposite of the default
|
||||
value.
|
||||
|
||||
Multi
|
||||
~~~~~
|
||||
|
||||
Options like ``-a=[]`` indicate they can be specified multiple times::
|
||||
|
||||
docker run -a stdin -a stdout -a stderr -i -t ubuntu /bin/bash
|
||||
|
||||
Sometimes this can use a more complex value string, as for ``-v``::
|
||||
|
||||
docker run -v /host:/container example/mysql
|
||||
|
||||
Strings and Integers
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Options like ``-name=""`` expect a string, and they can only be
|
||||
specified once. Options like ``-c=0`` expect an integer, and they can
|
||||
only be specified once.
|
||||
|
||||
----
|
||||
|
||||
Commands
|
||||
--------
|
||||
|
||||
.. _cli_daemon:
|
||||
|
||||
``daemon``
|
||||
|
|
|
@ -14,4 +14,5 @@ Contents:
|
|||
|
||||
commandline/index
|
||||
builder
|
||||
run
|
||||
api/index
|
||||
|
|
353
docs/sources/reference/run.rst
Normal file
353
docs/sources/reference/run.rst
Normal file
|
@ -0,0 +1,353 @@
|
|||
:title: Docker Run Reference
|
||||
:description: Configure containers at runtime
|
||||
:keywords: docker, run, configure, runtime
|
||||
|
||||
.. _run_docker:
|
||||
|
||||
====================
|
||||
Docker Run Reference
|
||||
====================
|
||||
|
||||
**Docker runs processes in isolated containers**. When an operator
|
||||
executes ``docker run``, she starts a process with its own file
|
||||
system, its own networking, and its own isolated process tree. The
|
||||
:ref:`image_def` which starts the process may define defaults related
|
||||
to the binary to run, the networking to expose, and more, but ``docker
|
||||
run`` gives final control to the operator who starts the container
|
||||
from the image. That's the main reason :ref:`cli_run` has more options
|
||||
than any other ``docker`` command.
|
||||
|
||||
Every one of the :ref:`example_list` shows running containers, and so
|
||||
here we try to give more in-depth guidance.
|
||||
|
||||
.. contents:: Table of Contents
|
||||
|
||||
.. _run_running:
|
||||
|
||||
General Form
|
||||
============
|
||||
|
||||
As you've seen in the :ref:`example_list`, the basic `run` command
|
||||
takes this form::
|
||||
|
||||
docker run [OPTIONS] IMAGE[:TAG] [COMMAND] [ARG...]
|
||||
|
||||
To learn how to interpret the types of ``[OPTIONS]``, see
|
||||
:ref:`cli_options`.
|
||||
|
||||
The list of ``[OPTIONS]`` breaks down into two groups:
|
||||
|
||||
* options that define the runtime behavior or environment, and
|
||||
* options that override image defaults.
|
||||
|
||||
Since image defaults usually get set in :ref:`Dockerfiles
|
||||
<dockerbuilder>` (though they could also be set at :ref:`cli_commit`
|
||||
time too), we will group the runtime options here by their related
|
||||
Dockerfile commands so that it is easier to see how to override image
|
||||
defaults and set new behavior.
|
||||
|
||||
We'll start, though, with the options that are unique to ``docker
|
||||
run``, the options which define the runtime behavior or the container
|
||||
environment.
|
||||
|
||||
.. note:: The runtime operator always has final control over the
|
||||
behavior of a Docker container.
|
||||
|
||||
Detached or Foreground
|
||||
======================
|
||||
|
||||
When starting a Docker container, you must first decide if you want to
|
||||
run the container in the background in a "detached" mode or in the
|
||||
default foreground mode::
|
||||
|
||||
-d=false: Detached mode: Run container in the background, print new container id
|
||||
|
||||
Detached (-d)
|
||||
.............
|
||||
|
||||
In detached mode (``-d=true`` or just ``-d``), all IO should be done
|
||||
through network connections or shared volumes because the container is
|
||||
no longer listening to the commandline where you executed ``docker
|
||||
run``. You can reattach to a detached container with ``docker``
|
||||
:ref:`cli_attach`. If you choose to run a container in the detached
|
||||
mode, then you cannot use the ``-rm`` option.
|
||||
|
||||
Foreground
|
||||
..........
|
||||
|
||||
In foreground mode (the default when ``-d`` is not specified),
|
||||
``docker run`` can start the process in the container and attach the
|
||||
console to the process's standard input, output, and standard
|
||||
error. It can even pretend to be a TTY (this is what most commandline
|
||||
executables expect) and pass along signals. All of that is
|
||||
configurable::
|
||||
|
||||
-a=[] : Attach to stdin, stdout and/or stderr
|
||||
-t=false : Allocate a pseudo-tty
|
||||
-sig-proxy=true: Proxify all received signal to the process (even in non-tty mode)
|
||||
-i=false : Keep stdin open even if not attached
|
||||
|
||||
If you do not specify ``-a`` then Docker will `attach everything
|
||||
(stdin,stdout,stderr)
|
||||
<https://github.com/dotcloud/docker/blob/master/commands.go#L1797>`_. You
|
||||
can specify which of the three standard streams (stdin, stdout,
|
||||
stderr) you'd like to connect between your instead, as in::
|
||||
|
||||
docker run -a stdin -a stdout -i -t ubuntu /bin/bash
|
||||
|
||||
For interactive processes (like a shell) you will typically want a tty
|
||||
as well as persistent standard in, so you'll use ``-i -t`` together in
|
||||
most interactive cases.
|
||||
|
||||
Clean Up (-rm)
|
||||
--------------
|
||||
|
||||
By default a container's file system persists even after the container
|
||||
exits. This makes debugging a lot easier (since you can inspect the
|
||||
final state) and you retain all your data by default. But if you are
|
||||
running short-term **foreground** processes, these container file
|
||||
systems can really pile up. If instead you'd like Docker to
|
||||
**automatically clean up the container and remove the file system when
|
||||
the container exits**, you can add the ``-rm`` flag::
|
||||
|
||||
-rm=false: Automatically remove the container when it exits (incompatible with -d)
|
||||
|
||||
Name (-name)
|
||||
============
|
||||
|
||||
The operator can identify a container in three ways:
|
||||
|
||||
* UUID long identifier ("f78375b1c487e03c9438c729345e54db9d20cfa2ac1fc3494b6eb60872e74778")
|
||||
* UUID short identifier ("f78375b1c487")
|
||||
* name ("evil_ptolemy")
|
||||
|
||||
The UUID identifiers come from the Docker daemon, and if you do not
|
||||
assign a name to the container with ``-name`` then the daemon will
|
||||
also generate a random string name too. The name can become a handy
|
||||
way to add meaning to a container since you can use this name when
|
||||
defining :ref:`links <working_with_links_names>` (or any other place
|
||||
you need to identify a container). This works for both background and
|
||||
foreground Docker containers.
|
||||
|
||||
PID Equivalent
|
||||
==============
|
||||
|
||||
And finally, to help with automation, you can have Docker write the
|
||||
container id out to a file of your choosing. This is similar to how
|
||||
some programs might write out their process ID to a file (you've seen
|
||||
them as .pid files)::
|
||||
|
||||
-cidfile="": Write the container ID to the file
|
||||
|
||||
Overriding Dockerfile Image Defaults
|
||||
====================================
|
||||
|
||||
When a developer builds an image from a :ref:`Dockerfile
|
||||
<dockerbuilder>` or when she commits it, the developer can set a
|
||||
number of default parameters that take effect when the image starts up
|
||||
as a container.
|
||||
|
||||
Four of the Dockerfile commands cannot be overridden at runtime:
|
||||
``FROM, MAINTAINER, RUN``, and ``ADD``. Everything else has a
|
||||
corresponding override in ``docker run``. We'll go through what the
|
||||
developer might have set in each Dockerfile instruction and how the
|
||||
operator can override that setting.
|
||||
|
||||
|
||||
CMD
|
||||
...
|
||||
|
||||
Remember the optional ``COMMAND`` in the Docker commandline::
|
||||
|
||||
docker run [OPTIONS] IMAGE[:TAG] [COMMAND] [ARG...]
|
||||
|
||||
This command is optional because the person who created the ``IMAGE``
|
||||
may have already provided a default ``COMMAND`` using the Dockerfile
|
||||
``CMD``. As the operator (the person running a container from the
|
||||
image), you can override that ``CMD`` just by specifying a new
|
||||
``COMMAND``.
|
||||
|
||||
If the image also specifies an ``ENTRYPOINT`` then the ``CMD`` or
|
||||
``COMMAND`` get appended as arguments to the ``ENTRYPOINT``.
|
||||
|
||||
|
||||
ENTRYPOINT
|
||||
..........
|
||||
|
||||
::
|
||||
|
||||
-entrypoint="": Overwrite the default entrypoint set by the image
|
||||
|
||||
The ENTRYPOINT of an image is similar to a COMMAND because it
|
||||
specifies what executable to run when the container starts, but it is
|
||||
(purposely) more difficult to override. The ENTRYPOINT gives a
|
||||
container its default nature or behavior, so that when you set an
|
||||
ENTRYPOINT you can run the container *as if it were that binary*,
|
||||
complete with default options, and you can pass in more options via
|
||||
the COMMAND. But, sometimes an operator may want to run something else
|
||||
inside the container, so you can override the default ENTRYPOINT at
|
||||
runtime by using a string to specify the new ENTRYPOINT. Here is an
|
||||
example of how to run a shell in a container that has been set up to
|
||||
automatically run something else (like ``/usr/bin/redis-server``)::
|
||||
|
||||
docker run -i -t -entrypoint /bin/bash example/redis
|
||||
|
||||
or two examples of how to pass more parameters to that ENTRYPOINT::
|
||||
|
||||
docker run -i -t -entrypoint /bin/bash example/redis -c ls -l
|
||||
docker run -i -t -entrypoint /usr/bin/redis-cli example/redis --help
|
||||
|
||||
|
||||
EXPOSE (``run`` Networking Options)
|
||||
...................................
|
||||
|
||||
The *Dockerfile* doesn't give much control over networking, only
|
||||
providing the EXPOSE instruction to give a hint to the operator about
|
||||
what incoming ports might provide services. At runtime, however,
|
||||
Docker provides a number of ``run`` options related to networking::
|
||||
|
||||
-n=true : Enable networking for this container
|
||||
-dns=[] : Set custom dns servers for the container
|
||||
-expose=[]: Expose a port from the container
|
||||
without publishing it to your host
|
||||
-P=false : Publish all exposed ports to the host interfaces
|
||||
-p=[] : Publish a container's port to the host (format:
|
||||
ip:hostPort:containerPort | ip::containerPort |
|
||||
hostPort:containerPort)
|
||||
(use 'docker port' to see the actual mapping)
|
||||
-link="" : Add link to another container (name:alias)
|
||||
|
||||
By default, all containers have networking enabled and they can make
|
||||
any outgoing connections. The operator can completely disable
|
||||
networking with ``run -n`` which disables all incoming and outgoing
|
||||
networking. In cases like this, you would perform IO through files or
|
||||
stdin/stdout only.
|
||||
|
||||
Your container will use the same DNS servers as the host by default,
|
||||
but you can override this with ``-dns``.
|
||||
|
||||
As mentioned previously, ``EXPOSE`` (and ``-expose``) make a port
|
||||
available **in** a container for incoming connections. The port number
|
||||
on the inside of the container (where the service listens) does not
|
||||
need to be the same number as the port exposed on the outside of the
|
||||
container (where clients connect), so inside the container you might
|
||||
have an HTTP service listening on port 80 (and so you ``EXPOSE 80`` in
|
||||
the Dockerfile), but outside the container the port might be 42800.
|
||||
|
||||
To help a new client container reach the server container's internal
|
||||
port operator ``-expose'd`` by the operator or ``EXPOSE'd`` by the
|
||||
developer, the operator has three choices: start the server container
|
||||
with ``-P`` or ``-p,`` or start the client container with ``-link``.
|
||||
|
||||
If the operator uses ``-P`` or ``-p`` then Docker will make the
|
||||
exposed port accessible on the host and the ports will be available to
|
||||
any client that can reach the host. To find the map between the host
|
||||
ports and the exposed ports, use ``docker port``)
|
||||
|
||||
If the operator uses ``-link`` when starting the new client container,
|
||||
then the client container can access the exposed port via a private
|
||||
networking interface. Docker will set some environment variables in
|
||||
the client container to help indicate which interface and port to use.
|
||||
|
||||
ENV (Environment Variables)
|
||||
...........................
|
||||
|
||||
The operator can **set any environment variable** in the container by
|
||||
using one or more ``-e``, even overriding those already defined by the
|
||||
developer with a Dockefile ``ENV``::
|
||||
|
||||
$ docker run -e "deep=purple" -rm ubuntu /bin/bash -c export
|
||||
declare -x HOME="/"
|
||||
declare -x HOSTNAME="85bc26a0e200"
|
||||
declare -x OLDPWD
|
||||
declare -x PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
|
||||
declare -x PWD="/"
|
||||
declare -x SHLVL="1"
|
||||
declare -x container="lxc"
|
||||
declare -x deep="purple"
|
||||
|
||||
Similarly the operator can set the **hostname** with ``-h``.
|
||||
|
||||
``-link name:alias`` also sets environment variables, using the
|
||||
*alias* string to define environment variables within the container
|
||||
that give the IP and PORT information for connecting to the service
|
||||
container. Let's imagine we have a container running Redis::
|
||||
|
||||
# Start the service container, named redis-name
|
||||
$ docker run -d -name redis-name dockerfiles/redis
|
||||
4241164edf6f5aca5b0e9e4c9eccd899b0b8080c64c0cd26efe02166c73208f3
|
||||
|
||||
# The redis-name container exposed port 6379
|
||||
$ docker ps
|
||||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
|
||||
4241164edf6f dockerfiles/redis:latest /redis-stable/src/re 5 seconds ago Up 4 seconds 6379/tcp redis-name
|
||||
|
||||
# Note that there are no public ports exposed since we didn't use -p or -P
|
||||
$ docker port 4241164edf6f 6379
|
||||
2014/01/25 00:55:38 Error: No public port '6379' published for 4241164edf6f
|
||||
|
||||
|
||||
Yet we can get information about the redis container's exposed ports with ``-link``. Choose an alias that will form a valid environment variable!
|
||||
|
||||
::
|
||||
|
||||
$ docker run -rm -link redis-name:redis_alias -entrypoint /bin/bash dockerfiles/redis -c export
|
||||
declare -x HOME="/"
|
||||
declare -x HOSTNAME="acda7f7b1cdc"
|
||||
declare -x OLDPWD
|
||||
declare -x PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
|
||||
declare -x PWD="/"
|
||||
declare -x REDIS_ALIAS_NAME="/distracted_wright/redis"
|
||||
declare -x REDIS_ALIAS_PORT="tcp://172.17.0.32:6379"
|
||||
declare -x REDIS_ALIAS_PORT_6379_TCP="tcp://172.17.0.32:6379"
|
||||
declare -x REDIS_ALIAS_PORT_6379_TCP_ADDR="172.17.0.32"
|
||||
declare -x REDIS_ALIAS_PORT_6379_TCP_PORT="6379"
|
||||
declare -x REDIS_ALIAS_PORT_6379_TCP_PROTO="tcp"
|
||||
declare -x SHLVL="1"
|
||||
declare -x container="lxc"
|
||||
|
||||
And we can use that information to connect from another container as a client::
|
||||
|
||||
$ docker run -i -t -rm -link redis-name:redis_alias -entrypoint /bin/bash dockerfiles/redis -c '/redis-stable/src/redis-cli -h $REDIS_ALIAS_PORT_6379_TCP_ADDR -p $REDIS_ALIAS_PORT_6379_TCP_PORT'
|
||||
172.17.0.32:6379>
|
||||
|
||||
VOLUME (Shared Filesystems)
|
||||
...........................
|
||||
|
||||
::
|
||||
|
||||
-v=[]: Create a bind mount with: [host-dir]:[container-dir]:[rw|ro].
|
||||
If "container-dir" is missing, then docker creates a new volume.
|
||||
-volumes-from="": Mount all volumes from the given container(s)
|
||||
|
||||
The volumes commands are complex enough to have their own
|
||||
documentation in section :ref:`volume_def`. A developer can define one
|
||||
or more VOLUMEs associated with an image, but only the operator can
|
||||
give access from one container to another (or from a container to a
|
||||
volume mounted on the host).
|
||||
|
||||
USER
|
||||
....
|
||||
|
||||
::
|
||||
|
||||
-u="": Username or UID
|
||||
|
||||
WORKDIR
|
||||
.......
|
||||
|
||||
::
|
||||
|
||||
-w="": Working directory inside the container
|
||||
|
||||
Performance
|
||||
===========
|
||||
|
||||
The operator can also adjust the performance parameters of the container::
|
||||
|
||||
-c=0 : CPU shares (relative weight)
|
||||
-m="": Memory limit (format: <number><optional unit>, where unit = b, k, m or g)
|
||||
|
||||
-lxc-conf=[]: Add custom lxc options -lxc-conf="lxc.cgroup.cpuset.cpus = 0,1"
|
||||
-privileged=false: Give extended privileges to this container
|
||||
|
Loading…
Reference in a new issue