diff --git a/docs/sources/articles/index.rst b/docs/sources/articles/index.rst index 2cfc427420..75c0cd3fa9 100644 --- a/docs/sources/articles/index.rst +++ b/docs/sources/articles/index.rst @@ -12,3 +12,4 @@ Articles security baseimages + runmetrics diff --git a/docs/sources/articles/runmetrics.rst b/docs/sources/articles/runmetrics.rst new file mode 100644 index 0000000000..f7406bc5ed --- /dev/null +++ b/docs/sources/articles/runmetrics.rst @@ -0,0 +1,469 @@ +:title: Runtime Metrics +:description: Measure the behavior of running containers +:keywords: docker, metrics, CPU, memory, disk, IO, run, runtime + +.. _run_metrics: + + +Runtime Metrics +=============== + +Linux Containers rely on `control groups +`_ which +not only track groups of processes, but also expose metrics about CPU, +memory, and block I/O usage. You can access those metrics and obtain +network usage metrics as well. This is relevant for "pure" LXC +containers, as well as for Docker containers. + +Control Groups +-------------- + +Control groups are exposed through a pseudo-filesystem. In recent +distros, you should find this filesystem under +``/sys/fs/cgroup``. Under that directory, you will see multiple +sub-directories, called devices, freezer, blkio, etc.; each +sub-directory actually corresponds to a different cgroup hierarchy. + +On older systems, the control groups might be mounted on ``/cgroup``, +without distinct hierarchies. In that case, instead of seeing the +sub-directories, you will see a bunch of files in that directory, and +possibly some directories corresponding to existing containers. + +To figure out where your control groups are mounted, you can run: + +:: + + grep cgroup /proc/mounts + +.. _run_findpid: + +Ennumerating Cgroups +-------------------- + +You can look into ``/proc/cgroups`` to see the different control group +subsystems known to the system, the hierarchy they belong to, and how +many groups they contain. + +You can also look at ``/proc//cgroup`` to see which control +groups a process belongs to. The control group will be shown as a path +relative to the root of the hierarchy mountpoint; e.g. ``/`` means +“this process has not been assigned into a particular group”, while +``/lxc/pumpkin`` means that the process is likely to be a member of a +container named ``pumpkin``. + +Finding the Cgroup for a Given Container +---------------------------------------- + +For each container, one cgroup will be created in each hierarchy. On +older systems with older versions of the LXC userland tools, the name +of the cgroup will be the name of the container. With more recent +versions of the LXC tools, the cgroup will be ``lxc/.`` + +For Docker containers using cgroups, the container name will be the +full ID or long ID of the container. If a container shows up as +ae836c95b4c3 in ``docker ps``, its long ID might be something like +``ae836c95b4c3c9e9179e0e91015512da89fdec91612f63cebae57df9a5444c79``. You +can look it up with ``docker inspect`` or ``docker ps -notrunc``. + +Putting everything together to look at the memory metrics for a Docker +container, take a look at ``/sys/fs/cgroup/memory/lxc//``. + +Metrics from Cgroups: Memory, CPU, Block IO +------------------------------------------- + +For each subsystem (memory, cpu, and block i/o), you will find one or +more pseudo-files containing statistics. + +Memory Metrics: ``memory.stat`` +............................... + +Memory metrics are found in the "memory" cgroup. Note that the memory +control group adds a little overhead, because it does very +fine-grained accounting of the memory usage on your system. Therefore, +many distros chose to not enable it by default. Generally, to enable +it, all you have to do is to add some kernel command-line parameters: +``cgroup_enable=memory swapaccount=1``. + +The metrics are in the pseudo-file ``memory.stat``. Here is what it +will look like: + +:: + + cache 11492564992 + rss 1930993664 + mapped_file 306728960 + pgpgin 406632648 + pgpgout 403355412 + swap 0 + pgfault 728281223 + pgmajfault 1724 + inactive_anon 46608384 + active_anon 1884520448 + inactive_file 7003344896 + active_file 4489052160 + unevictable 32768 + hierarchical_memory_limit 9223372036854775807 + hierarchical_memsw_limit 9223372036854775807 + total_cache 11492564992 + total_rss 1930993664 + total_mapped_file 306728960 + total_pgpgin 406632648 + total_pgpgout 403355412 + total_swap 0 + total_pgfault 728281223 + total_pgmajfault 1724 + total_inactive_anon 46608384 + total_active_anon 1884520448 + total_inactive_file 7003344896 + total_active_file 4489052160 + total_unevictable 32768 + +The first half (without the ``total_`` prefix) contains statistics +relevant to the processes within the cgroup, excluding +sub-cgroups. The second half (with the ``total_`` prefix) includes +sub-cgroups as well. + +Some metrics are "gauges", i.e. values that can increase or decrease +(e.g. swap, the amount of swap space used by the members of the +cgroup). Some others are "counters", i.e. values that can only go up, +because they represent occurrences of a specific event (e.g. pgfault, +which indicates the number of page faults which happened since the +creation of the cgroup; this number can never decrease). + +cache + the amount of memory used by the processes of this control group + that can be associated precisely with a block on a block + device. When you read and write files from and to disk, this amount + will increase. This will be the case if you use "conventional" I/O + (``open``, ``read``, ``write`` syscalls) as well as mapped files + (with ``mmap``). It also accounts for the memory used by ``tmpfs`` + mounts, though the reasons are unclear. + +rss + the amount of memory that *doesn't* correspond to anything on + disk: stacks, heaps, and anonymous memory maps. + +mapped_file + indicates the amount of memory mapped by the processes in the + control group. It doesn't give you information about *how much* + memory is used; it rather tells you *how* it is used. + +pgpgin and pgpgout + correspond to *charging events*. Each time a page is "charged" + (=added to the accounting) to a cgroup, pgpgin increases. When a + page is "uncharged" (=no longer "billed" to a cgroup), pgpgout + increases. + +pgfault and pgmajfault + indicate the number of times that a process of the cgroup triggered + a "page fault" and a "major fault", respectively. A page fault + happens when a process accesses a part of its virtual memory space + which is inexistent or protected. The former can happen if the + process is buggy and tries to access an invalid address (it will + then be sent a ``SIGSEGV`` signal, typically killing it with the + famous ``Segmentation fault`` message). The latter can happen when + the process reads from a memory zone which has been swapped out, or + which corresponds to a mapped file: in that case, the kernel will + load the page from disk, and let the CPU complete the memory + access. It can also happen when the process writes to a + copy-on-write memory zone: likewise, the kernel will preempt the + process, duplicate the memory page, and resume the write operation + on the process' own copy of the page. "Major" faults happen when the + kernel actually has to read the data from disk. When it just has to + duplicate an existing page, or allocate an empty page, it's a + regular (or "minor") fault. + +swap + the amount of swap currently used by the processes in this cgroup. + +active_anon and inactive_anon + the amount of *anonymous* memory that has been identified has + respectively *active* and *inactive* by the kernel. "Anonymous" + memory is the memory that is *not* linked to disk pages. In other + words, that's the equivalent of the rss counter described above. In + fact, the very definition of the rss counter is **active_anon** + + **inactive_anon** - **tmpfs** (where tmpfs is the amount of memory + used up by ``tmpfs`` filesystems mounted by this control + group). Now, what's the difference between "active" and "inactive"? + Pages are initially "active"; and at regular intervals, the kernel + sweeps over the memory, and tags some pages as "inactive". Whenever + they are accessed again, they are immediately retagged + "active". When the kernel is almost out of memory, and time comes to + swap out to disk, the kernel will swap "inactive" pages. + +active_file and inactive_file + cache memory, with *active* and *inactive* similar to the *anon* + memory above. The exact formula is cache = **active_file** + + **inactive_file** + **tmpfs**. The exact rules used by the kernel to + move memory pages between active and inactive sets are different + from the ones used for anonymous memory, but the general principle + is the same. Note that when the kernel needs to reclaim memory, it + is cheaper to reclaim a clean (=non modified) page from this pool, + since it can be reclaimed immediately (while anonymous pages and + dirty/modified pages have to be written to disk first). + +unevictable + the amount of memory that cannot be reclaimed; generally, it will + account for memory that has been "locked" with ``mlock``. It is + often used by crypto frameworks to make sure that secret keys and + other sensitive material never gets swapped out to disk. + +memory and memsw limits + These are not really metrics, but a reminder of the limits applied + to this cgroup. The first one indicates the maximum amount of + physical memory that can be used by the processes of this control + group; the second one indicates the maximum amount of RAM+swap. + +Accounting for memory in the page cache is very complex. If two +processes in different control groups both read the same file +(ultimately relying on the same blocks on disk), the corresponding +memory charge will be split between the control groups. It's nice, but +it also means that when a cgroup is terminated, it could increase the +memory usage of another cgroup, because they are not splitting the +cost anymore for those memory pages. + +CPU metrics: ``cpuacct.stat`` +............................. + +Now that we've covered memory metrics, everything else will look very +simple in comparison. CPU metrics will be found in the ``cpuacct`` +controller. + +For each container, you will find a pseudo-file ``cpuacct.stat``, +containing the CPU usage accumulated by the processes of the +container, broken down between ``user`` and ``system`` time. If you're +not familiar with the distinction, ``user`` is the time during which +the processes were in direct control of the CPU (i.e. executing +process code), and ``system`` is the time during which the CPU was +executing system calls on behalf of those processes. + +Those times are expressed in ticks of 1/100th of second. Actually, +they are expressed in "user jiffies". There are ``USER_HZ`` +*"jiffies"* per second, and on x86 systems, ``USER_HZ`` is 100. This +used to map exactly to the number of scheduler "ticks" per second; but +with the advent of higher frequency scheduling, as well as `tickless +kernels `_, the number of kernel +ticks wasn't relevant anymore. It stuck around anyway, mainly for +legacy and compatibility reasons. + +Block I/O metrics +................. + +Block I/O is accounted in the ``blkio`` controller. Different metrics +are scattered across different files. While you can find in-depth +details in the `blkio-controller +`_ +file in the kernel documentation, here is a short list of the most +relevant ones: + +blkio.sectors + contain the number of 512-bytes sectors read and written by the + processes member of the cgroup, device by device. Reads and writes + are merged in a single counter. + +blkio.io_service_bytes + indicates the number of bytes read and written by the cgroup. It has + 4 counters per device, because for each device, it differentiates + between synchronous vs. asynchronous I/O, and reads vs. writes. + +blkio.io_serviced + the number of I/O operations performed, regardless of their size. It + also has 4 counters per device. + +blkio.io_queued + indicates the number of I/O operations currently queued for this + cgroup. In other words, if the cgroup isn't doing any I/O, this will + be zero. Note that the opposite is not true. In other words, if + there is no I/O queued, it does not mean that the cgroup is idle + (I/O-wise). It could be doing purely synchronous reads on an + otherwise quiescent device, which is therefore able to handle them + immediately, without queuing. Also, while it is helpful to figure + out which cgroup is putting stress on the I/O subsystem, keep in + mind that is is a relative quantity. Even if a process group does + not perform more I/O, its queue size can increase just because the + device load increases because of other devices. + +Network Metrics +--------------- + +Network metrics are not exposed directly by control groups. There is a +good explanation for that: network interfaces exist within the context +of *network namespaces*. The kernel could probably accumulate metrics +about packets and bytes sent and received by a group of processes, but +those metrics wouldn't be very useful. You want per-interface metrics +(because traffic happening on the local ``lo`` interface doesn't +really count). But since processes in a single cgroup can belong to +multiple network namespaces, those metrics would be harder to +interpret: multiple network namespaces means multiple ``lo`` +interfaces, potentially multiple ``eth0`` interfaces, etc.; so this is +why there is no easy way to gather network metrics with control +groups. + +Instead we can gather network metrics from other sources: + +IPtables +........ + +IPtables (or rather, the netfilter framework for which iptables is +just an interface) can do some serious accounting. + +For instance, you can setup a rule to account for the outbound HTTP +traffic on a web server: + +:: + + iptables -I OUTPUT -p tcp --sport 80 + + +There is no ``-j`` or ``-g`` flag, so the rule will just count matched +packets and go to the following rule. + +Later, you can check the values of the counters, with: + +:: + + iptables -nxvL OUTPUT + +Technically, ``-n`` is not required, but it will prevent iptables from +doing DNS reverse lookups, which are probably useless in this +scenario. + +Counters include packets and bytes. If you want to setup metrics for +container traffic like this, you could execute a ``for`` loop to add +two ``iptables`` rules per container IP address (one in each +direction), in the ``FORWARD`` chain. This will only meter traffic +going through the NAT layer; you will also have to add traffic going +through the userland proxy. + +Then, you will need to check those counters on a regular basis. If you +happen to use ``collectd``, there is a nice plugin to automate +iptables counters collection. + +Interface-level counters +........................ + +Since each container has a virtual Ethernet interface, you might want +to check directly the TX and RX counters of this interface. You will +notice that each container is associated to a virtual Ethernet +interface in your host, with a name like ``vethKk8Zqi``. Figuring out +which interface corresponds to which container is, unfortunately, +difficult. + +But for now, the best way is to check the metrics *from within the +containers*. To accomplish this, you can run an executable from the +host environment within the network namespace of a container using +**ip-netns magic**. + +The ``ip-netns exec`` command will let you execute any program +(present in the host system) within any network namespace visible to +the current process. This means that your host will be able to enter +the network namespace of your containers, but your containers won't be +able to access the host, nor their sibling containers. Containers will +be able to “see” and affect their sub-containers, though. + +The exact format of the command is:: + + ip netns exec + +For example:: + + ip netns exec mycontainer netstat -i + +``ip netns`` finds the "mycontainer" container by using namespaces +pseudo-files. Each process belongs to one network namespace, one PID +namespace, one ``mnt`` namespace, etc., and those namespaces are +materialized under ``/proc//ns/``. For example, the network +namespace of PID 42 is materialized by the pseudo-file +``/proc/42/ns/net``. + +When you run ``ip netns exec mycontainer ...``, it expects +``/var/run/netns/mycontainer`` to be one of those +pseudo-files. (Symlinks are accepted.) + +In other words, to execute a command within the network namespace of a +container, we need to: + +* find out the PID of any process within the container that we want to + investigate; +* create a symlink from ``/var/run/netns/`` to + ``/proc//ns/net`` +* execute ``ip netns exec ....`` + +Please review :ref:`run_findpid` to learn how to find the cgroup of a +pprocess running in the container of which you want to measure network +usage. From there, you can examine the pseudo-file named ``tasks``, +which containes the PIDs that are in the control group (i.e. in the +container). Pick any one of them. + +Putting everything together, if the "short ID" of a container is held +in the environment variable ``$CID``, then you can do this:: + + TASKS=/sys/fs/cgroup/devices/$CID*/tasks + PID=$(head -n 1 $TASKS) + mkdir -p /var/run/netns + ln -sf /proc/$PID/ns/net /var/run/netns/$CID + ip netns exec $CID netstat -i + + +Tips for high-performance metric collection +------------------------------------------- + +Note that running a new process each time you want to update metrics +is (relatively) expensive. If you want to collect metrics at high +resolutions, and/or over a large number of containers (think 1000 +containers on a single host), you do not want to fork a new process +each time. + +Here is how to collect metrics from a single process. You will have to +write your metric collector in C (or any language that lets you do +low-level system calls). You need to use a special system call, +``setns()``, which lets the current process enter any arbitrary +namespace. It requires, however, an open file descriptor to the +namespace pseudo-file (remember: that’s the pseudo-file in +``/proc//ns/net``). + +However, there is a catch: you must not keep this file descriptor +open. If you do, when the last process of the control group exits, the +namespace will not be destroyed, and its network resources (like the +virtual interface of the container) will stay around for ever (or +until you close that file descriptor). + +The right approach would be to keep track of the first PID of each +container, and re-open the namespace pseudo-file each time. + +Collecting metrics when a container exits +----------------------------------------- + +Sometimes, you do not care about real time metric collection, but when +a container exits, you want to know how much CPU, memory, etc. it has +used. + +Docker makes this difficult because it relies on ``lxc-start``, which +carefully cleans up after itself, but it is still possible. It is +usually easier to collect metrics at regular intervals (e.g. every +minute, with the collectd LXC plugin) and rely on that instead. + +But, if you'd still like to gather the stats when a container stops, +here is how: + +For each container, start a collection process, and move it to the +control groups that you want to monitor by writing its PID to the +tasks file of the cgroup. The collection process should periodically +re-read the tasks file to check if it's the last process of the +control group. (If you also want to collect network statistics as +explained in the previous section, you should also move the process to +the appropriate network namespace.) + +When the container exits, ``lxc-start`` will try to delete the control +groups. It will fail, since the control group is still in use; but +that’s fine. You process should now detect that it is the only one +remaining in the group. Now is the right time to collect all the +metrics you need! + +Finally, your process should move itself back to the root control +group, and remove the container control group. To remove a control +group, just ``rmdir`` its directory. It's counter-intuitive to +``rmdir`` a directory as it still contains files; but remember that +this is a pseudo-filesystem, so usual rules don't apply. After the +cleanup is done, the collection process can exit safely. + diff --git a/docs/sources/reference/builder.rst b/docs/sources/reference/builder.rst index 45cb2ab86e..9889660913 100644 --- a/docs/sources/reference/builder.rst +++ b/docs/sources/reference/builder.rst @@ -1,12 +1,12 @@ -:title: Build Images (Dockerfile Reference) +:title: Dockerfile Reference :description: Dockerfiles use a simple DSL which allows you to automate the steps you would normally manually take to create an image. :keywords: builder, docker, Dockerfile, automation, image creation .. _dockerbuilder: -=================================== -Build Images (Dockerfile Reference) -=================================== +==================== +Dockerfile Reference +==================== **Docker can act as a builder** and read instructions from a text ``Dockerfile`` to automate the steps you would otherwise take manually diff --git a/docs/sources/reference/commandline/cli.rst b/docs/sources/reference/commandline/cli.rst index e71b691bcc..a636df6259 100644 --- a/docs/sources/reference/commandline/cli.rst +++ b/docs/sources/reference/commandline/cli.rst @@ -18,6 +18,45 @@ To list available commands, either run ``docker`` with no parameters or execute ... +.. _cli_options: + +Types of Options +---------------- + +Boolean +~~~~~~~ + +Boolean options look like ``-d=false``. The value you see is the +default value which gets set if you do **not** use the boolean +flag. If you do call ``run -d``, that sets the opposite boolean value, +so in this case, ``true``, and so ``docker run -d`` **will** run in +"detached" mode, in the background. Other boolean options are similar +-- specifying them will set the value to the opposite of the default +value. + +Multi +~~~~~ + +Options like ``-a=[]`` indicate they can be specified multiple times:: + + docker run -a stdin -a stdout -a stderr -i -t ubuntu /bin/bash + +Sometimes this can use a more complex value string, as for ``-v``:: + + docker run -v /host:/container example/mysql + +Strings and Integers +~~~~~~~~~~~~~~~~~~~~ + +Options like ``-name=""`` expect a string, and they can only be +specified once. Options like ``-c=0`` expect an integer, and they can +only be specified once. + +---- + +Commands +-------- + .. _cli_daemon: ``daemon`` diff --git a/docs/sources/reference/index.rst b/docs/sources/reference/index.rst index 49099d5621..d35a19b93d 100644 --- a/docs/sources/reference/index.rst +++ b/docs/sources/reference/index.rst @@ -14,4 +14,5 @@ Contents: commandline/index builder + run api/index diff --git a/docs/sources/reference/run.rst b/docs/sources/reference/run.rst new file mode 100644 index 0000000000..7505b7c02f --- /dev/null +++ b/docs/sources/reference/run.rst @@ -0,0 +1,353 @@ +:title: Docker Run Reference +:description: Configure containers at runtime +:keywords: docker, run, configure, runtime + +.. _run_docker: + +==================== +Docker Run Reference +==================== + +**Docker runs processes in isolated containers**. When an operator +executes ``docker run``, she starts a process with its own file +system, its own networking, and its own isolated process tree. The +:ref:`image_def` which starts the process may define defaults related +to the binary to run, the networking to expose, and more, but ``docker +run`` gives final control to the operator who starts the container +from the image. That's the main reason :ref:`cli_run` has more options +than any other ``docker`` command. + +Every one of the :ref:`example_list` shows running containers, and so +here we try to give more in-depth guidance. + +.. contents:: Table of Contents + +.. _run_running: + +General Form +============ + +As you've seen in the :ref:`example_list`, the basic `run` command +takes this form:: + + docker run [OPTIONS] IMAGE[:TAG] [COMMAND] [ARG...] + +To learn how to interpret the types of ``[OPTIONS]``, see +:ref:`cli_options`. + +The list of ``[OPTIONS]`` breaks down into two groups: + +* options that define the runtime behavior or environment, and +* options that override image defaults. + +Since image defaults usually get set in :ref:`Dockerfiles +` (though they could also be set at :ref:`cli_commit` +time too), we will group the runtime options here by their related +Dockerfile commands so that it is easier to see how to override image +defaults and set new behavior. + +We'll start, though, with the options that are unique to ``docker +run``, the options which define the runtime behavior or the container +environment. + +.. note:: The runtime operator always has final control over the + behavior of a Docker container. + +Detached or Foreground +====================== + +When starting a Docker container, you must first decide if you want to +run the container in the background in a "detached" mode or in the +default foreground mode:: + + -d=false: Detached mode: Run container in the background, print new container id + +Detached (-d) +............. + +In detached mode (``-d=true`` or just ``-d``), all IO should be done +through network connections or shared volumes because the container is +no longer listening to the commandline where you executed ``docker +run``. You can reattach to a detached container with ``docker`` +:ref:`cli_attach`. If you choose to run a container in the detached +mode, then you cannot use the ``-rm`` option. + +Foreground +.......... + +In foreground mode (the default when ``-d`` is not specified), +``docker run`` can start the process in the container and attach the +console to the process's standard input, output, and standard +error. It can even pretend to be a TTY (this is what most commandline +executables expect) and pass along signals. All of that is +configurable:: + + -a=[] : Attach to stdin, stdout and/or stderr + -t=false : Allocate a pseudo-tty + -sig-proxy=true: Proxify all received signal to the process (even in non-tty mode) + -i=false : Keep stdin open even if not attached + +If you do not specify ``-a`` then Docker will `attach everything +(stdin,stdout,stderr) +`_. You +can specify which of the three standard streams (stdin, stdout, +stderr) you'd like to connect between your instead, as in:: + + docker run -a stdin -a stdout -i -t ubuntu /bin/bash + +For interactive processes (like a shell) you will typically want a tty +as well as persistent standard in, so you'll use ``-i -t`` together in +most interactive cases. + +Clean Up (-rm) +-------------- + +By default a container's file system persists even after the container +exits. This makes debugging a lot easier (since you can inspect the +final state) and you retain all your data by default. But if you are +running short-term **foreground** processes, these container file +systems can really pile up. If instead you'd like Docker to +**automatically clean up the container and remove the file system when +the container exits**, you can add the ``-rm`` flag:: + + -rm=false: Automatically remove the container when it exits (incompatible with -d) + +Name (-name) +============ + +The operator can identify a container in three ways: + +* UUID long identifier ("f78375b1c487e03c9438c729345e54db9d20cfa2ac1fc3494b6eb60872e74778") +* UUID short identifier ("f78375b1c487") +* name ("evil_ptolemy") + +The UUID identifiers come from the Docker daemon, and if you do not +assign a name to the container with ``-name`` then the daemon will +also generate a random string name too. The name can become a handy +way to add meaning to a container since you can use this name when +defining :ref:`links ` (or any other place +you need to identify a container). This works for both background and +foreground Docker containers. + +PID Equivalent +============== + +And finally, to help with automation, you can have Docker write the +container id out to a file of your choosing. This is similar to how +some programs might write out their process ID to a file (you've seen +them as .pid files):: + + -cidfile="": Write the container ID to the file + +Overriding Dockerfile Image Defaults +==================================== + +When a developer builds an image from a :ref:`Dockerfile +` or when she commits it, the developer can set a +number of default parameters that take effect when the image starts up +as a container. + +Four of the Dockerfile commands cannot be overridden at runtime: +``FROM, MAINTAINER, RUN``, and ``ADD``. Everything else has a +corresponding override in ``docker run``. We'll go through what the +developer might have set in each Dockerfile instruction and how the +operator can override that setting. + + +CMD +... + +Remember the optional ``COMMAND`` in the Docker commandline:: + + docker run [OPTIONS] IMAGE[:TAG] [COMMAND] [ARG...] + +This command is optional because the person who created the ``IMAGE`` +may have already provided a default ``COMMAND`` using the Dockerfile +``CMD``. As the operator (the person running a container from the +image), you can override that ``CMD`` just by specifying a new +``COMMAND``. + +If the image also specifies an ``ENTRYPOINT`` then the ``CMD`` or +``COMMAND`` get appended as arguments to the ``ENTRYPOINT``. + + +ENTRYPOINT +.......... + +:: + + -entrypoint="": Overwrite the default entrypoint set by the image + +The ENTRYPOINT of an image is similar to a COMMAND because it +specifies what executable to run when the container starts, but it is +(purposely) more difficult to override. The ENTRYPOINT gives a +container its default nature or behavior, so that when you set an +ENTRYPOINT you can run the container *as if it were that binary*, +complete with default options, and you can pass in more options via +the COMMAND. But, sometimes an operator may want to run something else +inside the container, so you can override the default ENTRYPOINT at +runtime by using a string to specify the new ENTRYPOINT. Here is an +example of how to run a shell in a container that has been set up to +automatically run something else (like ``/usr/bin/redis-server``):: + + docker run -i -t -entrypoint /bin/bash example/redis + +or two examples of how to pass more parameters to that ENTRYPOINT:: + + docker run -i -t -entrypoint /bin/bash example/redis -c ls -l + docker run -i -t -entrypoint /usr/bin/redis-cli example/redis --help + + +EXPOSE (``run`` Networking Options) +................................... + +The *Dockerfile* doesn't give much control over networking, only +providing the EXPOSE instruction to give a hint to the operator about +what incoming ports might provide services. At runtime, however, +Docker provides a number of ``run`` options related to networking:: + + -n=true : Enable networking for this container + -dns=[] : Set custom dns servers for the container + -expose=[]: Expose a port from the container + without publishing it to your host + -P=false : Publish all exposed ports to the host interfaces + -p=[] : Publish a container's port to the host (format: + ip:hostPort:containerPort | ip::containerPort | + hostPort:containerPort) + (use 'docker port' to see the actual mapping) + -link="" : Add link to another container (name:alias) + +By default, all containers have networking enabled and they can make +any outgoing connections. The operator can completely disable +networking with ``run -n`` which disables all incoming and outgoing +networking. In cases like this, you would perform IO through files or +stdin/stdout only. + +Your container will use the same DNS servers as the host by default, +but you can override this with ``-dns``. + +As mentioned previously, ``EXPOSE`` (and ``-expose``) make a port +available **in** a container for incoming connections. The port number +on the inside of the container (where the service listens) does not +need to be the same number as the port exposed on the outside of the +container (where clients connect), so inside the container you might +have an HTTP service listening on port 80 (and so you ``EXPOSE 80`` in +the Dockerfile), but outside the container the port might be 42800. + +To help a new client container reach the server container's internal +port operator ``-expose'd`` by the operator or ``EXPOSE'd`` by the +developer, the operator has three choices: start the server container +with ``-P`` or ``-p,`` or start the client container with ``-link``. + +If the operator uses ``-P`` or ``-p`` then Docker will make the +exposed port accessible on the host and the ports will be available to +any client that can reach the host. To find the map between the host +ports and the exposed ports, use ``docker port``) + +If the operator uses ``-link`` when starting the new client container, +then the client container can access the exposed port via a private +networking interface. Docker will set some environment variables in +the client container to help indicate which interface and port to use. + +ENV (Environment Variables) +........................... + +The operator can **set any environment variable** in the container by +using one or more ``-e``, even overriding those already defined by the +developer with a Dockefile ``ENV``:: + + $ docker run -e "deep=purple" -rm ubuntu /bin/bash -c export + declare -x HOME="/" + declare -x HOSTNAME="85bc26a0e200" + declare -x OLDPWD + declare -x PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" + declare -x PWD="/" + declare -x SHLVL="1" + declare -x container="lxc" + declare -x deep="purple" + +Similarly the operator can set the **hostname** with ``-h``. + +``-link name:alias`` also sets environment variables, using the +*alias* string to define environment variables within the container +that give the IP and PORT information for connecting to the service +container. Let's imagine we have a container running Redis:: + + # Start the service container, named redis-name + $ docker run -d -name redis-name dockerfiles/redis + 4241164edf6f5aca5b0e9e4c9eccd899b0b8080c64c0cd26efe02166c73208f3 + + # The redis-name container exposed port 6379 + $ docker ps + CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES + 4241164edf6f dockerfiles/redis:latest /redis-stable/src/re 5 seconds ago Up 4 seconds 6379/tcp redis-name + + # Note that there are no public ports exposed since we didn't use -p or -P + $ docker port 4241164edf6f 6379 + 2014/01/25 00:55:38 Error: No public port '6379' published for 4241164edf6f + + +Yet we can get information about the redis container's exposed ports with ``-link``. Choose an alias that will form a valid environment variable! + +:: + + $ docker run -rm -link redis-name:redis_alias -entrypoint /bin/bash dockerfiles/redis -c export + declare -x HOME="/" + declare -x HOSTNAME="acda7f7b1cdc" + declare -x OLDPWD + declare -x PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" + declare -x PWD="/" + declare -x REDIS_ALIAS_NAME="/distracted_wright/redis" + declare -x REDIS_ALIAS_PORT="tcp://172.17.0.32:6379" + declare -x REDIS_ALIAS_PORT_6379_TCP="tcp://172.17.0.32:6379" + declare -x REDIS_ALIAS_PORT_6379_TCP_ADDR="172.17.0.32" + declare -x REDIS_ALIAS_PORT_6379_TCP_PORT="6379" + declare -x REDIS_ALIAS_PORT_6379_TCP_PROTO="tcp" + declare -x SHLVL="1" + declare -x container="lxc" + +And we can use that information to connect from another container as a client:: + + $ docker run -i -t -rm -link redis-name:redis_alias -entrypoint /bin/bash dockerfiles/redis -c '/redis-stable/src/redis-cli -h $REDIS_ALIAS_PORT_6379_TCP_ADDR -p $REDIS_ALIAS_PORT_6379_TCP_PORT' + 172.17.0.32:6379> + +VOLUME (Shared Filesystems) +........................... + +:: + + -v=[]: Create a bind mount with: [host-dir]:[container-dir]:[rw|ro]. + If "container-dir" is missing, then docker creates a new volume. + -volumes-from="": Mount all volumes from the given container(s) + +The volumes commands are complex enough to have their own +documentation in section :ref:`volume_def`. A developer can define one +or more VOLUMEs associated with an image, but only the operator can +give access from one container to another (or from a container to a +volume mounted on the host). + +USER +.... + +:: + + -u="": Username or UID + +WORKDIR +....... + +:: + + -w="": Working directory inside the container + +Performance +=========== + +The operator can also adjust the performance parameters of the container:: + + -c=0 : CPU shares (relative weight) + -m="": Memory limit (format: , where unit = b, k, m or g) + + -lxc-conf=[]: Add custom lxc options -lxc-conf="lxc.cgroup.cpuset.cpus = 0,1" + -privileged=false: Give extended privileges to this container +