Stale sandbox and endpoints are cleaned up during controller init.
Since we reuse the exact same code-path, for sandbox and endpoint
delete, they try to load the plugin and it causes daemon startup
timeouts since the external plugin containers cant be loaded at that
time. Since the cleanup is actually performed for the libnetwork core
states, we can force delete sandbox and endpoint even if the driver is
not loaded.
Signed-off-by: Madhu Venugopal <madhu@docker.com>
If the endpoint and the corresponding network is
not persistent then skip adding it into sandbox
store.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
At times, when checkpointed sandbox from store cannot be
cleaned up properly we still retain the sandbox in both
the store and in memory. But this sandbox store may not
contain important configuration information from docker.
So when docker requests a new sandbox, instead of using
it as is, reconcile the sandbox state from store with the
the configuration information provided by docker. To do this
mark the sandbox from store as stub and never reveal it to
external searches. When docker requests a new sandbox, update
the stub sandbox and clear the stub flag.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
For ungraceful daemon restarts, libnetwork has sandbox cleanup logic to
remove any stale & dangling resources. But, if the store is down during
the daemon restart, then the cleanup logic would not be able to perform
complete cleanup. During such cases, the sandbox has been removed. With
this fix, we retain the sandbox if the store is down and the endpoint
couldnt be cleaned. When the container is later restarted in docker
daemon, we will perform a sandbox cleanup and that will complete the
cleanup round.
Signed-off-by: Madhu Venugopal <madhu@docker.com>
When the daemon has a lot of containers and even when
the daemon tries to give 15 second to stop all containers
it is not enough. So the daemon forces a shut down at the end
of 15 seconds. And hence in a situation with a lot of
containers even gracefully bringing down the daemon will result
in a lot of containers fully not brought down.
In addition to this the daemon force killing itself can happen
in any arbitrary point in time which will result in inconsistent
checkpointed state for the sandbox. This makes the cleanup really
fail when we come back up and in many cases because of this
inability to cleanup properly on restart will result in daemon not
able to restart because we are not able to delete the default network.
This commit ensures that the sandbox state stored in the disk is
never inconsistent so that when we come back up we will always be
able to cleanup the sandbox state.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
Currently when docker exits ungracefully it may leave
dangling sandboxes which may hold onto precious network
resources. Added checkpoint state for sandboxes which
on boot up will be used to clean up the sandboxes and
network resources.
On bootup the remaining dangling state in the checkpoint
are read and cleaned up before accepting any new
network allocation requests.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>