Internal directory is designed to contain libraries
that are exclusively used by this project
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Instead of using "sync.Once" to determine whether to initialize a
network sandbox or subnet sandbox, we use a traditional mutex +
initialization boolean. This is because the initialization state isn't
truly a once-and-done condition. Rather, libnetwork destroys network
and subnet sandboxes when the last endpoint leaves them. The use of
sync.Once in this kind of scenario requires, therefore, re-initializing
the Once which is impoissible. So the approach that libnetwork
currently takes is to use a pointer to a Once and redirect that pointer
to a new Once on reset. This leads to nasty race conditions.
In addition to refactoring the locking, this patch merges the functions
joinSandbox(), and joinSubnetSandbox(). This makes the code both cleaner
and it also holds the network and subnet locks through the series of
read-modify-writes avoiding further potential races. This does reduce
the potential parallelism which could be applied should there be many
joins coming in on many different subnets in the same overlay network.
However, this should be an extremely minor performance hit for a very
obscure case.
One important pattern in this commit is that it is crucial to avoid
sending peerDB messages while holding a driver or network lock. The
changes herein defer such (asynchronous) notifications until after
release of such locks. This prevents deadlocks where the peerDB
blocks acquiring said locks while the network method blocks trying
to send to the peerDB's channel.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
'make check' will now fail if the files produced by re-running protoc
differ from those which are checked into the repository.
Signed-off-by: Euan Harris <euan.harris@docker.com>
Outside the build container, run: make protobuf
Inside the build container, run: make protobuf-local
Signed-off-by: Euan Harris <euan.harris@docker.com>
This makes it possible to Ctrl-C tests and builds again. Zombie
processes will also be reaped correctly.
Signed-off-by: Euan Harris <euan.harris@docker.com>
The previous code used string slices to limit the length of certain
fields like endpoint or sandbox IDs. This assumes that these strings
are at least as long as the slice length. Unfortunately, some sandbox
IDs can be smaller than 7 characters. This fix addresses this issue
by systematically converting format string calls that were taking
fixed-slice arguments to use a precision specifier in the string format
itself. From the golang fmt package documentation:
For strings, byte slices and byte arrays, however, precision limits
the length of the input to be formatted (not the size of the output),
truncating if necessary. Normally it is measured in runes, but for
these types when formatted with the %x or %X format it is measured
in bytes.
This nicely fits the desired behavior: it will limit the number of
runes considered for string interpolation to the precision value.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
TestOverlappingRequests checks that pool requests which are supersets or
subsets of existing allocations, and those which overlap with existing
allocations at the beginning or the end.
Multiple allocation is now tested by TestOverlappingRequests, so
TestDoublePoolRelease only needs to test double releasing.
Signed-off-by: Euan Harris <euan.harris@docker.com>
Added some optimizations to reduce the messages in the queue:
1) on join network the node execute a tcp sync with all the nodes that
it is aware part of the specific network. During this time before the
node was redistributing all the entries. This meant that if the network
had 10K entries the queue of the joining node will jump to 10K. The fix
adds a flag on the network that would avoid to insert any entry in the
queue till the sync happens. Note that right now the flag is set in
a best effort way, there is no real check if at least one of the nodes
succeed.
2) limit the number of messages to redistribute coming from a TCP sync.
Introduced a threshold that limit the number of messages that are
propagated, this will disable this optimization in case of heavy load.
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
instead of printing the whole option, print the _number_ only,
because that's what the error-message is pointing at;
Before this change:
invalid number for ndots option ndots:foobar
After this change:
invalid number for ndots option: foobar
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Refactor the ostweaks file to allows a more easy reuse
Add a method on the osl.Sandbox interface to allow setting
knobs on the sandbox
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
`ndots:0` is a valid DNS option; previously, `ndots:0` was
ignored, leading to the default (`ndots:0`) also being applied;
Before this change:
docker network create foo
docker run --rm --network foo --dns-opt ndots:0 alpine cat /etc/resolv.conf
nameserver 127.0.0.11
options ndots:0 ndots:0
After this change:
docker network create foo
docker run --rm --network foo --dns-opt ndots:0 alpine cat /etc/resolv.conf
nameserver 127.0.0.11
options ndots:0
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The "message" argument in assert.Equal expects a format
string; the current string was not that, resulting in an
incorrect message being printed;
--- FAIL: TestDNSOptions (1.28s)
Location: service_common_test.go:92
Error: Not equal: "ndots:5" (expected)
!= "ndots:0" (actual)
Messages: The option must be ndots:5 instead:%!(EXTRA string=ndots:0)
This patch removes the message altogether, because assert.Equal
already prints enough information to catch the error;
--- FAIL: TestDNSOptions (1.28s)
Location: service_common_test.go:92
Error: Not equal: "ndots:5" (expected)
!= "ndots:0" (actual)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Add debug and error logs to notify when a load balancing sandbox
is not found. This can occur in normal operation during removal.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
Lock the network ID in the controller during an addServiceBinding to
prevent racing with network.delete(). This would cause the binding to
be silently ignored in the system.
Signed-off-by: Chris Telfer <ctelfer@docker.com>
This is the heart of the scalability change for services in libnetwork.
The present routing mesh adds load-balancing rules for a network to
every container connected to the network. This newer approach creates a
load-balancing endpoint per network per node. For every service on a
network, libnetwork assigns the VIP of the service to the endpoint's
interface as an alias. This endpoint must have a unique IP address in
order to route return traffic to it. Traffic destined for a service's
VIP arrives at the load-balancing endpoint on the VIP and from there,
Linux load balances it among backend destinations while SNATing said
traffic to the endpoint's unique IP address.
The net result of this scheme is that each node in a swarm need only
have one set of load balancing state per service instead of one per
container on the node. This scheme is very similar to how services
currently operate on Windows nodes in libnetwork. It (as with Windows
nodes) costs the use of extra IP addresses in a network (one per node)
and an extra network hop in the stack, although, always in the stack
local to the container.
In order to prevent existing deployments from suddenly failing if they
failed to allocate sufficient address space to include per-node
load-balancing endpoint IP addresses, this patch preserves the existing
functionality and activates the new functionality on a per-network
basis depending on whether the network has a load-balancing endpoint.
Eventually, moby should always set this option when creating new
networks and should only omit it for networks created as part of a swarm
that are not marked to use endpoint load balancing.
This patch also normalizes the code to treat "load" and "balancer"
as two separate words from the perspectives of variable/function naming.
This means that the 'b' in "balancer" must be capitalized.
Signed-off-by: Chris Telfer <ctelfer@docker.com>