moby--moby

mirror of https://github.com/moby/moby.git synced 2022-11-09 12:21:53 -05:00

Author	SHA1	Message	Date
Madhu Venugopal	59994bbb15	Merge pull request #1775 from sanimej/gossip Handle single manager reload by having workers reconnect	2017-05-31 14:57:34 -07:00
Santhosh Manohar	ca9a768d80	Handle single manager reload by having workers reconnect Signed-off-by: Santhosh Manohar <santhosh@docker.com>	2017-05-31 14:36:23 -07:00
Flavio Crisciani	f585f33042	Node failure timeout fix The time to keep a node failed into the failed node list was originally supposed to be 24h. If a node leaves explicitly it will be removed from the list of nodes and put into the leftNodes list. This way the NotifyLeave event won't insert it into the retry list. NOTE: if the event is lost instead the behavior will be the same as a failed node. If a node fails, the NotifyLeave will insert it into the failedNodes list with a reapTime of 24h. This means that the node will be checked for 24h before being completely forgot. The current check time is every 1 second and is done by the reconnectNode function. The failed node list is updated every 2h instead. Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2017-05-22 17:19:31 -07:00
Madhu Venugopal	bb560a1f44	Generating node discovery events to the drivers from networkdb With the introduction of networkdb, the node discovery events were not sent to the drivers. This commit generates the node discovery events and sents it to the drivers interested in it. Signed-off-by: Madhu Venugopal <madhu@docker.com>	2017-02-01 17:54:51 -08:00
Santhosh Manohar	e98b152bac	Reap failed nodes after 24 hours Signed-off-by: Santhosh Manohar <santhosh@docker.com>	2016-10-20 11:24:04 -07:00
Jana Radhakrishnan	5f5dad3c02	Recover from transient gossip failures Currently if there is any transient gossip failure in any node the recoevry process depends on other nodes propogating the information indirectly. In cases if these transient failures affects all the nodes that this node has in its memberlist then this node will be permenantly cutoff from the the gossip channel. Added node state management code in networkdb to address these problems by trying to rejoin the cluster via the failed nodes when there is a failure. This also necessitates the need to add new messages called node event messages to differentiate between node leave and node failure. Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>	2016-09-19 15:58:14 -07:00
Jana Radhakrishnan	f5f576ad34	Properly purge node networks when node goes away When a node goes away purge all the network attachments from the node and make sure we don't attempt bulk syncing to that node once removed. Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>	2016-06-14 12:39:38 -07:00
Jana Radhakrishnan	28f4561e3f	Add network scoped gossip database Network DB is a network scoped gossip database built on top of hashicorp/memberlist providing an eventually consistent state store. It limits the scope of the gossip and periodic bulk syncing for table entries to only the nodes which participate in the network to which the gossip belongs. This designs make the gossip layer scale better and only consumes resources for the network state that the node participates in. Since the complete state for a network is maintained by all nodes participating in the network, all nodes will eventually converge to the same state. NetworkDB also provides facilities for the users of the package to watch on any table (or all tables) and get notified if there are state changes of interest that happened anywhere in the cluster when that state change eventually finds it's way to the watcher's node. Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>	2016-04-08 12:58:09 -07:00

8 commits