moby--moby

Commit Graph

Author	SHA1	Message	Date
Flavio Crisciani	64da6b8889	Avoid delay on node rejoin, avoid useless witness Avoid waiting for a double notification once a node rejoin, just put it back to active state. Waiting for a further message does not really add anything to the safety of the operation, the source of truth for the node status resided inside memberlist. Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2018-01-23 16:21:18 -08:00
Flavio Crisciani	b190ee3ccf	Cleanup node management logic Created method to handle the node state change with cleanup operation associated. Realign testing client with the new diagnostic interface Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2017-12-13 09:40:38 -08:00
Flavio Crisciani	3e544bc500	Avoid extra notification on node leave If a node leave, avoid to notify the upper layer for entries that are already marked for deletion Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2017-12-01 16:19:38 -08:00
Flavio Crisciani	f0fcb0bbe6	Fixed race on quick node fail/join The previous logic was not properly handling the case of a node that was failing and oining back in short period of time. The issue was in the handling of the network messages. When a node joins it sync with other nodes, these are passing the whole list of nodes that at best of their knowledge are part of a network. At this point if the node receives that node A is part of the network it saves it before having received the notification that node A is actually alive (coming from memberlist). If node A failed the source node will receive the notification while the new joined node won't because memberlist never advertise node A as available. In this case the new node will never purge node A from its state but also worse, will accept any table notification where node A is the owner and so will end up in a out of sync state with the rest of the cluster. This commit contains also some code cleanup around the area of node management Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2017-11-27 14:38:06 -08:00
Derek McGowan	710e0664c4	Update logrus to v1.0.1 Fix case sensitivity issue Update docker and runc vendors Signed-off-by: Derek McGowan <derek@mcgstyle.net>	2017-08-07 11:20:47 -07:00
Flavio Crisciani	d6440c9139	optimize the rebroadcast for failure case Before when a node was failing, all the nodes would bump the lamport time of all their entries. This means that if a node flap, there will be a storm of update of all the entries. This commit on the base of the previous logic guarantees that only the node that joins back will readvertise its own entries, the other nodes won't need to advertise again. Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2017-08-01 14:08:54 -07:00
Madhu Venugopal	59994bbb15	Merge pull request #1775 from sanimej/gossip Handle single manager reload by having workers reconnect	2017-05-31 14:57:34 -07:00
Santhosh Manohar	ca9a768d80	Handle single manager reload by having workers reconnect Signed-off-by: Santhosh Manohar <santhosh@docker.com>	2017-05-31 14:36:23 -07:00
Flavio Crisciani	f585f33042	Node failure timeout fix The time to keep a node failed into the failed node list was originally supposed to be 24h. If a node leaves explicitly it will be removed from the list of nodes and put into the leftNodes list. This way the NotifyLeave event won't insert it into the retry list. NOTE: if the event is lost instead the behavior will be the same as a failed node. If a node fails, the NotifyLeave will insert it into the failedNodes list with a reapTime of 24h. This means that the node will be checked for 24h before being completely forgot. The current check time is every 1 second and is done by the reconnectNode function. The failed node list is updated every 2h instead. Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2017-05-22 17:19:31 -07:00
Madhu Venugopal	bb560a1f44	Generating node discovery events to the drivers from networkdb With the introduction of networkdb, the node discovery events were not sent to the drivers. This commit generates the node discovery events and sents it to the drivers interested in it. Signed-off-by: Madhu Venugopal <madhu@docker.com>	2017-02-01 17:54:51 -08:00
Santhosh Manohar	e98b152bac	Reap failed nodes after 24 hours Signed-off-by: Santhosh Manohar <santhosh@docker.com>	2016-10-20 11:24:04 -07:00
Jana Radhakrishnan	5f5dad3c02	Recover from transient gossip failures Currently if there is any transient gossip failure in any node the recoevry process depends on other nodes propogating the information indirectly. In cases if these transient failures affects all the nodes that this node has in its memberlist then this node will be permenantly cutoff from the the gossip channel. Added node state management code in networkdb to address these problems by trying to rejoin the cluster via the failed nodes when there is a failure. This also necessitates the need to add new messages called node event messages to differentiate between node leave and node failure. Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>	2016-09-19 15:58:14 -07:00
Jana Radhakrishnan	f5f576ad34	Properly purge node networks when node goes away When a node goes away purge all the network attachments from the node and make sure we don't attempt bulk syncing to that node once removed. Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>	2016-06-14 12:39:38 -07:00
Jana Radhakrishnan	28f4561e3f	Add network scoped gossip database Network DB is a network scoped gossip database built on top of hashicorp/memberlist providing an eventually consistent state store. It limits the scope of the gossip and periodic bulk syncing for table entries to only the nodes which participate in the network to which the gossip belongs. This designs make the gossip layer scale better and only consumes resources for the network state that the node participates in. Since the complete state for a network is maintained by all nodes participating in the network, all nodes will eventually converge to the same state. NetworkDB also provides facilities for the users of the package to watch on any table (or all tables) and get notified if there are state changes of interest that happened anywhere in the cluster when that state change eventually finds it's way to the watcher's node. Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>	2016-04-08 12:58:09 -07:00

14 Commits