moby--moby

Commit Graph

Author	SHA1	Message	Date
Flavio Crisciani	f0fcb0bbe6	Fixed race on quick node fail/join The previous logic was not properly handling the case of a node that was failing and oining back in short period of time. The issue was in the handling of the network messages. When a node joins it sync with other nodes, these are passing the whole list of nodes that at best of their knowledge are part of a network. At this point if the node receives that node A is part of the network it saves it before having received the notification that node A is actually alive (coming from memberlist). If node A failed the source node will receive the notification while the new joined node won't because memberlist never advertise node A as available. In this case the new node will never purge node A from its state but also worse, will accept any table notification where node A is the owner and so will end up in a out of sync state with the rest of the cluster. This commit contains also some code cleanup around the area of node management Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2017-11-27 14:38:06 -08:00
Flavio Crisciani	7fbaf6de2c	Add test to confirm garbage collection - Create a test to verify that a node that joins in an async way is not going to extend the life of a already deleted object Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2017-10-23 09:58:57 +02:00
Flavio Crisciani	b92d91d6a1	Fix comparison against wrong constant The comparison was against the wrong constant value. As described in the comment the check is there to guarantee to not propagate events realted to stale deleted elements Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2017-09-29 21:05:24 -07:00
Flavio Crisciani	8c31217a44	NetworkDB create NodeID for cluster nodes Separate the hostname from the node identifier. All the messages that are exchanged on the network are containing a nodeName field that today was hostname-uniqueid. Now being encoded as strings in the protobuf without any length restriction they plays a role on the effieciency of protocol itself. If the hostname is very long the overhead will increase and will degradate the performance of the database itself that each single cycle by default allows 1400 bytes payload Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2017-09-26 10:48:04 -07:00
Flavio Crisciani	a4e64d05c1	Avoid alignment of reapNetwork and tableEntries Make sure that the network is garbage collected after the entries. Entries to be deleted requires that the network is present. Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2017-09-22 10:57:47 -07:00
Flavio Crisciani	053a534ab1	Changed ReapTable logic - Changed the loop per network. Previous implementation was taking a ReadLock to update the reapTime but now with the residualReapTime also the bulkSync is using the same ReadLock creating possible issues in concurrent read and update of the value. The new logic fetches the list of networks and proceed to the cleanup network by network locking the database and releasing it after each network. This should ensure a fair locking avoiding to keep the database blocked for too much time. Note: The ticker does not guarantee that the reap logic runs precisely every reapTimePeriod, actually documentation says that if the routine is too long will skip ticks. In case of slowdown of the process itself it is possible that the lifetime of the deleted entries increases, it still should not be a huge problem because now the residual reaptime is propagated among all the nodes a slower node will let the deleted entry being repropagate multiple times but the state will still remain consistent. Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2017-09-21 09:37:47 -07:00
Flavio Crisciani	2d2a2bc568	Fix reapTime logic in NetworkDB - Added remainingReapTime field in the table event. Wihtout it a node that did not have a state for the element was marking the element for deletion setting the max reapTime. This was creating the possibility to keep the entry being resync between nodes forever avoding the purpose of the reap time itself. - On broadcast of the table event the node owner was rewritten with the local node name, this was not correct because the owner should continue to remain the original one of the message Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2017-09-21 09:37:37 -07:00
Derek McGowan	710e0664c4	Update logrus to v1.0.1 Fix case sensitivity issue Update docker and runc vendors Signed-off-by: Derek McGowan <derek@mcgstyle.net>	2017-08-07 11:20:47 -07:00
Flavio Crisciani	a3ecb8902a	fix join/leave join/leave fixes: - when a node leaves the network will deletes all the other nodes entries but will keep track of its to make sure that other nodes if they are tcp syncing will be aware of them being deleted. (a node that did not yet receive the network leave will potentially tcp/sync) add network reapTime, was not being set locally Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2017-08-01 14:08:45 -07:00
Flavio Crisciani	e77c245e45	2x faster to converge - Introduced back the Invalidate - optimized the rebroadcast logic Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2017-08-01 13:47:18 -07:00
Flavio Crisciani	585964bf32	NetworkDB testing infra - Diagnose framework that exposes REST API for db interaction - Dockerfile to build the test image - Periodic print of stats regarding queue size - Client and server side for integration with testkit - Added write-delete-leave-join - Added test write-delete-wait-leave-join - Added write-wait-leave-join Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2017-07-27 08:50:43 -07:00
Flavio Crisciani	051a0d5ce9	NetworkDB incorrect number of entries in networkNodes A rapid (within networkReapTime 30min) leave/join network can corrupt the list of nodes per network with multiple copies of the same nodes. The fix makes sure that each node is present only once Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>	2017-07-18 16:57:49 -07:00
Sebastiaan van Stijn	3dd1fb1217	Make node join event logging less noisy Commit `ca9a768d80` added a number of debugging messages for node join/leave events. This patch checks if a node already was listed, and otherwise skips the logging to make the logs a bit less noisy. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2017-07-10 17:25:14 -07:00
Santhosh Manohar	ca9a768d80	Handle single manager reload by having workers reconnect Signed-off-by: Santhosh Manohar <santhosh@docker.com>	2017-05-31 14:36:23 -07:00
Santhosh Manohar	69ad7ef244	control-plane hardning: cleanup local state on peer leaving a network Signed-off-by: Santhosh Manohar <santhosh@docker.com>	2017-03-31 01:49:03 -07:00
Santhosh Manohar	0a2537eea3	Use monotonic clock for reaping networkDB entries Signed-off-by: Santhosh Manohar <santhosh@docker.com>	2016-10-19 22:30:47 -07:00
Jana Radhakrishnan	f649d5ae61	Do not hold ack channel in ack table after closing Once the bulksync ack channel is closed remove it from the ack table right away. There is no reason to keep it in the ack table and later delete it in the ack waiter. Ack waiter anyways has reference to the channel on which it is waiting. Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>	2016-10-03 09:50:02 -07:00
Jana Radhakrishnan	7b905d3c63	Purge stale nodes with same prefix and IP Since the node name randomization fix, we need to make sure that we purge the old node with the same prefix and same IP from the nodes database if it still present. This causes unnecessary reconnect attempts. Also added a change to avoid unnecessary update of local lamport time and only do it of we are ready to do a push pull on a join. Join should happen only when the node is bootstrapped or when trying to reconnect with a failed node. Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>	2016-09-23 14:48:54 -07:00
Jana Radhakrishnan	5f5dad3c02	Recover from transient gossip failures Currently if there is any transient gossip failure in any node the recoevry process depends on other nodes propogating the information indirectly. In cases if these transient failures affects all the nodes that this node has in its memberlist then this node will be permenantly cutoff from the the gossip channel. Added node state management code in networkdb to address these problems by trying to rejoin the cluster via the failed nodes when there is a failure. This also necessitates the need to add new messages called node event messages to differentiate between node leave and node failure. Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>	2016-09-19 15:58:14 -07:00
Santhosh Manohar	173832dd19	Merge pull request #1406 from mrjana/bugs Ensure add newly joined node to networknodes	2016-08-21 22:03:03 -07:00
Jana Radhakrishnan	1b027335f1	Ensure add newly joined node to networknodes In cases a node left the cluster and quickly rejoined before the node entry is expired by other nodes in the cluster, when the node rejoins we fail to add it to the quick lookup database. Fixed it. Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>	2016-08-19 17:18:15 -07:00
Jana Radhakrishnan	2bead02c87	Ignore delete events for non-existent entries In networkdb we should ignore delete events for entries which doesn't exist in the db. This is always true because if the entry did not exist then the entry has been removed way earlier and got purged after the reap timer and this notification is very stale. Also there were duplicate delete notifications being sent to the clients. One when the actual delete event was received from gossip and later when the entry was getting reaped. The second notification is unnecessary and may cause issues with the clients if they are not coded for idempotency. Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>	2016-08-18 13:57:24 -07:00
Santhosh Manohar	2bab9b6bdb	Cleanup networkdb state when the network is deleted locally Signed-off-by: Santhosh Manohar <santhosh@docker.com>	2016-08-10 12:44:05 -07:00
Alexander Morozov	392b089170	networkdb: fix data races in map access Signed-off-by: Alexander Morozov <lk4d4math@gmail.com>	2016-08-05 14:24:30 -07:00
Santhosh Manohar	8af5fdb9b1	Do not create network entry in networkdb for the local node based on table event from peer Signed-off-by: Santhosh Manohar <santhosh@docker.com>	2016-07-26 06:51:47 -07:00
Jana Radhakrishnan	8936daab5e	Retain deleted entries for longer time When deleting entries or when learning about deleted entries remember then for a longer time to avoid excessive delete duplicates in the gossip cluster. Also added code changes to ignore event messages originated from the source node so that it doesn't get added into the rebroadcast queue. Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>	2016-06-30 18:24:13 -07:00
Jana Radhakrishnan	8245296aa5	Make sure node map is valid before accessing it Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>	2016-06-13 18:30:31 -07:00
Jana Radhakrishnan	78a3cf5f6c	Do not rebroacast bulk sync updates Bulksync is not meant to be rebroadcast in gossip. Stopped rebroadcasting bulksync updates. Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>	2016-06-12 20:19:40 -07:00
Jana Radhakrishnan	774399fd66	Fix couple of panics in networkdb Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>	2016-06-02 20:29:37 -07:00
Jana Radhakrishnan	77abea9c1e	Use protobuf in networkdb core messages Convert all networkdb core message types from go message types to protobuf message types. This faciliates future modification of the message structure without breaking backward compatibility. Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>	2016-05-17 09:18:24 -07:00
Jana Radhakrishnan	28f4561e3f	Add network scoped gossip database Network DB is a network scoped gossip database built on top of hashicorp/memberlist providing an eventually consistent state store. It limits the scope of the gossip and periodic bulk syncing for table entries to only the nodes which participate in the network to which the gossip belongs. This designs make the gossip layer scale better and only consumes resources for the network state that the node participates in. Since the complete state for a network is maintained by all nodes participating in the network, all nodes will eventually converge to the same state. NetworkDB also provides facilities for the users of the package to watch on any table (or all tables) and get notified if there are state changes of interest that happened anywhere in the cluster when that state change eventually finds it's way to the watcher's node. Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>	2016-04-08 12:58:09 -07:00

31 Commits