Commit Graph

31 Commits

Author SHA1 Message Date
Flavio Crisciani f0fcb0bbe6 Fixed race on quick node fail/join
The previous logic was not properly handling the case of a node
that was failing and oining back in short period of time.
The issue was in the handling of the network messages.
When a node joins it sync with other nodes, these are passing
the whole list of nodes that at best of their knowledge are part
of a network. At this point if the node receives that node A is part
of the network it saves it before having received the notification
that node A is actually alive (coming from memberlist).
If node A failed the source node will receive the notification
while the new joined node won't because memberlist never advertise
node A as available. In this case the new node will never purge
node A from its state but also worse, will accept any table notification
where node A is the owner and so will end up in a out of sync state
with the rest of the cluster.

This commit contains also some code cleanup around the area of node
management

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-11-27 14:38:06 -08:00
Flavio Crisciani 7fbaf6de2c Add test to confirm garbage collection
- Create a test to verify that a node that joins
  in an async way is not going to extend the life
  of a already deleted object

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-10-23 09:58:57 +02:00
Flavio Crisciani b92d91d6a1 Fix comparison against wrong constant
The comparison was against the wrong constant value.
As described in the comment the check is there to guarantee
to not propagate events realted to stale deleted elements

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-09-29 21:05:24 -07:00
Flavio Crisciani 8c31217a44 NetworkDB create NodeID for cluster nodes
Separate the hostname from the node identifier. All the messages
that are exchanged on the network are containing a nodeName field
that today was hostname-uniqueid. Now being encoded as strings in
the protobuf without any length restriction they plays a role
on the effieciency of protocol itself. If the hostname is very long
the overhead will increase and will degradate the performance of
the database itself that each single cycle by default allows 1400
bytes payload

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-09-26 10:48:04 -07:00
Flavio Crisciani a4e64d05c1 Avoid alignment of reapNetwork and tableEntries
Make sure that the network is garbage collected after
the entries. Entries to be deleted requires that the network
is present.

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-09-22 10:57:47 -07:00
Flavio Crisciani 053a534ab1 Changed ReapTable logic
- Changed the loop per network. Previous implementation was taking a
  ReadLock to update the reapTime but now with the residualReapTime
  also the bulkSync is using the same ReadLock creating possible
  issues in concurrent read and update of the value.
  The new logic fetches the list of networks and proceed to the
  cleanup network by network locking the database and releasing it
  after each network. This should ensure a fair locking avoiding
  to keep the database blocked for too much time.

  Note: The ticker does not guarantee that the reap logic runs
  precisely every reapTimePeriod, actually documentation says that
  if the routine is too long will skip ticks. In case of slowdown
  of the process itself it is possible that the lifetime of the
  deleted entries increases, it still should not be a huge problem
  because now the residual reaptime is propagated among all the nodes
  a slower node will let the deleted entry being repropagate multiple
  times but the state will still remain consistent.

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-09-21 09:37:47 -07:00
Flavio Crisciani 2d2a2bc568 Fix reapTime logic in NetworkDB
- Added remainingReapTime field in the table event.
  Wihtout it a node that did not have a state for the element
  was marking the element for deletion setting the max reapTime.
  This was creating the possibility to keep the entry being resync
  between nodes forever avoding the purpose of the reap time
  itself.

- On broadcast of the table event the node owner was rewritten
  with the local node name, this was not correct because the owner
  should continue to remain the original one of the message

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-09-21 09:37:37 -07:00
Derek McGowan 710e0664c4 Update logrus to v1.0.1
Fix case sensitivity issue
Update docker and runc vendors

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2017-08-07 11:20:47 -07:00
Flavio Crisciani a3ecb8902a fix join/leave
join/leave fixes:
 - when a node leaves the network will deletes all the other nodes entries but will keep track of its
   to make sure that other nodes if they are tcp syncing will be aware of them being deleted. (a node that
   did not yet receive the network leave will potentially tcp/sync)

add network reapTime, was not being set locally

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-08-01 14:08:45 -07:00
Flavio Crisciani e77c245e45 2x faster to converge
- Introduced back the Invalidate
- optimized the rebroadcast logic

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-08-01 13:47:18 -07:00
Flavio Crisciani 585964bf32 NetworkDB testing infra
- Diagnose framework that exposes REST API for db interaction
- Dockerfile to build the test image
- Periodic print of stats regarding queue size
- Client and server side for integration with testkit
- Added write-delete-leave-join
- Added test write-delete-wait-leave-join
- Added write-wait-leave-join

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-07-27 08:50:43 -07:00
Flavio Crisciani 051a0d5ce9 NetworkDB incorrect number of entries in networkNodes
A rapid (within networkReapTime 30min) leave/join network
can corrupt the list of nodes per network with multiple copies
of the same nodes.
The fix makes sure that each node is present only once

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-07-18 16:57:49 -07:00
Sebastiaan van Stijn 3dd1fb1217 Make node join event logging less noisy
Commit ca9a768d80
added a number of debugging messages for node join/leave
events.

This patch checks if a node already was listed,
and otherwise skips the logging to make the logs a bit
less noisy.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2017-07-10 17:25:14 -07:00
Santhosh Manohar ca9a768d80 Handle single manager reload by having workers reconnect
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2017-05-31 14:36:23 -07:00
Santhosh Manohar 69ad7ef244 control-plane hardning: cleanup local state on peer leaving a network
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2017-03-31 01:49:03 -07:00
Santhosh Manohar 0a2537eea3 Use monotonic clock for reaping networkDB entries
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2016-10-19 22:30:47 -07:00
Jana Radhakrishnan f649d5ae61 Do not hold ack channel in ack table after closing
Once the bulksync ack channel is closed remove it from the ack table
right away. There is no reason to keep it in the ack table and later
delete it in the ack waiter. Ack waiter anyways has reference to the
channel on which it is waiting.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-10-03 09:50:02 -07:00
Jana Radhakrishnan 7b905d3c63 Purge stale nodes with same prefix and IP
Since the node name randomization fix, we need to make sure that we
purge the old node with the same prefix and same IP from the nodes
database if it still present. This causes unnecessary reconnect
attempts.

Also added a change to avoid unnecessary update of local lamport time
and only do it of we are ready to do a push pull on a join. Join should
happen only when the node is bootstrapped or when trying to reconnect
with a failed node.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-09-23 14:48:54 -07:00
Jana Radhakrishnan 5f5dad3c02 Recover from transient gossip failures
Currently if there is any transient gossip failure in any node the
recoevry process depends on other nodes propogating the information
indirectly. In cases if these transient failures affects all the nodes
that this node has in its memberlist then this node will be permenantly
cutoff from the the gossip channel. Added node state management code in
networkdb to address these problems by trying to rejoin the cluster via
the failed nodes when there is a failure. This also necessitates the
need to add new messages called node event messages to differentiate
between node leave and node failure.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-09-19 15:58:14 -07:00
Santhosh Manohar 173832dd19 Merge pull request #1406 from mrjana/bugs
Ensure add newly joined node to networknodes
2016-08-21 22:03:03 -07:00
Jana Radhakrishnan 1b027335f1 Ensure add newly joined node to networknodes
In cases a node left the cluster and quickly rejoined before the node
entry is expired by other nodes in the cluster, when the node rejoins we
fail to add it to the quick lookup database. Fixed it.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-08-19 17:18:15 -07:00
Jana Radhakrishnan 2bead02c87 Ignore delete events for non-existent entries
In networkdb we should ignore delete events for entries which doesn't
exist in the db. This is always true because if the entry did not exist
then the entry has been removed way earlier and got purged after the
reap timer and this notification is very stale.

Also there were duplicate delete notifications being sent to the
clients. One when the actual delete event was received from gossip and
later when the entry was getting reaped. The second notification is
unnecessary and may cause issues with the clients if they are not
coded for idempotency.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-08-18 13:57:24 -07:00
Santhosh Manohar 2bab9b6bdb Cleanup networkdb state when the network is deleted locally
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2016-08-10 12:44:05 -07:00
Alexander Morozov 392b089170 networkdb: fix data races in map access
Signed-off-by: Alexander Morozov <lk4d4math@gmail.com>
2016-08-05 14:24:30 -07:00
Santhosh Manohar 8af5fdb9b1 Do not create network entry in networkdb for the local node based on table
event from peer

Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2016-07-26 06:51:47 -07:00
Jana Radhakrishnan 8936daab5e Retain deleted entries for longer time
When deleting entries or when learning about deleted entries remember
then for a longer time to avoid excessive delete duplicates in the
gossip cluster. Also added code changes to ignore event messages
originated from the source node so that it doesn't get added into the
rebroadcast queue.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-06-30 18:24:13 -07:00
Jana Radhakrishnan 8245296aa5 Make sure node map is valid before accessing it
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-06-13 18:30:31 -07:00
Jana Radhakrishnan 78a3cf5f6c Do not rebroacast bulk sync updates
Bulksync is not meant to be rebroadcast in gossip. Stopped
rebroadcasting bulksync updates.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-06-12 20:19:40 -07:00
Jana Radhakrishnan 774399fd66 Fix couple of panics in networkdb
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-06-02 20:29:37 -07:00
Jana Radhakrishnan 77abea9c1e Use protobuf in networkdb core messages
Convert all networkdb core message types from go message types to
protobuf message types. This faciliates future modification of the
message structure without breaking backward compatibility.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-05-17 09:18:24 -07:00
Jana Radhakrishnan 28f4561e3f Add network scoped gossip database
Network DB is a network scoped gossip database built
on top of hashicorp/memberlist providing an eventually
consistent state store.

It limits the scope of the gossip and periodic bulk syncing
for table entries to only the nodes which participate in the
network to which the gossip belongs. This designs make the
gossip layer scale better and only consumes resources for the
network state that the node participates in.

Since the complete state for a network is maintained by all nodes
participating in the network, all nodes will eventually converge
to the same state.

NetworkDB also provides facilities for the users of the package to
watch on any table (or all tables) and get notified if there are
state changes of interest that happened anywhere in the cluster when
that state change eventually finds it's way to the watcher's node.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-04-08 12:58:09 -07:00