Commit Graph

14 Commits

Author SHA1 Message Date
Jana Radhakrishnan 7b905d3c63 Purge stale nodes with same prefix and IP
Since the node name randomization fix, we need to make sure that we
purge the old node with the same prefix and same IP from the nodes
database if it still present. This causes unnecessary reconnect
attempts.

Also added a change to avoid unnecessary update of local lamport time
and only do it of we are ready to do a push pull on a join. Join should
happen only when the node is bootstrapped or when trying to reconnect
with a failed node.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-09-23 14:48:54 -07:00
Jana Radhakrishnan 5f5dad3c02 Recover from transient gossip failures
Currently if there is any transient gossip failure in any node the
recoevry process depends on other nodes propogating the information
indirectly. In cases if these transient failures affects all the nodes
that this node has in its memberlist then this node will be permenantly
cutoff from the the gossip channel. Added node state management code in
networkdb to address these problems by trying to rejoin the cluster via
the failed nodes when there is a failure. This also necessitates the
need to add new messages called node event messages to differentiate
between node leave and node failure.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-09-19 15:58:14 -07:00
Santhosh Manohar 173832dd19 Merge pull request #1406 from mrjana/bugs
Ensure add newly joined node to networknodes
2016-08-21 22:03:03 -07:00
Jana Radhakrishnan 1b027335f1 Ensure add newly joined node to networknodes
In cases a node left the cluster and quickly rejoined before the node
entry is expired by other nodes in the cluster, when the node rejoins we
fail to add it to the quick lookup database. Fixed it.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-08-19 17:18:15 -07:00
Jana Radhakrishnan 2bead02c87 Ignore delete events for non-existent entries
In networkdb we should ignore delete events for entries which doesn't
exist in the db. This is always true because if the entry did not exist
then the entry has been removed way earlier and got purged after the
reap timer and this notification is very stale.

Also there were duplicate delete notifications being sent to the
clients. One when the actual delete event was received from gossip and
later when the entry was getting reaped. The second notification is
unnecessary and may cause issues with the clients if they are not
coded for idempotency.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-08-18 13:57:24 -07:00
Santhosh Manohar 2bab9b6bdb Cleanup networkdb state when the network is deleted locally
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2016-08-10 12:44:05 -07:00
Alexander Morozov 392b089170 networkdb: fix data races in map access
Signed-off-by: Alexander Morozov <lk4d4math@gmail.com>
2016-08-05 14:24:30 -07:00
Santhosh Manohar 8af5fdb9b1 Do not create network entry in networkdb for the local node based on table
event from peer

Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2016-07-26 06:51:47 -07:00
Jana Radhakrishnan 8936daab5e Retain deleted entries for longer time
When deleting entries or when learning about deleted entries remember
then for a longer time to avoid excessive delete duplicates in the
gossip cluster. Also added code changes to ignore event messages
originated from the source node so that it doesn't get added into the
rebroadcast queue.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-06-30 18:24:13 -07:00
Jana Radhakrishnan 8245296aa5 Make sure node map is valid before accessing it
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-06-13 18:30:31 -07:00
Jana Radhakrishnan 78a3cf5f6c Do not rebroacast bulk sync updates
Bulksync is not meant to be rebroadcast in gossip. Stopped
rebroadcasting bulksync updates.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-06-12 20:19:40 -07:00
Jana Radhakrishnan 774399fd66 Fix couple of panics in networkdb
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-06-02 20:29:37 -07:00
Jana Radhakrishnan 77abea9c1e Use protobuf in networkdb core messages
Convert all networkdb core message types from go message types to
protobuf message types. This faciliates future modification of the
message structure without breaking backward compatibility.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-05-17 09:18:24 -07:00
Jana Radhakrishnan 28f4561e3f Add network scoped gossip database
Network DB is a network scoped gossip database built
on top of hashicorp/memberlist providing an eventually
consistent state store.

It limits the scope of the gossip and periodic bulk syncing
for table entries to only the nodes which participate in the
network to which the gossip belongs. This designs make the
gossip layer scale better and only consumes resources for the
network state that the node participates in.

Since the complete state for a network is maintained by all nodes
participating in the network, all nodes will eventually converge
to the same state.

NetworkDB also provides facilities for the users of the package to
watch on any table (or all tables) and get notified if there are
state changes of interest that happened anywhere in the cluster when
that state change eventually finds it's way to the watcher's node.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-04-08 12:58:09 -07:00