staticcheck go linter warns:
> distribution/xfer/transfer_test.go:37:2: SA2002: the goroutine calls T.Fatalf, which must be called in the same goroutine as the test (staticcheck)
What it doesn't say is why. The reason is, t.Fatalf() calls t.FailNow(),
which is expected to stop test execution right now. It does so by
calling runtime.Goexit(), which, unless called from a main goroutine,
does not stop test execution.
Anyway, long story short, if we don't care much about stopping the test
case immediately, we can just replace t.Fatalf() with t.Errorf() which
still marks the test case as failed, but won't stop it immediately.
This patch was tested to check that the test fails if any of the
goroutines call t.Errorf():
1. Failure in DoFunc ("transfer function not started ...") was tested by
decreading the NewTransferManager() argument:
- tm := NewTransferManager(5)
+ tm := NewTransferManager(2)
2. Failure "got unexpected progress value" was tested by injecting a random:
- if present && p.Current <= val {
+ if present && p.Current <= val || rand.Intn(100) > 80 {
3. Failure in DoFunc ("too many jobs running") was tested by increasing
the NewTransferManager() argument:
- tm := NewTransferManager(concurrencyLimit)
+ tm := NewTransferManager(concurrencyLimit + 1)
While at it:
* fix/amend some error messages
* use _ for unused arguments of DoFunc
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This test was checking that it received every progress update that was
produced. But delivery of these intermediate progress updates is not
guaranteed. A new update can overwrite the previous one if the previous
one hasn't been sent to the channel yet.
The call to t.Fatalf exited the current goroutine which was consuming
the channel, which caused a deadlock and eventual test timeout rather
than a proper failure message.
Failure seen here:
https://jenkins.dockerproject.org/job/Docker-PRs-experimental/16400/console
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
A watcher would output the current progress item when it was detached,
in case it missed that item earlier, which would leave the user seeing
some intermediate step of the operation. This commit changes it to only
output it on detach if it didn't already output the same item.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Things could go wrong if Watch was called after the last existing
watcher was released. The call to Watch would succeed even though it was
not really adding a watcher, and the corresponding call to Release would
close hasWatchers a second time.
The fix for this is twofold:
1. We allow transfers to gain new watchers after the watcher count has
touched zero. This means that the channel returned by Released should
not be closed until all watchers have been released AND the transfer is
no longer tracked by the transfer manager, meaning it won't be possible
for additional calls to Watch to race with closing the channel returned
by Released.
The Transfer interface has a new method called Close so the transfer can
know when the transfer manager no longer references it.
Remove the Cancel method. It's not used and should not be exported.
2. Even though (1) makes it possible to add watchers after all the
previous watchers have been released, we want to avoid doing this in
practice. A transfer that has had all its watchers released is in the
process of being cancelled, and attaching to one of these will never be
the correct behavior. Add a check if a watcher is attaching to a
cancelled transfer. In this case, wait for the transfer to be removed
from the map and try again. This will ensure correct behavior when a
watcher tries to attach during the race window.
Either (1) or (2) should be sufficient to fix the race involved here,
but the combination is the most correct approach. (1) fixes the
low-level plumbing to be resilient to the race condition, and (2) avoids
using it in a racy way.
Fixes#19606
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This commit adds a transfer manager which deduplicates and schedules
transfers, and also an upload manager and download manager that build on
top of the transfer manager to provide high-level interfaces for uploads
and downloads. The push and pull code is modified to use these building
blocks.
Some benefits of the changes:
- Simplification of push/pull code
- Pushes can upload layers concurrently
- Failed downloads and uploads are retried after backoff delays
- Cancellation is supported, but individual transfers will only be
cancelled if all pushes or pulls using them are cancelled.
- The distribution code is decoupled from Docker Engine packages and API
conventions (i.e. streamformatter), which will make it easier to split
out.
This commit also includes unit tests for the new distribution/xfer
package. The tests cover 87.8% of the statements in the package.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>