Most importantly, the original request thread must yield its share lock
while waiting for the live thread to commit -- otherwise a request's
base and live threads can deadlock against each other.
While we know no user code is running, we should do as much loading as
we can. That way, all the threads will then be able to resume running
user code together.
Otherwise, only the last arriving thread would get to do its load, and
would then return to userspace, leaving the others still blocked.
Specifically, clean up if the thread is killed while it's blocked
awaiting the lock... if we get killed on some other arbitrary line, the
result remains quite undefined.