With queue_requests set to true (the default), workers accept all
requests and queue them before passing them to the handlers.
With it set to false, each worker process accepts exactly as
many requests as it is configured to simultaneously handle.
In combination with threads 1, 1 this ensures that requests
are balanced across workers in a single threaded application.
This can avoid deadlocks when a single threaded app sends a
request to itself. (For example, to generate a PDF.)
Rapidly adding work to the ThreadPool can result in many jobs making it onto `@todo` before one of the jobs gets the mutex lock to decrement `@waiting`. So `@waiting == 0` isn't true and no thread is spawned even though there's work piling up, e.g.:
```ruby
require 'puma/thread_pool'
pool = Puma::ThreadPool.new(1, 3) { sleep 2 }
3.times { pool << 1 }
sleep 1
pool.spawned #=> 1 # When 3 is expected
```
Checking if `@waiting < @todo.size` shows that there's more work to do than threads waiting even if `@waiting` hasn't been decremented to `0` and also covers the base case where `@waiting == 0` and `@tudo.size == 1`.
An alternate option would be just adding the new check without removing the old one, something like `(@waiting == 0 or @waiting < @todo.size)`, but I don't think it's necessary unless for some kind of performance reason.
Puma::IOBuffer is a very simple memory buffer that allows for fast
append without additional object overhead.
Additionally, turns out that IO#write on 1.9.3 is extremely
non-performant because it allows a Hash object on every invocation.
Avoid calling IO#write in a loop on 1.9.3! Use IO#syswrite if you can
(for instance when you don't care about the encoding of the output
(sockets)).
Using the work queue to communicate trimming doesn't work, it's far too
easy to starve the system doing that. Instead we now detect trimming and
work as seperate actions.