From our friends at GitLab, the new experimental `wait_for_less_busy_worker` config option may reduce latency and improve throughput for high-load Puma apps on MRI. See the [pull request](https://github.com/puma/puma/pull/2079) for more discussion.
Users of this option should see reduced request queue latency and possibly less overall latency.
Add the following to your `puma.rb` to try it:
```ruby
wait_for_less_busy_worker
# or
wait_for_less_busy_worker 0.001
```
Production testing at GitLab suggests values between `0.001` and `0.010` are best.
### Better memory usage
5.0 brings two new options to your config which may improve memory usage.
`nakayoshi_fork` calls GC a handful of times and compacts the heap on Ruby 2.7+ before forking. This may reduce memory usage on Pumas on MRI with preload enabled. It's inspired by [Koichi Sasada's work](https://github.com/ko1/nakayoshi_fork).
Puma 5 introduces an experimental new cluster-mode configuration option, `fork_worker` (`--fork-worker` from the CLI). This mode causes Puma to fork additional workers from worker 0, instead of directly from the master process:
```
10000 \_ puma 4.3.3 (tcp://0.0.0.0:9292) [puma]
10001 \_ puma: cluster worker 0: 10000 [puma]
10002 \_ puma: cluster worker 1: 10000 [puma]
10003 \_ puma: cluster worker 2: 10000 [puma]
10004 \_ puma: cluster worker 3: 10000 [puma]
```
This allows the usage of `preload` with `phased_restart`. It also may improve memory usage because the worker process loads additional code after processing requests.
To learn more about using `refork` and `fork_worker`, see [the documentation](https://github.com/puma/puma/blob/master/docs/fork_worker.md).
* pumactl now has a `thread-backtraces` command to print thread backtraces, bringing thread backtrace printing to all platforms, not just *BSD and Mac. (#2053)
* Added incrementing `requests_count` to `Puma.stats`. (#2106)
* If you are running MRI, default thread count on Puma is now 5, not 16. This may change the amount of threads running in your threadpool. We believe 5 is a better default for most Ruby web applications on MRI. Higher settings increase latency by causing GVL contention.
* If you are using a worker count of more than 1 and you are not using phased_restart, Puma will now `preload` by default. We believe this is a better default, but may cause issues in non-Rails applications if you do not have the proper `before` and `after` fork hooks configured. See documentation for your framework. Rails users do not need to change anything.
* tcp mode and daemonization have been removed without replacement. For daemonization, please use a modern process management solution, such as systemd or monit.