>Note: Puma 5 now automatically uses `WEB_CONCURRENCY` env var if set see [this post for an explanation](https://github.com/puma/puma/issues/2393#issuecomment-702352208). If your memory use goes up after upgrading to Puma 5 it indicates you're now running with multiple workers (processes). You can decrease memory use by tuning this number to be lower.
Puma 5 was named Spoony Bard by our newest supercontributor, [@wjordan](https://github.com/puma/puma/commits?author=wjordan). Will brought you one of our new perf features for this release, as well as [many other fixes and refactors.](https://github.com/puma/puma/commits?author=wjordan) If you'd like to name a Puma release in the future, take a look at [CONTRIBUTING.md](https://github.com/puma/puma/blob/master/CONTRIBUTING.md) and get started helping us out :)
Puma 5 also welcomes [@MSP-Greg](https://github.com/puma/puma/commits?author=MSP-Greg) as our newest committer. Greg has been instrumental in improving our CI setup and SSL features. Greg also [named our 4.3.0 release](https://github.com/puma/puma/releases/tag/v4.3.0): Mysterious Traveller.
Puma 5 contains three new "experimental" performance features for cluster-mode Pumas running on MRI.
If you try any of these features, please report your results to [our report issue](https://github.com/puma/puma/issues/2258).
Part of the reason we're calling them _experimental_ is because we're not sure if they'll actually have any benefit. People's workloads in the real world are often not what we anticipate, and synthetic benchmarks are usually not of any help in figuring out if a change will be beneficial or not.
We do not believe any of the new features will have a negative effect or impact the stability of your application. This is either a "it works" or "it does nothing" experiment.
If any of the features turn out to be particularly beneficial, we may make them defaults in future versions of Puma.
From our friends at GitLab, the new experimental `wait_for_less_busy_worker` config option may reduce latency and improve throughput for high-load Puma apps on MRI. See the [pull request](https://github.com/puma/puma/pull/2079) for more discussion.
Users of this option should see reduced request queue latency and possibly less overall latency.
`nakayoshi_fork` calls GC a handful of times and compacts the heap on Ruby 2.7+ before forking. This may reduce memory usage of Puma on MRI with preload enabled. It's inspired by [Koichi Sasada's work](https://github.com/ko1/nakayoshi_fork).
Puma 5 introduces an experimental new cluster-mode configuration option, `fork_worker` (`--fork-worker` from the CLI). This mode causes Puma to fork additional workers from worker 0, instead of directly from the master process:
* pumactl now has a `thread-backtraces` command to print thread backtraces, bringing thread backtrace printing to all platforms, not just *BSD and Mac. (#2053)
* Added incrementing `requests_count` to `Puma.stats`. (#2106)
* If you are running MRI, default thread count on Puma is now 5, not 16. This may change the amount of threads running in your threadpool. We believe 5 is a better default for most Ruby web applications on MRI. Higher settings increase latency by causing GVL contention.
* If you are using a worker count of more than 1 and you are not using phased_restart, Puma will now `preload` by default. We believe this is a better default, but may cause issues in non-Rails applications if you do not have the proper `before` and `after` fork hooks configured. See documentation for your framework. Rails users do not need to change anything.
* tcp mode and daemonization have been removed without replacement. For daemonization, please use a modern process management solution, such as systemd or monit.