Issue
My application is getting H12 timeout errors and the following are true:
- Using Unicorn or Gunicorn for my web server
- Gunicorn logs show
[CRITICAL] WORKER TIMEOUT
- Unicorn logs show
SIGKILL
Other possible symptoms:
- Increase in database connections
- Timeouts may correlate with restarts or dyno cycling
Resolution
The below explanation applies to both Unicorn and Gunicorn web servers.
Your Unicorn master worker is timing out and killing web workers. With (temporarily) fewer web workers, your app can't handle as many requests, hence the H12 timeouts.
Sometimes this can turn into a bigger problem during restarts. Process forking done by Unicorn can be long/expensive to start up. If these workers take too long to start up or take too long on their first request, Unicorn will kill the processes and start up a new worker. This can result in a seemingly unexplained death spiral of workers being killed on startup, and when there aren't enough workers to handle requests, requests timeout, and workers are once again killed.
Since the workers are sent a SIGKILL (no SIGTERM) they aren't able to log any errors during this time, so most of the time you'll just see a SIGKILL in your logs when this happens and nothing else.
Heroku and tools like New Relic or Scout are unaware of this happening, so it can seem like you're getting H12 timeout errors for no reason. It's an unfortunately harsh behaviour of Unicorn servers.
How to mitigate the Unicorn problem
Check to ensure that all of your dependencies and initializers have been configured with timeouts, reducing your worker dynos' startup time. Also, be sure that nothing else is delaying the booting/forking process.
Using Ruby? Try using Rack timeout
Instead of having Unicorn timeout your requests, using Rack timeout will allow the errors to bubble up to the application logs instead of dying silently with a SIGKILL.
You can also check out more timeout solutions for Ruby apps here.