Is it normal for request queueing (as reported by NewRelic) to take around 60ms at minimum or longer with only two requests per second and three dynos each running two Unicorn workers?
When New Relic shows request queueing, it might not mean what you think it means. There are two places between the outside world and your application stack where a request can spend time: in the routing layer, and in a queue on your dyno. Both of these together are "Request Queueing" according to New Relic.
When a request comes in to the Heroku router, the routing layer adds a header called
X-Request-Start that is a UNIX timestamp of when the request was first received. The request is dispatched to one of your app dynos, where it may wait in a request queue. As soon as your app starts handling that specific request, the New Relic middleware will compare the current time to the
X-Request-Start time and report all of that as "Request Queuing".
If you have requests coming in to your app faster than your Unicorn workers can respond to them, the router will continue to randomly distribute them, but the requests will start to queue up at the dyno level. You can configure how many requests Unicorn will allow to queue before refusing to accept any more using the
backlog property documented here.
You can tell how much request queueing time is coming from the routing layer by looking at the
connect time in the router log lines. This is the amount of time it took the routing layer to find one of your dynos and dispatch the request:
heroku[router]: at=info method=POST path=/foo host=example.com fwd="184.108.40.2069" dyno=web.1 connect=1ms service=17623ms status=200 bytes=38
In this case, the router found your dyno and dispatched the request in 1 ms. In this example, the dyno took 17 seconds to generate a response, as indicated by the