Why am I getting "Error waiting for network: Resource temporarily unavailable" errors?

Resolution

This is caused by a low-level, long-standing kernel bug that manifests at varying levels of frequency. We've been engaged with kernel developers at Canonical about this issue, but it's extremely tricky to solve due to where the bug is triggered. Many people run into it in a variety of situations, but we tend to more frequently due to our scale. It is extremely hard to reproduce on-demand though (which is why it's taking such a long time to get fixed).

Unfortunately, there is not really a work around for this issue. Scheduler dynos will crash if this happens and the job will not be executed again until its next scheduled run time. Other dyno types will typically enter a bad state and require a manual restart (you may see "App boot timeout" errors when this happens).

The only known way to reduce the chances you'll see this is to switch to Performance dynos which see this problem much less frequently due to their backing server only running one dyno at a time (which results in the bug being triggered less frequently). If that's not an option, you might look into using the webhook functionality of some log retention add-ons to automate restarting the dyno using our Platform API.