Understanding Heroku Postgres Out of Memory Errors

Issue

As a user of Heroku Postgres, I am getting out of memory errors, but metrics are still showing memory in cached and free. I'd like to understand why that is.

Resolution

Postgres out-of-memory (OOM) errors can be surprising to users, especially when there is still memory available in the memory-cached. The reason this happens is due to a combination of VM settings and Postgres preference for contiguous memory allocation. The effect of this is that smaller Postgres plans may run out of memory with half of their memory still in memory-cached. This does not mean that Postgres is not using this memory, as Postgres is able to leverage the OS cache (a majority of memory-cached) to improve query performance. It does mean that Postgres is prevented from using this memory directly for things like creating new connections or complex querying (which may use work_mem).

Understanding Postgres Memory Pressure and Out of Memory Errors

In order to monitor memory pressure on your server, you will want to track the amount of memory available to Postgres and other server processes. The two most important metrics for doing so, memory-free and memory-cached, come from the Postgres Metrics Logs.

Server memory is a valuable resource and so in order not to waste it an operating system prefers to use all of the memory available to it. It will use this memory either by granting it to a process that request it, or allocating it to memory-cached in order to improve overall performance. Because of this, it is not uncommon to see low numbers for the memory-free metric. These low numbers don't necessarily indicate that your server is low on memory, so in order to understand if there is memory contention we must look at another metric, memory-cached.

memory-cached represents the amount of memory held in the operating system cache. It is common to see large portions of your total-memory being held by memory-cached. This is a feature, not a bug. It means that Postgres PIDs do not currently need additional memory and the operating system is using it to optimize read and write performance.

In the event that Postgres needs additional memory, the operating system will begin to free cached memory. When this happens you should see an upward spike in the memory-free metric as well as a similar downward spike in utilization for the memory-cached metric.

The operating system is not able to free up memory indefinitely though. After certain limits are reached the OS will no longer be able to move memory from the memory-cached bucket into the memory-free bucket. This why you may see some of the smaller Postgres instances run out of memory with close to half of their memory in the memory-cached bucket.

When this happens your Postgres instance will begin reporting Out of Memory (OOM) errors. These errors, as you would expect, indicate that the Postgres processes are no longer able to obtain the memory from memory-free that they are requesting.

If this happens frequently - you will want to consider upgrading your Postgres plan in order to take advantage of larger amounts of memory.

What is the best way to monitor memory pressure on an instance?

The best way to monitor memory pressure on a Postgres instance is to monitor the memory-free and memory-cached metrics. As discussed above, when a Postgres process requests more memory, the operating system will release cached memory to free memory where it can then be consumed by the process requesting it. When the operating system is no longer able to release cached memory and there is very small amounts of memory available in free memory, Postgres will throw "out of memory" errors. This can begin to happen when memory-cached + memory-free drops below ~50% of the databases memory-total. On smaller plans this may be skewed closer to 60% and on larger plans closer to 40%.

Why is my Postgres server out of memory with memory still available in memory-free and memory-cached?

Due to configuration (overcommit_ratio) on the server meant to protect Postgres from being killed unexpectedly, not all of the memory in the memory-cached is available to be used by Postgres processes directly.

This memory still benefits Postgres processes due to way that Postgres relies on the operating system's page cache when its own shared buffers cache is full.

Ask on Stack Overflow

Engage with a community of passionate experts to get the answers you need

Ask on Stack Overflow

Heroku Support

Create a support ticket and our support experts will get back to you

Contact Heroku Support