Issue
You want to capture a logical backup of a large (20GB+) Heroku Postgres database.
Resolution
Heroku PGBackups (ref: https://devcenter.heroku.com/articles/heroku-postgres-backups) are primarily intended for databases under moderate load, up to 20GB in size. If you wish to capture a logical backup for databases larger than this, or under heavy load, please consider the following suggestions.
Capture against a fork
Logical backups should be taken against a short-lived fork (ref: https://devcenter.heroku.com/articles/heroku-postgres-fork) of your Heroku Postgres database. This allows you to place the extra load produced by capturing a logical backup on the fork, without impacting your production database and followers.
Capture from a large one-off dyno
For logical backups that are failing using pg:backups
, you may wish to consider using pg_dump
on a detatched one-off dyno (ref: https://devcenter.heroku.com/articles/one-off-dynos) of sufficient compute power, like a Performance-L (ref: https://devcenter.heroku.com/articles/dyno-types).
You can generate a dump that is the same as pg:backups
with pg_dump -F c --no-acl --no-owner --quote-all-identifiers $DATABASE_URL
, and optionally use the --jobs
flag to parallelise the dump operation. You can also adjust the amount of compression with the --compress
flag. More can be found on this configuration at https://www.postgresql.org/docs/current/app-pgdump.html.
If the backup will take longer than the one-off dyno timeout then you will need to either ensure that the session remains active while the backup is in progress, or use a detached dyno as described in the docs.
After the dump is complete, you will need to transfer the dump to long term storage of your choice. When using one-off dynos, you will have a limit of 24 hours to complete the dump and upload.
Capture from a VPS/server
For logical backups that will take longer than 24 hours, or otherwise exceed the limitations of a one-off dyno, you may wish to consider using a VPS, or a physical server should you own one. You will need to install the relevant Postgres tools and provide sufficient access to your Heroku Postgres instance.
Limitations
As pg_dump
runs using a Heroku Postgres credential when passing the connection string, information stored in the system catalog, like roles (ref: https://devcenter.heroku.com/articles/heroku-postgresql-credentials) are not included in the generated dump.
pg_dump
performance suffers greatly when large objects (ref: https://www.postgresql.org/docs/current/largeobjects.html) are being used, or if there are very many database objects, normally due to a large number (50+) of schema.
Alternatives
There are also third-party solutions in the Add-ons Marketplace that can help with this process.