Issue
You want to capture a logical backup of a large (20GB+) Heroku Postgres database.
Resolution
Heroku PGBackups are primarily intended for databases under moderate load, up to 20GB in size. If you wish to capture a logical backup for databases larger than this, or under heavy load, please consider the following suggestions.
Capture against a fork
Logical backups should be taken against a short-lived fork of your Heroku Postgres database (see: Forking your Database). This allows you to place the extra load produced by capturing a logical backup on the fork, without impacting your production database and followers.
Capture from a large one-off dyno
For logical backups that are failing using pg:backups
, you may wish to consider using pg_dump
on a detached one-off dyno of sufficient compute power, like a Performance-L (see: Dyno types).
Note that a one-off dyno will terminate once the process it is running has completed, so a detached dyno will need to run a script that handles all necessary tasks (including transferring the backup to persistent storage) rather than calling pg_dump
directly.
You can generate a dump that is the same as pg:backups
with:
pg_dump -F c \
--no-acl \
--no-owner \
--quote-all-identifiers \
--verbose \
--file backup.dump \
--exclude-schema=heroku_ext \
$DATABASE_URL
You can optionally use the --jobs
flag to parallelise the dump operation, although this requires changing the output format to "directory" by using the -F d
argument instead of -F c
. You can also adjust the amount of compression with the --compress
flag. For further details consult the PostgreSQL documentation for pg_dump
.
If the backup will take longer than the one-off dyno timeout then you will need to either ensure that the session remains active while the backup is in progress, or use a detached dyno as described in the docs.
After the dump is complete, you will need to transfer the dump to long term storage of your choice. When using one-off dynos, you will have a limit of 24 hours to complete the dump and upload.
Capture from a VPS/server
For logical backups that will take longer than 24 hours, or otherwise exceed the limitations of a one-off dyno, you may wish to consider using a VPS, or a physical server should you own one. You will need to install the relevant Postgres tools and provide sufficient access to your Heroku Postgres instance.
Limitations
As pg_dump
runs using a Heroku Postgres credential when passing the connection string, information stored in the system catalog, like roles (ref: https://devcenter.heroku.com/articles/heroku-postgresql-credentials) are not included in the generated dump.
pg_dump
performance suffers greatly when large objects (ref: https://www.postgresql.org/docs/current/largeobjects.html) are being used, or if there are very many database objects, normally due to a large number (50+) of schema.
Alternatives
There are also third-party solutions in the Add-ons Marketplace that can help with this process.