You want to capture a logical backup of a large (20GB+) Heroku Postgres database.
Heroku PGBackups are primarily intended for databases under moderate load, up to 20GB in size. If you wish to capture a logical backup for databases larger than this, or under heavy load, please consider the following suggestions.
Capture against a fork
Logical backups should be taken against a short-lived fork of your Heroku Postgres database (see: Forking your Database). This allows you to place the extra load produced by capturing a logical backup on the fork, without impacting your production database and followers.
Capture from a large one-off dyno
Note that a one-off dyno will terminate once the process it is running has completed, so a detached dyno will need to run a script that handles all necessary tasks (including transferring the backup to persistent storage) rather than calling
You can generate a dump that is the same as
pg_dump -F c --no-acl --no-owner --quote-all-identifiers $DATABASE_URL
You can optionally use the
--jobs flag to parallelise the dump operation, although this requires changing the output format to "directory" by using the
-F d argument instead of
-F c. You can also adjust the amount of compression with the
--compress flag. For further details consult the PostgreSQL documentation for
If the backup will take longer than the one-off dyno timeout then you will need to either ensure that the session remains active while the backup is in progress, or use a detached dyno as described in the docs.
After the dump is complete, you will need to transfer the dump to long term storage of your choice. When using one-off dynos, you will have a limit of 24 hours to complete the dump and upload.
Capture from a VPS/server
For logical backups that will take longer than 24 hours, or otherwise exceed the limitations of a one-off dyno, you may wish to consider using a VPS, or a physical server should you own one. You will need to install the relevant Postgres tools and provide sufficient access to your Heroku Postgres instance.
pg_dump runs using a Heroku Postgres credential when passing the connection string, information stored in the system catalog, like roles (ref: https://devcenter.heroku.com/articles/heroku-postgresql-credentials) are not included in the generated dump.
pg_dump performance suffers greatly when large objects (ref: https://www.postgresql.org/docs/current/largeobjects.html) are being used, or if there are very many database objects, normally due to a large number (50+) of schema.
There are also third-party solutions in the Add-ons Marketplace that can help with this process.