How can I take a logical backup of large Heroku Postgres databases?

Issue

You want to capture a logical backup of a large (20GB+) Heroku Postgres database.

Resolution

Heroku PGBackups are primarily intended for databases under moderate load, up to 20GB in size. If you wish to capture a logical backup for databases larger than this, or under heavy load, please consider the following suggestions.

Capture against a fork

Logical backups should be taken against a short-lived fork of your Heroku Postgres database (see: Forking your Database). This allows you to place the extra load produced by capturing a logical backup on the fork, without impacting your production database and followers.

Capture from a large one-off dyno

For logical backups that are failing using pg:backups, you may wish to consider using pg_dump on a detached one-off dyno of sufficient compute power, like a Performance-L (see: Dyno types).

Note that a one-off dyno will terminate once the process it is running has completed, so a detached dyno will need to run a script that handles all necessary tasks (including transferring the backup to persistent storage) rather than calling pg_dump directly.

You can generate a dump that is the same as pg:backups with:

pg_dump -F c \
  --no-acl \
  --no-owner \
  --quote-all-identifiers \
  --verbose \
  --file backup.dump \
  --exclude-schema=heroku_ext \
  $DATABASE_URL

You can optionally use the --jobs flag to parallelise the dump operation, although this requires changing the output format to "directory" by using the -F d argument instead of -F c. You can also adjust the amount of compression with the --compress flag. For further details consult the PostgreSQL documentation for pg_dump.

If the backup will take longer than the one-off dyno timeout then you will need to either ensure that the session remains active while the backup is in progress, or use a detached dyno as described in the docs.

After the dump is complete, you will need to transfer the dump to long term storage of your choice. When using one-off dynos, you will have a limit of 24 hours to complete the dump and upload.

Capture from a VPS/server

For logical backups that will take longer than 24 hours, or otherwise exceed the limitations of a one-off dyno, you may wish to consider using a VPS, or a physical server should you own one. You will need to install the relevant Postgres tools and provide sufficient access to your Heroku Postgres instance.

Limitations

As pg_dump runs using a Heroku Postgres credential when passing the connection string, information stored in the system catalog, like roles (ref: https://devcenter.heroku.com/articles/heroku-postgresql-credentials) are not included in the generated dump.

pg_dump performance suffers greatly when large objects (ref: https://www.postgresql.org/docs/current/largeobjects.html) are being used, or if there are very many database objects, normally due to a large number (50+) of schema.

Alternatives

There are also third-party solutions in the Add-ons Marketplace that can help with this process.