How do I rebuild a UNIQUE index if there are duplicate values?

Issue

You need to rebuild a corrupted UNIQUE index, in which the UNIQUE constraint may have been violated.

Resolution

How do I know if the UNIQUE constraint on my index has been violated?

The simplest way to determine whether this has occured is to attempt to rebuild the index normally. There are two approaches:

The procedure described below is only necessary if rebuilding a UNIQUE index fails due to an error about a duplicate key.

How do I rebuild a Unique Index if there are duplicate rows?

If the uniqueness constraint has been violated you will be unable to use REINDEX. Due to this constraint, resolving corruption involves manual intervention.

The general methodology will be to create two new tables. One table, without a uniqueness constraint, to house the corrupted data. Another table to house the duplicates as they are found.

Note that the approach described below will consume at least as much additional disk space as the original table occupied, and for large tables may take some time to carry out during which there will be a lot of disk IO.

It's also important to note that writes to the existing table should be avoided during this procedure, otherwise there is a risk that records written to the original table after data is copied to the new table will be lost.

Creating your tables

Create a table to house all affected records:

CREATE TABLE <new_table> AS TABLE <affected_table>;

Create a table to store duplicates:

CREATE TABLE duplicates ( LIKE <affected_table> );

Remove Duplicates

Copy duplicates into duplicates table

INSERT INTO duplicates (SELECT * FROM <new_table> WHERE <id> IN (SELECT <id> FROM <new_table> GROUP BY (<id>) HAVING COUNT(*) > 1));

Delete Duplicates

DELETE from <new_table> WHERE <id> in (SELECT <id> FROM duplicates);

Recreate indexes on

Here you will find all of the indexes on the previous table and recreate them.

Query to find all indexes on the old table. Please be sure to replace your table name with the placehold <old-table>

SELECT tablename, indexname, indexdef
FROM
    pg_indexes
WHERE
    schemaname = 'public' and tablename = '<old-table>'
ORDER BY
    tablename,
    indexname;

You can then run the generated query in the indexdef column to recreate the index(es) on the new table.

Example

CREATE UNIQUE INDEX <index-name> ON <schema>.<old-table> USING btree (<column>)

Would become

CREATE UNIQUE INDEX <index-name> ON <schema>.<new-table> USING btree (<column>)

** Please note that we have changed the definition to run a create against the <new-table> as opposed to the <old-table>

If your index is a primary key index, you will need to remove the old primary key constraint and recreate it using the new index you just created.
You should do this inside of a transaction to avoid any writes being performed on the table while you are recreating the primary key:

BEGIN;
-- alter table to drop the old index
ALTER TABLE <table_name> DROP CONSTRAINT <table_name>_pkey;
-- then add the new constraint
ALTER TABLE <table_name> ADD PRIMARY KEY USING INDEX <new-index-name>;
COMMIT;

I have questions, where can I read more?

We have an article answering FAQs and providing more detail about index corruption that occurred in March 2019:

Frequently Asked Questions about index corruption

Ask on Stack Overflow

Engage with a community of passionate experts to get the answers you need

Ask on Stack Overflow

Heroku Support

Create a support ticket and our support experts will get back to you

Contact Heroku Support