How do I rebuild a UNIQUE index if there are duplicate values?

Issue

You need to rebuild a corrupted UNIQUE index, in which the UNIQUE constraint may have been violated.

Resolution

How do I know if the UNIQUE constraint on my index has been violated?

The simplest way to determine whether this has occured is to attempt to rebuild the index normally. There are two approaches:

For smaller indexes, or if it's possible to schedule downtime, using the REINDEX approach described in this article: https://help.heroku.com/RKRZV2NO/how-do-i-fix-a-corrupted-index
For larger indexes where REINDEX might take long enough to cause downtime, it's possible to use the approach described in this article to create a replacement index CONCURRENTLY: https://help.heroku.com/H3LU3G7Z/how-do-i-rebuild-a-non-unique-corrupted-index

The procedure described below is only necessary if rebuilding a UNIQUE index fails due to an error about a duplicate key.

How do I rebuild a Unique Index if there are duplicate rows?

If the uniqueness constraint has been violated you will be unable to use REINDEX. Due to this constraint, resolving corruption involves manual intervention.

The general methodology will be to create two new tables. One table, without a uniqueness constraint, to house the corrupted data. Another table to house the duplicates as they are found.

Note that the approach described below will consume at least as much additional disk space as the original table occupied, and for large tables may take some time to carry out during which there will be a lot of disk IO.

It's also important to note that writes to the existing table should be avoided during this procedure, otherwise there is a risk that records written to the original table after data is copied to the new table will be lost.

Creating your tables

Create a table to house all affected records:

CREATE TABLE <new_table> AS TABLE <affected_table>;

Create a table to store duplicates:

CREATE TABLE duplicates ( LIKE <affected_table> );

Remove Duplicates

Copy duplicates into duplicates table

INSERT INTO duplicates (SELECT * FROM <new_table> WHERE <id> IN (SELECT <id> FROM <new_table> GROUP BY (<id>) HAVING COUNT(*) > 1));

Delete Duplicates

DELETE from <new_table> WHERE <id> in (SELECT <id> FROM duplicates);

Recreate indexes on

Here you will find all of the indexes on the previous table and recreate them.

Query to find all indexes on the old table. Please be sure to replace your table name with the placehold <old-table>

SELECT tablename, indexname, indexdef
FROM
    pg_indexes
WHERE
    schemaname = 'public' and tablename = '<old-table>'
ORDER BY
    tablename,
    indexname;

You can then run the generated query in the indexdef column to recreate the index(es) on the new table.

Example

CREATE UNIQUE INDEX <index-name> ON <schema>.<old-table> USING btree (<column>)

Would become

CREATE UNIQUE INDEX <index-name> ON <schema>.<new-table> USING btree (<column>)

** Please note that we have changed the definition to run a create against the <new-table> as opposed to the <old-table>

If your index is a primary key index, you will need to remove the old primary key constraint and recreate it using the new index you just created.
You should do this inside of a transaction to avoid any writes being performed on the table while you are recreating the primary key:

BEGIN;
-- alter table to drop the old index
ALTER TABLE <table_name> DROP CONSTRAINT <table_name>_pkey;
-- then add the new constraint
ALTER TABLE <table_name> ADD PRIMARY KEY USING INDEX <new-index-name>;
COMMIT;