Finding records > Finding duplicate values using a self-join relationship
 

Finding duplicate values using a self-join relationship

This procedure uses a self-join relationship and a calculation field referencing the relationship to identify all duplicate records except the first instance, according to the order in which the records were created. Once this system is set up, duplicate records will be identified as they are created.

To find duplicate records except the first instance:

1. If you plan to delete the duplicate records that you find, make a backup copy of the file.

See Saving and copying files.

2. Identify a field that determines a unique entity in your file.

For example, in a Contacts database, use an Employee ID field. You can also create a calculation field with a calculation (returning a text result) that combines data from several fields to make a unique identifier. For example, First Name & Last Name & Phone Number. Don’t use a repeating field.

3. Define a self-join relationship.

Use your chosen identifying field as the match field in both tables in the relationship. See Creating and changing relationships.

4. Create a new calculation field named Check Duplicate with the formula:

If(Count(<name of the table’s second occurrence>::<match field>) > 1; "Duplicates"; "Unique")

5. Click Show All in the status toolbar.

6. Perform a find for Duplicates in the Check Duplicates field.

All records with duplicates are marked Duplicates.