Excel: Find Duplicates.. Do Not Remove Them!


Find_DuplicateOne of the finer tasks in Excel is working with lists. When working with lists, one of the most common questions is “are there any duplicates”. Now most people know that there is a button at their disposal the famous “Remove Duplicates”. This is great but it just removes them. How about if you want to inspect the data before it is removed? There are various ways to find duplicates in your lists. Conditional formatting offers a great way to highlight duplicate values in your list. Personally I have been using a simple function to check for duplicates. It gives me control of what I can do when it finds duplicates and how to handle them. Let me show you what I mean.

Sample data
To give you an idea how to follow the steps I use some sample data from the old Northwind database. Just a couple of rows with data which I have enhanced with some duplicate values so you get the picture.

Find_duplicates_I

Sort the data and add an additional column
The first step I take is sorting the data. My preference is to use the Data tab because it has this great group of functionality you can use at once.

Find_duplicates_IISo sort the data and we are ready for step 2. The second step I take is adding an additional column which I call “Duplicate”.

Find_duplicates_III
Now the third step I take is I add an IF statement (function) in cell A2.
Find_duplicates_IVThe IF function is pretty straight forward. I compare the value of cell B2 with the value of cell B1. If it has the same value it should show “Duplicate”and otherwise leave the cell empty. Once you have entered the function press Enter and copy it down in column A. It should look something like this:

Find_duplicates_VAs you can see it skips the first new value and every value after that which has a duplicate value gets the word Duplicate in front of it.

Duplicates on more columns
I know what you are thinking. This is one column but what if I want to check more columns? Well basically you take the same steps as above. In the example above I checked for duplicates in the first column. Now I will check the first and second column for duplicates. Have a look at the picture below:

Find_duplicates_VI

As you can see I enhanced the function a bit. Using the & -sign I tied two values together to match the other two values that are tied. So basically I say look at the value of cell C2 and stitch the value of cell D2 to it. These combined values should be equal to the values of cell C1 & D1. If this is the same value then show “Duplicate” otherwise leave empty.

Finding a whole duplicate row
Knowing what we know now we can easily look for a duplicate row. Just concatenate all the values from every column using the & -sign. Then compare that with all the concatenated values of all cells from the row above. Copy that formula down and you are good to go.

Find_duplicates_VII

Concluding
Working with lists of data is currently one of the most common tasks to do at a job. When you are like me and working with lots of data having the knowledge to de-dup is mandatory. Find the best way that fits your needs. For me it is the simple IF-function that does the trick without having them removed straight away 🙂

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s