Microsoft Excel 2010 : Removing Duplicates, Consolidating Data

Removing Duplicates

After a cell is selected in the dataset, remove the doubles can be found under data, tools of data, removes doubles, or if the whole of data is a table, under tools of Table, design, tools, remove doubles.

The tool removes in a permanent way of the data of a table based on the columns chosen in the dialog of doubles of elimination. Unlike other filters, it does not hide simply the lines. For this reason, you can want to copy the data before removing the doubles.

Removing Duplicates from a Dataset

To remove duplicates from a dataset, follow these steps:

Select a cell in the dataset.

Go to Data, Data Tools, Remove Duplicates.

Excel will highlight the dataset. If columns are missing in the selection, go back and make sure there are no blank separating columns.

From the Remove Duplicates dialog, make sure My Data Has Headers is selected if the dataset has headers.

By default, all the columns are selected. A selected column means the tool will use the columns when looking for duplicates. Duplicates in an unselected column will be ignored. Select the columns to search for duplicates.

Click OK. The dataset will update, deleting any duplicate rows. A message box will appear informing you of the number of rows deleted and the number remaining in the dataset.


Consolidating Data

You can use the Consolidate tool, found under Data, Data Tools, to combine data in three ways:

  • By Position— Sum data found on different sheets or in different workbooks based on their positions in the datasets. For example, if the ranges are A1:A10 and C220:C230, the results will be A1+C220, A2+C221, A3+C222, and so on. Do not select either of the options under Use Labels In.

    [1] The function applied to the data can be any available from the drop-down list in the Consolidate dialog. These include Sum, Count, Average, and more.

  • By Category— Sum data found on different sheets or in different workbooks based on matching row and column labels, similar to a pivot table report. The references must include the labels in the leftmost column of the ranges. Select either or both of the options under Use Labels In to have the labels appear in the final data.

  • By Column— Combine the data to a new sheet, with each dataset in its own column. Select the Top Row option under Use Labels In.

The Reference field is where the datasets are entered. Click Add to add the selection to the All References list. If the dataset is in a closed workbook, you can reference it only by using a range name. Click the Browse button to find and select the workbook. After the exclamation point (!) at the end of the path, enter the range name assigned to the dataset.

Create Links to Source Data applies only to external workbooks. If this option is selected, the consolidated data will update automatically when the source is changed. Also, the consolidated data will be grouped, as shown in Figure 1. Click the + icon to the left of the data to open the group and see the data used in the summary.

Figure 1. When linking to an external workbook, the consolidation will include the individual values of the selected references.

Consolidating Duplicate Data by Category

To combine duplicate data from within the same workbook, follow these steps:

Select the top leftmost cell where the consolidated report should be placed. If other data is on the sheet, make sure there is enough room for the new data.

Go to Data, Data Tools, Consolidate.

Select the desired function from the Function drop-down.

Place the cursor in the Reference field.

Go to the sheet with the desired dataset.

Select the dataset, making sure the duplicated labels are in the leftmost column and that the column headers are included in the selection.

Click the Add button.

Repeat steps 4 to 7 for each additional dataset.

To include the top and/or left column labels, select the corresponding option.

Click OK.