Remove Duplicate Rows in Excel with Python

This quick guide describes the process to remove duplicate rows in Excel with Python. It has the details to set the development environment, a list of steps to write the application, and a sample code to eliminate duplicate rows in Excel with Python. It will discuss various options to remove duplicate rows by setting different parameters.

Steps to Delete Duplicate Lines in Excel with Python

Set the IDE to use Aspose.Cells for Python via Java to remove duplicate rows
Load the Excel file into the Workbook object
Access the Cell collection from the selected sheet
Call the removeDuplicates() method to remove all duplicate rows from a sheet
Call the removeDuplicates() method with a defined range to eliminate duplicate lines
Call the removeDuplicates() method with a range of cells and flag for headers and target columns
Save the output

The above steps explain the process to delete duplicate entries in Excel with Python. Begin the process by loading the source Excel file into the Workbook class object and calling different variants of the removeDuplicates() method. Save the output Excel file after deleting the desired repeated rows from the source file.

Code to Delete Repeated Rows in Excel with Python

-import jpype
-import asposecells as cells
-jpype.startJVM()
-from asposecells.api import License, Workbook
-# Instantiate a license
-license = License()
-license.setLicense("License.lic")
-book = Workbook("removeduplicates.xlsx")
-# Remove duplicates from the entire sheet
-book.getWorksheets().get(1).getCells().removeDuplicates()
-# Remove duplicate from the defined range
-book.getWorksheets().get(0).getCells().removeDuplicates(0,7,5,10)
-# Remove Duplicates based on data from the selected columns
-cols = [ 0, 3 ]
-book.getWorksheets().get(0).getCells().removeDuplicates(0, 0, 6, 3,True,cols)
-# Save result
-book.save("removeduplicates-result.xlsx")
-print("Duplicate rows removed successfully")

view raw Remove Duplicate Rows in Excel with Python.py hosted with ❤ by GitHub

This code explains how to delete duplicate records in Excel with Python. The removeDuplicates() method without any arguments will remove all the repeated rows from the target sheet, and the second overload with starting and ending cells defines the range of the cells from which you want to delete. Another overload takes a range of cells, the flag to show the presence of a header in the data, and finally, the list of column indexes where you want to compare data within the given range.

This article has taught us how to eliminate duplicate entries in Excel with Python. To remove formulas from the Excel file, refer to the article on how to remove formula in Excel using Python.

Aspose Knowledge Base

Find Answers by API

Remove Duplicate Rows in Excel with Python

Steps to Delete Duplicate Lines in Excel with Python

Code to Delete Repeated Rows in Excel with Python