Remove Highlight from PDF using Python

This article explains how to remove highlight from PDF using Python. It has details to set the IDE, a list of steps, and a sample code demonstrating how to remove highlight in PDF using Python with different criteria. It will assist you to select particular Highlights or all highlights for deletion.

Steps to Remove Highlight on PDF using Python

  1. Establish the environment for writing Python code to remove highlights using Aspose.PDF for Python via .NET
  2. Load the sample input PDF in the Document object with multiple highlighted text in it
  3. Parse through all the pages in the PDF and collect all the target annotations
  4. For removing all the Highlight annotations, mark all the annotations for removal
  5. For removing selected highlights, mark only those annotations matching the target color
  6. Remove all the annotations which are marked for deletion from each page using the delete() method in the annotations collection
  7. Save the output PDF file

These steps summarize how to delete highlights in PDF using Python. Set the environment, load the source PDF, parse through all the pages in the PDF, access collection of annotations on each page, and mark the annotation for deletion according to the selected criteria. Finally, delete the annotations from each page and save the resultant PDF file on the disk after deleting the desired highlights from the PDF.

Code to Remove PDF Highlight using Python

The above code demonstrates how to remove PDF highlight using Python. Note that you have to mention the target color in HEX format. You can filter the highlights by page number or other parameters exposed in the Page class.

This article has guided us to erase highlights from the text in a PDF. To convert a PS file to PDF, refer to the article Convert a PS file to PDF using Python.