This quick tutorial guides on how to find and replace text in PDF using Python. It contains information about configuring the IDE, a step-wise detailed process, and a runnable sample code to find and replace word in pdf using Python. You will also learn the options to search and replace text on all the pages of a PDF or on a particular page as per the application requirements.
Steps to Find and Replace in PDF using Python
- Set the environment to use Aspose.PDF for Python via .NET to replace the text
- Load the target PDF file using the Document class object where data is to be searched and replaced
- Define the text that is to be searched using the TextFragmentAbsorber class object
- Apply the TextAbsorber for all the pages in the PDF using the Document.pages.accept() method
- Get access to the collection of all the searched items in the PDF through the TextFragmentAbsorber.text_fragments property
- Iterate through all the searched text fragments and set new values as per your requirements
- Save the updated PDF file on the disk with updated text
These steps summarize the process to find and replace all in PDF using Python. A TextFragmentAbsorber object is declared by providing a string that is to be searched and then Document.pages.accept() method is called to parse all the pages in the PDF and collect the text fragments containing the target word. Once the found words collection is ready, now you can replace all or selected fragments with the new words as per your needs.
Code to Find and Replace Text in PDF using Python
This code demonstrates the process to implement the feature of PDF search and replace text using Python. This code has used Document.pages.accept()method to search text in the entire PDF however if you want to search and replace text on a particular page only, you may select the page by providing the page index in the Document.pages collection and then call the Page.accept() method. You may also use TextSearchOptions class object as a second argument while instantiating the TextFragmentAbsorber object to customize the search operation.
This article has taught us to find and replace text in a PDF. If you want to learn the process to find and highlight a text in a PDF, refer to the article on how to highlight in PDF using Python.