This short guide explains how to clean metadata from PDF using Python. It has the details to set the development environment, a list of programming tasks, and a runnable sample code to strip PDF metadata using Python. It will guide you to remove the default properties one by one and all the custom properties collectively.
Steps to Remove Metadata from PDF using Python
- Set the environment to use Aspose.PDF for Python via .NET to clean metadata
- Load the source PDF file using the Document class for removing the metadata
- Create and use the metadata display function if required
- Access the metadata using the DocumentInfo class object
- Clear the default metadata using the remove() method
- Delete the custom metadata
- Save the resultant PDF file
These steps summarize the process to clean metadata from PDF using Python. Load the source PDF file into the Document class followed by accessing the metadata using the DocumentInfo. Remove the default properties by calling the remove() method with the property name as an argument and calling the clear_custom_data() method to remove all the custom properties.
Code to Delete PDF Metadata using Python
This code demonstrates how to clear metadata from PDF using Python. We have used the DisplayMetadata() method to optionally display the metadata before and after the removal. The DocumentInfo.remove() method requires the default property name whereas the clear_custom_data() method does not require any argument for deleting the custom properties.
In this article, we have learned the process of developing a PDF metadata removal tool using Python. If you are inclined to remove different restrictions on a PDF file, refers to the article on how to remove restrictions on PDF document in Python.