How to Clean Metadata from PDF using Python

This short guide explains how to clean metadata from PDF using Python. It has the details to set the development environment, a list of programming tasks, and a runnable sample code to strip PDF metadata using Python. It will guide you to remove the default properties one by one and all the custom properties collectively.

Steps to Remove Metadata from PDF using Python

Set the environment to use Aspose.PDF for Python via .NET to clean metadata
Load the source PDF file using the Document class for removing the metadata
Create and use the metadata display function if required
Access the metadata using the DocumentInfo class object
Clear the default metadata using the remove() method
Delete the custom metadata
Save the resultant PDF file

These steps summarize the process to clean metadata from PDF using Python. Load the source PDF file into the Document class followed by accessing the metadata using the DocumentInfo. Remove the default properties by calling the remove() method with the property name as an argument and calling the clear_custom_data() method to remove all the custom properties.

Code to Delete PDF Metadata using Python

	import aspose.pdf as pdf

	# Load License
	license = pdf.License()
	license.set_license("License.lic")

	def DisplayMetadata(info):
	print(f"title:{info.title}")
	print(f"author:{info.author}")
	try:
	print(f"creation_date:{info.creation_date}")
	except:
	print("creation_date is empty")
	print(f"creator:{info.creator}")
	try:
	print(f"mod_date:{info.mod_date}")
	except:
	print("mod_date is empty")
	print(f"producer:{info.producer}")
	print(f"subject:{info.subject}")

	# Open document
	pdfDocument = pdf.Document("Sample.pdf")

	# Access the metadata
	info = pdf.DocumentInfo(pdfDocument)
	DisplayMetadata(info)

	# Clear the default metadata
	info.remove("Title")

view raw How to Clean Metadata from PDF using Python.py hosted with ❤ by GitHub

This code demonstrates how to clear metadata from PDF using Python. We have used the DisplayMetadata() method to optionally display the metadata before and after the removal. The DocumentInfo.remove() method requires the default property name whereas the clear_custom_data() method does not require any argument for deleting the custom properties.

In this article, we have learned the process of developing a PDF metadata removal tool using Python. If you are inclined to remove different restrictions on a PDF file, refers to the article on how to remove restrictions on PDF document in Python.

Aspose Knowledge Base

Find Answers by API

How to Clean Metadata from PDF using Python

Steps to Remove Metadata from PDF using Python

Code to Delete PDF Metadata using Python