This article provides an overview of how to clean metadata from Word document in Python. It has details about setting the development environment to develop and run the attached sample code along with the programming steps and a sample runnable code for removing metadata from Word in Python. You will also learn the difference between removing the custom properties and the built-in properties.
Steps to Remove Metadata From Word Document in Python
- Set the IDE to use Aspose.Words for Python via .NET to remove the metadata
- Load the DOC file using the Document class object
- Get access to the custom properties collection using the custom_document_properties property
- Call the clear() method in the collection to remove the properties and values both at a time
- Access the built-in properties collection using the built_in_document_properties property
- Call the clear() method in the collection to clear the values only
- Save the resultant Word file
The listed steps offer a systematic approach to clean metadata from Word document in Python. The process is quite easy as first the target file is loaded and subsequently the custom and built-in properties collections are accessed. Both collections contain the clear() method that can be used to remove the properties.
Code to Clear Metadata from Word Document in Python
This concise code illustrates the process to remove all document properties and personal information in Python. The custom_document_properties contains the clear() method that can remove the property and its value both whereas the clear() method in the built_in_document_properties collection only clears the values and does not remove the property itself. Once the properties are cleared you may manipulate the output file further using a number of properties in the Document class before saving the output file.
This code has educated us on how to remove all document properties and personal information in Python. If you want to remove the comments from a Word file, refer to the article on how to remove comments in Word using Python.