How to Read PDF Metadata using Python

This quick tutorial explains how to read PDF metadata using Python. It contains detailed information to set the environment for developing the application, a stepwise procedure, and a runnable sample code for extracting metadata from PDF using Python. You will learn how easy it is to write the application and access the metadata information from the PDF using very few API calls without installing any third-party tool in any of the Python-supported environments.

Steps to Read PDF Metadata using Python

  1. Establish the environment to use Aspose.PDF for Python via .NET to read metadata
  2. Load the source PDF file using the Document class object to fetch metadata
  3. Get access to the DocumentInfo class object containing the PDF metadata
  4. Access a few information properties and display them on the console

These steps describe the process to view PDF metadata using Python. First, you need to load the target PDF file and then access the DocumentInfo property named ‘Info’ in the Document class. This object has all the metadata in the PDF like creator, modification time zone, producer, creation date, and modification date.

Code to Get PDF Metadata using Python

This code simply demonstrates the procedure to fetch PDF metadata using Python. The DocumentInfo class object is accessed from the loaded document that has a number of metadata information like the trapped flag, title, subject, keywords of the document, and author. If you want to add these properties you may use DocumentInfo.add() method, use clear() method to clear the metadata, and use remove() method to remove specified metadata only.

This article has described the process to retrieve metadata from the PDF. If you want to learn the process to read the PDF contents, refer to the article on how to read PDF content in Python.

 English