Extract Data from PDF Form using Python

This article guides on how to extract data from PDF form using Python. It has all the details to set the IDE, a list of steps, and a sample code for accessing form fields data. The sample code will create a test PDF with fields and values, and fetch data in all the fields.

Steps to Extract Data from PDF Form Fields using Python

Establish the environment for using Aspose.PDF for Python via .NET to extract form data
Create or load a PDF file into a Document object with input fields containing data
Fetch all the fields from the form property of the loaded PDF document
Parse through all the fields and access each field
Display the field full name and value

These steps describe how to extract data from fillable PDF using Python. Create or load a PDF file with fields and values, and access the collection of fields from the Form property in the PDF file. Iterate through all the fields and access full name and value for processing.

Code to Extract Form Fields from PDF using Python

	import aspose.pdf as pdf
	from aspose.pdf import Document, License, Rectangle
	from aspose.pdf.forms import TextBoxField
	def main():
	# Load Aspose PDF license
	license = License()
	license.set_license("license.lic")

	# Generate PDF with input fields
	create_pdf_with_fields()

	# Open and process the generated PDF file
	pdf_document = Document("UserForm.pdf")

	# Retrieve and display form fields
	form_fields = pdf_document.form.fields
	for form_field in form_fields:
	print("Field Name:", form_field.full_name)
	print("Field Content:", form_field.value)

	def create_pdf_with_fields():
	# Instantiate new PDF document
	pdf_file = Document()

	for page_index in range(1, 4): # 3 pages
	new_page = pdf_file.pages.add()

	for field_index in range(1, 5): # 4 fields per page

	# Define a text input field
	input_field = TextBoxField(new_page, Rectangle(120, field_index * 90, 320,(field_index + 1) * 90,True))
	input_field.partial_name = f"inputField_{page_index}_{field_index}"
	input_field.value = f"Data Entry {page_index}-{field_index}"

	# Attach field to the document form
	pdf_file.form.add(input_field, page_index)
	# Save document to disk
	pdf_file.save("UserForm.pdf")
	main()

view raw Extract Data from PDF Form using Python.py hosted with ❤ by GitHub

This code has demonstrated how to extract data from PDF form. We have used Document.form.fields collection that contains all the fields in PDF. You can filter the fields from a particular page by using the page_index in the Field object accessed from the collection.

This article has taught us the process to read PDF form data. If you want to flatten a PDF file, refer to the article on How to flatten PDF in Python.

Aspose Knowledge Base

Find Answers by API

Extract Data from PDF Form using Python

Steps to Extract Data from PDF Form Fields using Python

Code to Extract Form Fields from PDF using Python