Extract Data from PDF Form using Java

This short tutorial describes the process to extract data from PDF form using Java. It shares details to set the IDE, provides a list of steps for writing the program, and a sample code demonstrating how to export data from PDF form using Java. Details are presented to access all or selected fields from the Form and process as per requirement.

Steps to Extract Data from PDF Form Fields using Java

  1. Set the IDE to use Aspose.PDF for Java to extract form data
  2. Create a PDF file with Textbox fields and sample data
  3. Load the PDF file with Form and input fields into the Document object
  4. Access the collection of fields in the Form from the loaded document
  5. Iterate through all the fields and get the full name and value for displaying on the console

These steps explain how to extract form fields from PDF using Java. Create a PDF file with form fields and sample data or load an existing file with form data. Access the field collection from the Form property of the document, iterate through all the fields and display the desired properties.

Code to Extract Data from Fillable PDF using Java

import com.aspose.pdf.*;
public class Main {
public static void main(String[] args) throws Exception {
// Load Aspose PDF license
License license = new License();
license.setLicense("license.lic");
// Generate PDF with input fields
createPdfWithFields();
// Open and process the generated PDF file
Document pdfDocument = new Document("UserForm.pdf");
// Retrieve and display form fields
Field[] formFields = pdfDocument.getForm().getFields();
for (Field formField : formFields) {
System.out.println("Field Name: " + formField.getFullName());
System.out.println("Field Content: " + formField.getValue());
}
// Release resources
pdfDocument.close();
}
private static void createPdfWithFields() {
// Instantiate new PDF document
Document pdfFile = new Document();
for (int pageIndex = 1; pageIndex <= 3; pageIndex++) {
Page newPage = pdfFile.getPages().add();
for (int fieldIndex = 1; fieldIndex <= 4; fieldIndex++) {
// Define a text input field
TextBoxField inputField = new TextBoxField(newPage,
new Rectangle(120, fieldIndex * 90, 320, (fieldIndex + 1) * 90));
inputField.setPartialName("inputField_" + pageIndex + "_" + fieldIndex);
inputField.setValue("Data Entry " + pageIndex + "-" + fieldIndex);
// Attach field to the document form
pdfFile.getForm().add(inputField, pageIndex);
}
}
// Save document to disk
pdfFile.save("UserForm.pdf");
// Free resources
pdfFile.close();
}
}

This code has demonstrated how to extract data from PDF Form using Java. You can access various properties from the Form, such as field alternate name, mapping name, contents, partial name, active state, checked state name, page index, etc. For accessing only selected fields, use the field index such as formFields[1].getValue() for accessing the value of the first field.

In this article, we have processed forms in a PDF file. For extracting fonts from a PDF file, refer to the article on Extract font from PDF with Java.

 English