Extract Data from PDF Form using C#

This article guides on how to extract data from PDF form using C#. It has details to set the IDE, a list of steps and a sample code demonstrating the process to extract form fields from pdf using C#. It will explain the process of extracting the field names and values from the loaded PDF.

Steps to Extract Data from PDF Form Fields using C#

  1. Set the environment to use Aspose.PDF for .NET to read form data
  2. Create an empty PDF document, add multiple pages and multiple fields with data for testing
  3. Load the PDF file with fields into the Document object
  4. Access the form object from the loaded Document
  5. Parse through each field in the Form and access the information
  6. Display field partial name and value

These steps summarize the process to export data from PDF form using C#. Create a PDF file and add fields with values in it, or load an existing PDF file with form fields. Access the field collection using the Document.Form object and display the field name and value.

Code to Extract Data from Fillable PDF using C#

using Aspose.Pdf;
License lic = new License();
lic.SetLicense("license.lic");
AddTextBoxFieldToPdf();
// Open PDF document
using (var pdfDoc = new Document("TextBox_out.pdf"))
{
// Get values from all fields
foreach (Aspose.Pdf.Forms.Field field in pdfDoc.Form)
{
Console.WriteLine("Field Title : {0} ", field.PartialName);
Console.WriteLine("Field Data : {0} ", field.Value);
}
}
void AddTextBoxFieldToPdf()
{
// Open PDF document
using (var document = new Aspose.Pdf.Document())
{
for(int iPage = 1; iPage < 5; iPage++)
{
var page = document.Pages.Add();
for(int i = 1; i <= 5; i++)
{
// Create a field
var textBoxField = new Aspose.Pdf.Forms.TextBoxField(page,
new Aspose.Pdf.Rectangle(100, i * 100, 300, (i + 1) *100));
textBoxField.PartialName = $"textbox{iPage}{i}";
textBoxField.Value = $"Text Box {iPage}{i} Value";
document.Form.Add(textBoxField, iPage);
}
}
// Save PDF document
document.Save("TextBox_out.pdf");
}
}

This code has demonstrated how to extract data from PDF form using C#. You can access all the controls on the Form, including the textbox, radio button, and combo box. Note that the Form contains all the fields in the PDF and provides access to fields on all the pages in the loaded PDF file.

This article has taught us the process of accessing all the fields from a PDF file. To extract fonts from a PDF file, refer to the article on Extract Font from PDF using C#.

 English