Extract a Table from PDF to Excel using Java

This quick guide explains how to extract a table from PDF to Excel using Java. You will get information to set the environment, a list of steps, and a sample code to pull table from PDF into Excel using Java. All the necessary details are demonstrated to access each and every table on any PDF page, access text from all the cells and copy the content to the respective cell in the output Workbook.

Steps to Extract Table from PDF to Excel using Java

  1. Set the environment to use Aspose.Total for Java to extract the PDF table to the Excel sheet
  2. Apply the Aspose.Total license for Aspose.PDF and Aspose.Cells product
  3. Load the source PDF file into the Document class object
  4. Create an empty Excel file using the Workbook class from Aspose.Cells
  5. Parse through each page in the PDF and access the table collection on each page
  6. Iterate through all the pages and access each cell one by one
  7. Fetch text from each cell and save the content to the respective row and column in the destination sheet
  8. Autofit the columns in the sheet and save the output Excel file

Follow these steps to get table from PDF to Excel using Java. Commence the process by loading the source PDF file, accessing all the pages in it, parsing each page separately, getting the collection of tables on each page, and accessing each cell in the selected table. Combine the text within a cell to string and save the content in the respective row and column on the particular sheet of the output Excel file.

Code to Extract Excel Table from PDF using Java

This code demonstrates how to extract data from PDF table to Excel using Java. Source PDF text format can be applied to the Excel sheet table by fetching the PDF table cell colour, bold/italic style, font name and size, and setting the same in the Excel cell while writing the content. Take care while handling the merged tables in the PDF and creating the same in the Excel file to keep the table contents organization same.

This article teaches the process of copying text from a PDF table to an Excel sheet table. To convert a scanned PDF to an editable PDF, refer to the article Convert scanned PDF to editable PDF using Java.