How to Read PDF Table in Java

This tutorial provides details on how to read PDF table in Java and access text belonging to each cell inside the desired table. You will have full control to refer to a particular table in the target page of the PDF and parse through all the rows and cells to retrieve the data. To write this PDF table reader in Java no other third-party tool or software is required.

Steps to Read PDF Table in Java

Configure your PDF table reader application to add Aspose.PDF from the Maven repository
Load the sample PDF file containing a table in it using the Document class object
Instantiate and initialize the TableAbsorber object to fetch all the PDF tables from the selected PDF page
Iterate through all the rows in the desired table
Iterate through all the cells in the desired row and fetch all the text fragments from each cell
Display the text fetched from the cell

These steps explain how using Java extract table from PDF along with the information about the required libraries which are to be added to the project. It also states the order of operations to complete the task like first loading the PDF, then accessing a particular page, and fetching the desired table. Finally, parse through all the rows and cells to get the information.

Code to Read PDF Table in Java

	import com.aspose.pdf.License;
	import com.aspose.pdf.AbsorbedCell;
	import com.aspose.pdf.AbsorbedRow;
	import com.aspose.pdf.AbsorbedTable;
	import com.aspose.pdf.Document;
	import com.aspose.pdf.TableAbsorber;
	import com.aspose.pdf.TextFragmentCollection;

	public class ReadPDFTableInJava {

	public static void main(String[] args) throws Exception { // main function for reading PDF table data in ReadPDFTableInJava

	// For avoiding the trial version limitation, load the Aspose.PDF license prior to reading table data
	License licenseForHtmlToPdf = new License();
	licenseForHtmlToPdf.setLicense("Aspose.Pdf.lic");

	// Load a source PDF document which contains a table in it
	Document pdfDocument = new Document("PdfWithTable.pdf");

	// Instantiate the TableAbsorber object for PDF tables extraction
	TableAbsorber tableAbsorber = new TableAbsorber();

	// visit the table collection in the input PDF
	tableAbsorber.visit(pdfDocument.getPages().get_Item(1));

	// Access the desired table from the tables collection
	AbsorbedTable absorbedTable = tableAbsorber.getTableList().get(0);

	// Parse all the rows and get each row using the AbsorbedRow
	for (AbsorbedRow pdfTableRow : absorbedTable.getRowList())
	{
	// Access each cell in the cells collection using AbsorbedCell
	for (AbsorbedCell pdfTableCell : pdfTableRow.getCellList())
	{
	// Access each text fragment from the cell
	TextFragmentCollection textFragmentCollection = pdfTableCell.getTextFragments();

	// Access each text fragment from the fragments collection
	for (com.aspose.pdf.TextFragment textFragment : textFragmentCollection)
	{
	// Display the table cell text
	System.out.println(textFragment.getText());
	}
	}
	}

	System.out.println("Done");
	}
	}

view raw How to Read PDF Table in Java.java hosted with ❤ by GitHub

To extract table from PDF Java code is provided here that uses TableAbsorber and AbsorbedTable classes to handle the tables in PDF. It also uses AbsorbedRow and AbsorbedCell classes for managing rows and columns before using the TextFragment class for fetching the cell data. Also, there are many other absorber classes available for different elements in the document like fonts, paragraphs, text, and text fragments.

This article has described that by using Java PDF table extraction can be performed in a few steps. If you want to learn how to read text and images from a PDF file, refer to the article on how to read PDF file in Java.

Aspose Knowledge Base

Find Answers by API

How to Read PDF Table in Java

Steps to Read PDF Table in Java

Code to Read PDF Table in Java