How to Read PDF Table in C#

This short how-to tutorial guides on how to read PDF table in C# and read all the contents inside it. It provides a detailed description for parsing all the tables in a PDF file and then accessing each individual row and cell of a particular table. In order to read table from PDF C# code comprise of few lines whereby the source PDF file is loaded and then all the tables are parsed for reading contents.

Steps to Read PDF Table in C#

Add a reference to Aspose.PDF for .NET to read table data in the PDF
Load the source PDF file using the Document class object
Instantiate the TableAbsorber class object and read all tables from the desired PDF page
Iterate through all the rows in the target PDF table
Iterate all the cells in each row and fetch all text fragments
Display or process each text fragment in a cell

A systematic approach is followed in these steps to read PDF table in C#, where initially the PDF file is loaded and then all the tables are parsed using the TableAbsorber class object. Once all the tables are visited in the PDF file, you may get the reference to any of the tables in the parsed collection. You can access any table, row, cell, and text fragment in a PDF file to process or display it.

Code to Read PDF Table in C#

	using System;
	using Aspose.Pdf;
	using Aspose.Pdf.Text;

	namespace ReadPDFTableInCSharp
	{
	class Program
	{
	static void Main(string[] args)
	{
	// Instantiate the license to avoid trial limitations while reading table data from PDF
	License asposePdfLicense = new License();
	asposePdfLicense.SetLicense("Aspose.pdf.lic");

	// Load source PDF document having a table in it
	Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(@"PdfWithTable.pdf");

	// Declare and initialize TableAbsorber class object for reading table from the PDF
	Aspose.Pdf.Text.TableAbsorber tableAbsorber = new Aspose.Pdf.Text.TableAbsorber();

	// Parse all the tables from the desired page in the PDF
	tableAbsorber.Visit(pdfDocument.Pages[1]);

	// Get reference to the first table in the parsed collection
	AbsorbedTable absorbedTable = tableAbsorber.TableList[0];

	// Iterate through all the rows in the PDF table
	foreach (AbsorbedRow pdfTableRow in absorbedTable.RowList)
	{
	// Iterate through all the cells in the pdf table row
	foreach (AbsorbedCell pdfTableCell in pdfTableRow.CellList)
	{
	// Fetch all the text fragments in the cell
	TextFragmentCollection textFragmentCollection = pdfTableCell.TextFragments;

	// Iterate through all the text fragments
	foreach (TextFragment textFragment in textFragmentCollection)
	{
	// Display the text
	Console.WriteLine(textFragment.Text);
	}
	}
	}
	System.Console.WriteLine("Done");
	}
	}
	}

view raw How to Read PDF Table in C#.cs hosted with ❤ by GitHub

In this sample code using C# parse PDF table is made possible using the TableAbsorber class which is used for reading tables. However, you can also use other options like TextAbsorber, ParagraphAbsorber, FontAbsorber, and TextFragmentAbsorber for accessing different elements of the document. You can either iterate through the entire collection or access individual elements using the array index.

We have learned how to read PDF table in C# in this topic. However, if you want to read PDF bookmarks refer to the article on how to read bookmarks in PDF using C#.

Aspose Knowledge Base

Find Answers by API

How to Read PDF Table in C#

Steps to Read PDF Table in C#

Code to Read PDF Table in C#