This short how-to tutorial guides on how to read PDF table in C# and read all the contents inside it. It provides a detailed description for parsing all the tables in a PDF file and then accessing each individual row and cell of a particular table. In order to read table from PDF C# code comprise of few lines whereby the source PDF file is loaded and then all the tables are parsed for reading contents.
Steps to Read PDF Table in C#
- Add a reference to Aspose.PDF for .NET to read table data in the PDF
- Load the source PDF file using the Document class object
- Instantiate the TableAbsorber class object and read all tables from the desired PDF page
- Iterate through all the rows in the target PDF table
- Iterate all the cells in each row and fetch all text fragments
- Display or process each text fragment in a cell
A systematic approach is followed in these steps to read PDF table in C#, where initially the PDF file is loaded and then all the tables are parsed using the TableAbsorber class object. Once all the tables are visited in the PDF file, you may get the reference to any of the tables in the parsed collection. You can access any table, row, cell, and text fragment in a PDF file to process or display it.
Code to Read PDF Table in C#
In this sample code using C# parse PDF table is made possible using the TableAbsorber class which is used for reading tables. However, you can also use other options like TextAbsorber, ParagraphAbsorber, FontAbsorber, and TextFragmentAbsorber for accessing different elements of the document. You can either iterate through the entire collection or access individual elements using the array index.
We have learned how to read PDF table in C# in this topic. However, if you want to read PDF bookmarks refer to the article on how to read bookmarks in PDF using C#.