如何在 C# 中读取 PDF 表格

这个简短的操作指南指导如何在 C# 中阅读 PDF 表并阅读其中的所有内容。它为解析 PDF 文件中的所有表格，然后访问特定表格的每个单独的行和单元格提供了详细说明。为了从 PDF 中读取表格，C# 代码由几行组成，其中源 PDF 文件被加载，然后解析所有表格以读取内容。

在 C# 中读取 PDF 表格的步骤

添加对 Aspose.PDF for .NET 的引用以读取 PDF 中的表格数据
使用 Document 类对象加载源 PDF 文件
实例化 TableAbsorber 类对象并从所需的 PDF 页面读取所有表格
遍历目标 PDF 表中的所有行
迭代每行中的所有单元格并获取所有文本片段
显示或处理单元格中的每个文本片段

在这些步骤中遵循系统的方法来在 C# 中读取 PDF 表，其中首先加载 PDF 文件，然后使用 TableAbsorber 类对象解析所有表。一旦访问了 PDF 文件中的所有表，您就可以获得对已解析集合中任何表的引用。您可以访问 PDF 文件中的任何表格、行、单元格和文本片段以进行处理或显示。

在 C# 中读取 PDF 表的代码

	using System;
	using Aspose.Pdf;
	using Aspose.Pdf.Text;

	namespace ReadPDFTableInCSharp
	{
	class Program
	{
	static void Main(string[] args)
	{
	// Instantiate the license to avoid trial limitations while reading table data from PDF
	License asposePdfLicense = new License();
	asposePdfLicense.SetLicense("Aspose.pdf.lic");

	// Load source PDF document having a table in it
	Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(@"PdfWithTable.pdf");

	// Declare and initialize TableAbsorber class object for reading table from the PDF
	Aspose.Pdf.Text.TableAbsorber tableAbsorber = new Aspose.Pdf.Text.TableAbsorber();

	// Parse all the tables from the desired page in the PDF
	tableAbsorber.Visit(pdfDocument.Pages[1]);

	// Get reference to the first table in the parsed collection
	AbsorbedTable absorbedTable = tableAbsorber.TableList[0];

	// Iterate through all the rows in the PDF table
	foreach (AbsorbedRow pdfTableRow in absorbedTable.RowList)
	{
	// Iterate through all the cells in the pdf table row
	foreach (AbsorbedCell pdfTableCell in pdfTableRow.CellList)
	{
	// Fetch all the text fragments in the cell
	TextFragmentCollection textFragmentCollection = pdfTableCell.TextFragments;

	// Iterate through all the text fragments
	foreach (TextFragment textFragment in textFragmentCollection)
	{
	// Display the text
	Console.WriteLine(textFragment.Text);
	}
	}
	}
	System.Console.WriteLine("Done");
	}
	}
	}

view raw How to Read PDF Table in C#.cs hosted with ❤ by GitHub

在此示例代码中，使用 C# parse PDF table 可以使用用于读取表格的 TableAbsorber 类。但是，您也可以使用 TextAbsorber、ParagraphAbsorber、FontAbsorber 和 TextFragmentAbsorber 等其他选项来访问文档的不同元素。您可以遍历整个集合或使用数组索引访问单个元素。

我们已经在本主题中学习了如何阅读 C# 中的 PDF 表格。但是，如果您想阅读 PDF 书签，请参阅如何使用 C# 读取 PDF 中的书签上的文章。

Aspose 知识库

查找API的答案

如何在 C# 中读取 PDF 表格

在 C# 中读取 PDF 表格的步骤

在 C# 中读取 PDF 表的代码