Como ler a tabela PDF em C#

Este breve tutorial de instruções orienta como ler a tabela PDF em C# e ler todo o conteúdo dentro dela. Ele fornece uma descrição detalhada para analisar todas as tabelas em um arquivo PDF e, em seguida, acessar cada linha e célula individual de uma tabela específica. Para ler a tabela do PDF, o código C# é composto por poucas linhas nas quais o arquivo PDF de origem é carregado e, em seguida, todas as tabelas são analisadas para ler o conteúdo.

Etapas para ler a tabela PDF em C#

Adicione uma referência a Aspose.PDF for .NET para ler os dados da tabela no PDF
Carregue o arquivo PDF de origem usando o objeto de classe Document
Instancie o objeto de classe TableAbsorber e leia todas as tabelas da página PDF desejada
Iterar por todas as linhas na tabela PDF de destino
Iterar todas as células em cada linha e buscar todos os fragmentos de texto
Exibir ou processar cada fragmento de texto em uma célula

Uma abordagem sistemática é seguida nestas etapas para ler a tabela PDF em C#, onde inicialmente o arquivo PDF é carregado e, em seguida, todas as tabelas são analisadas usando o objeto de classe TableAbsorber. Uma vez que todas as tabelas são visitadas no arquivo PDF, você pode obter a referência a qualquer uma das tabelas na coleção analisada. Você pode acessar qualquer tabela, linha, célula e fragmento de texto em um arquivo PDF para processá-lo ou exibi-lo.

Código para ler tabela PDF em C#

	using System;
	using Aspose.Pdf;
	using Aspose.Pdf.Text;

	namespace ReadPDFTableInCSharp
	{
	class Program
	{
	static void Main(string[] args)
	{
	// Instantiate the license to avoid trial limitations while reading table data from PDF
	License asposePdfLicense = new License();
	asposePdfLicense.SetLicense("Aspose.pdf.lic");

	// Load source PDF document having a table in it
	Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(@"PdfWithTable.pdf");

	// Declare and initialize TableAbsorber class object for reading table from the PDF
	Aspose.Pdf.Text.TableAbsorber tableAbsorber = new Aspose.Pdf.Text.TableAbsorber();

	// Parse all the tables from the desired page in the PDF
	tableAbsorber.Visit(pdfDocument.Pages[1]);

	// Get reference to the first table in the parsed collection
	AbsorbedTable absorbedTable = tableAbsorber.TableList[0];

	// Iterate through all the rows in the PDF table
	foreach (AbsorbedRow pdfTableRow in absorbedTable.RowList)
	{
	// Iterate through all the cells in the pdf table row
	foreach (AbsorbedCell pdfTableCell in pdfTableRow.CellList)
	{
	// Fetch all the text fragments in the cell
	TextFragmentCollection textFragmentCollection = pdfTableCell.TextFragments;

	// Iterate through all the text fragments
	foreach (TextFragment textFragment in textFragmentCollection)
	{
	// Display the text
	Console.WriteLine(textFragment.Text);
	}
	}
	}
	System.Console.WriteLine("Done");
	}
	}
	}

view raw How to Read PDF Table in C#.cs hosted with ❤ by GitHub

Neste código de exemplo, o uso de C# parse PDF table é possível usando a classe TableAbsorber que é usada para ler tabelas. No entanto, você também pode usar outras opções como TextAbsorber, ParagraphAbsorber, FontAbsorber e TextFragmentAbsorber para acessar diferentes elementos do documento. Você pode iterar por toda a coleção ou acessar elementos individuais usando o índice de matriz.

Aprendemos a ler a tabela PDF em C# neste tópico. No entanto, se você quiser ler favoritos em PDF, consulte o artigo em como ler marcadores em PDF usando C#.

Aspose Base de conhecimento

Encontre respostas da API

Como ler a tabela PDF em C#

Etapas para ler a tabela PDF em C#

Código para ler tabela PDF em C#