Como ler arquivo PDF em C#

Ler diferentes tipos de documentos programaticamente é uma prática comum nos dias de hoje. Neste guia de instruções, você aprenderá a ler PDF Arquivo em C# seguindo as etapas simples abaixo.

Etapas para ler o arquivo PDF em C#

Criar um aplicativo de console C# vazio no Visual Studio
Adicione referência a Aspose.PDF for .NET instalando-o do NuGet.org
Carregar um arquivo PDF existente no objeto Documento
Inicialize a classe TextAbsorber para ler o arquivo PDF
Extraia o texto do PDF e grave-o na saída do console
Iterar na página PDF Resources para encontrar imagens
Criar objeto FileStream com imagem encontrada
Salve a imagem no disco local

Abaixo, o trecho de código explica como abrir e ler o arquivo PDF em C#. Você poderá ler texto e extrair imagens de um arquivo PDF usando-o. A API oferece a classe TextAbsorber que é usada para ler o texto do arquivo PDF e você pode obter os resultados extraídos por meio do objeto Text. Encontrar imagens e salvá-las no disco local também é possível percorrendo os recursos da página PDF, conforme mostrado abaixo.

Código para ler arquivo PDF em C#

	using System;
	using System.IO;
	// Add reference to Aspose.PDF for .NET API
	// Use following namespace to read PDF file
	using Aspose.Pdf;

	namespace ReadPDFFiles
	{
	class Program
	{
	static void Main(string[] args)
	{
	// Set license before reading PDF file
	Aspose.Pdf.License AsposePDFLicense = new Aspose.Pdf.License();
	AsposePDFLicense.SetLicense(@"c:\asposelicense\license.lic");

	string inFile = @"c:\ReadPDFFileInCSharp.pdf";

	// Load an existing PDF file in Document object to read
	Document pdf = new Document(inFile);
	// 1. Read text from PDF file
	// Initialize TextAbsorber Class to read Text from PDF file
	Aspose.Pdf.Text.TextAbsorber textAbsorber = new Aspose.Pdf.Text.TextAbsorber();

	// Call Page.Accept() method to let TextAbsorber find text in PDF Pages
	pdf.Pages.Accept(textAbsorber);

	// Write the extracted text to Console output
	Console.WriteLine(textAbsorber.Text);

	// 2. Extract images from PDF file
	int imageIndex = 1;

	// Iterate through PDF pages
	foreach (var pdfPage in pdf.Pages)
	{
	// Check available images while reading the PDF
	foreach (XImage image in pdfPage.Resources.Images)
	{
	// Create file stream for found image
	FileStream extractedImage = new FileStream(String.Format("Page{0}_Image{1}.jpg", pdfPage.Number, imageIndex), FileMode.Create);

	// Save output image to the disk
	image.Save(extractedImage, System.Drawing.Imaging.ImageFormat.Jpeg);

	// Close stream
	extractedImage.Close();

	imageIndex++;
	}

	// Reset image index
	imageIndex = 1;
	}
	}
	}
	}

view raw How to Read PDF File in C#.cs hosted with ❤ by GitHub

No tópico anterior, você aprendeu como processar grandes arquivos PDF em C#. As informações acima e o exemplo de código permitirão que você abra e leia arquivos PDF em C# para extrair texto e imagens.

Aspose Base de conhecimento

Encontre respostas da API

Como ler arquivo PDF em C#

Etapas para ler o arquivo PDF em C#

Código para ler arquivo PDF em C#