如何在 C# 中读取 PDF 文件

如今，以编程方式阅读不同类型的文档是一种常见的做法。在本操作指南中，您将通过以下简单步骤学习如何在 C# 中阅读 PDF 文件。

在 C# 中读取 PDF 文件的步骤

在 Visual Studio 中创建一个空的 C# 控制台应用程序
通过从 NuGet.org 安装来添加对 Aspose.PDF for .NET 的引用
在 Document 对象中加载现有 PDF 文件
初始化 TextAbsorber 类以读取 PDF 文件
提取 PDF 文本并将其写入控制台输出
遍历 PDF 页面 Resources 以查找图像
使用找到的图像创建 FileStream 对象
将图像保存到本地磁盘

下面的代码片段解释了如何在 C# 中打开和读取 PDF 文件。您将能够阅读文本并使用它从 PDF 文件中提取图像。该 API 提供 TextAbsorber 类，用于从 PDF 文件中读取文本，您可以通过 Text 对象获取提取的结果。也可以通过循环浏览 PDF 页面资源来查找图像并将其保存到本地磁盘，如下所示。

用 C# 读取 PDF 文件的代码

	using System;
	using System.IO;
	// Add reference to Aspose.PDF for .NET API
	// Use following namespace to read PDF file
	using Aspose.Pdf;

	namespace ReadPDFFiles
	{
	class Program
	{
	static void Main(string[] args)
	{
	// Set license before reading PDF file
	Aspose.Pdf.License AsposePDFLicense = new Aspose.Pdf.License();
	AsposePDFLicense.SetLicense(@"c:\asposelicense\license.lic");

	string inFile = @"c:\ReadPDFFileInCSharp.pdf";

	// Load an existing PDF file in Document object to read
	Document pdf = new Document(inFile);
	// 1. Read text from PDF file
	// Initialize TextAbsorber Class to read Text from PDF file
	Aspose.Pdf.Text.TextAbsorber textAbsorber = new Aspose.Pdf.Text.TextAbsorber();

	// Call Page.Accept() method to let TextAbsorber find text in PDF Pages
	pdf.Pages.Accept(textAbsorber);

	// Write the extracted text to Console output
	Console.WriteLine(textAbsorber.Text);

	// 2. Extract images from PDF file
	int imageIndex = 1;

	// Iterate through PDF pages
	foreach (var pdfPage in pdf.Pages)
	{
	// Check available images while reading the PDF
	foreach (XImage image in pdfPage.Resources.Images)
	{
	// Create file stream for found image
	FileStream extractedImage = new FileStream(String.Format("Page{0}_Image{1}.jpg", pdfPage.Number, imageIndex), FileMode.Create);

	// Save output image to the disk
	image.Save(extractedImage, System.Drawing.Imaging.ImageFormat.Jpeg);

	// Close stream
	extractedImage.Close();

	imageIndex++;
	}

	// Reset image index
	imageIndex = 1;
	}
	}
	}
	}

view raw How to Read PDF File in C#.cs hosted with ❤ by GitHub

在上一主题中，您学习了如何在 C# 中处理大型 PDF 文件。以上信息和代码示例将使您能够在 C# 中打开和阅读 PDF 文件以提取文本和图像。

Aspose 知识库

查找API的答案

如何在 C# 中读取 PDF 文件

在 C# 中读取 PDF 文件的步骤

用 C# 读取 PDF 文件的代码