How to Read PDF File in C#

Reading different types of documents programmatically is common practice these days. In this how-to guide, you will get to learn how to read PDF File in C# by following below simple steps.

Steps to Read PDF File in C#

Create an empty C# Console Application in Visual Studio
Add reference to Aspose.PDF for .NET by installing it from NuGet.org
Load an existing PDF file in Document object
Initialize TextAbsorber Class to read PDF file
Extract PDF text and write it to Console output
Iterate through PDF Page Resources to find images
Create FileStream object with found image
Save the image to local disk

Below code snippet explains how to open and read PDF file in C#. You will be able to read text and extract images from a PDF file by using it. The API offers TextAbsorber Class that is used to read text from PDF file and you can achieve the extracted results via Text object. Finding images and saving them to local disk is also possible by looping through PDF Page resources as shown below.

Code to Read PDF File in C#

	using System;
	using System.IO;
	// Add reference to Aspose.PDF for .NET API
	// Use following namespace to read PDF file
	using Aspose.Pdf;

	namespace ReadPDFFiles
	{
	class Program
	{
	static void Main(string[] args)
	{
	// Set license before reading PDF file
	Aspose.Pdf.License AsposePDFLicense = new Aspose.Pdf.License();
	AsposePDFLicense.SetLicense(@"c:\asposelicense\license.lic");

	string inFile = @"c:\ReadPDFFileInCSharp.pdf";

	// Load an existing PDF file in Document object to read
	Document pdf = new Document(inFile);
	// 1. Read text from PDF file
	// Initialize TextAbsorber Class to read Text from PDF file
	Aspose.Pdf.Text.TextAbsorber textAbsorber = new Aspose.Pdf.Text.TextAbsorber();

	// Call Page.Accept() method to let TextAbsorber find text in PDF Pages
	pdf.Pages.Accept(textAbsorber);

	// Write the extracted text to Console output
	Console.WriteLine(textAbsorber.Text);

	// 2. Extract images from PDF file
	int imageIndex = 1;

	// Iterate through PDF pages
	foreach (var pdfPage in pdf.Pages)
	{
	// Check available images while reading the PDF
	foreach (XImage image in pdfPage.Resources.Images)
	{
	// Create file stream for found image
	FileStream extractedImage = new FileStream(String.Format("Page{0}_Image{1}.jpg", pdfPage.Number, imageIndex), FileMode.Create);

	// Save output image to the disk
	image.Save(extractedImage, System.Drawing.Imaging.ImageFormat.Jpeg);

	// Close stream
	extractedImage.Close();

	imageIndex++;
	}

	// Reset image index
	imageIndex = 1;
	}
	}
	}
	}

view raw How to Read PDF File in C#.cs hosted with ❤ by GitHub

In the previous topic, you learnt how to process large PDF files in C#. The above information and code example will enable you to open and read PDF files in C# in order to extract text and images.

Aspose Knowledge Base

Find Answers by API

How to Read PDF File in C#

Steps to Read PDF File in C#

Code to Read PDF File in C#