How to Convert PDF to Text File using C#

This basic tutorial guides you on how to convert PDF to Text file using C# including the configuration settings and a runnable code snippet. It demonstrates how using C# PDF to Text converter can be created with a few API calls. You only need to load the source PDF document and save the output Text file.

Steps to Convert PDF to Text File using C#

  1. Add a reference to Aspose.PDF for .NET into your application to convert PDF to a Text file
  2. Load the source PDF file using the Document class instance for creating a text file
  3. Create an instance of TextAbsorber class and extract text from all pages
  4. Save the output Text file

For writing PDF to Text converter C# based application configuration in .NET framework and a step-by-step procedure is provided here. As a first step, quickly configure the API and load the input PDF file. Next, we proceed to extract the text from all of its pages and write the extracted text to a file or a stream as per the requirements.

Code Snippet to Convert PDF to Text using C#

using System.IO;
using Aspose.Pdf;
using Aspose.Pdf.Text;
namespace ConvertPdfToTextUsingCSharp
{
class Program
{
static void Main(string[] args)
{
// Instantiate the license to avoid evaluation limitations while converting a PDF to Text
License PdfToTextLicense = new License();
PdfToTextLicense.SetLicense("Aspose.pdf.lic");
// Open document
Document pdfDocument = new Document("PDFtoText.pdf");
// Instantiate a TextAbsorber class object for extracting the text
TextAbsorber textAbsorber = new TextAbsorber(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure),new TextSearchOptions(new Rectangle(5,5,50,50)));
// Call the Accept() function to parse all the pages for reading text
pdfDocument.Pages.Accept(textAbsorber);
// Get extracted text as string
string ExtractedText = textAbsorber.Text;
// Save the text file
File.WriteAllText("PDFtoText.txt" , ExtractedText);
System.Console.WriteLine("Done");
}
}
}

Using C# convert PDF to Text feature can be integrated in your applications along with the control over reading text from the source PDF like you can read text from all the pages or from a specified page. Similarly, if you want to read text from a particular rectangle area on the PDF page, you have the liberty to define that rectangular area as well. Different modes can also be defined for converting PDF to text like Pure, Raw, and MemorySaving.

In this article, we have learned how PDF to Text C# code can be used in your .NET applications. However, if you want to explore the conversion of PDF to HTML documents, refer to the article on how to convert PDF to HTML using C#.

 English