How to Convert PDF to Text in Java

This short tutorial provides details about how to convert PDF to Text in Java by loading the input PDF document and saving it to the Text format. Moreover, using Java PDF to Text converter can be customized to control if you want the output Text with or without formatting as compared to the source PDF file.

Steps to Convert PDF to Text in Java

  1. Configure your application by adding the reference to Aspose.PDF from the Maven repository to convert PDF to a Text file
  2. Load the input PDF file with the Document class object for conversion of PDF to a Text file
  3. Create an object of TextAbsorber class to set the text extraction options
  4. Write the extracted text to a Text file

The above steps elaborate the process of developing a PDF to Text Java based converter application. In the first step, the input PDF document is loaded using the Document class instance and then select whether you want the text with formatting or not. Finally, you can use the text string to write into a file or process it further as per your requirements.

Code to Convert PDF to Text in Java

import com.aspose.pdf.Document;
import com.aspose.pdf.License;
import com.aspose.pdf.TextAbsorber;
import com.aspose.pdf.TextExtractionOptions;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.nio.file.Files;
public class ConvertPdfToTextInJava {
public static void main(String[] args) throws Exception { // main method to convert a PDF document to Text file
// Instantiate the license to avoid trial limitations while converting the PDF to a text file
License asposePdfLicenseText = new License();
asposePdfLicenseText.setLicense("Aspose.pdf.lic");
// Load the source PDF file that is to be converted to Text file
Document convertPDFDocumentToText = new Document("input.pdf");
// Instantiate a TextAbsorber class object for converting PDF to Text
TextAbsorber textAbsorber = new TextAbsorber(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure));
// Call the Accept method exposed by the TextAbsorber class
convertPDFDocumentToText.getPages().accept(textAbsorber);
// Read the text as string
String ExtractedText = textAbsorber.getText();
// Create the BufferedWriter object to open the file
BufferedWriter writer = new BufferedWriter(new FileWriter(new File("SampleOutput.txt")));
// Write extracted contents to the file
writer.write(ExtractedText);
// Close writer
writer.close();
System.out.println("Done");
}
}

This sample code demonstrates that by using Java convert PDF to text with full control by using different options like the TextAbsorber class has multiple constructors where you can use TextSearchOptions which provides the option to convert the shaded text in source PDF as a separate text. Similarly, you can set flags to search text only within the page bound or set a rectangle to search the text from a specified area only in all the pages.

Here we have learned how to convert PDF to Text in Java along with the code snippet. If you want to learn the process to convert PDF to Word, refer to the article on how to convert PDF to Word in Java.

 English