This short tutorial guides on how to convert scanned PDF to editable PDF using Java. It includes details for configuring the IDE, a list of steps, and sample code to convert scanned PDF to searchable PDF using Java. You will get details to customize the process of scanning image and converting content to readable text.
Steps to Make PDF Readable using Java
- Set the IDE to use Aspose.Total for Java to transform a scanned PDF to a searchable PDF
- Instantiate respective product licenses to avoid watermarks in the output
- Create an instance of the recognition engine using the AsposeOCR class
- Create the input instance using the OcrInput class and load the source scanned PDF into it
- Create the RecognitionSettings instance to set parameters for the customization during the image scanning
- Call the AsposeOCR.Recognize() method to fetch data from the scanned PDF and store it in a temporary PDF
- Load the intermediate PDF into the Document class object of Aspose.PDF and set metadata in it
- Save the final PDF on disk with searchable text and metadata
These steps describe how to convert PDF image to PDF text using Java. Create the recognition engine object, prepare the input file by adding the scanned PDF file to the OcrInput object, define the parameters in the RecognitionSettings object, call the Recognize() method to scan data and save the results in an intermediate PDF file. Finally, load the intermediate PDF file using Aspose.PDF.Document object and add metadata or format it further before saving the final PDF file.
Code to Convert PDF Picture to Text using Java
This code demonstrates how to convert PDF to searchable PDF using Java. You can set a specific detection language while scanning the PDF, set a flag to auto-detect the language, restrict recognition to specific characters or blacklist characters you want OCR to ignore. Options are also available to choose the source document area/layout detection strategy if required.
This article has taught us the process to convert a PDF to a selectable text PDF. To export data from a PDF form to Excel, refer to the article Export Data from a PDF Form to Excel using Java.