This brief tutorial guides you on how to read PDF file in Java. It contains Java code to read PDF file such that first, you read text from PDF in Java into a string and then fetch all the images from the PDF file to save them on the disc as JPG. There is no need to install any third-party tool to read PDF in Java.
Steps to Read PDF File in Java
- Configure Aspose.PDF into your project using the Maven repository for reading the PDF file
- Load the sample PDF file into the Document class object
- Instantiate TextAbsorber class object that can read entire text from the PDF file
- Read PDF text from the loaded file using the TextAbsorber class object
- Display entire text read from the PDF file on the console
- Iterate through all the pages in the PDF file for accessing the images
- Parse all the images on each page images collection and save them on the disc
In this quick step-by-step tutorial, we first load the target PDF file and then initiate the TextAbsorber class object that is capable of searching text through all the pages in the PDF. This whole text is returned into a string that can be displayed or processed as per the requirement. Similarly, we can parse all the images in the images collection and save them on the disc in any format as we saved it as JPG in this tutorial.
Code to Read PDF using Java
In this sample code, we used the TextAbsorber class and getImages() function of Page.getResources() to read PDF using Java. TextAbsorber object is used to read text by the accept function in the PDF PageCollection. Whereas the getImages() function of the getResources() collection returns all the images on a page.
Note that these steps to read PDF in Java can be performed in any of the operating systems like Windows, Linux or macOS. If you want to learn more about working with PDF files, refer to the article on how to read bookmarks in PDF using Java.