This brief tutorial presents information on how to read Word document in Java by providing a detailed step-by-step procedure and a runnable Java code reading the sample document in different ways. It introduces the necessary classes which are used to read a Word file and access its different segments. While reading Word document in Java code like DOCX, DOC, or other MS Word supported files, you will iterate through different child nodes of the document and process each one as per your requirement.
Steps to Read Word File in Java
- Install Aspose.Words for Java using the Maven repository to read the DOCX file
- Load the source DOCX file into the Document class object for reading in Java
- Iterate through all the Paragraph type nodes in the document
- Convert each paragraph text to a string and display it on the console
- Iterate through all the Run type nodes in the document
- Convert each node to Run type and access the font name, size, and text of the Run
- Display each run text on the console
These steps describe how to read Word file in Java by sharing link to the configuration page and then guiding to load the source Word document. Once the Word file is loaded, its document object model (DOM) i.e. the logical structure is also loaded and can be parsed in different ways. These steps assist in preparing two main collections which are Paragraphs and Runs to access different parts of the loaded Word document.
Code to Read DOCX File in Java
This Java code to read Word document demonstrates the parsing of DOM by using different filters e.g. in the first place we fetch all the paragraph nodes. The Paragraph class provides the toString() function which extracts text from the entire paragraph including Tables etc. and saves it to a string variable. Similarly, when we parse the document to fetch all the Runs, it separates contents based on their style, font, node type etc. and divide a single paragraph into multiple segments based on the text font style like bold text will be provided separately, italic text separately and so on.
This tutorial has guided us to read a DOCX file however, if you want some sort of conversion like Word to PDF, refer to the article on how to convert Word to PDF in Java.