How to Convert HTML to Text in Java

This simple topic is about how to convert HTML to text in Java. In Java HTML to plain text conversion application running under Windows, Linux or macOS platforms can be developed using simple and easy API interfaces.

Steps to Convert HTML to Text in Java

  1. Configure your project to add Aspose.HTML for Java from the Maven repository
  2. Include the reference to Aspose.HTML namespace in your application
  3. Read the source HMTL file content using String object
  4. Initialize HTMLDocument class object to load the source HTML String
  5. Initialize INodeIterator class object to iterate nodes and append in StringBuilder
  6. Save the extracted text from HTML on disk

In order to extract text from HTML Java based application using few lines of code can be used. We will initiate the process by loading source HTML into a String object and subsequently loading that String using HTMLDocument class. We will then use INodeIterator to extract, traverse and append the HMTL nodes to a StringBuilder. Finally, the StringBuilder will be saved as plain text file on disk.

Code to Convert HTML to Text in Java

The above example in Java convert HTML to plain text in few API calls. We have created StyleFilter class that extends NodeFilter class and implement the AcceptNode method to set the customer node filters and omit the undesirable nodes from HTML during conversion process.

In this topic, we have explored how to extract text from HTML in Java. If you are interested in conversion of MD file to XPS format, proceed to topic how to convert Markdown to XPS using Java.

 English