How to Convert HTML to Text in C#

This simple how-to demonstrates how to convert HTML to text in C#. In C# HTML to plain text conversion can be easily achieved by using few lines of code for any .NET based application running in Windows, macOS or Linux platforms.

Steps to Convert HTML to Text in C#

  1. Install Aspose.HTML for .NET from NuGet package manager
  2. Include Aspose.HTML namespace in your project
  3. Load the HTML file content to a String
  4. Create an instance of HTMLDocument class to load the String containing HTML
  5. Instantiate INodeIterator class instance to iterate through nodes and append in StringBuilder
  6. Finally, save converted text from HTML on disk

In order to get plain text from HTML C# few lines of code can be used effectively in any .NET based application. The process begins by loading the HTML file as String into HTMLDocument class instance by using File.ReadAllText method. Then INodeIterator will be used to extract nodes from HTML and appending them to StringBuilder. Finally, the extracted HTML in StringBuilder will be saved on disk.

Code to Convert HTML to Text in C#

The above code in C# convert HTML to plain text using few API calls. We have used customized StyleFilter class that inherits NodeFilter class to override the AcceptNode method, which filters out the undesirable nodes from HTML during conversion process.

In the previous topic, we learnt how to create HTML file in C#. Whereas, the above example in C# get plain text from HTML file programmatically.

 English