In this short tutorial, we will learn how to read Word document in C# with the help of details about environment configuration, a list of steps, and a runnable code. The code will demonstrate reading of Word file in different ways. You will learn how does C# read Word document by loading a Word file like DOCX, DOC, RTF, or HTML, etc, and then accessing its different elements to process or view.
Steps to Read Data From Word Document in C#
- Configure the project environment to use Aspose.Words from the NuGet package manager
- Load the input DOCX file into the Document class object
- Get all the nodes of type Paragraph from the document
- Convert each paragraph to a string and display it on the console
- Get all the Run type nodes from the document
- Convert each Run item to a string and display it along with the font name and size
These steps provide the detailed information required to configure the environment and tasks to be performed while writing a Word file reader program. It shows how does C# read DOCX file by loading the source file into the Document class instance and then accessing all its paragraphs to display the text. It also describes reading data from paragraphs, tables, etc. such that each segment of text with a different style is separated or each table cell value is separately accessed for processing.s
Code to Read Word File in C#
This code demonstrates how to read Word file in C# by using Document.GetChildNodes() function that requires a type of node to be fetched like Paragraph, Run, Section, Body, HeaderFooter, Comment, etc. Once the child node is accessed, you have to cast it to the respective type to use its methods and properties. For example, we have read the document twice such that first all the text from the entire document is displayed irrespective of a normal paragraph or a table, etc, and the second time it is read based on any change in style and content type.
This article has taught us to read Word files however if you want to learn the conversion of Word documents to HTML, refer to the article on how to convert Word document to HTML using C#.