Extract Text from PowerPoint using Java

This short tutorial explains how to extract text from PowerPoint using Java. It will share details to set the IDE, a list of steps, and a sample code for developing a PowerPoint to text converter using Java. Various options for extracting text from the presentation will be discussed.

Steps to Extract all Text from PowerPoint using Java

  1. Set the environment to use Aspose.Slides for Java to convert PPTX to TXT
  2. Import the dependencies for slide parsing and file output
  3. Load the source PPTX file into memory by using the Presentation class
  4. Retrieve all the text frames to collect every text container from all the slides
  5. Parse through all the frames’ paragraphs and portions, and append text to the StringBuilder object
  6. Save the output file as a TXT file

The above-mentioned steps summarize the process to extract text from PPTX using Java. Load the presentation, access all text frames, create a StringBuilder object, and iterate through all frames to fetch the paragraphs. From each paragraph, fetch portions, extract the text from each portion and append it to the StringBuilder instance, and finally save the accumulated text to a TXT file.

Code to Convert PPTX to TXT using Java

This code demonstrates how to convert PowerPoint to text using Java. You can extract text slide-by-slide using the SlideUtil.getAllTextBoxes() method which takes a slide as input, and extract speaker notes using slide.getNotesSlideManager().getNotesSlide() method, and extract text from tables using slide.getShapes() that returns the shapes collection and filter each shape that is an instance of ITable. JSON output can also be generated by filling the JsonObject and saving data in a JSONArray.

This article explains the process to convert PowerPoint to text. To convert a presentation to video, refer to the article Convert PowerPoint to Video using Java.