使用 Java 从 PDF 中提取链接

在这篇简短的操作方法文章中，您将学习如何使用 Java 从 PDF 中提取链接。它包含 IDE 设置、步骤列表和使用 Java 从 PDF 中提取超链接的示例代码。您将学习如何获取链接类型注释并将其转换为 URIAction 以获取 URI。

使用 Java 从 PDF 中提取 URL 的步骤

将 IDE 设置为使用 Aspose.PDF for Java 提取链接
加载源 PDF file，遍历所有页面，并为页面创建注释选择器
从页面中提取所有注释并将其保存在选定的集合中
遍历所有注释并将每个注释类型转换为 GoToURIAction
调用 getURI() 方法访问链接并将其显示在控制台上

本指南介绍了如何使用 Java 从 PDF 中提取所有链接。加载源 PDF 文件，访问目标页面，并为每个页面创建注释选择器。使用定义的选择器调用 accept() 方法，获取链接注释列表，并通过将其类型转换为 GoToURIAction 类来获取 URI。

使用 Java 从 PDF 中提取超链接的代码

	import com.aspose.pdf.*;
	import java.util.List;

	public class Main {

	public static void main(String[] args) throws Exception {//main() method for fetching URI
	License license = new License();//Initialize the PDF license
	license.setLicense("license.lic");//Apply the license

	Document pdfDocument = new Document("PdfWithLinks.pdf");// Load hyperlinks PDF

	// Iterate all the pages
	for (int pageNumber = 1; pageNumber <= pdfDocument.getPages().size(); pageNumber++) {
	System.out.println("Processing Page " + pageNumber);// Display the current page number

	Page pdfPage = pdfDocument.getPages().get_Item(pageNumber);// Get the current page

	// Create an annotation selector to find link annotations on the page
	AnnotationSelector linkSelector = new AnnotationSelector(new LinkAnnotation(pdfPage, Rectangle.getTrivial()));

	// Extract all annotations from the current page
	pdfPage.accept(linkSelector);

	// Retrieve the list of selected link annotations
	List<Annotation> linkAnnotations = linkSelector.getSelected();

	// Iterate through each link annotation
	for (Annotation annotation : linkAnnotations) {
	// Check if the annotation is a LinkAnnotation and has actions
	if (annotation instanceof LinkAnnotation) {
	LinkAnnotation linkAnnotation = (LinkAnnotation) annotation;

	// Check if the LinkAnnotation has any associated actions
	if (linkAnnotation.getAction() instanceof GoToURIAction) {
	// Cast the action to a GoToURIAction to access the URI
	GoToURIAction uriAction = (GoToURIAction) linkAnnotation.getAction();

	// Display the extracted URI
	System.out.println("Found URI: " + uriAction.getURI());
	}
	}
	}
	}

	// Indicate that the process is complete
	System.out.println("URI extraction completed.");
	}
	}

view raw Extract Links from PDF in Java.java hosted with ❤ by GitHub

上述代码演示了 Java 中的 PDF 链接提取器。您可以在遍历 PDF 中的页面时使用 Page 类对象分析页面内容来跳过或选择页面。getAction() 方法用于获取包含链接 URI 的 URIAction。

在本文中，我们了解了从 PDF 中获取超链接的过程。若要在 PDF 中创建超链接，请参阅有关如何使用 Java 在 PDF 中创建超链接的文章。

Aspose 知识库

查找API的答案

使用 Java 从 PDF 中提取链接

使用 Java 从 PDF 中提取 URL 的步骤

使用 Java 从 PDF 中提取超链接的代码