Extract Links from PDF using C#

This article guides on how to extract links from PDF using C#. It has IDE settings, steps, and a sample code for developing a PDF link extractor using C#. You will learn to retrieve selected annotations from a page and fetch URI from link-type annotations.

Steps to Extract URL from PDF using C#

Set the IDE to use Aspose.PDF for .NET to extract URI from a PDF page
Load the source PDF file and loop through all the pages in it
Create an annotation selector for finding link annotations on a page
Extract all annotations and retrieve the list of selected annotations
Iterate through each link annotation and find associated actions
Cast action to a GoToURIAction to access and display URI

These steps describe how to extract hyperlinks from PDF using C#. Create an annotation selector for the link annotations and use it to select a list of target annotations. Typecast each link annotation to GoToURIAction and fetch the URI from this action.

Code to Extract Hyperlink from PDF using C#

	using System;
	using System.Linq;
	using Aspose.Pdf;
	using Aspose.Pdf.Annotations;

	class PdfLinkExtractor
	{
	static void Main()
	{
	// Initialize and apply the Aspose.PDF license
	new License().SetLicense("license.lic");

	// Load the PDF document containing hyperlinks
	using (var pdfDocument = new Document("PdfWithLinks.pdf"))
	{
	// Loop through each page in the PDF document
	foreach (var (pdfPage, pageNumber) in pdfDocument.Pages.Select((page, index) => (page, index + 1)))
	{
	// Display the current page number
	Console.WriteLine($"Processing Page {pageNumber}");

	// Create an annotation selector to find link annotations on the page
	var linkSelector = new AnnotationSelector(new LinkAnnotation(pdfPage, Rectangle.Trivial));

	// Extract all annotations from the current page
	pdfPage.Accept(linkSelector);

	// Retrieve the list of selected link annotations
	var linkAnnotations = linkSelector.Selected;

	// Iterate through each link annotation
	foreach (var annotation in linkAnnotations)
	{
	// Check if the annotation has any associated actions
	if (annotation.Actions.Any())
	{
	// Cast the action to a GoToURIAction to access the URI
	var uriAction = (GoToURIAction)((LinkAnnotation)annotation).Action;

	// Display the extracted URI
	Console.WriteLine($"Found URI: {uriAction.URI}");
	}
	}
	}
	}

	// Indicate that the process is complete
	Console.WriteLine("URI extraction completed.");
	}
	}

view raw Extract Links from PDF using C#.cs hosted with ❤ by GitHub

The code above demonstrates how to extract all links from PDF using C#. The AnnotationSelector takes a LinkAnnotation object that requires the page and rectangle object. The Accept() method in the Page class takes the link selector object and saves the link annotations in the Selected collection.

This quick tutorial has taught us the process of extracting hyperlinks from a PDF page. To remove hyperlinks from the PDF file, refer to the article How to remove hyperlink from PDF in C#.

Aspose Knowledge Base

Find Answers by API

Extract Links from PDF using C#

Steps to Extract URL from PDF using C#

Code to Extract Hyperlink from PDF using C#