Extract Links from PDF using C#

This article guides on how to extract links from PDF using C#. It has IDE settings, steps, and a sample code for developing a PDF link extractor using C#. You will learn to retrieve selected annotations from a page and fetch URI from link-type annotations.

Steps to Extract URL from PDF using C#

  1. Set the IDE to use Aspose.PDF for .NET to extract URI from a PDF page
  2. Load the source PDF file and loop through all the pages in it
  3. Create an annotation selector for finding link annotations on a page
  4. Extract all annotations and retrieve the list of selected annotations
  5. Iterate through each link annotation and find associated actions
  6. Cast action to a GoToURIAction to access and display URI

These steps describe how to extract hyperlinks from PDF using C#. Create an annotation selector for the link annotations and use it to select a list of target annotations. Typecast each link annotation to GoToURIAction and fetch the URI from this action.

The code above demonstrates how to extract all links from PDF using C#. The AnnotationSelector takes a LinkAnnotation object that requires the page and rectangle object. The Accept() method in the Page class takes the link selector object and saves the link annotations in the Selected collection.

This quick tutorial has taught us the process of extracting hyperlinks from a PDF page. To remove hyperlinks from the PDF file, refer to the article How to remove hyperlink from PDF in C#.

 English