This article guides on how to extract links from PDF using C#. It has IDE settings, steps, and a sample code for developing a PDF link extractor using C#. You will learn to retrieve selected annotations from a page and fetch URI from link-type annotations.
Steps to Extract URL from PDF using C#
- Set the IDE to use Aspose.PDF for .NET to extract URI from a PDF page
- Load the source PDF file and loop through all the pages in it
- Create an annotation selector for finding link annotations on a page
- Extract all annotations and retrieve the list of selected annotations
- Iterate through each link annotation and find associated actions
- Cast action to a GoToURIAction to access and display URI
These steps describe how to extract hyperlinks from PDF using C#. Create an annotation selector for the link annotations and use it to select a list of target annotations. Typecast each link annotation to GoToURIAction and fetch the URI from this action.
Code to Extract Hyperlink from PDF using C#
using System; | |
using System.Linq; | |
using Aspose.Pdf; | |
using Aspose.Pdf.Annotations; | |
class PdfLinkExtractor | |
{ | |
static void Main() | |
{ | |
// Initialize and apply the Aspose.PDF license | |
new License().SetLicense("license.lic"); | |
// Load the PDF document containing hyperlinks | |
using (var pdfDocument = new Document("PdfWithLinks.pdf")) | |
{ | |
// Loop through each page in the PDF document | |
foreach (var (pdfPage, pageNumber) in pdfDocument.Pages.Select((page, index) => (page, index + 1))) | |
{ | |
// Display the current page number | |
Console.WriteLine($"Processing Page {pageNumber}"); | |
// Create an annotation selector to find link annotations on the page | |
var linkSelector = new AnnotationSelector(new LinkAnnotation(pdfPage, Rectangle.Trivial)); | |
// Extract all annotations from the current page | |
pdfPage.Accept(linkSelector); | |
// Retrieve the list of selected link annotations | |
var linkAnnotations = linkSelector.Selected; | |
// Iterate through each link annotation | |
foreach (var annotation in linkAnnotations) | |
{ | |
// Check if the annotation has any associated actions | |
if (annotation.Actions.Any()) | |
{ | |
// Cast the action to a GoToURIAction to access the URI | |
var uriAction = (GoToURIAction)((LinkAnnotation)annotation).Action; | |
// Display the extracted URI | |
Console.WriteLine($"Found URI: {uriAction.URI}"); | |
} | |
} | |
} | |
} | |
// Indicate that the process is complete | |
Console.WriteLine("URI extraction completed."); | |
} | |
} |
The code above demonstrates how to extract all links from PDF using C#. The AnnotationSelector takes a LinkAnnotation object that requires the page and rectangle object. The Accept() method in the Page class takes the link selector object and saves the link annotations in the Selected collection.
This quick tutorial has taught us the process of extracting hyperlinks from a PDF page. To remove hyperlinks from the PDF file, refer to the article How to remove hyperlink from PDF in C#.