How to Process Large PDF Files in C#

You can face memory restrictions and issues while processing large PDF files using MemoryStream Class in C#. Any solution that restricts the input file size doesn’t work in cases where PDF file size is much bigger than 2.5GB. Below step-by-step guide will teach you how to process large PDF files in C# using advanced streams.

Steps to Process Large PDF Files in C#

  1. Open Visual Studio and Create an empty C# Console Application
  2. Install Latest version of Aspose.PDF for .NET from NuGet.org
  3. Initialize OptimizedMemoryStream object to process large PDF file
  4. Load large size PDF using FileStream
  5. Write FileStream bytes into OptimizedMemoryStream
  6. Initialize Document object using the InputStream-based constructor
  7. Manipulate or modify PDF document as per your needs
  8. Save the modified and processed document to the disk

When you are working with large sized PDF documents and have restrictions of local disk size, you need an interface that can allow seek-ability to be used to load huge PDF documents. Simple C# MemoryStream Class offers restrictions and causes high memory issues while processing huge PDF files due to lack of seek-ability. The solution of using advanced streams comes into the picture at this stage. The following code snippet shows how you can use advanced streams to load huge PDF files in C#.

Code to Process Large PDF Files in C#

using System;
using System.IO;
// Add reference to Aspose.PDF for .NET API
// Use following namespace to process large PDF files
using Aspose.Pdf;
namespace ProcessLargePDFFiles
{
class Program
{
static void Main(string[] args)
{
// Set license before processing large PDF files
Aspose.Pdf.License AsposePDFLicense = new Aspose.Pdf.License();
AsposePDFLicense.SetLicense(@"c:\asposelicense\license.lic");
string outFile = @"c:\LargeSizePDF_Processed.pdf";
// Initialize OptimizedMemoryStream object in which large size PDF will be stored for loading
OptimizedMemoryStream ms = new OptimizedMemoryStream();
// Read large size PDF document from disk using FileStream
using (FileStream file = new FileStream(@"c:\LargeSizePDF.pdf", FileMode.Open, FileAccess.Read))
{
byte[] bytes = new byte[file.Length];
file.Read(bytes, 0, (int)file.Length);
// Write large PDF bytes to OptimizedMemoryStream
ms.Write(bytes, 0, (int)file.Length);
}
// Use advanced stream to process large PDF file and load into Document object
Document doc = new Document(ms);
// Save the output PDF document
doc.Save(outFile);
}
}
}

The above simple code snippet enables you to process arbitrarily sized PDF documents without having the need of storing them on a local disk. The OptimizedMemoryStream Class in Aspose.PDF for .NET makes it possible to load huge PDF documents using memory stream in C#. It defines a MemoryStream that has a capacity more than standard and allows you to process huge PDF files with a size larger than 2.5GB.

You can also check another guide on how to read PDF bookmarks using C# in case your PDF document has bookmarks and you want to read them in your .NET Application.

 English