JavaでPDFテーブルを読む方法

このチュートリアルでは、** JavaでPDFテーブルを読み取り**、目的のテーブル内の各セルに属するテキストにアクセスする方法について詳しく説明します。 PDFのターゲットページで特定のテーブルを参照し、すべての行とセルを解析してデータを取得するための完全な制御が可能になります。このPDFテーブルリーダーをJavaで作成するために、他のサードパーティのツールやソフトウェアは必要ありません。

JavaでPDFテーブルを読み取る手順

MavenリポジトリからAspose.PDFを追加するようにPDFテーブルリーダーアプリケーションを構成します
Documentクラスオブジェクトを使用して、テーブルを含むサンプルPDFファイルをロードします
TableAbsorberオブジェクトをインスタンス化して初期化し、選択したPDFページからすべてのPDFテーブルをフェッチします
目的のテーブルのすべての行を繰り返し処理します
目的の行のすべてのセルを反復処理し、各セルからすべてのテキストフラグメントをフェッチします
セルから取得したテキストを表示する

これらの手順では、* Java抽出テーブルをPDFから*使用する方法と、プロジェクトに追加する必要のあるライブラリに関する情報について説明します。また、最初にPDFをロードし、次に特定のページにアクセスし、目的のテーブルをフェッチするなど、タスクを完了するための操作の順序についても説明します。最後に、すべての行とセルを解析して情報を取得します。

JavaでPDFテーブルを読み取るためのコード

	import com.aspose.pdf.License;
	import com.aspose.pdf.AbsorbedCell;
	import com.aspose.pdf.AbsorbedRow;
	import com.aspose.pdf.AbsorbedTable;
	import com.aspose.pdf.Document;
	import com.aspose.pdf.TableAbsorber;
	import com.aspose.pdf.TextFragmentCollection;

	public class ReadPDFTableInJava {

	public static void main(String[] args) throws Exception { // main function for reading PDF table data in ReadPDFTableInJava

	// For avoiding the trial version limitation, load the Aspose.PDF license prior to reading table data
	License licenseForHtmlToPdf = new License();
	licenseForHtmlToPdf.setLicense("Aspose.Pdf.lic");

	// Load a source PDF document which contains a table in it
	Document pdfDocument = new Document("PdfWithTable.pdf");

	// Instantiate the TableAbsorber object for PDF tables extraction
	TableAbsorber tableAbsorber = new TableAbsorber();

	// visit the table collection in the input PDF
	tableAbsorber.visit(pdfDocument.getPages().get_Item(1));

	// Access the desired table from the tables collection
	AbsorbedTable absorbedTable = tableAbsorber.getTableList().get(0);

	// Parse all the rows and get each row using the AbsorbedRow
	for (AbsorbedRow pdfTableRow : absorbedTable.getRowList())
	{
	// Access each cell in the cells collection using AbsorbedCell
	for (AbsorbedCell pdfTableCell : pdfTableRow.getCellList())
	{
	// Access each text fragment from the cell
	TextFragmentCollection textFragmentCollection = pdfTableCell.getTextFragments();

	// Access each text fragment from the fragments collection
	for (com.aspose.pdf.TextFragment textFragment : textFragmentCollection)
	{
	// Display the table cell text
	System.out.println(textFragment.getText());
	}
	}
	}

	System.out.println("Done");
	}
	}

view raw How to Read PDF Table in Java.java hosted with ❤ by GitHub

PDFからテーブルを抽出するにはJava*コードがここに提供されており、TableAbsorberクラスとAbsorbedTableクラスを使用してPDFのテーブルを処理します。また、セルデータをフェッチするためにTextFragmentクラスを使用する前に、行と列を管理するためにAbsorbedRowクラスとAbsorbedCellクラスを使用します。また、フォント、段落、テキスト、テキストフラグメントなど、ドキュメント内のさまざまな要素で使用できる他の多くのアブソーバークラスがあります。

この記事では、Java PDFを使用することにより、テーブルの抽出をいくつかの手順で実行できることを説明しました。 PDFファイルからテキストや画像を読み取る方法を知りたい場合は、JavaでPDFファイルを読む方法の記事を参照してください。

Aspose 知識ベース

APIで回答を見つけます

JavaでPDFテーブルを読む方法

JavaでPDFテーブルを読み取る手順

JavaでPDFテーブルを読み取るためのコード