Python を使用した請求書からのデータ抽出

このチュートリアルでは、Python を使用して請求書からデータを抽出する 方法について説明します。開発用の IDE を設定するためのすべての詳細、プログラムフローを定義する手順のリスト、および Python を使用して請求書 OCR ソフトウェアを実行する サンプルコードが含まれています。要件に応じて、PNG、JPEG、BMP、TIFF、GIF などの画像からの検出プロセスをカスタマイズする方法を学びます。

Python を使用した請求書 OCR の手順

請求書データの抽出に Aspose.OCR for Python via .NET を使用するように環境を設定します
OCR処理用のAspose.Ocrのインスタンスを作成する
領収書を保持するためのOcrInputクラスのインスタンスを作成する
OcrInputコレクションに領収書を追加する
領収書認識設定と認識言語の設定
recognize_receipt メソッドを使用して OCR を実行し、入力されたレシートからテキストを認識します。
領収書から認識されたテキストを表示する

これらの手順では、Python を使用して領収書に OCR を適用する方法について説明します。Aspose.Ocr オブジェクトのインスタンスを作成し、領収書を保持するための OcrInput オブジェクトを初期化し、請求書 OCR のパラメーターを定義するための ReceiptRecognitionSettings オブジェクトを作成します。最後に、領収書リストとテキスト抽出の設定を指定して、recognize_receipt() メソッドを呼び出します。

Python を使用した請求書データ抽出のコード

	import aspose.ocr as api
	from aspose.ocr import License

	# Instantiate and apply the license for Aspose.OCR to enable full functionality.
	license = License()
	license.set_license("License.lic")

	# Create an instance of the Aspose.Ocr class for OCR processing.
	extractTextFromReceipt = api.AsposeOcr()

	# Initialize an OcrInput object to hold input image(s) for OCR processing.
	receiptDatas = api.OcrInput(api.InputType.SINGLE_IMAGE)

	# Add images (receipts) to the OcrInput object for recognition.
	receiptDatas.add("Receipt1.png")
	receiptDatas.add("Receipt2.png")

	# Set up receipt recognition settings.
	recognitionSettings = api.ReceiptRecognitionSettings()
	recognitionSettings.language = api.Language.ENG # Specify the language as English.

	# Perform OCR to recognize text from the input receipts using the specified settings.
	results = extractTextFromReceipt.recognize_receipt(receiptDatas, recognitionSettings)

	# Get the number of recognized results (one result per input image).
	length = results.length

	# Loop through each result and print the recognized text for each input image.
	for i in range(length):
	print(results[i].recognition_text)

view raw Data Extraction from Invoices using Python.py hosted with ❤ by GitHub

このサンプルコードは、Python を使用した請求書 OCR API の使用方法を示しています。入力タイプを PDF、TIFF、URL、ディレクトリ、Zip などに設定し、Language 列挙子の言語名の大規模なリストから検出言語を設定できます。ReceiptRecognitionSettings クラスには、許可された文字セットの設定、自動色反転を設定するフラグ、無視する文字のブラックリストの定義など、多数のプロパティが含まれています。

この記事では、請求書のテキストを抽出する手順について説明しました。手書きのテキストを編集可能かつ検索可能なテキストに変換するには、Pythonを使用して手書きをテキストに変換するの記事を参照してください。

Aspose 知識ベース

APIで回答を見つけます

Python を使用した請求書からのデータ抽出

Python を使用した請求書 OCR の手順

Python を使用した請求書データ抽出のコード