Ethical Data Extraction from Invoices Using Large Language Models: A JSON-Based Approach

Mayesha, Sabera Ryhana; Bishal, Shahnur Islam

DSpace Home
→
Faculty of Science and Information Technology
→
Department of Computer Science and Engineering
→
Project Report
→
View Item

Ethical Data Extraction from Invoices Using Large Language Models: A JSON-Based Approach

Mayesha, Sabera Ryhana; Bishal, Shahnur Islam

URI: http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/16715

Date: 2025-09-17

Abstract:

There is a greater need than ever to have a secure and ethical data mining of invoices due to the greater adoption of digital manipulation of financial documents among companies. The traditional approaches such as manual entry and OCR systems are usually inaccurate, inflexible and low in data security. This thesis describes one of the available ways of technical idea-wise ethical and responsible invoice information automation with the help of Large Language Models (LLM). The proposed solution will utilize the application of LLM to read and retrieve useful invoice data such as date, vendor name, quantity and invoice number and present the outcome in a well-organized and clean format of a JSON. It ensures that the information is readily integrated into the accounting systems and business applications. The technique deals with some ethical issues that are severe besides technical precision. The steps involved in the process are anonymization of data, encryption, and bias monitoring that help to offer the guarantee that international regulations are observed. The model has been tested and demonstrated to give good results with more than 90 percent accuracy in the various invoicing formats and languages. The system is able to handle any alteration in design and nomenclature and deliver quality output. Other principles of ethical AI building in the model, in addition to performance, include fairness, transparency, and accountability. To establish a balanced solution to invoice processing through automated way a machine learning will be considered as powerful, and an interest in ethics will be taken. It forms the foundation of the versatile, resilient, and regulation-insensitive financial data management solutions, which will be the prototype of the further AI-based automation venture in the specified field.