Optical Character Recognition (OCR) Using Tesseract.js

Miah, Md. Parvej

DSpace Home
→
Faculty of Science and Information Technology
→
Department of Computer Science and Engineering
→
Project Report
→
View Item

Optical Character Recognition (OCR) Using Tesseract.js

Miah, Md. Parvej

URI: http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/17312

Date: 2025-01-13

Abstract:

This project focuses on developing a web-based optical character recognition (OCR) system using Tesseract.js, a JavaScript library that allows extracting text from images in both client-side and server-side environments. Image-based text conversion to be editable and searchable. Digital text efficiently and accurately happens. The system uses Tesseract.js which leverages the power of the Tesseract OCR engine to support multiple languages. Processes various image formats (JPEG, PNG, TIFF) and improves OCR accuracy on high-resolution images. This is especially true for printed text. By using preprocessing techniques such as grayscale conversion, Tesseract.js allows for easy integration with web platforms. It allows users to upload images and receive messages instantly. Although it is effective with sharp images But it faces challenges with handdrawn or low-quality images. Future improvements include improved pre-processing. Hand writing recognition and multi-language support for wider applications.