Abstract:
This project focuses on developing a web-based optical character recognition (OCR)
system using Tesseract.js, a JavaScript library that allows extracting text from images in
both client-side and server-side environments. Image-based text conversion to be editable
and searchable. Digital text efficiently and accurately happens. The system uses
Tesseract.js which leverages the power of the Tesseract OCR engine to support multiple
languages. Processes various image formats (JPEG, PNG, TIFF) and improves OCR
accuracy on high-resolution images. This is especially true for printed text. By using preprocessing techniques such as grayscale conversion, Tesseract.js allows for easy
integration with web platforms. It allows users to upload images and receive messages
instantly. Although it is effective with sharp images But it faces challenges with handdrawn or low-quality images. Future improvements include improved pre-processing.
Hand writing recognition and multi-language support for wider applications.