| dc.description.abstract |
Automatic License Plate Recognition (ALPR) systems are critical for intelligent transportation and security infrastructure yet remain challenging for scripts with complex characters like Bangla. Bangla license plate contains complex structure such as curved glyphs, complex conjuncts and area-specific layouts which makes traditional OCR-based pipelines to struggle in the presence of occlusion, motion blur and low-resolution surveillance footage. This paper presents a novel multi-layer end-to-end Bangla ALPR system using YOLOv12's attention-centric architecture. The proposed pipeline utilized a lightweight family of YOLOv12 models so that it can make the feature representation more consistently optimized across vehicle, plate and character detection and improve robustness to scale variation of urban backgournd. We introduce a three-layer model approach: (1) a YOLOv12-based vehicle detection model (0.975 mAP@0.50, 0.924 mAP@0.50:0.95, 2.3 ms/inference), (2) a YOLOv12n license plate detection model (0.975 mAP@0.50, 2.3 ms/inference), and (3) a specialized YOLOv12 character recognizer for Bangla glyphs (0.986 mAP@0.50, 0.750 mAP@0.50:0.95), eliminating OCR dependencies. All the layers are trained on real images of Bangladeshi traffic scenes covering various illumination, cluttered urban scenes, diverse viewpoints and multiple plate layouts to ensure a generalized to real roads of Bangladesh. Trained on real-world Bangladeshi vehicle datasets, our system processes 640×640 resolution images on a consumer-grade GPU. The character recognition model handles 102 classes including conjuncts such as ক্ষ (kkho), জ্ঞ (gya) etc through coordinate-based reconstruction, achieving reliable detection and recognition of Bangla license plate numbers in unconstrained traffic scenes. This study proposes a fast and reliable Bangla license plate recognition solution for real-life traffic scenes and establishes a YOLOv12-based pipeline capable of complex-script ALPR. |
en_US |