SQLi Attack Detection Using Machine Learning Techniques for Web Application Security

Hasan, Md. Siam

DSpace Home
→
Faculty of Science and Information Technology
→
DEPARTMENT OF SOFTWARE ENGINEERING
→
Thesis Report
→
View Item

dc.contributor.author	Hasan, Md. Siam
dc.date.accessioned	2026-04-27T04:25:02Z
dc.date.available	2026-04-27T04:25:02Z
dc.date.issued	2025-12-27
dc.identifier.citation	SWT	en_US
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/17073
dc.description	Thesis Report	en_US
dc.description.abstract	This thesis addresses the persistent threat of SQL injection attacks, which remain one of the most critical vulnerabilities in web applications despite the widespread use of firewalls and input filters. Such traditional defenses often fail to generalize to previously unseen attack patterns. To tackle this limitation, we develop and evaluate a machine learning based detection framework for SQLi, designed to be integrated into web security. Incoming SQLi query are first preprocessed and transformed into TF-IDF feature vectors, capturing both benign and malicious query patterns. On top of these features, we train and compare six supervised classifiers: Logistic Regression, Linear Support Vector Machine, Decision Tree, Random Forest, Complement Naive Bayes and XGBoost. Models are assessed using ROC-AUC, Precision-Recall AUC (PR-AP), confusion matrices and class wise precision, recall and F1-score on a validation set of 3,981 samples. All the models achieved strong validation performance (ROC-AUC ≥ 99.57%, PR-AP ≥ 98.91%), with Random Forest and Logistic Regression showing particularly high accuracy. Logistic Regression is selected as the primary model based on its best validation PR-AP (99.90%) and consistently high F1-scores for both classes. On an independent test set of 4,280 requests, the selected model attains a ROC-AUC of 99.97% and PR-AP of 99.99%. After optimizing the decision threshold using an F2-score constraint and a cost sensitive objective that heavily penalizes missed attacks, the deployed configuration reaches 99.93% overall accuracy, with macro-F1 of 99.64%, detecting 4,057 out of 4,058 SQLi queries and misclassifying only two benign requests as attacks. These results demonstrate that a carefully tuned, interpretation friendly linear model on TF-IDF features can deliver near perfect SQLi detection performance, offering a practical and easily deployable enhancement to existing web security mechanisms.	en_US
dc.description.sponsorship	DIU	en_US
dc.language.iso	en_US	en_US
dc.publisher	Daffodil International University	en_US
dc.subject	SQL Injection	en_US
dc.subject	Attack	en_US
dc.subject	Detection	en_US
dc.subject	Machine learning	en_US
dc.subject	Web Security	en_US
dc.title	SQLi Attack Detection Using Machine Learning Techniques for Web Application Security	en_US
dc.type	Thesis	en_US