Static Malware Detection Using Machine Learning: A Feature-Based  Approach

Islam, Ariful

DSpace Home
→
Faculty of Science and Information Technology
→
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
→
Project Report
→
View Item

Static Malware Detection Using Machine Learning: A Feature-Based Approach

Islam, Ariful

URI: http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/16588

Date: 2025-09-17

Abstract:

Malware represents a serious existential threat to digital security because adversaries are now using obfuscation, polymorphism, and encryption to evade the use of conventional signature-based detection systems. To surmount these deficiencies, the presented work proposes a static malware detection framework using machine learning that relies on light weight feature-based techniques. In the experiments, the proposed system utilizes the CICMalMem2022 dataset with 58,596 benign and malicious Portable Executable (PE) files by using Mutual Information Gain in identifying and filtering 55 extracted features down to 20 most discriminative features. Various supervised algorithms, such as Random Forest, XGBoost, Logistic Regression, Support Vector Machine or Artificial Neural Network, are trained on 80:20 dataset and strongly evaluated under the attack scenarios that incorporate Gaussian noise, feature scaling, and permutation. The findings show that the five models had a 100 percent accuracy, precision, recall, and F1-score on the balanced test set, which proved the high discriminative power of the chosen features. Also, robustness analysis ensured that the models did not respond well to evasion strategies and interpretability became available as the Local Interpretable Model-Agnostic Explanations (LIME) were used to highlight significant contributions of features in the model that lead to classification decisions. Such results highlight the fact that besides performing state-of-the-art in terms of detection accuracy, the proposed framework also incorporates the concepts of resilience and transparency, which are essential features to be able to deploy the framework to the real world. This study adds a reproducible and performant malware identification pipeline that may be adopted to security operations centers (SOCs), resource-constrained systems, including IoT devices, and teaching or training sessions. The alignment of the high detection performance, explainability and robustness is able to fill significant gaps in the state-of-the-art malware detection methods as well as offering a promising direction on how the digital infrastructure can be more secure and trustworthy.