Abstract:
Adversarial manipulation of static malware features can derail learned detectors. This
research builds and audits a complete pipeline that measures and strengthens
robustness for Windows executable screening under explicit, realistic threat models.
Windows binaries were represented as fixed-length feature vectors and classified by
three complementary learners; a Feed Forward Neural network (FNN) with bounded
inputs, a one-dimensional Convolutional Neural Network (1D-CNN) with
standardized inputs, and a Gradient-Boosting (LitghtGBM) tree on raw features.
Evasion was evaluated with the Fast Gradient Sign Method (FGSM) and Projected
Gradient Descent (PGD) for the neural models and with a decision guided routine for
the tree model. Defenses combined adversarial training, feature squeezing detectors
calibrated at a fixed 1% false-positive rate, and simple probability averaging
ensembling. All iteration used fixed seeds and produced saved artifacts for audit.
Clean baselines on the full test set were strong: gradient-boosted trees reached 97.63%
accuracy and 99.62% receiver operating characteristic area under the curve (ROCAUC), the convolutional network reached 96.37% and 99.24%, and the feed forward
network reached 95.68% and 98.48%. Under iterative adversarial attacks, the
convolutional network falls to 0.23% at a large projected gradient descent budget and
the feed forward network to 28.43% at a moderate budget. Adversarial training
improved robustness only marginally while reducing clean accuracy; detectors
recovered up to 44.00% true-positive rate at a 1% false-positive rate; ensembling
stabilized predictions but did not neutralize strong attacks.