Machine Learning Approach for Software Defect Prediction

Hossen, Md Anwar; Islam, Md. Shariful; Yusof, Nurhafizah Abu Talip; Rahman, Md. Sakib; Siddika, Fatema; Rahman, Mostafijur; Khatun, Sabira; Karim, Mohamad Shaiful Abdul; Mahmud, S. M. Hasan

DSpace Home
→
DIU Faculty Publication
→
Articles
→
View Item

dc.contributor.author	Hossen, Md Anwar
dc.contributor.author	Islam, Md. Shariful
dc.contributor.author	Yusof, Nurhafizah Abu Talip
dc.contributor.author	Rahman, Md. Sakib
dc.contributor.author	Siddika, Fatema
dc.contributor.author	Rahman, Mostafijur
dc.contributor.author	Khatun, Sabira
dc.contributor.author	Karim, Mohamad Shaiful Abdul
dc.contributor.author	Mahmud, S. M. Hasan
dc.date.accessioned	2021-11-17T10:29:05Z
dc.date.available	2021-11-17T10:29:05Z
dc.date.issued	2020-03-24
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/6388
dc.description.abstract	The software has turn into an imperious part of human’s life. In the recent computing era, many large-scale complex network systems and millions of modern technological devices produce a huge amount of data every second. Among these data, the amount of imbalanced data is relatively excessive. The machine learning model is miss leaded by these imbalanced data. Software Defect Prediction (SDP) is a standout amongst the most helping exercises during the testing phase. The estimated cost of finding and fixing defects is approximately billions of pounds per year. To reduce this problem, software defect prediction has come forth but need fine tuning to have expected efficiency. In this chapter, we have proposed a new model based on machine learning approach to predict software defect and identify the key factors that may help the software engineer to identify the most defect-prone part of the system. The proposed model works as follows. First, need to remove highly correlated features and turn all the feature in the same scale using the scaling feature approach. Second, we have used Synthetic Minority Over-Sampling Technique (SMOTE), Adaptive Synthetic (ADASYN) and Hybrid sampling method to balance highly imbalanced datasets. Third, Random Forest Importance and Chi-square algorithms are chosen to find out the factors which have high effect on software defect. Cross validation is used to remove overriding problem. Scikit-learn library is used for machine learning algorithms. Pandas library is used for data processing. Matplotlib, and PyPlot are used for graph and data visualization respectively. The hybrid sampling method and Random Forest (RF) algorithms achieved the highest prediction accuracy about 93.26% by showing its superiority.	en_US
dc.language.iso	en_US	en_US
dc.publisher	Lecture Notes in Electrical Engineering, Springer	en_US
dc.subject	Software defect prediction	en_US
dc.subject	Machine learning	en_US
dc.subject	Imbalanced dataset	en_US
dc.subject	Chi square	en_US
dc.subject	Random forest importance	en_US
dc.title	Machine Learning Approach for Software Defect Prediction	en_US
dc.type	Article	en_US