DSpace Repository

Selectively Oversampling Difficult Positive Samples from Imbalanced Data for Preprocessing

Show simple item record

dc.contributor.author Mahin, Md.
dc.contributor.author Rukhsara, Lamia
dc.contributor.author Kabir, Md. Yasin
dc.contributor.author Rahman, H M Mostafizur
dc.contributor.author Islam, Md Jahidul
dc.contributor.author Khatun, Ayesha
dc.contributor.author Kabir, Sumaiya
dc.date.accessioned 2021-11-01T08:08:39Z
dc.date.available 2021-11-01T08:08:39Z
dc.date.issued 2020-03-19
dc.identifier.uri http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/6308
dc.description.abstract Oversampling is a procedure traditionally has been applied to train machine learning classifiers for a better performance in presence of class imbalance. This work suggests a new insight for oversampling imbalanced data. In literature Borderline samples are mainly focused for oversampling. How-ever, because of low number of samples within the positive class a huge percentage of samples can be labeled as Rare and Outliers. These samples are often overlooked by the traditional oversampling methods or the nearest negative samples are often removed to increase positive prediction rate- while sacrificing the negative prediction rate. This work demonstrates that by only oversampling the Borderline, Rare and Outlier samples at different rate, better performance can be achieved than all other pre-processing methods. The proposed method is applied on four datasets- Abalone, CMC, Solar Flare and Seismic Bump, collected from the UCL digital library and compared with four traditional pre-processing methods ADYSYN, SMOTE, Border-line SMOTE 1 and 2 from imbalanced learn toolkit python. The result analysis shows that with fine tuning better performance can be achieved for all known performance measurements: Accuracy, True Positive Rate, True Negative Rate, Geometric Mean, Area Under the Curve measure and F-measure. en_US
dc.language.iso en_US en_US
dc.publisher 22nd International Conference on Computer and Information Technology, ICCIT 2019, IEEE en_US
dc.subject Data analysis en_US
dc.subject Artificial intelligence en_US
dc.subject Pattern classification en_US
dc.subject Sampling methods en_US
dc.title Selectively Oversampling Difficult Positive Samples from Imbalanced Data for Preprocessing en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account

Statistics