Abstract:
Breast cancer is a common disease among
women globally. Past studies have used Machine
learning techniques to speed up the prediction of the
disease using labeled datasets. This study proposed a
supervised machine learning approach for the
classification of breast cancer. The model was built
using Random Forest Algorithm. The dataset chosen
for this study is a Wisconsin breast cancer (Diagnostic)
dataset. The breast cancer dataset was originally
released by the University of Wisconsin Hospitals,
Madison. Python programming language and some of
its libraries were used for the experimental analyses.
The dataset was split in the ratio 75:25 percent as
training and testing sets respectively. The metrics used
for the performance evaluation of the model built
include: accuracy, precision, recall, f1-score, and
Cohen’s Kappa Statistics. In the experimental
analyses, accuracy of 96% was recorded. 98% was
obtained for the precision. For the recall, 96% was
obtained. Moreso, 97% was obtained for F1-score
while 91% was recorded for Cohen’s Kappa Statistics.
The model provides superior classification
performance in terms of the chosen evaluation metrics.