Abstract:
When a blood vessel that supplies the brain with oxygen and nutrients becomes blocked by a clot or breaks, a stroke happens. Nowadays, it is the most common cause of death in the entire world. Every four minutes, someone become a victim of stroke and passes away, although 80% of stroke deaths may be avoided if we could recognize or anticipate them before they happened. Early stroke detection may be preferable to reducing the severity of the condition. Data science has played a significant role in the development of medical studies in recent years. To predict the chance of a stroke, several machine learning approaches are developed that use a patient's physical and physiological reporting data. The most significant risk factors for stroke in patients include age, cardiac disease, average blood sugar level, and hypertension. In this study, we employ
Decision Tree, XG Boost, Light Gradient Boosting Machine (LGBM), Random Forest and K-nearest Neighbors learning as five machine learning algorithms to determine the most accurate model that can anticipate the risk of stroke and the dataset was collected through Kaggle. In this study, comparing with other machine learning algorithms utilized, the testing results indicate that the Random Forest algorithm has the maximum accuracy rate that is 96%.
Keywords: Ischemic stroke, Hemorrhagic stroke, Light Gradient Boosting Machine,
Precision-Recall Curve · Random Forest · ROC Curve· Stroke Prediction.