Prediction of Patient’s Stroke Vulnerability Status Using Logistic Regression Machine Learning Model

PDF Review History

Published: 2024-06-20

Page: 70-82

Okpe Anthony Okwori *

Department of Computer Science, Federal University Wukari, Nigeria.

Moses Adah Agana

Department of Computer Science, University of Calabar, Nigeria.

Ofem Ajah Ofem

Department of Computer Science, University of Calabar, Nigeria.

Obono I. Ofem

Department of Computer Science, University of Calabar, Nigeria.

*Author to whom correspondence should be addressed.


In recent time, machine learning has been widely used in healthcare services due to its efficiency in solving health-related problems through accurate prediction of diseases and medical conditions thereby assisting the physicians to diagnose diseases at an early stage. Machine learning models are equally used to handle complex and high-dimensional ever evolving huge amount of medical data to improve the accuracy and efficiency of disease prediction and diagnosis. This paper aims at applying machine learning model for the prediction of stroke vulnerability among individuals. In particular a Logistic Regression (LR) based stroke prediction model was described and developed using phyton programming language for the prediction of likelihood of stroke occurrence. Stroke usually occur due to blockage of blood flow to the brain cell which causes the brain cells to die as a result of lack of oxygen and nutrients. It is a medical emergency that may result in lasting brain damage, permanent disability and mortality across all ages. To reduce stroke occurrence, there is an urgent need for stroke prediction and life style changes. The logistic regression-based stroke prediction model was developed in this paper using the healthcare dataset stroke data obtained from Kaggle machine learning dataset repository. The dataset was preprocessed to improve the prediction performance using various dataset preprocessing techniques such as feature selection, feature encoding, missing values correction, class balancing, outlier detection and correction, feature scaling as well as hyperparameter turning. The preprocessed dataset was used for the training, validation and testing of the logistic regression stroke prediction machine learning model and was evaluated using python Scikit-Learn evaluation metrics such as accuracy score, precision score, recall score, f1-score, specificity score as well as area under receiver operating characteristic curve (AUC-ROC). After successful evaluation, the model produced a classification accuracy of 81% and AUC-ROC of 90%. This shows that logistic regression model is very efficient in stroke classification using the healthcare dataset and the proposed model has shown improvement over some existing stroke prediction model that uses logistic regression.

Keywords: Stroke, learning, supervised, regression, dataset, preprocessing

How to Cite

Okwori, O. A., Agana, M. A., Ofem, O. A., & Ofem, O. I. (2024). Prediction of Patient’s Stroke Vulnerability Status Using Logistic Regression Machine Learning Model. Asian Basic and Applied Research Journal, 6(1), 70–82. Retrieved from


Download data is not yet available.


Maqbool M, Toor UUR, Nahra SF. Stroke, a foremost cause for disability and functional impairment, Indo American Journal of Pharmaceutical Sciences. 2019; 6(3):5403-5409.

Ohoud A, Riyad A. Prediction of stroke using data mining classification techniques. International Journal of Advanced Computer Science and Applications. 2018;9:457-460.

Maren E, Shipe ME, Stephen AD, Farhood F, Eric LG. Developing prediction models for clinical use using logistic regression: An overview. Journal of Thoracic Disease. 2019;11(4):576-584.

Ogbu HN, Agana MA. Intranet Security Using a LAN Packet Sniffer to Monitor Traffic. In Natarajan M. (Eds) CCSIT, NCWMC, DaKM. 2019;9(8):57-68.

Mateen BA, Liley J, Denniston AK, Holmes CC, Vollmer SJ. Improving the quality of machine learning in health applications and clinical research. Nature Machine Intelligence. 2020;2:554-556

Habehh H, Gohel S. Machine Learning in Healthcare, Current Genomics. 2021; 22: 291-300.

Mohammed GM. Detection and analysis of diabetes by using logistic regression (LR). International Research Journal of Modernization in Engineering Technology and Science. 2023;5(1):609-613.

Barbosa C. Prediction Model of Heart Disease With Logistic Regression; 2020. Available:, accessed on 18th January, 2023

Babatola TB. Heart disease prediction : A logistzic regression implementation from python scikit-learn; 2020. Available:, accessed on 18th January, 2023

Ciu T, Oetama RS. Logistic regression prediction model for cardiovascular disease, International Journal of New Media Technology. 2020;VII(1):33-38.

Zhang Y, Diao L, Ma L. Logistic regression models in predicting heart disease. Journal of Physics: Conference Series. 2021;1-5.

Eleftherakou O. Stroke prediction: Logistic Regression with Julia; 2022. Available:, accessed on 15th December, 2022

Shayesteh SP, Shiri I, Karami AH, Hashemian R, Kooranifar S, Ghaznavi H, Shakeri-Zadeh A. Predicting lung cancer patients’ survival time via logistic regression based models in a quantitative radiomic framework. Journal of Biomed Phys Eng. 2020;10(4):479-492.

Ambrish G, Ganesh B, Anitha G, Chetana S, Kiran M. Logistic regression technique for prediction of cardiovascular disease. Journal of Global Transitions Proceedings. 2022;3:127–130

Ram DJ, Chandra KD. Predicting type 2 diabetes using logistic regression and machine learning approaches. International Journal of Environmental Research and Public Health. 2021; 1-17.

Nopour R, Shanbehzadeh M, Kazemi-Arpanahi H. Using logistic regression to develop a diagnostic model for COVID-19: A single-center study, Journal of Education and Health Promotion. 2022;11:1-6.