Explainable machine learning for antimalarial activity prediction in drug discovery

Date
2025
Authors
Namulinda, Hellen
Journal Title
Journal ISSN
Volume Title
Publisher
Makerere University
Abstract
Malaria remains a significant global health burden, causing substantial morbidity and mortality, particularly in tropical and subtropical regions. While effective antimalarial drugs exist, such as quinine, chloroquine, antifolates and artemisinin, the emergence of drug-resistant strains of Plasmodium falciparum emphasises the need for ongoing drug discovery efforts. One of the primary challenges in drug discovery is the high failure rate, with over 90% of candidate drugs failing to reach clinical trials. To address these challenges, the pharmaceutical industry and research institutions have explored alternative approaches to drug discovery, including artificial intelligence (AI) and machine learning (ML) techniques. Despite the growing array of ML methods for drug discovery, these techniques often demand expertise. Furthermore, there is a limited exploration into the rationale behind predictions, which is essential for understanding why a specific compound shows potential as an antimalarial agent. Understanding the types of molecular representations and relationships between chemical structure and activity prediction is necessary for researchers to refine molecules and design more effective drugs. This dissertation explored the application of ML models for predicting antimalarial activity in chemical compounds, with a focus on enhancing the interpretability of these models through Explainable Artificial Intelligence (XAI) techniques. A key contribution of this work is the development of the XAI4Chem tool, which integrates interpretability into the ML workflow for cheminformatics, allowing researchers to better understand the factors influencing predictions. Using data from the ChEMBL database, models were trained on molecular descriptors (RDKit, Datamol, and Mordred) and fingerprints (RDKit and Morgan) to predict the percentage inhibition and classify compounds as active or inactive. Models trained on RDKit descriptors with 64 selected features achieved a higher performance in regression (R² of 0.563), outperforming Morgan Fingerprints (R² of 0.5012). Both RDKit descriptors and Morgan fingerprints achieved 97% test accuracy in classification. SHAP (Shapley Additive exPlanations) value analysis identified key molecular features such as the compound’s lipophilicity (MolLogP), polar surface area (TPSA), number of amide functional groups (fr_amide), and the estimated drug-likeness (QED) as significant drivers of predictions.
Description
A dissertation submitted to the Directorate of Research and Graduate Training in partial fulfillment of the requirements for the award of the Degree of Master of Science in Computer Science of Makerere University
Keywords
Citation
Namulinda, H. (2025). Explainable machine learning for antimalarial activity prediction in drug discovery (Unpublished master’s dissertation). Makerere University, Kampala, Uganda