Document Type
Thesis
Degree Name
Master of Applied Computing
Department
Physics and Computer Science
Program Name/Specialization
Applied Computing
Faculty/School
Faculty of Science
First Advisor
Dr. Yang Liu
Advisor Role
Thesis Supervisor
Abstract
Alzheimer’s disease (AD) is a neurodegenerative condition that gradually deteriorates memory and cognitive abilities, posing a significant global health challenge. While deep learning networks applied to structural magnetic resonance imaging (MRI) have achieved high diagnostic accuracy, their decision-making processes often lack transparency due to their "black-box" nature. This opacity limits the clinical adoption of these models, as achieving high predictive accuracy does not necessarily ensure that the model’s decisions are based on genuine neuropathological biomarkers rather than mere artifacts. To address this gap between complex deep learning models and their trustworthy clinical adoption, this thesis proposes a clinically grounded evaluation framework for explainable AI (XAI) in 3D neuroimaging, aimed at enabling reliable and interpretable validation of model explanations for AD diagnosis.
To motivate this framework, a structured pipeline was first developed to benchmark the predictive performance and interpretability of multiple architectures, including custom 3D DenseNet and ResNet-18 models, using the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset. This stage involved both qualitative and preliminary quantitative assessments of eight XAI methods, including gradient-based approaches such as Grad-CAM, HiRes-CAM, and Guided Backpropagation, as well as model-agnostic methods such as Kernel SHAP, LIME, and RISE. Comparative analyses revealed substantial variation in spatial attribution patterns, highlighting the limitations of existing evaluation practices and the need for standardized, pathology-aware criteria.
Building on these observations, the proposed framework introduces two complementary quantitative evaluation protocols. First, an Anatomy-Aware stress test evaluates attribution stability by simulating controlled, clinically meaningful regional atrophy, addressing the limitations of conventional random perturbation methods. Second, a clinical alignment scoring system quantifies the agreement between model saliency maps and established AD pathology by incorporating region-specific importance and evaluating both distributional alignment and ordinal consistency.
Experimental results reveal significant disparities in XAI reliability that standard accuracy metrics fail to detect. While widely used methods like Grad-CAM++ exhibited considerable instability under anatomical changes, layer-independent gradient-based methods such as Guided Backpropagation demonstrated stronger spatial consistency and clinical alignment. More broadly, the results highlight that no single evaluation metric is sufficient, and that commonly used validation approaches may produce misleading conclusions about explanation faithfulness.
By integrating these components, this work presents a reproducible and systematic evaluation framework for assessing the reliability, robustness, and clinical relevance of XAI methods in 3D neuroimaging, supporting the development of trustworthy AI-assisted diagnostic systems.
Recommended Citation
Chakroborty, Tamal, "Beyond Accuracy: Explainable Deep Learning for Alzheimer’s Disease Detection Using Structural MRI Data" (2026). Theses and Dissertations (Comprehensive). 2921.
https://scholars.wlu.ca/etd/2921
Convocation Year
2026
Convocation Season
Fall