Document Type
Thesis
Degree Name
Master of Applied Computing
Department
Physics and Computer Science
Program Name/Specialization
Applied Computing
Faculty/School
Faculty of Science
First Advisor
Dr. Emad Mohammed
Advisor Role
Supervision, Formal analysis, Investigation, Project administration, Validation, Editing, Review
Second Advisor
Dr. Sukhjit Singh Sehra
Advisor Role
Visualization, Validation, Writing – review and editing
Abstract
Effective and interpretable classification of medical images remains a critical challenge in computer-aided diagnosis, particularly in data-scarce and resource-constrained clinical settings where traditional deep learning models prove impractical. This study addresses the fundamental barrier to Vision Transformer adoption in medical imaging—massive parameter counts and data requirements—through a systematic two-phase methodology. Phase 1 evaluates three spline-based Kolmogorov–Arnold Network (KAN) variants to identify the optimal nonlinear approximation function for parameter-efficient medical image classification: SBTAYLOR-KAN (B-splines with Taylor series), SBRBF-KAN (B-splines with Radial Basis Functions), and SBWAVELET-KAN (B-splines with Morlet wavelets). Comprehensive experiments across brain MRI, chest X-rays, and tuberculosis datasets—without any image preprocessing beyond the datasets' original published forms—establish SBTAYLOR-KAN as the superior architecture, achieving up to 98.93% accuracy with only 2,872 parameters (>99% reduction versus ResNet50's ~24.18M) while maintaining 86% accuracy using merely 30% training data. Statistical validation through kappa coefficients, Matthews correlation, and cross-dataset generalization confirms its robustness and data efficiency. Building directly on these findings, Phase 2 integrates the optimal Taylor-KAN formulation into a Vision Transformer architecture, yielding TaylorKAN-ViT—an ultra-compact model that fundamentally reimagines efficient transformer design through a KAN-first approach. Unlike conventional ViTs that retrofit parameter reduction techniques while retaining MLP-heavy architectures, TaylorKAN-ViT replaces all nonlinear feed-forward mappings with the Taylor-series-approximated KAN modules identified in Phase 1. This enables dual-scale feature learning: KAN transformations capture fine-grained local patterns within image patches, while self-attention mechanisms model long-range global dependencies across the entire image. Despite comprising only 88.9K parameters and 4.9G FLOPs—representing ~99.3% parameter reduction compared to recent medical ViTs (MedKAFormer-T: 12.47M, MedViTV2-T: 12.3M)—TaylorKAN-ViT achieves competitive performance across four diverse benchmarks: 94.36% accuracy on PneumoniaMNIST, 95.90% on CPNX-ray, 61.00% on PAD-UFES-20, and 70.50% on Kvasir. The model demonstrates stable generalization even under limited-data and class-imbalanced conditions. These results demonstrate that clinical-grade medical image classification is achievable without large-scale models, establishing TaylorKAN-ViT as a practical, deployable solution for edge devices, mobile platforms, and resource-limited healthcare environments.
Recommended Citation
Fatema, Kaniz; Mohammed, Dr. Emad; and Sehra, Dr. Sukhjit Singh, "TaylorKAN-ViT: Parameter-Efficient Vision Transformers for Medical Image Classification" (2026). Theses and Dissertations (Comprehensive). 2901.
https://scholars.wlu.ca/etd/2901
Convocation Year
2026
Convocation Season
Spring