Document Type

Thesis

Degree Name

Master of Applied Computing

Department

Physics and Computer Science

Program Name/Specialization

Applied Computing

Faculty/School

Faculty of Science

First Advisor

Dariush Ebrahimi

Advisor Role

Supervisor

Abstract

This thesis offers a comprehensive exploration of Reinforcement Learning (RL), beginning with fundamental theoretical constructs, Markov Decision Processes, Dynamic Programming, Monte Carlo, and Temporal Difference methods, and extending into state-of-the-art deep RL approaches such as Deep Q-Networks (DQN) and policy-gradient algorithms. Through analytical experiments in controlled environments, the thesis demonstrates how distinct algorithmic choices (e.g., exploration techniques, eligibility traces, or network architectures) influence convergence and stability. These foundational insights pave the way for two in-depth case studies, which apply RL techniques to critical, real-world scheduling and routing challenges.

The first case study tackles the Electric Vehicle (EV) routing and charging problem, in which vehicles must navigate a road network populated with limited charging stations under battery and time constraints. To address this, first, a heuristic-based Q-learning (TQL) strategy that extends classical tabular Q-learning with temporal features of traffic and charging demands is proposed. A variant of deep Q-learning (DQL) is then introduced to accommodate larger-scale networks and continuous state parameters. Extensive simulations reveal that RL-based methods, particularly TQL, outperform baseline heuristics by reducing total travel distance and charging overhead. Notably, DQL demonstrates scalability, effectively handling more nodes and dynamic charging station placements, an essential factor for complex, real-time EV routing scenarios.

The second case study dives deeper into fleet management through a vehicle patrol scheduling problem where multiple patrol vehicles must repeatedly traverse a set of critical locations subject to diverse constraints, including patrol frequency, travel times, and emergency calls. This study thoroughly compares classical optimization (e.g., mixed integer linear programming), heuristic (e.g., genetic algorithms, adaptive hill-climbing), and RL-based approaches. Of particular interest is the proposed $\mathcal{P}^4\mathcal{O}$ RL framework, which integrates a policy-based architecture with real-time decision-making to handle unpredictable emergency events and evolving traffic conditions. By combining RL and heuristic strategies, the system maintains efficient patrol coverage while dynamically reallocating vehicles to emergent hotspots. Simulation-based evaluations demonstrate significant improvements in response times, minimized travel distance, and balanced distribution of vehicle workloads, even under high uncertainty. In addition, sensitivity analyses examine how variations in patrol duration, rest times, and emergency rates impact the performance of each algorithm, providing actionable guidelines for deploying RL-driven patrol strategies in real-world settings.

By systematically integrating classical RL algorithms with modern deep learning architectures, this thesis reveals how an agent’s learning process can be significantly improved when critical constraints, such as energy or resource limitations, timing requirements, or rapidly changing system states, are carefully embedded into RL reward structures, state representations, and update mechanisms. Extensive experiments demonstrate that tabular approaches excel in managing smaller, more structured problems, whereas deep methods scale more effectively in complex domains with high-dimensional or continuous state spaces. Crucially, hybrid RL solutions bridge these strengths by incorporating domain-oriented heuristics, adaptive exploration, and targeted function approximation, resulting in faster convergence and more robust behavior in the presence of uncertainty. Collectively, these findings show that thoughtfully designed RL systems, whether tabular, deep, or hybrid, can quickly adapt to new conditions, consistently surpassing conventional heuristics or purely optimization-based methods while maintaining computational feasibility. Ultimately, this work underscores the transformative potential of RL for dynamic scheduling, routing, and resource management, illustrating a practical pathway to achieve higher-quality decision-making in real-world deployments.

Convocation Year

2025

Convocation Season

Spring

Share

COinS