Deep reinforcement learning Hyper-heuristics for online marine port truck dispatching

  • Yuchang ZHANG

Student thesis: PhD Thesis


Real-world combinatorial optimisation problems (COPs) faced in ports are often subject to dynamics and uncertainties, i.e. the problem instance is not completely known in advance. Such problems are commonly called online combinatorial optimisation problems, in which uncertainties are often sequentially revealed over time during decision-making. In such situations, solutions generated as a priori often encounter various issues, consisting of inferior service quality, increased costs, and solution infeasibility, all of which would lead to substantial losses. This thesis is concerned with using Deep Reinforcement Learning (DRL)-based hyper-heuristics to solve online container truck dispatching problems in ports. The proposed method has comparatively better abilities to both exploit the special structures of the problem and discover the hidden patterns of the problem uncertainties from historical data. Specifically, two real-world online truck routing problems with progressively increasing difficulties drawn from a container truck terminal at an international port were addressed.

In this thesis, we first apply traditional DRL directly to solve online container truck dispatching problems. Then, we combine DRL with hyper-heuristics (HH) to create a new framework, namely a DRL-based hyper-heuristic (DRL-HH). Compared to traditional DRL, the proposed DRL-HH offers the advantage of reusing existing well-performing rules and experience in the form of low-level heuristics, while also enhancing the interpretability of solutions and exhibiting improved convergence properties.

The final research work of this thesis addressed a more challenging multi-scenario online container truck routing problem. In a multi-scenario problem, the problem uncertainties (service time of trucks) switch between different distributions in different problem-solving stages. A new framework, called DRL-GA-HH, that combines DRL-HH with GA is then proposed. It takes advantage of DRL's ability to recognise specific environmental patterns so that priori plans obtained by GA in some more stable scenarios (i.e. semi-deterministic scenarios) can be selected to improve the performance of the algorithm. The new method performs better overall compared to the DRL-HH framework. Some effective tricks, such as GA with variable visions and GA with surrogate networks, are developed during the research. They can help GA perform better in an online environment and can also be useful for other meta-heuristics when interacting with DRL in online environments.
Date of AwardMar 2024
Original languageEnglish
Awarding Institution
  • University of Nottingham
SupervisorRuibin Bai (Supervisor) & Rong Qu (Supervisor)


  • Hyper-heuristics
  • Deep Reinforcement Learning
  • online combinatorial optimisation problems

Cite this