Deep reinforcement learning for truck task dispatching optimization problem in a real-life marine container terminal

Student thesis: PhD Thesis

Abstract

In recent decades, dynamic real-time task dispatching has emerged as an essential area of study within the context of modern logistics and global supply chain development. Challenges associated with response timeliness, uncertainties, and solution generalization of this problem have gradually emerged among various real-life cases. This thesis focuses on the truck task dispatching optimization problem under the background of marine container terminals - the pivots of ocean transportation. Proceeding from the practical application angle, several key bottlenecks of the examined problem in a container terminal are addressed, and several corresponding solutions are provided.

At the earlier stage of this research, a Real2Sim simulation framework is developed, which reproduced the most concerned details and logic of a real-world container terminal. Mechanisms that help to close the performance gap between reality and simulation are designed, which makes the obtained solutions as practical as possible. In the first primary research work, a spatial attention-based deep reinforcement learning (DRL) approach is applied to the examined problem. The DRL method verifies its effective performance and demonstrates the capability to properly cope with the multi-scenario issue. In this stage, the feasibility of DRL-based methods to solve online optimization problems is confirmed, and a solid foundation is set up for the subsequent research work.

The examined problem is extended to a multi-objective optimization (MOO) version in the second stage of the research. Recently, demand for dynamic MOO is fast-emerging due to severe market competition and ever-growing requirements for customization and agility in business services. However, most of the existing evolutionary-based MOO approaches generate a finite set of trade-off solutions, which usually cannot efficiently obtain the most desired preference. To tackle such an issue, a preference-agile multi-objective optimization (PAMOO) methodology is proposed to permit users to dynamically adjust and interactively assign preferences. To achieve this, a novel uniform network is designed that could properly handle arbitrary user preferences. Benefit from such an attribute, a preference calibration method is then developed to further enhance the policy set quality.

For complex real-world optimization problems, it is costly for the DRL agent to learn a sophisticated policy from scratch, and the techniques to accelerate the training process are practical for real-life applications. This thesis explores the mechanisms to tackle such issues by introducing prior expert knowledge. For the single-objective case, an expert network-assisted dispatching model is designed, which has shown great convergence efficiency and the ability to handle high-level uncertainties. Following the similar principle, a policy fusion approach is proposed for the MOO problem, which reduces the training cost and demonstrates its potential to solve complex real-life optimization problems.
Date of Award15 Jul 2025
Original languageEnglish
Awarding Institution
  • University of Nottingham
SupervisorRuibin Bai (Supervisor) & Rong Qu (Supervisor)

Cite this

'