Abstract
Jigsaw puzzle solving, once regarded primarily as a recreational activity, has evolved into a significant research area within computer vision and artificial intelligence. Early computational approaches, dating back to the 1960s, mainly relied on boundary information—leveraging geometric features, such as shape and curvature, and color gradients along the edges of pieces—to guide puzzle assembly. Over the following decades, these boundary-based methods were extended with techniques like curve matching, statistical modeling, and combinatorial optimization, allowing for the reconstruction of fragmented objects and shredded documents. In recent years, however, the focus has shifted toward exploiting the semantic information embedded within each piece. Modern methods, powered by deep learning, reinforcement learning, and advanced feature extraction, now utilize high-level visual context and semantic cues to solve puzzles, enabling robust assembly even in the presence of eroded gaps, missing pieces, or ambiguous boundaries. The developed techniques have been widely used in applications beyond puzzle solving. This thesis focuses on solving Jigsaw Puzzles of Eroded Gaps in an integrated framework, where we formulated the puzzle reassembly as a Combinatorial Optimization problem and utilized a series of Combinatorial Optimization methods to solve it.Firstly, the first work proposes a Siamese-Discriminant Deep Reinforcement Learning (SD$^2$RL) to solve the Jigsaw Puzzle of Large Eroded Gaps (JPLEG). A Deep Q-network (DQN) is designed to visually understand the puzzles, consisting of two sets of Siamese Discriminant Networks, one to perceive the pairwise relations between vertical neighbors and another for horizontal neighbors. The proposed DQN considers not only the evidence from the incumbent fragment but also the support from its four neighbors. The DQN is trained using replay experience with carefully designed rewards to guide the search for a sequence of fragment swaps to reach the correct puzzle solution. Two JPLEG datasets are constructed to evaluate the proposed method, and the experimental results show that the proposed SD$^2$RL significantly outperforms state-of-the-art methods.
Secondly, to tackle the challenges of solving large-scale puzzles of gaps presents distinctive challenges in both image understanding and combinatorial optimization, the second work proposes a framework of Evolutionary Reinforcement Learning with Multi-head Puzzle Perception (ERL-MPP) to derive a better set of swapping actions for solving the puzzles. Specifically, to tackle the challenges of perceiving the puzzle with gaps, a Multi-head Puzzle Perception Network (MPPN) with a shared encoder is designed, where multiple puzzlet heads comprehensively perceive the local assembly status, and a discriminator head provides a global assessment of the puzzle. To explore the large swapping action space efficiently, an Evolutionary Reinforcement Learning (EvoRL) agent is designed, where an actor recommends a set of suitable swapping actions from a large action space based on the perceived puzzle status, a critic updates the actor using the estimated rewards and the puzzle status, and an evaluator coupled with evolutionary strategies evolves the actions aligning with the historical assembly experience. The proposed ERL-MPP is comprehensively evaluated on the JPLEG-5 dataset, which has large gaps, and the MIT dataset, which contains large-scale puzzles. It significantly outperforms all state-of-the-art models on both datasets.
Finally, existing solvers often overlook puzzles with missing pieces. The missing pieces, together with gaps between pieces, pose significant challenges, amplified by a large solution space. To tackle the challenges, the third work proposes Co-Evolutionary Agents for Reassembling and Inpainting (CEARI), one agent to inpaint missing contents and the other to reassemble the puzzle, with a shared perception network to perceive the puzzle status. The reassembly agent utilizes an evolutionary algorithm to explore the large solution space, to discover a sequence of fragment-swapping actions to efficiently reassemble the puzzle, while the inpainting agent evolves from using a local outpainting network at the early stage to using a global inpainting network at the latter stage. Furthermore, a co-evolutionary training paradigm is designed to iteratively evolve the two agents in a coherent and collaborative manner, improving reassembly accuracy and inpainting quality simultaneously. Experimental results on three datasets show that CEARI largely outperforms state-of-the-art methods in terms of both reassembly accuracy and inpainting quality.
Across the thesis, puzzle solving is formulated as a combinatorial optimization problem and addressed with learning-based methods that tightly couple visual perception with reassembly strategies under uncertainty. In practice, these methods support reconstruction of fragmented or eroded artifacts and documents in archaeology, cultural heritage restoration, and digital forensics, where gaps and missing content are common and boundary cues are unreliable.
| Date of Award | 15 Jul 2026 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Ruibin Bai (Supervisor), Jianfeng Ren (Supervisor) & Xin CHEN (Supervisor) |
Keywords
- Deep Learning
- Reinforcement Learning
- Combinatorial Optimization
- Computer Vision