Multi-task learning applications in deep learning

Chang SHU

School of Computer Science

Student thesis: PhD Thesis

Abstract

In the last decade, deep learning has significantly progressed in single-task learning, mimicking human intelligence and achieving state-of-the-art performance across a range of tasks. In contrast to single-task learning, Multi-task Learning (MTL) offers a more accurate reflection of the human learning process. MTL mitigates the laborious task of designing individual systems, making it easier to capture and extract diverse or advanced features from various training data. This thesis aims to address the challenges of designing MTL frameworks and creating better feature representations that can effectively tackle different tasks and data modalities for various deep learning-based applications.
Specifically, the MTL applications discussed in this thesis can be categorized into two main groups: single-modal and multi-modal works. Firstly, a novel MTL approach for early rumor detection is introduced. Unlike the majority of rumor detection frameworks, this proposed approach dynamically determines the checkpoint for each event, learning when it is appropriate to classify it as a rumor. Secondly, a novel open-domain dialogue generation technique is devised, incorporating two distinct tasks. This technique utilizes keyword extraction based on multi-source alignment to guide a two-stage transformer for the final generation process. Thirdly, a joint pre-training framework, MITTER, is proposed for medical images and text, leveraging multi-level contrastive learning. MITTER pre-trains the models using objectives derived from both paired and unpaired image and text data. MITTER also employs a novel uni-modal pre-training procedure that capitalizes on the spontaneous relationship (frontal and lateral, FINDINGS and IMPRESSION) between uni-modal medical data in a self-supervised manner. Additionally, the Momentum Contrast (MoCo) dictionary is employed to expand the pool of negative samples. During training, the number of negative samples is dynamically adjusted by leveraging the Alignment and Uniformity properties. Fourthly, an adaptive optimization objective is devised to enhance visual-semantic embeddings. This framework selects multiple in-batch negative samples based on model quality during training, encompassing both image-to-text and text-to-image objectives. Lastly, for multi-modal abstractive summarization, a modified recurrent alignment layer is proposed for the encoder, addressing the complexity of semantics. This layer incorporates a cross-modal attention module for aligning information and a renovation addition module for aggregating knowledge. Similarly, two auxiliary contrastive losses are utilized in the encoder, with separate objectives for image-to-text and text-to-image alignments.
In summary, this thesis presents several MTL frameworks specially designed for several single-modal and multi-modal applications. The experimental results have demonstrated the superior performance from these techniques by comparing with the state-of-the-art benchmarks on popular datasets for each application. It is shown that by utilizing MTL paradigm, each individual task can be effectively tackled through a single MTL model with better feature representations obtained from joint training.

Date of Award	Jul 2024
Original language	English
Awarding Institution	University of Nottingham
Supervisor	Qian Zhang (Supervisor) & Zheng LU (Supervisor)

Cite this

Documents

ChangSHU-20120194-Thesis
File: application/pdf, 13.1 MB
Type: Thesis-as examined