Learning regular expressions for interpretable medical text classification using a pool-based simulated annealing and word-vector models

Chaofan TU; Ruibin Bai; Zheng LU; Uwe  Aickelin; Peiming Ge; Jianshuang Zhao

Learning regular expressions for interpretable medical text classification using a pool-based simulated annealing and word-vector models

Chaofan TU, Ruibin Bai, Zheng LU, Uwe Aickelin, Peiming Ge, Jianshuang Zhao

School of Computer Science

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

Abstract

In this paper, we propose a rule-based engine composed of high quality and interpretable regular expressions for medical text classification. The regular expressions are auto generated by a constructive heuristic method and optimized using a Pool-based Simulated Annealing (PSA) approach. Although existing Deep Neural Network (DNN) methods present high quality performance in most Natural Language Processing (NLP) applications, the solutions are regarded as uninterpretable black boxes to humans. Therefore, rule-based methods are often introduced when interpretable solutions are needed, especially in the medical field. However, the construction of regular expressions can be extremely labor-intensive for large data sets. This research aims to reduce the manual efforts while maintaining high-quality solutions

Original language	English
Title of host publication	Proceedings of 9th Multi-disciplinary International Scheduling Conference: Theory and Applications, 12-15 December 2019, Ningbo, China
Publication status	Published - 2019
Event	9th Multi-disciplinary International Scheduling Conference: Theory and Applications - Ningbo, China Duration: 12 Dec 2019 → 15 Dec 2019 Conference number: 9th

Conference

Conference	9th Multi-disciplinary International Scheduling Conference: Theory and Applications
Abbreviated title	MISTA2019
Country/Territory	China
City	Ningbo
Period	12/12/19 → 15/12/19

Cite this

@inproceedings{686a89b1a79945239acf7135ba73939c,

title = "Learning regular expressions for interpretable medical text classification using a pool-based simulated annealing and word-vector models",

abstract = "In this paper, we propose a rule-based engine composed of high quality and interpretable regular expressions for medical text classification. The regular expressions are auto generated by a constructive heuristic method and optimized using a Pool-based Simulated Annealing (PSA) approach. Although existing Deep Neural Network (DNN) methods present high quality performance in most Natural Language Processing (NLP) applications, the solutions are regarded as uninterpretable black boxes to humans. Therefore, rule-based methods are often introduced when interpretable solutions are needed, especially in the medical field. However, the construction of regular expressions can be extremely labor-intensive for large data sets. This research aims to reduce the manual efforts while maintaining high-quality solutions",

author = "Chaofan TU and Ruibin Bai and Zheng LU and Uwe Aickelin and Peiming Ge and Jianshuang Zhao",

year = "2019",

language = "English",

booktitle = "Proceedings of 9th Multi-disciplinary International Scheduling Conference: Theory and Applications, 12-15 December 2019, Ningbo, China",

note = "9th Multi-disciplinary International Scheduling Conference: Theory and Applications , MISTA2019 ; Conference date: 12-12-2019 Through 15-12-2019",

}

TU, C, Bai, R , LU, Z, Aickelin, U, Ge, P & Zhao, J 2019, Learning regular expressions for interpretable medical text classification using a pool-based simulated annealing and word-vector models. in Proceedings of 9th Multi-disciplinary International Scheduling Conference: Theory and Applications, 12-15 December 2019, Ningbo, China . 9th Multi-disciplinary International Scheduling Conference: Theory and Applications , Ningbo, China, 12/12/19.

Learning regular expressions for interpretable medical text classification using a pool-based simulated annealing and word-vector models. / TU, Chaofan; Bai, Ruibin ; LU, Zheng et al.
Proceedings of 9th Multi-disciplinary International Scheduling Conference: Theory and Applications, 12-15 December 2019, Ningbo, China . 2019.

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Learning regular expressions for interpretable medical text classification using a pool-based simulated annealing and word-vector models

AU - TU, Chaofan

AU - Bai, Ruibin

AU - LU, Zheng

AU - Aickelin, Uwe

AU - Ge, Peiming

AU - Zhao, Jianshuang

N1 - Conference code: 9th

PY - 2019

Y1 - 2019

N2 - In this paper, we propose a rule-based engine composed of high quality and interpretable regular expressions for medical text classification. The regular expressions are auto generated by a constructive heuristic method and optimized using a Pool-based Simulated Annealing (PSA) approach. Although existing Deep Neural Network (DNN) methods present high quality performance in most Natural Language Processing (NLP) applications, the solutions are regarded as uninterpretable black boxes to humans. Therefore, rule-based methods are often introduced when interpretable solutions are needed, especially in the medical field. However, the construction of regular expressions can be extremely labor-intensive for large data sets. This research aims to reduce the manual efforts while maintaining high-quality solutions

AB - In this paper, we propose a rule-based engine composed of high quality and interpretable regular expressions for medical text classification. The regular expressions are auto generated by a constructive heuristic method and optimized using a Pool-based Simulated Annealing (PSA) approach. Although existing Deep Neural Network (DNN) methods present high quality performance in most Natural Language Processing (NLP) applications, the solutions are regarded as uninterpretable black boxes to humans. Therefore, rule-based methods are often introduced when interpretable solutions are needed, especially in the medical field. However, the construction of regular expressions can be extremely labor-intensive for large data sets. This research aims to reduce the manual efforts while maintaining high-quality solutions

M3 - Conference contribution

BT - Proceedings of 9th Multi-disciplinary International Scheduling Conference: Theory and Applications, 12-15 December 2019, Ningbo, China

T2 - 9th Multi-disciplinary International Scheduling Conference: Theory and Applications

Y2 - 12 December 2019 through 15 December 2019

ER -

Learning regular expressions for interpretable medical text classification using a pool-based simulated annealing and word-vector models

Abstract

Conference

Fingerprint

Cite this