Regular expression based medical text classification using constructive heuristic approach

Menglin Cui, Ruibin Bai, Zheng Lu, Xiang Li, Uwe Aickelin, Peiming Ge

Research output: Journal PublicationArticlepeer-review

25 Citations (Scopus)


Medical text classification assigns medical related text into different categories such as topics or disease types. Machine learning based techniques have been widely used to perform such tasks despite the obvious drawback in such 'black box' approach, leaving no easy way to fine-tune the resultant model for better performance. We propose a novel constructive heuristic approach to generate a set of regular expressions that can be used as effective text classifiers. The main innovation of our approach is that we develop a novel regular expression based text classifier with both satisfactory classification performance and excellent interpretability. We evaluate our framework on real-world medical data provided by our collaborator, one of the largest online healthcare providers in the market, and observe the high performance and consistency of this approach. Experimental results show that the machine-generated regular expressions can be effectively used in conjunction with machine learning techniques to perform medical text classification tasks. The proposed methodology improves the performance of baseline methods (Naive Bayes and Support Vector Machines) by 9% in precision and 4.5% in recall. We also evaluate the performance of modified regular expressions by human experts and demonstrate the potential of practical applications using the proposed method.

Original languageEnglish
Article number8864974
Pages (from-to)147892-147904
Number of pages13
JournalIEEE Access
Publication statusPublished - 2019


  • Regular expressions
  • constructive heuristic method
  • text classification

ASJC Scopus subject areas

  • Computer Science (all)
  • Materials Science (all)
  • Engineering (all)


Dive into the research topics of 'Regular expression based medical text classification using constructive heuristic approach'. Together they form a unique fingerprint.

Cite this