Supporting the curation of biological databases with reusable text mining.

Olivo Miotto; Tin Wee Tan; Vladimir Brusic

Supporting the curation of biological databases with reusable text mining.

Olivo Miotto, Tin Wee Tan, Vladimir Brusic

Research output: Journal Publication › Article › peer-review

21 Citations (Scopus)

Abstract

Curators of biological databases transfer knowledge from scientific publications, a laborious and expensive manual process. Machine learning algorithms can reduce the workload of curators by filtering relevant biomedical literature, though their widespread adoption will depend on the availability of intuitive tools that can be configured for a variety of tasks. We propose a new method for supporting curators by means of document categorization, and describe the architecture of a curator-oriented tool implementing this method using techniques that require no computational linguistic or programming expertise. To demonstrate the feasibility of this approach, we prototyped an application of this method to support a real curation task: identifying PubMed abstracts that contain allergen cross-reactivity information. We tested the performance of two different classifier algorithms (CART and ANN), applied to both composite and single-word features, using several feature scoring functions. Both classifiers exceeded our performance targets, the ANN classifier yielding the best results. These results show that the method we propose can deliver the level of performance needed to assist database curation.

Original language	English
Pages (from-to)	32-44
Number of pages	13
Journal	Genome informatics. International Conference on Genome Informatics
Volume	16
Issue number	2
Publication status	Published - 2005
Externally published	Yes

ASJC Scopus subject areas

General Medicine

Cite this

@article{dbd426bade264eb58f1825069c7ea408,

title = "Supporting the curation of biological databases with reusable text mining.",

abstract = "Curators of biological databases transfer knowledge from scientific publications, a laborious and expensive manual process. Machine learning algorithms can reduce the workload of curators by filtering relevant biomedical literature, though their widespread adoption will depend on the availability of intuitive tools that can be configured for a variety of tasks. We propose a new method for supporting curators by means of document categorization, and describe the architecture of a curator-oriented tool implementing this method using techniques that require no computational linguistic or programming expertise. To demonstrate the feasibility of this approach, we prototyped an application of this method to support a real curation task: identifying PubMed abstracts that contain allergen cross-reactivity information. We tested the performance of two different classifier algorithms (CART and ANN), applied to both composite and single-word features, using several feature scoring functions. Both classifiers exceeded our performance targets, the ANN classifier yielding the best results. These results show that the method we propose can deliver the level of performance needed to assist database curation.",

author = "Olivo Miotto and Tan, {Tin Wee} and Vladimir Brusic",

note = "Copyright: This record is sourced from MEDLINE{\textregistered}/PubMed{\textregistered}, a database of the U.S. National Library of Medicine",

year = "2005",

language = "English",

volume = "16",

pages = "32--44",

journal = "Genome informatics. International Conference on Genome Informatics",

issn = "0919-9454",

publisher = "Universal Academy Press",

number = "2",

}

TY - JOUR

T1 - Supporting the curation of biological databases with reusable text mining.

AU - Miotto, Olivo

AU - Tan, Tin Wee

AU - Brusic, Vladimir

N1 - Copyright: This record is sourced from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine

PY - 2005

Y1 - 2005

N2 - Curators of biological databases transfer knowledge from scientific publications, a laborious and expensive manual process. Machine learning algorithms can reduce the workload of curators by filtering relevant biomedical literature, though their widespread adoption will depend on the availability of intuitive tools that can be configured for a variety of tasks. We propose a new method for supporting curators by means of document categorization, and describe the architecture of a curator-oriented tool implementing this method using techniques that require no computational linguistic or programming expertise. To demonstrate the feasibility of this approach, we prototyped an application of this method to support a real curation task: identifying PubMed abstracts that contain allergen cross-reactivity information. We tested the performance of two different classifier algorithms (CART and ANN), applied to both composite and single-word features, using several feature scoring functions. Both classifiers exceeded our performance targets, the ANN classifier yielding the best results. These results show that the method we propose can deliver the level of performance needed to assist database curation.

AB - Curators of biological databases transfer knowledge from scientific publications, a laborious and expensive manual process. Machine learning algorithms can reduce the workload of curators by filtering relevant biomedical literature, though their widespread adoption will depend on the availability of intuitive tools that can be configured for a variety of tasks. We propose a new method for supporting curators by means of document categorization, and describe the architecture of a curator-oriented tool implementing this method using techniques that require no computational linguistic or programming expertise. To demonstrate the feasibility of this approach, we prototyped an application of this method to support a real curation task: identifying PubMed abstracts that contain allergen cross-reactivity information. We tested the performance of two different classifier algorithms (CART and ANN), applied to both composite and single-word features, using several feature scoring functions. Both classifiers exceeded our performance targets, the ANN classifier yielding the best results. These results show that the method we propose can deliver the level of performance needed to assist database curation.

UR - http://www.scopus.com/inward/record.url?scp=33748521231&partnerID=8YFLogxK

M3 - Article

C2 - 16901087

AN - SCOPUS:33748521231

SN - 0919-9454

VL - 16

SP - 32

EP - 44

JO - Genome informatics. International Conference on Genome Informatics

JF - Genome informatics. International Conference on Genome Informatics

IS - 2

ER -

Supporting the curation of biological databases with reusable text mining.

Abstract

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this