Extraction by example: Induction of structural rules for the analysis of molecular sequence data from heterogeneous sources

Olivo Miotto, Tin Wee Tan, Vladimir Brusic

Research output: Journal PublicationConference articlepeer-review

3 Citations (Scopus)

Abstract

Biological research requires information from multiple data sources that use a variety of database-specific formats. Manual gathering of information is time consuming and error-prone, making automated data aggregation a compelling option for large studies. We describe a method for extracting information from diverse sources that involves structural rules specified by example. We developed a system for aggregation of biological knowledge (ABK) and used it to conduct an epidemiological study of dengue virus (DENV) sequences. Additional information on geographical origin and isolation date is critical for understanding evolutionary relationships, but this data is inconsistently structured in database entries. Using three public databases, we found that structural rules can be used successfully even when applied on inconsistently structured data that is distributed across multiple fields. High reusability, combined with the ability to integrate analysis tools, make this method suitable for a wide variety of large-scale studies involving viral sequences.

Original languageEnglish
Pages (from-to)398-405
Number of pages8
JournalLecture Notes in Computer Science
Volume3578
DOIs
Publication statusPublished - 2005
Externally publishedYes
Event6th International Conference on Intelligent Data Engineering and Automated Learning - IDEAL 2005 - Brisbane, Australia
Duration: 6 Jul 20058 Jul 2005

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Extraction by example: Induction of structural rules for the analysis of molecular sequence data from heterogeneous sources'. Together they form a unique fingerprint.

Cite this