The use of weighted graphs for large-scale genome analysis

Fang Zhou; Hannu Toivonen; Ross D. King

doi:10.1371/journal.pone.0089618

The use of weighted graphs for large-scale genome analysis

Fang Zhou, Hannu Toivonen, Ross D. King

School of Computer Science

Research output: Journal Publication › Article › peer-review

4 Citations (Scopus)

107 Downloads (Pure)

Abstract

There is an acute need for better tools to extract knowledge from the growing flood of sequence data. For example, thousands of complete genomes have been sequenced, and their metabolic networks inferred. Such data should enable a better understanding of evolution. However, most existing network analysis methods are based on pair-wise comparisons, and these do not scale to thousands of genomes. Here we propose the use of weighted graphs as a data structure to enable large-scale phylogenetic analysis of networks. We have developed three types of weighted graph for enzymes: taxonomic (these summarize phylogenetic importance), isoenzymatic (these summarize enzymatic variety/redundancy), and sequence-similarity (these summarize sequence conservation); and we applied these types of weighted graph to survey prokaryotic metabolism. To demonstrate the utility of this approach we have compared and contrasted the large-scale evolution of metabolism in Archaea and Eubacteria. Our results provide evidence for limits to the contingency of evolution.

Original language	English
Pages (from-to)	e89618/1-e89618/12
Journal	PLoS ONE
Volume	9
Issue number	3
DOIs	https://doi.org/10.1371/journal.pone.0089618
Publication status	Published - 11 Mar 2014

Access to Document

10.1371/journal.pone.0089618

725394Other version, 780 KB

https://doi.org/10.1371/journal.pone.0089618

Cite this

@article{a5811183dfff4b3aa5541f0bbe20effe,

title = "The use of weighted graphs for large-scale genome analysis",

abstract = "There is an acute need for better tools to extract knowledge from the growing flood of sequence data. For example, thousands of complete genomes have been sequenced, and their metabolic networks inferred. Such data should enable a better understanding of evolution. However, most existing network analysis methods are based on pair-wise comparisons, and these do not scale to thousands of genomes. Here we propose the use of weighted graphs as a data structure to enable large-scale phylogenetic analysis of networks. We have developed three types of weighted graph for enzymes: taxonomic (these summarize phylogenetic importance), isoenzymatic (these summarize enzymatic variety/redundancy), and sequence-similarity (these summarize sequence conservation); and we applied these types of weighted graph to survey prokaryotic metabolism. To demonstrate the utility of this approach we have compared and contrasted the large-scale evolution of metabolism in Archaea and Eubacteria. Our results provide evidence for limits to the contingency of evolution.",

author = "Fang Zhou and Hannu Toivonen and King, {Ross D.}",

year = "2014",

month = mar,

day = "11",

doi = "10.1371/journal.pone.0089618",

language = "English",

volume = "9",

pages = "e89618/1--e89618/12",

journal = "PLoS ONE",

issn = "1932-6203",

publisher = "Public Library of Science",

number = "3",

}

TY - JOUR

T1 - The use of weighted graphs for large-scale genome analysis

AU - Zhou, Fang

AU - Toivonen, Hannu

AU - King, Ross D.

PY - 2014/3/11

Y1 - 2014/3/11

N2 - There is an acute need for better tools to extract knowledge from the growing flood of sequence data. For example, thousands of complete genomes have been sequenced, and their metabolic networks inferred. Such data should enable a better understanding of evolution. However, most existing network analysis methods are based on pair-wise comparisons, and these do not scale to thousands of genomes. Here we propose the use of weighted graphs as a data structure to enable large-scale phylogenetic analysis of networks. We have developed three types of weighted graph for enzymes: taxonomic (these summarize phylogenetic importance), isoenzymatic (these summarize enzymatic variety/redundancy), and sequence-similarity (these summarize sequence conservation); and we applied these types of weighted graph to survey prokaryotic metabolism. To demonstrate the utility of this approach we have compared and contrasted the large-scale evolution of metabolism in Archaea and Eubacteria. Our results provide evidence for limits to the contingency of evolution.

AB - There is an acute need for better tools to extract knowledge from the growing flood of sequence data. For example, thousands of complete genomes have been sequenced, and their metabolic networks inferred. Such data should enable a better understanding of evolution. However, most existing network analysis methods are based on pair-wise comparisons, and these do not scale to thousands of genomes. Here we propose the use of weighted graphs as a data structure to enable large-scale phylogenetic analysis of networks. We have developed three types of weighted graph for enzymes: taxonomic (these summarize phylogenetic importance), isoenzymatic (these summarize enzymatic variety/redundancy), and sequence-similarity (these summarize sequence conservation); and we applied these types of weighted graph to survey prokaryotic metabolism. To demonstrate the utility of this approach we have compared and contrasted the large-scale evolution of metabolism in Archaea and Eubacteria. Our results provide evidence for limits to the contingency of evolution.

U2 - 10.1371/journal.pone.0089618

DO - 10.1371/journal.pone.0089618

M3 - Article

SN - 1932-6203

VL - 9

SP - e89618/1-e89618/12

JO - PLoS ONE

JF - PLoS ONE

IS - 3

ER -

The use of weighted graphs for large-scale genome analysis

Abstract

Access to Document

Fingerprint

Cite this