LSCluster, a large-scale sequence clustering and aligning software for use in partial identity mapping and splice-variant analysis

Holger Husi; Richard J. Skipworth; Kenneth C.H. Fearon; James A. Ross

doi:10.1016/j.jprot.2013.04.006

LSCluster, a large-scale sequence clustering and aligning software for use in partial identity mapping and splice-variant analysis

Holger Husi, Richard J. Skipworth, Kenneth C.H. Fearon, James A. Ross

Research output: Journal Publication › Article › peer-review

7 Citations (Scopus)

Abstract

Many sequence analyses and multiple sequence alignment tools are widely used in biological research and are well described. However, large-scale proteome-wide analysis to identify potential splice-variants, describe the sequence differences compared to a progenitor sequence and cluster those sequences into individual groups for further analysis is a difficult task with the tools available, and a desktop-based, stand-alone search engine with the capabilities to align and cluster thousands of sequences and present the output in a deprecated format has been lacking. We have developed a novel software named LSCluster (Large-Scale CLUSTERing) which allows users to group tens of thousands of sequences based on sequence alignments or partial identity mapping, and can be used specifically for the detection of splicing variants and other pairs of sequences sharing identical fragments. One of the unique features of LSCluster is its ability to display the alignment output as a deprecated string thereby listing only differences in aligned sequences. The software (current version 2.0) is freely available through the PADB (Proteomic Analysis DataBase) initiative at www.PADB.org. Biological significance: Large-scale proteome-wide analysis to identify potential splice-variants, describe the sequence differences compared to a progenitor sequence and cluster those sequences into individual groups for further analysis is a difficult task with the tools presently available. This work introduces a desktop-based, stand-alone search engine with the capabilities to align and cluster thousands of sequences and present the output in a deprecated format. We have developed a novel software named LSCluster (Large-Scale CLUSTERing) which allows users to group tens of thousands of sequences based on sequence alignments or partial identity mapping which can be used specifically for the detection of splicing variants and other pairs of sequences sharing identical fragments. One of the unique features of LSCluster is the ability to display the alignment output as a deprecated string listing only differences in aligned sequences. The software (current version 2.0) is freely available through the PADB (Proteomic Analysis DataBase) initiative at www.PADB.org.

Original language	English
Pages (from-to)	185-189
Number of pages	5
Journal	Journal of Proteomics
Volume	84
DOIs	https://doi.org/10.1016/j.jprot.2013.04.006
Publication status	Published - 2 Jun 2013
Externally published	Yes

Keywords

Sequence alignment
Sequence clustering

ASJC Scopus subject areas

Biophysics
Biochemistry

Access to Document

10.1016/j.jprot.2013.04.006

Cite this

@article{9adb56aafcdd4191999bf93182430562,

title = "LSCluster, a large-scale sequence clustering and aligning software for use in partial identity mapping and splice-variant analysis",

abstract = "Many sequence analyses and multiple sequence alignment tools are widely used in biological research and are well described. However, large-scale proteome-wide analysis to identify potential splice-variants, describe the sequence differences compared to a progenitor sequence and cluster those sequences into individual groups for further analysis is a difficult task with the tools available, and a desktop-based, stand-alone search engine with the capabilities to align and cluster thousands of sequences and present the output in a deprecated format has been lacking. We have developed a novel software named LSCluster (Large-Scale CLUSTERing) which allows users to group tens of thousands of sequences based on sequence alignments or partial identity mapping, and can be used specifically for the detection of splicing variants and other pairs of sequences sharing identical fragments. One of the unique features of LSCluster is its ability to display the alignment output as a deprecated string thereby listing only differences in aligned sequences. The software (current version 2.0) is freely available through the PADB (Proteomic Analysis DataBase) initiative at www.PADB.org. Biological significance: Large-scale proteome-wide analysis to identify potential splice-variants, describe the sequence differences compared to a progenitor sequence and cluster those sequences into individual groups for further analysis is a difficult task with the tools presently available. This work introduces a desktop-based, stand-alone search engine with the capabilities to align and cluster thousands of sequences and present the output in a deprecated format. We have developed a novel software named LSCluster (Large-Scale CLUSTERing) which allows users to group tens of thousands of sequences based on sequence alignments or partial identity mapping which can be used specifically for the detection of splicing variants and other pairs of sequences sharing identical fragments. One of the unique features of LSCluster is the ability to display the alignment output as a deprecated string listing only differences in aligned sequences. The software (current version 2.0) is freely available through the PADB (Proteomic Analysis DataBase) initiative at www.PADB.org.",

keywords = "Sequence alignment, Sequence clustering",

author = "Holger Husi and Skipworth, {Richard J.} and Fearon, {Kenneth C.H.} and Ross, {James A.}",

year = "2013",

month = jun,

day = "2",

doi = "10.1016/j.jprot.2013.04.006",

language = "English",

volume = "84",

pages = "185--189",

journal = "Journal of Proteomics",

issn = "1874-3919",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - LSCluster, a large-scale sequence clustering and aligning software for use in partial identity mapping and splice-variant analysis

AU - Husi, Holger

AU - Skipworth, Richard J.

AU - Fearon, Kenneth C.H.

AU - Ross, James A.

PY - 2013/6/2

Y1 - 2013/6/2

N2 - Many sequence analyses and multiple sequence alignment tools are widely used in biological research and are well described. However, large-scale proteome-wide analysis to identify potential splice-variants, describe the sequence differences compared to a progenitor sequence and cluster those sequences into individual groups for further analysis is a difficult task with the tools available, and a desktop-based, stand-alone search engine with the capabilities to align and cluster thousands of sequences and present the output in a deprecated format has been lacking. We have developed a novel software named LSCluster (Large-Scale CLUSTERing) which allows users to group tens of thousands of sequences based on sequence alignments or partial identity mapping, and can be used specifically for the detection of splicing variants and other pairs of sequences sharing identical fragments. One of the unique features of LSCluster is its ability to display the alignment output as a deprecated string thereby listing only differences in aligned sequences. The software (current version 2.0) is freely available through the PADB (Proteomic Analysis DataBase) initiative at www.PADB.org. Biological significance: Large-scale proteome-wide analysis to identify potential splice-variants, describe the sequence differences compared to a progenitor sequence and cluster those sequences into individual groups for further analysis is a difficult task with the tools presently available. This work introduces a desktop-based, stand-alone search engine with the capabilities to align and cluster thousands of sequences and present the output in a deprecated format. We have developed a novel software named LSCluster (Large-Scale CLUSTERing) which allows users to group tens of thousands of sequences based on sequence alignments or partial identity mapping which can be used specifically for the detection of splicing variants and other pairs of sequences sharing identical fragments. One of the unique features of LSCluster is the ability to display the alignment output as a deprecated string listing only differences in aligned sequences. The software (current version 2.0) is freely available through the PADB (Proteomic Analysis DataBase) initiative at www.PADB.org.

AB - Many sequence analyses and multiple sequence alignment tools are widely used in biological research and are well described. However, large-scale proteome-wide analysis to identify potential splice-variants, describe the sequence differences compared to a progenitor sequence and cluster those sequences into individual groups for further analysis is a difficult task with the tools available, and a desktop-based, stand-alone search engine with the capabilities to align and cluster thousands of sequences and present the output in a deprecated format has been lacking. We have developed a novel software named LSCluster (Large-Scale CLUSTERing) which allows users to group tens of thousands of sequences based on sequence alignments or partial identity mapping, and can be used specifically for the detection of splicing variants and other pairs of sequences sharing identical fragments. One of the unique features of LSCluster is its ability to display the alignment output as a deprecated string thereby listing only differences in aligned sequences. The software (current version 2.0) is freely available through the PADB (Proteomic Analysis DataBase) initiative at www.PADB.org. Biological significance: Large-scale proteome-wide analysis to identify potential splice-variants, describe the sequence differences compared to a progenitor sequence and cluster those sequences into individual groups for further analysis is a difficult task with the tools presently available. This work introduces a desktop-based, stand-alone search engine with the capabilities to align and cluster thousands of sequences and present the output in a deprecated format. We have developed a novel software named LSCluster (Large-Scale CLUSTERing) which allows users to group tens of thousands of sequences based on sequence alignments or partial identity mapping which can be used specifically for the detection of splicing variants and other pairs of sequences sharing identical fragments. One of the unique features of LSCluster is the ability to display the alignment output as a deprecated string listing only differences in aligned sequences. The software (current version 2.0) is freely available through the PADB (Proteomic Analysis DataBase) initiative at www.PADB.org.

KW - Sequence alignment

KW - Sequence clustering

UR - http://www.scopus.com/inward/record.url?scp=84877835534&partnerID=8YFLogxK

U2 - 10.1016/j.jprot.2013.04.006

DO - 10.1016/j.jprot.2013.04.006

M3 - Article

C2 - 23587666

AN - SCOPUS:84877835534

SN - 1874-3919

VL - 84

SP - 185

EP - 189

JO - Journal of Proteomics

JF - Journal of Proteomics

ER -

LSCluster, a large-scale sequence clustering and aligning software for use in partial identity mapping and splice-variant analysis

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this