TY - JOUR
T1 - LSCluster, a large-scale sequence clustering and aligning software for use in partial identity mapping and splice-variant analysis
AU - Husi, Holger
AU - Skipworth, Richard J.
AU - Fearon, Kenneth C.H.
AU - Ross, James A.
PY - 2013/6/2
Y1 - 2013/6/2
N2 - Many sequence analyses and multiple sequence alignment tools are widely used in biological research and are well described. However, large-scale proteome-wide analysis to identify potential splice-variants, describe the sequence differences compared to a progenitor sequence and cluster those sequences into individual groups for further analysis is a difficult task with the tools available, and a desktop-based, stand-alone search engine with the capabilities to align and cluster thousands of sequences and present the output in a deprecated format has been lacking. We have developed a novel software named LSCluster (Large-Scale CLUSTERing) which allows users to group tens of thousands of sequences based on sequence alignments or partial identity mapping, and can be used specifically for the detection of splicing variants and other pairs of sequences sharing identical fragments. One of the unique features of LSCluster is its ability to display the alignment output as a deprecated string thereby listing only differences in aligned sequences. The software (current version 2.0) is freely available through the PADB (Proteomic Analysis DataBase) initiative at www.PADB.org. Biological significance: Large-scale proteome-wide analysis to identify potential splice-variants, describe the sequence differences compared to a progenitor sequence and cluster those sequences into individual groups for further analysis is a difficult task with the tools presently available. This work introduces a desktop-based, stand-alone search engine with the capabilities to align and cluster thousands of sequences and present the output in a deprecated format. We have developed a novel software named LSCluster (Large-Scale CLUSTERing) which allows users to group tens of thousands of sequences based on sequence alignments or partial identity mapping which can be used specifically for the detection of splicing variants and other pairs of sequences sharing identical fragments. One of the unique features of LSCluster is the ability to display the alignment output as a deprecated string listing only differences in aligned sequences. The software (current version 2.0) is freely available through the PADB (Proteomic Analysis DataBase) initiative at www.PADB.org.
AB - Many sequence analyses and multiple sequence alignment tools are widely used in biological research and are well described. However, large-scale proteome-wide analysis to identify potential splice-variants, describe the sequence differences compared to a progenitor sequence and cluster those sequences into individual groups for further analysis is a difficult task with the tools available, and a desktop-based, stand-alone search engine with the capabilities to align and cluster thousands of sequences and present the output in a deprecated format has been lacking. We have developed a novel software named LSCluster (Large-Scale CLUSTERing) which allows users to group tens of thousands of sequences based on sequence alignments or partial identity mapping, and can be used specifically for the detection of splicing variants and other pairs of sequences sharing identical fragments. One of the unique features of LSCluster is its ability to display the alignment output as a deprecated string thereby listing only differences in aligned sequences. The software (current version 2.0) is freely available through the PADB (Proteomic Analysis DataBase) initiative at www.PADB.org. Biological significance: Large-scale proteome-wide analysis to identify potential splice-variants, describe the sequence differences compared to a progenitor sequence and cluster those sequences into individual groups for further analysis is a difficult task with the tools presently available. This work introduces a desktop-based, stand-alone search engine with the capabilities to align and cluster thousands of sequences and present the output in a deprecated format. We have developed a novel software named LSCluster (Large-Scale CLUSTERing) which allows users to group tens of thousands of sequences based on sequence alignments or partial identity mapping which can be used specifically for the detection of splicing variants and other pairs of sequences sharing identical fragments. One of the unique features of LSCluster is the ability to display the alignment output as a deprecated string listing only differences in aligned sequences. The software (current version 2.0) is freely available through the PADB (Proteomic Analysis DataBase) initiative at www.PADB.org.
KW - Sequence alignment
KW - Sequence clustering
UR - http://www.scopus.com/inward/record.url?scp=84877835534&partnerID=8YFLogxK
U2 - 10.1016/j.jprot.2013.04.006
DO - 10.1016/j.jprot.2013.04.006
M3 - Article
C2 - 23587666
AN - SCOPUS:84877835534
SN - 1874-3919
VL - 84
SP - 185
EP - 189
JO - Journal of Proteomics
JF - Journal of Proteomics
ER -