View: |
Part 1: Document Description
|
Citation |
|
---|---|
Title: |
GraphML files for protein sequence networks of expansin homologues |
Identification Number: |
doi:10.18419/darus-624 |
Distributor: |
DaRUS |
Date of Distribution: |
2020-01-30 |
Version: |
1 |
Bibliographic Citation: |
Lohoff, Caroline, 2020, "GraphML files for protein sequence networks of expansin homologues", https://doi.org/10.18419/darus-624, DaRUS, V1 |
Citation |
|
Title: |
GraphML files for protein sequence networks of expansin homologues |
Identification Number: |
doi:10.18419/darus-624 |
Authoring Entity: |
Lohoff, Caroline (Universität Stuttgart) |
Distributor: |
DaRUS |
Access Authority: |
Pleiss, Jürgen |
Depositor: |
Buchholz, Patrick C. F. |
Date of Deposit: |
2020-01-27 |
Holdings Information: |
https://doi.org/10.18419/darus-624 |
Study Scope |
|
Keywords: |
Medicine, Health and Life Sciences, protein sequence, graph, network, amino acid sequence, alignment |
Abstract: |
GraphML files for undirected weighted graphs with nodes that represent protein sequences of expansin homologues. Protein sequences were clustered by a threshold of sequence identity to derive representative sequences.Pairwise sequence identity between two sequences was derived from global Needleman-Wunsch alignment. Protein sequence networks were generated with edge weights of pairwise sequence identity, filtered by a predefined threshold. Metadata of the nodes (e.g. annotations) and of the edges (the edge weights) were summarized in GraphML files. |
Notes: |
The GraphML attributes for the edges comprise the edge weights (pairwise sequence identity, "weight"). The GraphML attributes for the nodes comprise the identifiers from the ExED ("sequence_id", "protein_id", "hfam_id", and "sfam_id" for sequence, protein, homologous family and superfamily identifiers, respectively), the NCBI taxonomy ID ("tax_id"), the annotated (organism) source name ("tax_name"), the taxonomic lineage of the source organism ("lineage", with taxa separated by "<--"), and the length of the amino acid sequence ("sequence_length"). In addition, suggested color names are given for both fill color and border color of each node ("color" and "color_border"). |
Methodology and Processing |
|
Sources Statement |
|
Data Sources: |
Expansin Engineering Database (<a href="https://exed.biocatnet.de/">https://exed.biocatnet.de/</a>) |
Carbohydrate-Active enZYmes Database (<a href="http://www.cazy.org/">http://www.cazy.org/</a>) |
|
Pfam Database (<a href="https://pfam.xfam.org/">https://pfam.xfam.org/</a>) |
|
Data Access |
|
Other Study Description Materials |
|
Related Publications |
|
Citation |
|
Title: |
Lohoff C., Buchholz P. C. F., Le Roes-Hill M. & Pleiss J. (2020). The Expansin Engineering Database: a navigation and classification tool for expansins and homologues. Proteins: Structure, Function, and Bioinformatics 89:2. |
Identification Number: |
10.1002/prot.26001 |
Bibliographic Citation: |
Lohoff C., Buchholz P. C. F., Le Roes-Hill M. & Pleiss J. (2020). The Expansin Engineering Database: a navigation and classification tool for expansins and homologues. Proteins: Structure, Function, and Bioinformatics 89:2. |
Label: |
CBM63_Sfams123_210-300_90_50.graphml |
Text: |
Protein sequence network for the bacterial, fungal, and plant superfamily from the Expansin Engineering Database (for sequences with length between 210 and 300 residues) including members of the CBM63 family (downloaded from the CAZy database on June 3, 2019). The GraphML file contains representative nodes (clustered by 0.9 in CD-Hit) connected by at least 50% pairwise sequence identity (edge weights derived from Needleman-Wunsch alignments). |
Notes: |
text/xml-graphml |
Label: |
GH45_Sfam_1234_Ndomain_90_30.graphml |
Text: |
Protein sequence network for the bacterial, fungal, plant and N-terminal domains superfamily from the Expansin Engineering Database including members of the GH45 family (from Pfam, version 32.0, accession PF02015). The GraphML file contains representative nodes (clustered by 0.9 in CD-Hit) connected by at least 30% pairwise sequence identity (edge weights derived from Needleman-Wunsch alignments). |
Notes: |
text/xml-graphml |
Label: |
Ndomains_1234_CBM_09_60identity.graphml |
Text: |
Protein sequence network for N-terminal expansin domains from the bacterial, fungal, plant and N-terminal domains superfamily from the Expansin Engineering Database. The GraphML file contains representative nodes (clustered by 0.9 in USEARCH/ UCLUST) connected by at least 60% pairwise sequence identity (edge weights derived from Needleman-Wunsch alignments). |
Notes: |
text/xml-graphml |
Label: |
Sfams_123_210-300_90_50.graphml |
Text: |
Protein sequence network for N-terminal expansin domains from the bacterial, fungal, and plant superfamily from the Expansin Engineering Database, for sequences with length between 210 and 300 residues.The GraphML file contains representative nodes (clustered by 0.9 in USEARCH/ UCLUST) connected by at least 50% pairwise sequence identity (edge weights derived from Needleman-Wunsch alignments). |
Notes: |
text/xml-graphml |
Label: |
Sfam_123_Cdomain_90_60.graphml |
Text: |
Protein sequence network for C-terminal expansin domains from the bacterial, fungal, and plant superfamily from the Expansin Engineering Database. The GraphML file contains representative nodes (clustered by 0.9 in USEARCH/ UCLUST) connected by at least 60% pairwise sequence identity (edge weights derived from Needleman-Wunsch alignments). |
Notes: |
text/xml-graphml |