View: |
Part 1: Document Description
|
Citation |
|
---|---|
Title: |
Sequence cross-references and taxonomic lineage for glycoside hydrolase family 19 |
Identification Number: |
doi:10.18419/darus-1163 |
Distributor: |
DaRUS |
Date of Distribution: |
2021-05-20 |
Version: |
1 |
Bibliographic Citation: |
Buchholz, Patrick C. F., 2021, "Sequence cross-references and taxonomic lineage for glycoside hydrolase family 19", https://doi.org/10.18419/darus-1163, DaRUS, V1, UNF:6:zi8TRxkq1C/pCN14pXTA0Q== [fileUNF] |
Citation |
|
Title: |
Sequence cross-references and taxonomic lineage for glycoside hydrolase family 19 |
Identification Number: |
doi:10.18419/darus-1163 |
Authoring Entity: |
Buchholz, Patrick C. F. (Universität Stuttgart) |
Distributor: |
DaRUS |
Access Authority: |
Pleiss, Jürgen |
Depositor: |
Buchholz, Patrick C. F. |
Date of Deposit: |
2020-11-30 |
Holdings Information: |
https://doi.org/10.18419/darus-1163 |
Study Scope |
|
Keywords: |
Medicine, Health and Life Sciences, protein sequence, protein structure, taxonomy, lineage, source organism, amino acid sequence |
Abstract: |
The Glycoside Hydrolase 19 Engineering Database (GH19ED) contains information on protein sequences and structures of glycoside hydrolases from family 19. This dataset lists cross-references to the National Center for Biotechnology Information (NCBI), cross-references to the Protein Data Bank (PDB) and the taxonomic lineage for each sequence entry in the GH19ED. |
Notes: |
The tab-separated tabular file comprises nine columns:<br> (1) the sequence identifier from the GH19ED, integer (Sequence_id),<br> (2) the protein sequence accessions from the NCBI, semicolon-separated (NCBI_accessions),<br> (3) the PDB accessions, semicolon-separated (PDB_accessions),<br> (4) the name of the source or source organism (Source_name),<br> (5) the NCBI taxonomy identifier for the source (NCBI_taxonomy_id),<br> (6) the taxonomic lineage from the lowest to the highest rank, as inferred from NCBI taxonomy (Lineage),<br> (7) the "protein" identifier from the GH19ED, integer (Protein_id),<br> (8) the "homologous family" (or group) identifier from the GH19ED, integer (Homologous_family_id),<br> (9) the "superfamily" (or subfamily) identifier from the GH19ED, integer (Superfamily_id). For sequence entries assigned to more than one source organism name, only the first taxonomic lineage found in the GH19ED is listed. |
Methodology and Processing |
|
Sources Statement |
|
Data Sources: |
<a href="https://gh19ed.biocatnet.de/">https://gh19ed.biocatnet.de/</a> |
<a href="https://www.ncbi.nlm.nih.gov/protein">https://www.ncbi.nlm.nih.gov/protein</a> |
|
<a href="https://www.rcsb.org/">https://www.rcsb.org/</a> |
|
<a href="https://www.ncbi.nlm.nih.gov/taxonomy">https://www.ncbi.nlm.nih.gov/taxonomy</a> |
|
Data Access |
|
Other Study Description Materials |
|
Related Publications |
|
Citation |
|
Title: |
Orlando M., Buchholz P. C. F., Lotti M. & Pleiss J. (2020). The GH19 Engineering Database: an extended classification system for exploring the properties of sequence space and protein evolution. (submitted) |
Bibliographic Citation: |
Orlando M., Buchholz P. C. F., Lotti M. & Pleiss J. (2020). The GH19 Engineering Database: an extended classification system for exploring the properties of sequence space and protein evolution. (submitted) |
File Description--f33717 |
|
File: GH19ED.tab |
|
|
|
Notes: |
UNF:6:zi8TRxkq1C/pCN14pXTA0Q== |
List of Variables: |
|
Variables |
|
f33717 Location: |
Summary Statistics: Mean 11820.976759716817; Min. 1.0; Valid 22461.0; StDev 6835.113144441923; Max. 23858.0 Variable Format: numeric Notes: UNF:6:tr2WNacahxpTOY90CWiKWA== |
f33717 Location: |
Variable Format: character Notes: UNF:6:sw7YrE4Qu4aalqUUXlKShQ== |
f33717 Location: |
Variable Format: character Notes: UNF:6:8paabhcg0KMzrnxHqu4s2Q== |
f33717 Location: |
Variable Format: character Notes: UNF:6:NLBXar37N1eA9yj+BHlpmw== |
f33717 Location: |
Summary Statistics: Mean 522380.47415521316; Valid 22461.0; Min. 7.0; StDev 761131.9421858811; Max. 2563569.0; Variable Format: numeric Notes: UNF:6:Q3rSJ34TQIMAew2tbzorZg== |
f33717 Location: |
Variable Format: character Notes: UNF:6:iIpo3pjBbeWpoM2yG8pCUA== |
f33717 Location: |
Summary Statistics: Valid 22461.0; Max. 23856.0; Min. 1.0; Mean 10857.60647344091; StDev 7332.244548359417 Variable Format: numeric Notes: UNF:6:dAFBJ8MUPKrA+JwUeitddA== |
f33717 Location: |
Summary Statistics: Max. 55.0; StDev 16.810820731203552; Mean 20.40844129825125; Min. 2.0; Valid 22461.0 Variable Format: numeric Notes: UNF:6:cthdMl/FqSE+EfUrrftkRA== |
f33717 Location: |
Summary Statistics: Valid 22461.0; StDev 0.6702829784262895; Mean 1.7500556520191046; Min. 1.0; Max. 3.0; Variable Format: numeric Notes: UNF:6:OG748kLSgJ+BDkFY2y3vCg== |