{"dcterms:modified":"2023-12-08","dcterms:creator":"DaRUS","@type":"ore:ResourceMap","@id":"https://nfldevdataverse2.rus.uni-stuttgart.de/api/datasets/export?exporter=OAI_ORE&persistentId=https://doi.org/10.18419/darus-624","ore:describes":{"citation:datasetContact":{"citation:datasetContactName":"Pleiss, Jürgen","citation:datasetContactAffiliation":"Universität Stuttgart"},"process:processMethods":[{"processMethodsName":"UCLUST","process:processMethodsPars":"sequence identity threshold","process:processMethodsDescription":"sequence clustering using cluster_fast command from USEARCH"},{"processMethodsName":"CD-HIT","process:processMethodsPars":"sequence identity threshold, word length","process:processMethodsDescription":"sequence clustering"},{"processMethodsName":"Needleman-Wunsch alignment","process:processMethodsPars":"gap opening penalty, gap extension penalty","process:processMethodsDescription":"pairwise global sequence alignment"}],"citation:keyword":[{"citation:keywordValue":"protein sequence","citation:keywordVocabulary":"EDAM","citation:keywordVocabularyURI":"http://edamontology.org/data_2976"},{"citation:keywordValue":"graph"},{"citation:keywordValue":"network"},{"citation:keywordValue":"amino acid sequence","citation:keywordVocabulary":"NCIT","citation:keywordVocabularyURI":"http://purl.obolibrary.org/obo/NCIT_C13187"},{"citation:keywordValue":"alignment","citation:keywordVocabulary":"EDAM","citation:keywordVocabularyURI":"http://edamontology.org/data_1916"}],"citation:dsDescription":{"citation:dsDescriptionValue":"GraphML files for undirected weighted graphs with nodes that represent protein sequences of expansin homologues. Protein sequences were clustered by a threshold of sequence identity to derive representative sequences.Pairwise sequence identity between two sequences was derived from global Needleman-Wunsch alignment. Protein sequence networks were generated with edge weights of pairwise sequence identity, filtered by a predefined threshold. Metadata of the nodes (e.g. annotations) and of the edges (the edge weights) were summarized in GraphML files."},"processSoftware":[{"processSoftwareName":"USEARCH","processSoftwareCitation":"Edgar, R. C. (2010). Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26(19), 2460-2461. https://doi.org/10.1093/bioinformatics/btq461","processSoftwareVersion":"11.0.667","processSoftwareURL":"https://www.drive5.com/usearch/"},{"processSoftwareName":"CD-HIT","processSoftwareCitation":"Li, W., & Godzik, A. (2006). Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 22(13):1658-1659. https://doi.org/10.1093/bioinformatics/btl158","processSoftwareVersion":"4.7","processSoftwareURL":"http://weizhongli-lab.org/cd-hit/"},{"processSoftwareName":"EMBOSS","processSoftwareCitation":"Rice, P., Longden, L., & Bleasby, A. (2000). EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics. 16(6):276-277. https://doi.org/10.1016/S0168-9525(00)02024-2","processSoftwareVersion":"6.6.0","processSoftwareURL":"http://emboss.sourceforge.net/"},{"processSoftwareName":"NetworkX","processSoftwareCitation":"Hagberg, A. A., Schult, D. A., & Swart, P. J. (2008). Exploring network structure, dynamics, and function using NetworkX. 7th Python in Science Conference (SciPy 2008). http://conference.scipy.org/proceedings/scipy2008/paper_2/","processSoftwareVersion":"1.9","processSoftwareURL":"https://networkx.github.io/"},{"processSoftwareName":"Pfam","processSoftwareCitation":"Sara El-Gebali, Jaina Mistry, Alex Bateman, Sean R Eddy, Aurélien Luciani, Simon C Potter, Matloob Qureshi, Lorna J Richardson, Gustavo A Salazar, Alfredo Smart, Erik L L Sonnhammer, Layla Hirsh, Lisanna Paladin, Damiano Piovesan, Silvio C E Tosatto, Robert D Finn, The Pfam protein families database in 2019, Nucleic Acids Research, Volume 47, Issue D1, 08 January 2019, Pages D427-D432. https://doi.org/10.1093/nar/gky995","processSoftwareVersion":"32.0","processSoftwareURL":"https://pfam.xfam.org/"},{"processSoftwareName":"Carbohydrate-Active enZYmes Database (CAZy)","processSoftwareCitation":"Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B (2014) The Carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Research, Volume 42, Issue D1, 1 January 2014, Pages D490-D495. https://doi.org/10.1093/nar/gkt1178","processSoftwareURL":"http://www.cazy.org/"}],"process:processMethodsPar":[{"process:processMethodsParName":"sequence identity threshold"},{"process:processMethodsParName":"gap opening penalty","process:processMethodsParValue":"10"},{"process:processMethodsParName":"gap extension penalty","process:processMethodsParValue":"0.5"},{"process:processMethodsParName":"word length (CD-Hit)","process:processMethodsParValue":"5"}],"author":{"citation:authorName":"Lohoff, Caroline","citation:authorAffiliation":"Universität Stuttgart"},"publication":{"publicationCitation":"Lohoff C., Buchholz P. C. F., Le Roes-Hill M. & Pleiss J. (2020). The Expansin Engineering Database: a navigation and classification tool for expansins and homologues. Proteins: Structure, Function, and Bioinformatics 89:2.","publicationIDType":"doi","publicationIDNumber":"10.1002/prot.26001","publicationURL":"https://doi.org/10.1002/prot.26001"},"title":"GraphML files for protein sequence networks of expansin homologues","dateOfDeposit":"2020-01-27","dataSources":["Expansin Engineering Database (https://exed.biocatnet.de/)","Carbohydrate-Active enZYmes Database (http://www.cazy.org/)","Pfam Database (https://pfam.xfam.org/)"],"subject":"Medicine, Health and Life Sciences","citation:depositor":"Buchholz, Patrick C. F.","language":"English","citation:notesText":"The GraphML attributes for the edges comprise the edge weights (pairwise sequence identity, \"weight\").\r\nThe GraphML attributes for the nodes comprise the identifiers from the ExED (\"sequence_id\", \"protein_id\", \"hfam_id\", and \"sfam_id\" for sequence, protein, homologous family and superfamily identifiers, respectively), the NCBI taxonomy ID (\"tax_id\"), the annotated (organism) source name (\"tax_name\"), the taxonomic lineage of the source organism (\"lineage\", with taxa separated by \"<--\"), and the length of the amino acid sequence (\"sequence_length\"). In addition, suggested color names are given for both fill color and border color of each node (\"color\" and \"color_border\").","@id":"https://doi.org/10.18419/darus-624","@type":["ore:Aggregation","schema:Dataset"],"schema:version":"1.1","schema:name":"GraphML files for protein sequence networks of expansin homologues","schema:dateModified":"Mon May 03 09:35:21 CEST 2021","schema:datePublished":"2020-01-30","schema:license":"http://creativecommons.org/licenses/by/4.0","dvcore:fileTermsOfAccess":{"dvcore:fileRequestAccess":false},"schema:includedInDataCatalog":"DaRUS","schema:isPartOf":{"schema:name":"Expansin Engineering Database","@id":"https://nfldevdataverse2.rus.uni-stuttgart.de/dataverse/ibtb_ExED","schema:description":"Supporting information and original files for bioinformatic investigations using the Expansin Engineering Database (https://exed.biocatnet.de/)","schema:isPartOf":{"schema:name":"Bioinformatics","@id":"https://nfldevdataverse2.rus.uni-stuttgart.de/dataverse/ibtb_BI","schema:isPartOf":{"schema:name":"Department of Technical Biochemistry","@id":"https://nfldevdataverse2.rus.uni-stuttgart.de/dataverse/ibtb_TB","schema:isPartOf":{"schema:name":"Institute of Biochemistry and Technical Biochemistry","@id":"https://nfldevdataverse2.rus.uni-stuttgart.de/dataverse/ibtb","schema:isPartOf":{"schema:name":"DaRUS","@id":"https://nfldevdataverse2.rus.uni-stuttgart.de/dataverse/darus","schema:description":"This is the data Repository of the University of Stuttgart."}}}}},"ore:aggregates":[{"schema:description":"Protein sequence network for the bacterial, fungal, and plant superfamily from the Expansin Engineering Database (for sequences with length between 210 and 300 residues) including members of the CBM63 family (downloaded from the CAZy database on June 3, 2019). The GraphML file contains representative nodes (clustered by 0.9 in CD-Hit) connected by at least 50% pairwise sequence identity (edge weights derived from Needleman-Wunsch alignments).\r\n\r\n","schema:name":"CBM63_Sfams123_210-300_90_50.graphml","dvcore:restricted":false,"schema:version":1,"dvcore:datasetVersionId":301,"@id":"doi:10.18419/darus-624/3","schema:sameAs":"https://nfldevdataverse2.rus.uni-stuttgart.de/api/access/datafile/:persistentId?persistentId=doi:10.18419/darus-624/3","@type":"ore:AggregatedResource","schema:fileFormat":"text/xml-graphml","dvcore:filesize":74017190,"dvcore:storageIdentifier":"s3://fokus-dv-prod-1:16ff08ceab1-8d8a4999280c","dvcore:rootDataFileId":-1,"dvcore:checksum":{"@type":"MD5","@value":"51c315706ee9f262b816ee2654172d0e"}},{"schema:description":"Protein sequence network for the bacterial, fungal, plant and N-terminal domains superfamily from the Expansin Engineering Database including members of the GH45 family (from Pfam, version 32.0, accession PF02015).\r\nThe GraphML file contains representative nodes (clustered by 0.9 in CD-Hit) connected by at least 30% pairwise sequence identity (edge weights derived from Needleman-Wunsch alignments).","schema:name":"GH45_Sfam_1234_Ndomain_90_30.graphml","dvcore:restricted":false,"schema:version":1,"dvcore:datasetVersionId":301,"@id":"doi:10.18419/darus-624/1","schema:sameAs":"https://nfldevdataverse2.rus.uni-stuttgart.de/api/access/datafile/:persistentId?persistentId=doi:10.18419/darus-624/1","@type":"ore:AggregatedResource","schema:fileFormat":"text/xml-graphml","dvcore:filesize":187334563,"dvcore:storageIdentifier":"s3://fokus-dv-prod-1:16ff08c30ae-e7ab9cf8940e","dvcore:rootDataFileId":-1,"dvcore:checksum":{"@type":"MD5","@value":"afaa6dd751225c3dfb4a7702ad979993"}},{"schema:description":"Protein sequence network for N-terminal expansin domains from the bacterial, fungal, plant and N-terminal domains superfamily from the Expansin Engineering Database.\r\nThe GraphML file contains representative nodes (clustered by 0.9 in USEARCH/ UCLUST) connected by at least 60% pairwise sequence identity (edge weights derived from Needleman-Wunsch alignments).","schema:name":"Ndomains_1234_CBM_09_60identity.graphml","dvcore:restricted":false,"schema:version":1,"dvcore:datasetVersionId":301,"@id":"doi:10.18419/darus-624/4","schema:sameAs":"https://nfldevdataverse2.rus.uni-stuttgart.de/api/access/datafile/:persistentId?persistentId=doi:10.18419/darus-624/4","@type":"ore:AggregatedResource","schema:fileFormat":"text/xml-graphml","dvcore:filesize":26408406,"dvcore:storageIdentifier":"s3://fokus-dv-prod-1:16ff098cdbe-77c95e3f4d2c","dvcore:rootDataFileId":-1,"dvcore:checksum":{"@type":"MD5","@value":"dd85b921f70e99ae9cbffc2156be55c0"}},{"schema:description":"Protein sequence network for C-terminal expansin domains from the bacterial, fungal, and plant superfamily from the Expansin Engineering Database. The GraphML file contains representative nodes (clustered by 0.9 in USEARCH/ UCLUST) connected by at least 60% pairwise sequence identity (edge weights derived from Needleman-Wunsch alignments).","schema:name":"Sfam_123_Cdomain_90_60.graphml","dvcore:restricted":false,"schema:version":1,"dvcore:datasetVersionId":301,"@id":"doi:10.18419/darus-624/2","schema:sameAs":"https://nfldevdataverse2.rus.uni-stuttgart.de/api/access/datafile/:persistentId?persistentId=doi:10.18419/darus-624/2","@type":"ore:AggregatedResource","schema:fileFormat":"text/xml-graphml","dvcore:filesize":21810184,"dvcore:storageIdentifier":"s3://fokus-dv-prod-1:16ff08c6f4c-fe2358042882","dvcore:rootDataFileId":-1,"dvcore:checksum":{"@type":"MD5","@value":"1bcf30f501153632979c2e032fe5a497"}},{"schema:description":"Protein sequence network for N-terminal expansin domains from the bacterial, fungal, and plant superfamily from the Expansin Engineering Database, for sequences with length between 210 and 300 residues.The GraphML file contains representative nodes (clustered by 0.9 in USEARCH/ UCLUST) connected by at least 50% pairwise sequence identity (edge weights derived from Needleman-Wunsch alignments).","schema:name":"Sfams_123_210-300_90_50.graphml","dvcore:restricted":false,"schema:version":1,"dvcore:datasetVersionId":301,"@id":"doi:10.18419/darus-624/5","schema:sameAs":"https://nfldevdataverse2.rus.uni-stuttgart.de/api/access/datafile/:persistentId?persistentId=doi:10.18419/darus-624/5","@type":"ore:AggregatedResource","schema:fileFormat":"text/xml-graphml","dvcore:filesize":90050475,"dvcore:storageIdentifier":"s3://fokus-dv-prod-1:16ff22df9fb-f0a9b48681e4","dvcore:rootDataFileId":-1,"dvcore:checksum":{"@type":"MD5","@value":"6b7c3ee5702bf65837900503ef335e54"}}],"schema:hasPart":["doi:10.18419/darus-624/3","doi:10.18419/darus-624/1","doi:10.18419/darus-624/4","doi:10.18419/darus-624/2","doi:10.18419/darus-624/5"]},"@context":{"author":"http://purl.org/dc/terms/creator","citation":"https://dataverse.org/schema/citation/","dataSources":"https://www.w3.org/TR/prov-o/#wasDerivedFrom","dateOfDeposit":"http://purl.org/dc/terms/dateSubmitted","dcterms":"http://purl.org/dc/terms/","dvcore":"https://dataverse.org/schema/core#","language":"http://purl.org/dc/terms/language","ore":"http://www.openarchives.org/ore/terms/","process":"https://nfldevdataverse2.rus.uni-stuttgart.de/schema/process#","processMethodsName":"https://schema.org/measurementTechnique","processSoftware":"https://schema.org/SoftwareApplication","processSoftwareCitation":"https://schema.org/citation","processSoftwareName":"https://schema.org/name","processSoftwareURL":"https://schema.org/downloadUrl","processSoftwareVersion":"https://schema.org/version","publication":"http://purl.org/dc/terms/isReferencedBy","publicationCitation":"http://purl.org/dc/terms/bibliographicCitation","publicationIDNumber":"http://purl.org/spar/datacite/ResourceIdentifier","publicationIDType":"http://purl.org/spar/datacite/ResourceIdentifierScheme","publicationURL":"https://schema.org/distribution","schema":"http://schema.org/","subject":"http://purl.org/dc/terms/subject","title":"http://purl.org/dc/terms/title"}}