Persistent Identifier
|
doi:10.18419/DARUS-3898 |
Publication Date
|
2024-02-13 |
Title
| CNVVE Dataset clean audio samples |
Alternative URL
| https://doi.org/10.6084/m9.figshare.23301608.v1 |
Author
| Hedeshy, RaminUniversität StuttgartORCID0000-0001-5854-4033
Menges, RaphaelSemanuxORCID0000-0002-2112-7065
Staab, SteffenUniversität StuttgartORCID0000-0002-0780-4154 |
Point of Contact
|
Use email button above to contact.
Hedeshy, Ramin (Universität Stuttgart)
Analytical Computing (Universität Stuttgart) |
Description
| This CNVVE Dataset contains clean audio samples encompassing six distinct classes of voice expressions, namely “Uh-huh” or “mm-hmm”, “Uh-uh” or “mm-mm”, “Hush” or “Shh”, “Psst”, “Ahem”, and Continuous humming, e.g., “hmmm.” Audio samples of each class are found in the respective folders.
These audio samples have undergone a thorough cleaning process. The raw samples are published in https://doi.org/10.18419/darus-3897. Initially, we applied the Google WebRTC voice activity detection (VAD) algorithm on the given audio files to remove noise or silence from the collected voice signals. The intensity was set to "2", which could be a value between "1" and "3". However, because of variations in the data, some files required additional manual cleaning. These outliers, characterized by sharp click sounds (such as those occurring at the end of recordings), were addressed.
The samples are recorded through a dedicated website for data collection that defines the purpose and type of voice data by providing example recordings to participants as well as the expressions’ written equivalent, e.g., “Uh-huh”. Audio recordings were automatically saved in the .wav format and kept anonymous, with a sampling rate of 48 kHz and a bit depth of 32 bits.
For more info, please check the paper or feel free to contact the authors for any inquiries. |
Subject
| Computer and Information Science |
Keyword
| Human-Computer Interaction http://www.wikidata.org/entity/Q207434 (Wikidata)
Speech Impairment http://www.wikidata.org/entity/Q1282114 (Wikidata)
Dysarthric Speech http://www.wikidata.org/entity/wiki/Q50508701 (Wikidata)
Data Augmentation http://www.wikidata.org/entity/Q85014143 (Wikidata) |
Topic Classification
| Human-Computer Interaction (Wikidata) http://www.wikidata.org/entity/Q207434 |
Related Publication
| CNVVE: Dataset and Benchmark for Classifying Non-verbal Voice Expressions. R. Hedeshy, R. Menges, and S. Staab. Interspeech 2023, August 20-24, 2023. Dublin, Ireland, (2023). doi 10.21437/Interspeech.2023-201 https://doi.org/10.21437/Interspeech.2023-201 |
Language
| English |
Production Date
| 2023-06-06 |
Funding Information
| BMWK/ESF: 03EFRBW231
BMBF: 16DHBKI041 |
Depositor
| Bhattacharya, Mrityunjoy |
Deposit Date
| 2024-01-29 |
Related Dataset
| Hedeshy, Ramin; Menges, Raphael; Staab, Steffen, 2024, "Code for Training and Testing CNVVE", https://doi.org/10.18419/darus-3896, DaRUS, V1.; Hedeshy, Ramin; Menges, Raphael; Staab, Steffen, 2024, "Raw audio samples of the CNVVE dataset", https://doi.org/10.18419/darus-3897, DaRUS, V1. |
Data Source
| Hedeshy, Ramin; Menges, Raphael; Staab, Steffen, 2024, "Raw audio samples of the CNVVE dataset", https://doi.org/10.18419/darus-3897, DaRUS, V1. |