Persistent Identifier
|
doi:10.18419/DARUS-4776 |
Publication Date
|
2025-02-28 |
Title
| Code for Improving Video Caption Accuracy with LLMs |
Subtitle
| Empowering the Deaf and Hard of Hearing Community: Enhancing Video Captions Using Large Language Models |
Alternative Title
| Improving the Quality of Video Captions for the DHH Community Using LLM |
Alternative URL
| https://github.com/monikabhole001/Improving-the-Quality-of-Video-Captions-for-the-DHH-Community-Using-LLM |
Other Identifier
| Software Heritage: swh:1:snp:09f89cf03dccad2d5918dd55a88eba57904c03ed;origin=https://github.com/monikabhole001/Improving-the-Quality-of-Video-Captions-for-the-DHH-Community-Using-LLM |
Author
| Fathallah, Nadeenhttps://ror.org/04vnq7t77ORCIDhttps://orcid.org/0000-0001-7921-034X |
Point of Contact
|
Use email button above to contact.
Fathallah, Nadeen (University of Stuttgart)
Fathallah, Nadeen (University of Stuttgart) |
Description
| As part of the IKILeUS project at the University of Stuttgart, research was conducted to explore how Large Language Models (LLMs) can enhance the accuracy and contextual relevance of automatic speech recognition (ASR)-generated captions. While ASR tools provide a foundation for accessibility, they often produce grammatical errors, misinterpret homophones, and struggle with domain-specific terminology. To address these challenges, experiments were conducted using LLMs such as GPT-3.5 and Llama2-13B to refine and correct captioning errors. The models were evaluated using standard NLP metrics such as Word Error Rate (WER), BLEU, and ROUGE scores, demonstrating notable improvements in caption accuracy. The findings suggest that LLMs can effectively enhance the readability, coherence, and precision of automatically generated captions, offering a promising direction for improving video accessibility for the Deaf and Hard of Hearing (DHH) community. (2024-02-13) |
Subject
| Computer and Information Science |
Keyword
| Accessibility http://www.wikidata.org/entity/Q555097 (Wikidata) http://www.wikidata.org/
Assistive Technologies https://vocabs.acdh.oeaw.ac.at/oefosdisciplines/211902 (ÖFOS) |
Topic Classification
| Artificial Intelligence and Machine Learning Methods (DFGFO) https://w3id.org/dfgfo/2024/443-04 |
Related Publication
| Is Supplement To: Fathallah, N., Bhole, M., & Staab, S. (2024, November 30). Empowering the Deaf and Hard of Hearing Community: Enhancing Video Captions Using Large Language Models. In Proceedings of the 11th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion, 2024. arXiv 2412.00342 https://arxiv.org/abs/2412.00342 |
Producer
| High Performance Computing Center (HLRS) (University of Stuttgart) |
Funding Information
| German Federal Ministry of Education and Research (BMBF): IKILeUS: 16DHBKI041 |
Distributor
| Fathallah, Nadeen (University of Stuttgart) |
Distribution Date
| 2025-02-13 |
Depositor
| Fathallah, Nadeen |
Deposit Date
| 2025-02-13 |
Time Period
| Start Date: 2022-08-01; End Date: 2024-11-30 |
Date of Collection
| Start Date: 2022-08-01; End Date: 2024-11-30 |
Data Type
| Automatic speech recognition (ASR) transcriptions, large language model (LLM)-corrected subtitle datasets, word error rate (WER) evaluation data, NLP-processed text outputs, captioning quality metrics (BLEU, ROUGE scores). |