Code for Improving Video Caption Accuracy with LLMs

Version 1.0

Fathallah, Nadeen, 2025, "Code for Improving Video Caption Accuracy with LLMs", https://doi.org/10.18419/DARUS-4776, DaRUS, V1

Learn about Data Citation Standards.

Contact Owner

Dataset Metrics

1 Download

Subtitle	Empowering the Deaf and Hard of Hearing Community: Enhancing Video Captions Using Large Language Models
Description	As part of the IKILeUS project at the University of Stuttgart, research was conducted to explore how Large Language Models (LLMs) can enhance the accuracy and contextual relevance of automatic speech recognition (ASR)-generated captions. While ASR tools provide a foundation for accessibility, they often produce grammatical errors, misinterpret homophones, and struggle with domain-specific terminology. To address these challenges, experiments were conducted using LLMs such as GPT-3.5 and Llama2-13B to refine and correct captioning errors. The models were evaluated using standard NLP metrics such as Word Error Rate (WER), BLEU, and ROUGE scores, demonstrating notable improvements in caption accuracy. The findings suggest that LLMs can effectively enhance the readability, coherence, and precision of automatically generated captions, offering a promising direction for improving video accessibility for the Deaf and Hard of Hearing (DHH) community. (2024-02-13)
Subject	Computer and Information Science
Keyword	Accessibility, Assistive Technologies
Related Publication	Is Supplement To: Fathallah, N., Bhole, M., & Staab, S. (2024, November 30). Empowering the Deaf and Hard of Hearing Community: Enhancing Video Captions Using Large Language Models. In Proceedings of the 11th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion, 2024.arXiv: 2412.00342
License/Data Use Agreement	MIT License

Change View

Table

Tree

	1 to 10 of 11 Files	Download
	LICENSE Plain Text - 1.0 KB Published Feb 28, 2025 1 Download MD5: aadc2e2eeb8a75df7a5843133be16dd6	Access File File Access Public Download Options Plain Text Download Metadata Data File Citation Download EndNote XML Download RIS Download BibTeX
	Download_YT_Video.ipynb Improving-the-Quality-of-Video-Captions-for-the-DHH-Community-Using-LLM-main/Jupyter Notebook - 7.0 KB Published Feb 28, 2025 0 Downloads MD5: 5e884e1584616b21d000203d27683c3a	Access File File Access Public Download Options Jupyter Notebook Download Metadata Data File Citation Download EndNote XML Download RIS Download BibTeX
	Evaluation.ipynb Improving-the-Quality-of-Video-Captions-for-the-DHH-Community-Using-LLM-main/Jupyter Notebook - 19.4 KB Published Feb 28, 2025 0 Downloads MD5: c24b839b69635b4bf4b11a386188bed3	Access File File Access Public Download Options Jupyter Notebook Download Metadata Data File Citation Download EndNote XML Download RIS Download BibTeX
	Gemini.ipynb Improving-the-Quality-of-Video-Captions-for-the-DHH-Community-Using-LLM-main/Jupyter Notebook - 11.9 KB Published Feb 28, 2025 0 Downloads MD5: f6e97b6eeb114cd95822523bbc65ad9c	Access File File Access Public Download Options Jupyter Notebook Download Metadata Data File Citation Download EndNote XML Download RIS Download BibTeX
	GPT2.ipynb Improving-the-Quality-of-Video-Captions-for-the-DHH-Community-Using-LLM-main/Jupyter Notebook - 7.8 KB Published Feb 28, 2025 0 Downloads MD5: 58be18532e56dbbe2ea70f2dd564f3f8	Access File File Access Public Download Options Jupyter Notebook Download Metadata Data File Citation Download EndNote XML Download RIS Download BibTeX
	GPT3_5_openai_api_Azure.ipynb Improving-the-Quality-of-Video-Captions-for-the-DHH-Community-Using-LLM-main/Jupyter Notebook - 4.8 KB Published Feb 28, 2025 0 Downloads MD5: 0d371e20b73fe73d51366becd673c5ca	Access File File Access Public Download Options Jupyter Notebook Download Metadata Data File Citation Download EndNote XML Download RIS Download BibTeX
	llama2_7b_ggml.ipynb Improving-the-Quality-of-Video-Captions-for-the-DHH-Community-Using-LLM-main/Jupyter Notebook - 4.4 KB Published Feb 28, 2025 0 Downloads MD5: 1e3fe7a495cd36dc8328ee1c464d4b3c	Access File File Access Public Download Options Jupyter Notebook Download Metadata Data File Citation Download EndNote XML Download RIS Download BibTeX
	llama2_replicate.ipynb Improving-the-Quality-of-Video-Captions-for-the-DHH-Community-Using-LLM-main/Jupyter Notebook - 7.4 KB Published Feb 28, 2025 0 Downloads MD5: 27167e9740456c66174d9070e08c35f6	Access File File Access Public Download Options Jupyter Notebook Download Metadata Data File Citation Download EndNote XML Download RIS Download BibTeX
	README.md Improving-the-Quality-of-Video-Captions-for-the-DHH-Community-Using-LLM-main/Markdown Text - 3.8 KB Published Feb 28, 2025 0 Downloads MD5: d9c31ba29a0a95e82ec2fc6602db412f	Preview "Improving-the-Quality-of-Video-Captions-for-the-DHH-Community-Using-LLM-main/README.md" Access File File Access Public Download Options Markdown Text Download Metadata Data File Citation Download EndNote XML Download RIS Download BibTeX
	T5.ipynb Improving-the-Quality-of-Video-Captions-for-the-DHH-Community-Using-LLM-main/Jupyter Notebook - 6.2 KB Published Feb 28, 2025 0 Downloads MD5: cd98c305f14bf313951e8a011ba587b9	Access File File Access Public Download Options Jupyter Notebook Download Metadata Data File Citation Download EndNote XML Download RIS Download BibTeX

Citation Metadata

Persistent Identifier	doi:10.18419/DARUS-4776
Publication Date	2025-02-28
Title	Code for Improving Video Caption Accuracy with LLMs
Subtitle	Empowering the Deaf and Hard of Hearing Community: Enhancing Video Captions Using Large Language Models
Alternative Title	Improving the Quality of Video Captions for the DHH Community Using LLM
Alternative URL	https://github.com/monikabhole001/Improving-the-Quality-of-Video-Captions-for-the-DHH-Community-Using-LLM
Other Identifier	Software Heritage: swh:1:snp:09f89cf03dccad2d5918dd55a88eba57904c03ed;origin=https://github.com/monikabhole001/Improving-the-Quality-of-Video-Captions-for-the-DHH-Community-Using-LLM
Author	https://ror.org/04vnq7t77https://orcid.org/0000-0001-7921-034X
Point of Contact	Use email button above to contact. Fathallah, Nadeen (University of Stuttgart) Fathallah, Nadeen (University of Stuttgart)
Description	As part of the IKILeUS project at the University of Stuttgart, research was conducted to explore how Large Language Models (LLMs) can enhance the accuracy and contextual relevance of automatic speech recognition (ASR)-generated captions. While ASR tools provide a foundation for accessibility, they often produce grammatical errors, misinterpret homophones, and struggle with domain-specific terminology. To address these challenges, experiments were conducted using LLMs such as GPT-3.5 and Llama2-13B to refine and correct captioning errors. The models were evaluated using standard NLP metrics such as Word Error Rate (WER), BLEU, and ROUGE scores, demonstrating notable improvements in caption accuracy. The findings suggest that LLMs can effectively enhance the readability, coherence, and precision of automatically generated captions, offering a promising direction for improving video accessibility for the Deaf and Hard of Hearing (DHH) community. (2024-02-13)
Subject	Computer and Information Science
Keyword	Accessibility http://www.wikidata.org/entity/Q555097 (Wikidata) http://www.wikidata.org/ Assistive Technologies https://vocabs.acdh.oeaw.ac.at/oefosdisciplines/211902 (ÖFOS)
Topic Classification	Artificial Intelligence and Machine Learning Methods (DFGFO) https://w3id.org/dfgfo/2024/443-04
Related Publication	Is Supplement To: Fathallah, N., Bhole, M., & Staab, S. (2024, November 30). Empowering the Deaf and Hard of Hearing Community: Enhancing Video Captions Using Large Language Models. In Proceedings of the 11th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion, 2024. arXiv 2412.00342 https://arxiv.org/abs/2412.00342
Producer	High Performance Computing Center (HLRS) (University of Stuttgart)
Funding Information	German Federal Ministry of Education and Research (BMBF): IKILeUS: 16DHBKI041
Distributor	Fathallah, Nadeen (University of Stuttgart)
Distribution Date	2025-02-13
Depositor	Fathallah, Nadeen
Deposit Date	2025-02-13
Time Period	Start Date: 2022-08-01; End Date: 2024-11-30
Date of Collection	Start Date: 2022-08-01; End Date: 2024-11-30
Data Type	Automatic speech recognition (ASR) transcriptions, large language model (LLM)-corrected subtitle datasets, word error rate (WER) evaluation data, NLP-processed text outputs, captioning quality metrics (BLEU, ROUGE scores).

Privacy Metadata

Personal Data	no

Dataset Terms

License/Data Use Agreement

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation shown on the dataset page.

MIT License

Dataset Version	Summary	Version Note	Contributors	Published on
No records found.

Edit File

This file has already been deleted (or replaced) in the current version. It may not be edited.

Restrict Access

Restricting limits access to published files. People who want to use the restricted files can request access by default. If you disable request access, you must add information about access to the Terms of Access field.

Learn about restricting files and dataset access in the User Guide.

Request Access

Enable access request

You must enable request access or add terms of access to restrict file access.

Terms of Access for Restricted Files

Save Changes

Edit Embargo

The selected file or files have already been published. Contact an administrator to change the embargo date or reason of the file or files.

Edit Retention Period

The selected file or files have already been published. Contact an administrator to change the retention period date or reason of the file or files.

Delete Files

The file will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Select File(s)

Please select one or more files.

Share Dataset

Share this dataset on your favorite social media networks.

Continue

Dataset Citations

Citations for this dataset are retrieved from Crossref via DataCite using Make Data Count standards. For more information about dataset metrics, please refer to the User Guide.

Sorry, no citations were found.

Inaccessible Files Selected

The selected file(s) may not be downloaded because you have not been granted access or the file(s) have a retention period that has expired or the files can only be transferred via Globus.

You may request access to any restricted file(s) by clicking the Request Access button.

Ineligible Files Selected

The selected file(s) may not be transferred because you have not been granted access or the file(s) have a retention period that has expired or the files are not Globus accessible.

You may request access to any restricted file(s) by clicking the Request Access button.

Download Options

The files selected are too large to download as a ZIP.

You can select individual files that are below the 1.9 GB download limit from the files table, or use the Data Access API for programmatic access to the files.

Select File(s)

Please select a file or files to be downloaded.

Inaccessible Files Selected

The selected file(s) may not be downloaded because you have not been granted access or the file(s) have a retention period that has expired.

Click Continue to download the files you have access to download.

Ineligible Files Selected

Some file(s) cannot be transferred. (They are restricted, embargoed, with an expired retention period, or not Globus accessible.)

Click Continue to transfer the elligible files.

Delete Dataset

Are you sure you want to delete this dataset and all of its files? You cannot undelete this dataset.

Delete Draft Version

Are you sure you want to delete this draft version? Files will be reverted to the most recently published version. You cannot undelete this draft.

Unpublished Dataset Preview URL

Preview URL can only be used with unpublished versions of datasets.

Unpublished Dataset Preview URL

Are you sure you want to disable the Preview URL? If you have shared the Preview URL with others they will no longer be able to use it to access your unpublished dataset.

Delete Files

The file(s) will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Compute

This dataset contains restricted files you may not compute on because you have not been granted access.

Deaccession Dataset

Are you sure you want to deaccession? This is permanent and the selected version(s) will no longer be viewable by the public.

Deaccession Dataset

Are you sure you want to deaccession this dataset? This is permanent an it will no longer be viewable by the public.

Version Differences Details

Please select two versions to view the differences.

Version Differences Details

Version:
Last Updated:

Select File(s)

Please select a file or files for access request.

Select File(s)

Embargoed files cannot be accessed. Please select an unembargoed file or files for your access request.

Edit Tags

Select existing file tags or create new tags to describe your files. Each file can have more than one tag.

Request Access

You need to Log In to request access.

Dataset Terms

Please confirm and/or complete the information needed below in order to request access to files in this dataset.

This dataset is made available under the following terms. Please confirm and/or complete the information needed below in order to continue.

License/Data Use Agreement

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation shown on the dataset page.

MIT License

Preview Guestbook

Upon downloading files the guestbook asks for the following information.

Guestbook Name

Collected Data

Account Information

Package File Download

Use the Download URL in a Wget command or a download manager to download this package file. Download via web browser is not recommended. User Guide - Downloading a Dataverse Package via URL

Download URL

https://darus.uni-stuttgart.de/api/access/datafile/

Compute Batch

Clear Batch

Dataset	Persistent Identifier	Change Compute Batch

Compute Batch

Submit for Review

Please use the publication checklist for authors ( https://www.izus.uni-stuttgart.de/en/fokus/darus/publication/ ) to ensure publication readiness. You will not be able to make changes to this dataset while it is in review.

Publish Dataset

Are you sure you want to republish this dataset?

Select if this is a minor or major version update.

Minor Release (1.1)

Major Release (2.0)

Version Note

Publish Dataset

This dataset cannot be published until Analytic Computing is published by its administrator.

Publish Dataset

This dataset cannot be published until Analytic Computing and Institute for Artificial Intelligence are published.

Return to Author

Return this dataset to contributor for modification. The reason for return entered below will be sent by email to the author.

Add/Edit a Version Note

Enter the reason this version was created. To learn more about Version Notes, visit the Version Notes section of the User Guide.

Version Note

Styled Citation