1 to 4 of 4 Results
Jul 8, 2024
Tilli, Pascal, 2024, "Data for: HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities", https://doi.org/10.18419/DARUS-4341, DaRUS, V1
Image-Text-Matching (ITM) is one of the defacto methods of learning generalized representations from a large corpus in Vision and Language (VL). However, due to the weak association between the web-collected image–text pairs, models fail to show fine-grained understanding of the combined semantics of these modalities. To this end, we propose Hard N... |
Jul 8, 2024 -
Data for: HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities
JSON - 7.6 GB -
MD5: d2ed063924be0ead2b1475deeb001cf6
HNC training set, automatically generated. |
Jul 8, 2024 -
Data for: HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities
JSON - 1.1 GB -
MD5: 6fe56c1128fa3615bcd41dd281636578
HNC validation set, automatically generated. |
Jul 8, 2024 -
Data for: HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities
JSON - 411.7 KB -
MD5: 22dcb8f1f3ab6497b5c4db5f3d756726
HNC test set, annotated by humans. |