1 to 1 of 1 Result
Jul 8, 2024
Tilli, Pascal, 2024, "Data for: HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities", https://doi.org/10.18419/DARUS-4341, DaRUS, V1
Image-Text-Matching (ITM) is one of the defacto methods of learning generalized representations from a large corpus in Vision and Language (VL). However, due to the weak association between the web-collected image–text pairs, models fail to show fine-grained understanding of the combined semantics of these modalities. To this end, we propose Hard N... |