Cootie Catcher Large Image Instruction

Assessing Robustness of Multi-Modal Large Language Models in Image Classification through Hierarchical WordNet-Based Evaluation

Abstract: The advancement of multi-modal large language models (MLLMs) has significantly enhanced their capability to process and understand diverse data types, integrating text, images, and other ...

GitHub

Describe Anything: Detailed Localized Image and Video Captioning

TL;DR: Our Describe Anything Model (DAM) takes in a region of an image or a video in the form of points/boxes/scribbles/masks and outputs detailed descriptions to the ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Assessing Robustness of Multi-Modal Large Language Models in Image Classification through Hierarchical WordNet-Based Evaluation

Describe Anything: Detailed Localized Image and Video Captioning

Trending now