[MM] InternVL-1.5: How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
[MM] InternVL-1.5: How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
[MM] InternVL-1.5: How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
[VTG] HPCVTG: Toward Human Perception-Centric Video Thumbnail Generation
[LG] PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM
[IR] ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
[MM] Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks