[MM] Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
[MM] Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
[MM] Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
[MM] Dense Connector for MLLMs
[Layout] VLT: Interactively Optimizing Layout Transfer for Vector Graphics
[MM] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
[Layout] LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer