[Agent] VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
[Agent] VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
[Agent] VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
[Agent] PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides
[Agent] BannerAgency: Advertising Banner Design with Multimodal LLM Agents
[MM] GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
[MobileUI] From Perception to Reasoning: Enhancing Vision-Language Models for Mobile UI Understanding