InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
[MLLM] InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
[MLLM] InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
[WebAgent] ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data
[WebAgent] UIShift: Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning
[Chart] CHARTEDIT: How Far Are MLLMs From Automating Chart Analysis? Evaluating MLLMs’ Capability via Chart Editing
[WebAgent] A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models