OmniParser for Pure Vision Based GUI Agent
[WebAgent] OmniParser: OmniParser for Pure Vision Based GUI Agent
[WebAgent] OmniParser: OmniParser for Pure Vision Based GUI Agent
[WebAgent] A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
[MLLM] InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
[WebAgent] ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data
[WebAgent] UIShift: Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning