BloomBerry.ai

[MM] CogAgent: A Visual Language Model for GUI Agents

2 minute read

[MM] CogAgent: A Visual Language Model for GUI Agents

1 minute read

[MM] VLM2VEC: Training Vision-Language Models for Massive Multimodal Embedding Tasks

3 minute read

[MM] UniCode: Learning a Unified Codebook for Multimodal Large Language Models

1 minute read

[MM] SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

2 minute read

[MM] Qwen2-VL: Enhancing Vision-Language Model’s Perception of the Wolrd at Any Resolution