SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
[MM] SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
[MM] SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
[Retrieval] ColPali: Efficient Document Retrieval with Vision Language Models
[Retrieval] RECO: RETRIEVAL-ENHANCED CONTRASTIVE VISION-TEXT MODELS
[LLM] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning paper: https://arxiv.org/pdf/2501.12948 github: https://github....
[LLM] DeepSeek-V3 Technical Report