[MM] Making LLaMA SEE and Draw with SEED Tokenizer
[MM] Making LLaMA SEE and Draw with SEED Tokenizer
[MM] Making LLaMA SEE and Draw with SEED Tokenizer
[SSL][CLS][SS] BEIT V2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
[MM] LaViT: UNIFIED LANGUAGE-VISION PRETRAINING IN LLM WITH DYNAMIC DISCRETE VISUAL TOKENIZATION
[MM] Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference