[AR] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
[AR] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
[AR] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
[SDU] UDoP: Unifying Vision, Text, and Layout for Universal Document Processing
[CTTA][CLS][SS] VIDA: HOMEOSTATIC VISUAL DOMAIN ADAPTER FOR CONTINUAL TEST TIME ADAPTATION
[SDU] ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents
[SDU] LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding