[SDU] ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents
[SDU] ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents
[SDU] ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents
[SDU] LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
[KD][AR] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
[OD] SAM-DETR: Accelerating DETR Convergence via Semantic-Aligned Matching
[SSL2][AR] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training