[AR] TimeSFormer: Is Space-Time Attention All You Need for Video Understanding?
[AR] TimeSFormer: Is Space-Time Attention All You Need for Video Understanding?
[AR] TimeSFormer: Is Space-Time Attention All You Need for Video Understanding?
[OVD][VLM][KD] RKD: Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection
[TTA] CAFA: Class-Aware Feature Alignment for Test-Time Adaptation
[SSL2][OD][SS][OT] DoRA: IS IMAGENET WORTH 1 VIDEO? LEARNING STRONG IMAGE ENCODERS FROM 1 LONG UNLABELLED VIDEO
[SSL][OD] Efficient Teacher: Semi-Supervised Object Detection for YOLOv5