[DG][MM] USING LANGUAGE TO EXTEND TO UNSEEN DOMAINS

1 minute read

[DG][MM] USING LANGUAGE TO EXTEND TO UNSEEN DOMAINS

paper : https://arxiv.org/pdf/2210.09520.pdf
git : https://github.com/lisadunlap/LADS
Published as a conference paper at ICLR 2023 (‘22.10, 인용수 : 5회 (‘23.08.14 기준))
Downstream tasks: Domain Generalization for classification
Contribution
- Vision-Language Model (CLIP)을 이용해서 unseen domain에 대한 text description만 가지고 latent space에서 augmentation하는 network 제안
- Linear probing을 통해 backbone fine-tuning 없이 head만 학습시켜 DG task에서 SOTA
- 데이터셋에 존재하는 거짓 편향 (spurious correlation bias)을 다룸
  - ex. waterbird on land vs. landbird on water
Latent Augmentation Using Domain Descriptions
- Domain Extension with Language
  - Training Dataset & label
    \[\{x_i, y_i\}_{i=1}^n\]
  - class name : $t_y$
  - training domain written description : $t_{training}$
  - unseen domain written description : $t_{unseen}$
    \[\{t_{unseen}^i\}_{i=1}^k\]
- LADS (Lantent Augmentation using Domain descriptionS)
  - Linear probing으로 classification head만 학습
  - Input :
    - image embedding (by CLIP Image encoder)
    - class labels
    - text description of class
    - text descriotion of domain
- Two-stage Training approach
  - Augmentation network training
    - $f_{aug}$ : 2-layer MLP with input and output dimensions of 768 and a hidden dimension of 384
    - Train Domain의 image embedding을 Unseen Domain으로 transform
    - 그러면서 class의 semantic은 보존
      - CLIP loss를 적용
    - LADS 최종 Loss
  - Fine tuning
    - Linear probing: Original image와 함께 Augmented image도 함께 head 만 학습
      - CLIP의 backbone까지 학습하는건 overfitting 우려가 있으므로 head만 학습
- Dataset Bias
  - CLIP을 활용해서 source domain의 배경이 water이면 augmented domain을 land로, vice-versa

Share on

Twitter Facebook LinkedIn

JooYoung Jang

[DG][MM] USING LANGUAGE TO EXTEND TO UNSEEN DOMAINS

[DG][MM] USING LANGUAGE TO EXTEND TO UNSEEN DOMAINS

Share on

You may also enjoy

Screencoder

Ui2code