[TTA][CLS] DEYO: Entropy is Not Enough for Test-Time Adaptation : From the Perspective of Disentangled Factors

2 minute read

[TTA][CLS] DEYO: Entropy is Not Enough for Test-Time Adaptation : From the Perspective of Disentangled Factors

paper: https://openreview.net/pdf?id=9w3iw8wDuE
X
ICLR 2024 in review (인용수: 0회, ‘23.12.13 기준)
downstream task : TTA for CLS

1. Motivation

TTA에서 entropy를 사용해서 sample을 adaptation할지 판단하는 confidence metric으로 활용함
하지만, biased scenario (spurious correlation)에서 entropy는 unreliable한 양상을 나타냄을 발견
- biased scenario : waterbird dataset
  - waterbird class : 배경이 물에서 95% 나타남. 배경이 땅에서 5% 나타남
  - landbird class : 배경이 땅에서 95% 나타남. 배경이 물에서 5% 나타남
    
    waterbird, landbird를 학습하는데 필요한 (CPR: Commonly Positively-coRelated) factor 보다, 배경과 같은 (TRAP: TRAin-time only Positively-corelated) factor로 학습을 하게되어 그러함.
    
    validation은 spurious correlation이 안된 데이터셋으로 검증 수행
  - Entropy기반으로 학습 시, 성능이 떨어짐. 틀린 원인 분석 (Grad-CAM)결과 배경을 보고 판단했기 때문 (DG에서 Multi-View Discrimantor의 causal, non-causal과 개념이 비슷해 보임)

2. Contribution

Entropy만 가지고 confidence metric으로 삼기엔 불충분함을 보임
harmful sample을 identify하는 새로운 confidence metric인 PLPD (Pseudo Label Probability Difference)기반의 DeYO (Destroy Your Object)를 제안함
여러 TTA benchmark에서 SOTA

3. DeYO

Overview
- 4사분면에 해당하는 sample만 selection해서 adaptation하고자 함

3.1 Entropy is Not Enough

Entropy
- Disentangled Latent Vector v(x)
  - train, test set에 대해 CPR factor, TRP factor를 정의한 가상의 vector
  - CPR에 해당하는 latent vector $v_{pp}$ : train set, test set 모두에서 CPR factor. ex. object shape, object pattern, etc
  - TRP에 해당하는 latent vector $v_{pn}, v_{np}, v_{nn}$
    - $v_{pn}$: trainset 에서는 CPR인데 testset에서는 TRP factor. ex. water background
    - $v_{np}$: trainset 에서는 TRP인데 testset에서는 CPR factor ex. land background
    - $v_{nn}$: trainset 에서는 TRP이고 testset에서도 TRP factor ex. image aspect ratio?
- CPR factor 판단 기준
  - 여기서는 CPR인지 여부를 test label $y^{test}$와의 correlation이 0보다 크면 CPR라고 함
  - Appendix를 살펴보면 linear classifier 대해 probability score가 0.5 이상이면 CRP, 미만이면 TRP로 처리하는 이진 분류기를 정의함
  - $a_{\theta}$: sigmoid 통과 전 linear classifer의 output. 이 값이 양수란 것은 CPR과 양의 상관관계를 갖는다는 의미
  - $M_{\theta}$: disentangled latent vector v를 입력으로 하는 linear classifier. TTA에서는 DNN 모델에 해당.
  - $\theta$: learnable parameters
  - $\hat{y}$: input $x$에 대해 disentangled latent vector v와 linear classifier간의 similarity를 pseudo label로 표현한 값. probability score가 0.5보다 크면 1, 작으면 -1
Harmful sample 판단 기준
- $X^{test}_{+1}$: test dataset X에서 CRP factor를 보존한 이미지
- $X^{test}_{-1}$: test dataset X에서 CRP factor를 훼손한 이미지
- Appendix의 증명: CPR 을 판단할때 사용한 $p_{\theta}$에 대해 entropy를 계산하고, model weight를 두 pseudo label class간의 discriminality를 가깝게하므로 위 (5)식을 만족하면 harmful하다고 판단함
- $\theta_{pp}$를 제거하기 전 (1)과 후 (-1)로 나타낼 수도 있음

3.2 Sample Selection

Spurious Correlation 이 높은 경우, entropy기반의 sample selection은 TRAP factor에 영향을 받게 되므로 reliable하지 않다.
때문에 본 논문에서는 CPR factor중 하나인 “shape”에 따른 영향을 받게하는 PLPD (Pseudo Label Probability Difference)를 제안한다.
- $\hat{y}$: pseudo-label. argmax취한 값.
- PLPD를 로 나타낼 수도 있음
- $\tau_{ent}$: entropy threshold
- $\tau_{PLPD}$: PLPD threshold
- $x’$: shuffled image. 즉 CRP factor중 하나인 “object shape”를 destroy함으로써 변형전과 prediction score 차이를 극명하게 나타내고자 사용
- $x$: original image

3.3 Sample Weighting

Reliablity에 따라 각 single sample 마다 weight를 조정함. 즉, more reliable한 sample에 weight를 많이 줘서 학습하고자 함
- $\alpha_{\theta}$: weight for image x
- $Ent_0$: entropy normalization factor. hyper-parameter
DeYO Loss
DeYO Algorithm