[DG][GM] EBTSA: ENERGY-BASED TEST SAMPLE ADAPTATION FOR DOMAIN GENERALIZATION

3 minute read

1. Motivation

Deploy된 환경에서 single sample로 model parameter를 adaptation하는 것은 제한된 정보를 제공하므로 domain gap이 큰 상황에서는 문제를 야기할 수 있음
차라리 target domain sample을 source data로 adaptation하면 source data로 학습된 모델이 성능을 발휘할 수 있지 않을까?
- (초록색) target sample들의 t-SNE cluster 분포 (2-class)
- (빨간색) source sample들의 t-SNE cluster 분포 (2-class)
최근 각광받는 Energy-based model로 complex data distribution을 modeling해보자! $\to$ Langevin Dynamics

unseen target sample을 모방하고자, 서로 다른 source domain sample을 다른 source domain으로 adaptation 수행

probability distribution을 Energy로 modeling
\[p_{\theta}=\frac{exp(-E_{\theta}(x))}{Z_{\theta}}\]
위 분모 term이 intractable하므로, log-likelhood probability $logp_{\theta}(x)=-E_{\theta}(x)-logZ_{\theta}$를 maximize 수행하는 방식으로 대체
- 1st term : data distribution ($p_d(x)$). 에너지를 최소화 해야 maximum likelyhood estimate 할 수 있음
- 2nd term : model distribution ($p_d(x)$). 에너지를 최대화 해야 maximum likelyhood estimate 할 수 있음
- Model $p_{\theta}$를 approximate하는 가장 Naive 한 방법은 MCMC $\to$ Stochatic Gradient Langevin Dynmaics르 대체
Stochatic Gradient Langevin Dynmaics로 대체
$logp_{\theta}(x)=-E_{\theta}(x)-logZ_{\theta}$를 최대화 하는 또 다른 방식 $\to$ KL divergence of $\mathbb{D}{KL}(p_d(x|p{\theta}(x)))$을 최소화
- minimizing contrastive divergence
  - $q_{\theta}(x)=\Pi_{\theta}^tp_d(x)$: t seqeuential MCMC starting from p(x)
- 위 식은 아래 식을 최소화함으로 구현 가능
3.2. Energy Based Test Sample Adaptation
- intial value : uniform distribution $\to$ target sample로 대체
- target sample을 source sample로 adaptation하는게 목표

Overall diagram

x$^i$: i번째 source domain에 속한 sample
x^j: j번째 source domain에 속한 sample
Energy based model은 source domain별로 존재하며, real sample from i domain은 positive, adapted sample from j domain은 negative로 contrastive 학습

Discimiative Energy-based Model

$p_{\theta, \phi}$ (x, y): classification model($\theta$)과 energy based model$\phi$로 구성됨
- x: image I의 backbone featuer
- y: label space vector
contrastive divergence loss를 사용하여 최소화 (Eq. 6)
- 정리하면
- $q_{\theta}(x)=\Pi_{\theta}^tp_d(x)$: t seqeuential MCMC starting from p(x)

Label-preserving adaptation with categorical latent variable

Navie한 Langevin Dynamics는 sampling은 input feature X space에서만 수행되고, start point과 독립적으로 random sampling하므로 categorical 정보가 없음
label-preserving adaptation을 위해 categorical latent vector z를 도입 X space $\to$ X $\times$ Z space 에서 sampling
- $\phi$: classification model의 parameter로, x, z를 예측함
- $\theta$: z given x에 대한 maixmum probability를 modeling하는데 사용되는 energy function network parameter
- z는 fixed되어 사용
lower bound of log-likelyhood $log p_{\theta, \phi}$(x, y) (Eq. 9)
위 식 (9)를 (6)에 대입하여 정리하면
- z는 variational inference로 예측함 (q(z* d$_x$))
  - d$_x$: category x로 예측된 average representation of samples on source domain $\to$ class prototype 같은 느낌
- 1st term: source data로 classification model $\phi$를 학습
- 2nd term: source data로 energy-based model $\theta$를 학습
- 3rd term: adapted sampe에 대해 adaptation 과정을 학습시킴

Ensemble Inference

S개의 source energy, classifier model의 ensemble 결과를 활용

p(z$^n$

x$_t$): target sample feature에 대해 n번째 sample의 prior probability distribution