[OD] detrex: Benchmarking Detection Transformers

less than 1 minute read

[OD] detrex: Benchmarking Detection Transformers

github: https://github.com/IDEA-Research/detrex
paper : https://arxiv.org/pdf/2306.07265.pdf
한줄요약 : DETR기반의 Object detection Transformer 계열 19개 최신 구조를 Detectron기반으로 modulable하게 구현한 project를 제안함
기존 OD project의 한계
- mmdetection : backbone/neck/head/assigner의 고정된 구조는 Convolutional계열의 기존 모델들만 통합하며, Transformer기반의 encoder/decoder, 그리고 query matcher 등의 최신 모델을 구현하기에 시간이 많이 듦
- mmdet은 string 기반의 정형화된 config를 제공하는데 비해, detrex는 Detectron2기반의 lazy config를 채택해서 python 호환이 가능한 custom config 생성이 가능함 (user-friendly)
Detrex library 구성도 (7개의 모듈)
Detrex vs. mmdet / Detectron
- 제공하는 DETR계열의 모델이 압도적으로 많음
  - Mask2Former, MaskDINO, ED-Pose 등 Object Detection뿐만 아니라 Instance Segmentation, Pose Estimation등의 downstream task관련 documentation / tutorial 구현됨
- 기존에 비해 core code dependency를 벗어나도록 구현함 (mmdet 3.0이상, detectron부터는 동일함)
성능 지표
- 정확도 : mAP (Original 모델 대비 +0.2AP ~ 1.1AP의 향상수준)
- 학습속도 : GPU-hour
- GPU memory cost : GB
- FPS: FPS (A100에서 수행함 (A100은 80Gb 1500만원 / A6000은 48Gb 600만원))
참고할 다양한 report자료들이 많음
- 초기 initial weight에 따른 성능 분석 (ImageNet 22K pretrained weight backbone 공유)
- freezing layer 갯수에 따른 성능
- hyper-parameter tuning (Learning rate & classification loss weight)에 따른 성능비교
- 추가적인 최적화 진행 (NMS적용 등)