OSNET(Omni-Scale Network) 개요

츄98 2025. 3. 19. 13:09

논문 [Omni-Scale Feature Learning for Person Re-Identification](https://arxiv.org/abs/1905.00953)
논문리뷰 https://medium.com/@moncefboujou96/omni-scale-feature-learning-for-person-re-identification-6e09df1c9a1a
깃헙 [deep-person-reid](https://github.com/KaiyangZhou/deep-person-reid)

OSNET 이란?

Omni-Scale Network의 약자로, Person Re-Identification(ReID) 문제에 특화된 딥 뉴럴 네트워크 아키텍처이다.
‘Omni-Scale’이라는 이름에서 알 수 있듯이, 다양한 공간적 스케일에서 추출된 정보를 동시에 학습하고, 이를 통합하여 최종적인 임베딩(embedding)을 만드는 데 초점을 맞추었다.

OSNET 주요 아이디어

#1. Omni-Scale Feature

논문에서는 “동질적 스케일(homogeneous scale)” + “이질적 스케일(heterogeneous scale)”을 모두 고려한 것을
omni-scale feature라고 한다.
즉, 네트워크가 이미지를 볼 때, 여러 스케일을 넘나드는 특징을 동시에 학습하여,
필요한 모든 해상도·영역적 정보를 놓치지 않도록 하는 것이다.

#2. Depthwise Separable Convolutions

하나의 컨볼루션을 Depthwise(채널별로 나누어 k×k×1 형태로 적용)와
Pointwise(1x1 컨볼루션)로 분리해서 수행
연산량과 파라미터가 더 적어짐
일반적으로 Depthwise → Pointwise 순으로 적용하는 것이 MobileNet 등에서 흔하지만,
논문에서는 Pointwise → Depthwise 순서를 사용했다.
이 방식을 이용하면 Omni-Scale Feature Learning(OSNet의 핵심)에 더 효과적이었다고 보고 있다.

#3. Residual Block with Multiple Convolutional Streams

OSNet의 핵심은 잔차 블록(residual block) 내부에 여러 개의 병렬 스트림(각각 다른 커널 크기나 receptive field)을 배치하는 것이다.
예: 1x1, 3x3, 5x5 혹은 다양한 dilation rate를 가진 컨볼루션 등을 병렬로 두고, 이 스트림들의 결과물을 적절히 합치는 구조를 만든다.
이렇게 하면, 한 블록 안에서 멀티 스케일 특징을 추출 가능.

#4. Unified Aggregation Gate

단순히 병렬 스트림 결과를 `add`나 `concat`으로 묶는 것이 아니라, “채널별로 가중치”를 두어 입력 종속적으로
최적의 스케일 특성을 골라 합치는 기법을 쓴다.
이 과정을 통해, 어떤 상황(입력 이미지의 특징)에선 큰 스케일이 중요한 경우, 다른 상황에선 미세 스케일이 중요한 경우 등에 유연하게 대응할 수 있다.

Network Architecture Example

논문 Abstract 요약

As an instance-level recognition problem, person reidentification (re-ID) relies on discriminative features, which not only capture different spatial scales but also encapsulate an arbitrary combination of multiple scales. We call features of both homogeneous and heterogeneous scales omni-scale features. In this paper, a novel deep re-ID CNN is designed, termed omni-scale network (OSNet), for omni-scale feature learning. This is achieved by designing a residual block composed of multiple convolutional streams, each detecting features at a certain scale. Importantly, a novel unified aggregation gate is introduced to dynamically fuse multi-scale features with input-dependent channel-wise weights. To efficiently learn spatial-channel correlations and avoid overfitting, the building block uses pointwise and depthwise convolutions. By stacking such block layerby-layer, our OSNet is extremely lightweight and can be trained from scratch on existing re-ID benchmarks. Despite its small model size, OSNet achieves state-of-the-art performance on six person re-ID datasets, outperforming most large-sized models, often by a clear margin. Code and models are available at: https://github.com/ KaiyangZhou/deep-person-reid.

ReID는 인스턴스 레벨 인식 문제로, 서로 다른 스케일에서 변별력 있는 특징을 추출해야 한다.
Omni-Scale Features
- 서로 다른 단일 스케일(동질적 스케일)뿐 아니라, 여러 스케일들의 조합(이질적 스케일)도 필요하다고 봄.
Omni-Scale Network (OSNet):
- 복수 스트림(여러 필터 크기/스트라이드/확장률 등)을 병렬로 두고,
- Residual Block 형태로 쌓아 올리며,
- Unified Aggregation Gate라는 동적인 채널 가중 메커니즘으로 멀티 스케일 특성을 융합한다.
- 병렬 스트림 각각에서 추출된 특징을, 단순 합이 아닌 가중치 기반으로 학습해 최적 결합.
경량화:
깊이별 컨볼루션 + 점별 컨볼루션(Depthwise + Pointwise)을 쓰면,
전체 파라미터 수가 많지 않으면서도 강력한 표현력을 확보할 수 있다.