Motivation

Contranstive loss에서 positive pair과 negative pair를 선택하는 것이 문제다.

이것을 고르는 것이 quadratic하니 minibatch 안에 있는 sample만 고려해서 constant complexity를 맞추려고 한다. 이렇게 하면 이게 optimal한지도, 다른 방법이 더 좋은지도 모르는 상태

What is the most effective and principled approach to optimizing the contrastive loss when utilizing mini-batches?

What’s new?

Proove that

mini-batch and full batch training are equivalent under some mild conditions

all ${N \choose B} = \mathcal{O}(N^B)$ mini-batches must be considered.

improoved selection algorithm to choose $\mathcal{O}(N)$ mini-batches while training
They are equivalent if and only if all ${N\choose B}$ mini batches are selected.

Proposed algorithm

Proposed results

feature의 dim이 class보다 더 많은 그러한 일반적인 경우 neural collapse가 발생한다.
Simplex ETF에서 N vectors가 주어졌을 때 크기는 1이며 distinctive pair의 내적은 ${-1\over N-1}$로 정해진다는 것이다.
그러한 점에서 우리가 data point를 다 선택하는 full batch에서는 optimal은 simplex ETF를 선택하는 것
mini batch이지만 모든 batch를 본다고 할 땐 동일하게 simplex ETF가 optimal solution
일부만 봐야하는 경우엔 Simplex ETF가 optimal이 아니다.