Introduction

MLM은 인간관점에서 그럴듯한 결과를 예측하는 것이다. 즉 인간이 알아들을 수 있는 단어들 자체를 예측하는 explicit한 형태로하는 것.

이런 학습 방식 말고 다른 제안되었던 방식들

non zero probability를 여러 token이 가지도록

non zero probability를 여러 토큰이 가지도록
- label smooth를 말하는 듯
semantic을 예측하도록 : external knowledge incorporation
- explicit하게 학습하는 것은 비용이 크다.
- implicit = sparse coding
  - hidden repre에서 sparse coding을 한다는 것.
  - 이러한 방식이 가능한게 hidden repre가 인간관점에서 해석이 가능하기 때문(직관과 비슷한 형태를 보인다는 것)

Key point

efficiency

Methodology

Determining semantic information

어떻게 implicit하게 할까?

Knowledge distilation 방식 사용

teacher modeld이 각 hidden repre에 대한 sparse coding 생성

어떻게 sparse coding 생성?

dictionary learning phase
sparse coefficient determination