Contextual Representation Learning beyond Masked Language Modeling

each token has two kinds of representations: contextindependent embedding, and context-dependent dense representation that stems from its embedding and contains context information

MLM optimizes two properties, the alignment of contextualized representations with the static embeddings of masked tokens, and the uniformity of static embeddings in the representation space.

In the alignment property, sampled embeddings of masked tokens play as an anchor to align contextualized representations. We find that although such local anchor is essential to model local dependencies, the lack of global anchors brings several limitations.

학습 초기에는 embedding이 미숙하기에 학습하기에 어려움
여러 masking을 동시에 사용하는 것은 anchor가 많기 때문에 동일한 context에서 constraint가 많아지는 것과 같을 것.

propose to directly align global information hidden in contextualized representations at all positions of a natural sentence to encourage models to attend same global semantics when generating contextualized representations

Untitled

즉 같은 의미를 가져야하는 요소, global context depenet를 contrastive learning으로 구현하겠다.

MLM이 학습하는 것은

(1) the alignment between contextualized representations of surrounding tokens and the context-independent embedding of the target token

MLM에서 input으로 주변 token이 들어오기에 context-dependent repre.를 학습하는 것
그리고 해당 masked가 어떤 것인지 예측하는 것은 context와 무관한 repre.
즉 위 둘간의 alignment

embedding은 원래 2개로 나뉘게 되고, input으로 맥락을 알려줄 때에는 context dependent 로 사용되는 것이고