each token has two kinds of representations: contextindependent embedding, and context-dependent dense representation that stems from its embedding and contains context information

MLM optimizes two properties, the alignment of contextualized representations with the static embeddings of masked tokens, and the uniformity of static embeddings in the representation space.

In the alignment property, sampled embeddings of masked tokens play as an anchor to align contextualized representations. We find that although such local anchor is essential to model local dependencies, the lack of global anchors brings several limitations.

propose to directly align global information hidden in contextualized representations at all positions of a natural sentence to encourage models to attend same global semantics when generating contextualized representations

Untitled

즉 같은 의미를 가져야하는 요소, global context depenet를 contrastive learning으로 구현하겠다.

MLM이 학습하는 것은

(1) the alignment between contextualized representations of surrounding tokens and the context-independent embedding of the target token

embedding은 원래 2개로 나뉘게 되고, input으로 맥락을 알려줄 때에는 context dependent 로 사용되는 것이고