BERT

Click on a tile to change the color scheme:

1. Self-supervised Learning

The system learns to predict part of its input from other parts of its input. A portion of the input is used as a supervisory signal.

Screen Shot 2021-05-11 at 12.12.20 AM

NSP is not helpful.

SOP works.

Screen Shot 2021-05-11 at 12.16.03 AM

Note: CLS: a token used for classification

Screen Shot 2021-05-11 at 12.17.19 AM

How to use?

Screen Shot 2021-05-11 at 12.24.54 AM

Screen Shot 2021-05-11 at 12.39.04 AM

Screen Shot 2021-05-11 at 12.39.20 AM

Screen Shot 2021-05-11 at 12.52.38 AM

Ways of corruption:

Screen Shot 2021-05-11 at 12.53.18 AM

Comparison of these ways: T5 & C4

https://gluebenchmark.com/

When does BERT know POS tagging, syntactic parsing, semantics?

Screen Shot 2021-05-11 at 9.52.06 AM

Training a BERT model by many different languages.

Screen Shot 2021-05-11 at 9.56.58 AM

Screen Shot 2021-05-11 at 10.08.09 AM

Last update: June 16, 2023

Authors: Colin