BERT
Click on a tile to change the color scheme:
1. Self-supervised Learning
The system learns to predict part of its input from other parts of its input. A portion of the input is used as a supervisory signal.
2. Masking Input

3. Next Sentence Prediction & Sentence Order Prediction
NSP is not helpful.
SOP works.

Note: CLS: a token used for classification
4. Pre-train & Fine-tune

How to use?
4.1 Text Classification

4.2 Extraction-based Q&A


- Only need to randomly initialize two vectors (for beginning and ending).
4.3 seq2seq

Ways of corruption:

Comparison of these ways: T5 & C4
5. General Language Understanding Evaluation
https://gluebenchmark.com/
6. BERT Embryology
When does BERT know POS tagging, syntactic parsing, semantics?
7. Features & Interesting things
7.1 Contextualized word embedding
7.2 Protein

7.3 Multi-lingual BERT
Training a BERT model by many different languages.

7.3.1 Language Information

Last update:
June 16, 2023
Authors: