BERT
Click on a tile to change the color scheme:
1. Self-supervised Learning
The system learns to predict part of its input from other parts of its input. A portion of the input is used as a supervisory signal.
2. Masking Input
3. Next Sentence Prediction & Sentence Order Prediction
NSP is not helpful.
SOP works.
Note: CLS: a token used for classification
4. Pre-train & Fine-tune
How to use?
4.1 Text Classification
4.2 Extraction-based Q&A
- Only need to randomly initialize two vectors (for beginning and ending).
4.3 seq2seq
Ways of corruption:
Comparison of these ways: T5 & C4
5. General Language Understanding Evaluation
https://gluebenchmark.com/
6. BERT Embryology
When does BERT know POS tagging, syntactic parsing, semantics?
7. Features & Interesting things
7.1 Contextualized word embedding
7.2 Protein
7.3 Multi-lingual BERT
Training a BERT model by many different languages.
7.3.1 Language Information
Last update:
June 16, 2023
Authors: