Skip to content

BERT

Click on a tile to change the color scheme:

1. Self-supervised Learning

The system learns to predict part of its input from other parts of its input. A portion of the input is used as a supervisory signal.

2. Masking Input

Screen Shot 2021-05-11 at 12.12.20 AM

3. Next Sentence Prediction & Sentence Order Prediction

NSP is not helpful.

SOP works.

Screen Shot 2021-05-11 at 12.16.03 AM

Note: CLS: a token used for classification

4. Pre-train & Fine-tune

Screen Shot 2021-05-11 at 12.17.19 AM

How to use?

4.1 Text Classification

Screen Shot 2021-05-11 at 12.24.54 AM

4.2 Extraction-based Q&A

Screen Shot 2021-05-11 at 12.39.04 AM

Screen Shot 2021-05-11 at 12.39.20 AM

  • Only need to randomly initialize two vectors (for beginning and ending).

4.3 seq2seq

Screen Shot 2021-05-11 at 12.52.38 AM

Ways of corruption:

Screen Shot 2021-05-11 at 12.53.18 AM

Comparison of these ways: T5 & C4

5. General Language Understanding Evaluation

https://gluebenchmark.com/

6. BERT Embryology

When does BERT know POS tagging, syntactic parsing, semantics?

7. Features & Interesting things

7.1 Contextualized word embedding

7.2 Protein

Screen Shot 2021-05-11 at 9.52.06 AM

7.3 Multi-lingual BERT

Training a BERT model by many different languages.

Screen Shot 2021-05-11 at 9.56.58 AM

7.3.1 Language Information

Screen Shot 2021-05-11 at 10.08.09 AM


Last update: June 16, 2023
Authors: Colin