Transformer

Click on a tile to change the color scheme:

1. Seq2seq

Input a sequence, output a sequence.

The output length is determined by model.

Applications: too many. Selected:

syntactic parsing, multi-label classification, even object detection

Summary:

输入，进行 positional encoding ；
经过多个相同结构的特征提取「模块」，包括：
- multi-head self-attention
- residual addition & norm
- feed forward
- residual addition & norm
最后得到输出

Screen Shot 2021-05-10 at 7.00.04 PM

Structure of encoder:

Screen Shot 2021-05-10 at 6.55.19 PM

Detail of a block in an encoder:

Screen Shot 2021-05-10 at 6.55.38 PM

Self-attention: build connections between input layers
FC: increase the dim and them decrease (recover) the dim, to increase the ability of expression
Better design: change the place of Layer Normalization

Screen Shot 2021-05-10 at 8.40.49 PM

Summary:

输入是「顺序迭代型」的，即一开始底部输入 \<BOS> ，输出一个 token ，然后把这个 token 作为第二个输入。
输入，首先是 positional encoding 。
然后是几层结构相同的「模块」，包括：
- masked multi-head attention (with residual addition & norm)
- multi-head cross-attention (with residual addition & norm)
- FFN (with residual addition & norm)
最后通过 softmax 输出每个 token 可能性的预测。