首页 > 技术文章 > BERT的双向编码与BiLSTM的编码的不同之处

hisi-tech 2022-04-19 17:21 原文

感觉会有用,先记录下来,如果大家看了有帮助,深感荣幸,若不幸点开了,万分抱歉。
Instead of predicting the next word in a sequence, BERT makes use of a novel technique called Masked LM (MLM): it randomly masks words in the sentence and then it tries to predict them. Masking means that the model looks in both directions and it uses the full context of the sentence, both left and right surroundings, in order to predict the masked word. Unlike the previous language models, it takes both the previous and next tokens into account at the same time. The existing combined left-to-right and right-to-left LSTM based models were missing this “same-time part”. (It might be more accurate to say that BERT is non-directional though.)
[link]

推荐阅读