BERT, introduced by Devlin et al. (2019), is pre-trained with a masked-language-modelling objective: it hides random tokens and learns to predict them from both left and right context. This bidirectional reading made it exceptionally strong at language understanding and reset the state of the art across many NLP benchmarks.
Unlike GPT, BERT is an encoder built for comprehension rather than open-ended generation — still the backbone of many search and classification systems.