Srivastava et al. (2014) introduced dropout: during each training step, a random fraction of units is temporarily removed, forcing the network not to rely on any single neuron. This acts like training a large ensemble of thinned networks and markedly reduces overfitting.
Dropout became a standard ingredient in deep networks and is still used, in various forms, in modern architectures.