AUBER: Automated BERT regularization.
How can we effectively regularize BERT? Although BERT proves its effectiveness in various NLP tasks, it often overfits when there are only gymnastics wall decals a small number of training instances.A promising direction to regularize BERT is based on pruning its attention heads with a proxy score for head importance.However, these methods are usua