ROUGE, introduced by Lin (2004), scores a summary by how much of its content (overlapping n-grams or longest common subsequences) matches human reference summaries — emphasising recall, i.e. how much reference content is captured. It became the standard automatic metric for summarisation.
Like BLEU, ROUGE measures surface overlap rather than meaning, so it complements rather than replaces human review.