Extrinsic Evaluation of Automatic Metrics for Summarization

TitleExtrinsic Evaluation of Automatic Metrics for Summarization
Publication TypeReports
Year of Publication2004
AuthorsDorr BJ, Monz C, Oard D, President S, Zajic D, Schwartz R
Date Published2004/07/20/
InstitutionInstititue for Advanced Computer Studies, Univ of Maryland, College Park
Keywords*ABSTRACTS, *ACCURACY, *DATA SUMMARIZATION, *DOCUMENT SUMMARIES, *DOCUMENTS, *READING MACHINES, *SOFTWARE METRICS, *STATISTICAL ANALYSIS, ABSTRACT SELECTION, ANNOTATIONS, Automation, COMPUTER PROGRAMMING AND SOFTWARE, COMPUTER SYSTEMS, CORRELATION TECHNIQUES, DATA PROCESSING, DOCUMENT SUMMARIES, EXTRINSIC TASKS, Information retrieval, INFORMATION SCIENCE, PERFORMANCE(HUMAN), PRECISION, RELEVANCY, ROUGE-1 COMPUTER PROGRAM, STATISTICS AND PROBABILITY, TEST AND EVALUATION
Abstract

This paper describes extrinsic-task evaluation of summarization. We show that it is possible to save time using summaries for relevance assessment without adversely impacting the degree of accuracy that would be possible with full documents. In addition, we demonstrate that the extrinsic task we have selected exhibits a high degree of interannotator agreement, i.e., consistent relevance decisions across subjects. We also conducted a composite experiment that better reflects the actual document selection process and found that using a surrogate improves the processing speed over reading the entire document. Finally, we have found a small yet statistically significant correlation between some of the intrinsic measures and a user's performance in an extrinsic task. The overall conclusion we can draw at this point is that ROUGE-1 does correlate with precision and to a somewhat lesser degree with accuracy, but that it remains to be investigated how stable these correlations are and how differences in ROUGE-1 translate into significant differences in human performance in an extrinsic task.

URLhttp://stinet.dtic.mil/oai/oai?&verb=getRecord&metadataPrefix=html&identifier=ADA448065