Navigating Through Summary Space: Selecting Summaries, Not Sentences
(English)Manuscript (preprint) (Other academic)
We present a novel method for extraction based summarization using statisticallexical semantics. It attempts to give an overview by selecting thesummary most similar to the source text from a set of possible candidates.It evaluates whole summaries at once, making no judgments on for instanceindividual sentences. A simple greedy search strategy can be used to searchthrough a space of possible summaries. Starting the search with the leadingsentences of the source text is a powerful heuristic, but we also evaluateother search strategies. The aim has been to construct a summarizer thatcan be quickly assembled, with the use of only a very few basic languagetools. The proposed method is largely language independent and can beused even for languages that lack large amounts of structured or annotateddata, or advanced tools for linguistic processing. When evaluated on Englishabstracts from the Document Understanding Conferences it performs well,though better language specific systems are available. It performs betterthan several of the systems evaluated there, but worse than the best systems.We have also evaluated our method on a corpus of human made extractsin Swedish. It performed poorly compared to a traditional extraction-basedsummarizer. However, since these man-made extracts were not produced toreflect the whole contents of the texts, but rather to cover only the main topic,this was expected.
IdentifiersURN: urn:nbn:se:kth:diva-14086OAI: oai:DiVA.org:kth-14086DiVA: diva2:329603
QC 201007122010-07-122010-07-122010-07-12Bibliographically approved