kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Improving abstraction in text summarization
KTH, School of Electrical Engineering and Computer Science (EECS). Salesforce Research.
Salesforce Research.
Salesforce Research.
Salesforce Research.
2018 (English)In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, Association for Computational Linguistics , 2018, p. 1808-1817Conference paper, Published paper (Refereed)
Abstract [en]

Abstraive text summarization aims to shorten long text documents into a human readable form that contains the most important facts from the original document. However, the level of actual abstraction as measured by novel phrases that do not appear in the source document remains low in existing approaches. We propose two techniques to improve the level of abstraction of generated summaries. First, we decompose the decoder into a contextual network that retrieves relevant parts of the source document, and a pretrained language model that incorporates prior knowledge about language generation. Second, we propose a novelty metric that is optimized directly through policy learning to encourage the generation of novel phrases. Our model achieves results comparable to state-of-the-art models, as determined by ROUGE scores and human evaluations, while achieving a significantly higher level of abstraction as measured by n-gram overlap with the source document.

Place, publisher, year, edition, pages
Association for Computational Linguistics , 2018. p. 1808-1817
Keywords [en]
Abstracting, Text processing, Contextual network, Human evaluation, Language generation, Level of abstraction, Policy learning, Prior knowledge, State of the art, Text summarization, Natural language processing systems
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:kth:diva-301118ISI: 000865723401104Scopus ID: 2-s2.0-85075219572OAI: oai:DiVA.org:kth-301118DiVA, id: diva2:1593593
Conference
2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, 31 October 2018 - 4 November 2018, Brussels, Belgium
Note

Part of ISBN 978-194808784-1

QC 20230921

Available from: 2021-09-13 Created: 2021-09-13 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Scopus

Search in DiVA

By author/editor
Kryściński, Wojciech
By organisation
School of Electrical Engineering and Computer Science (EECS)
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 40 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf