Change search
Refine search result
1 - 8 of 8
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Dalianis, Hercules
    et al.
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Hassel, Martin
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Development of a Swedish Corpus for Evaluating Summarizers and other IR-tools2001Report (Other academic)
    Abstract [en]

    We are presenting the construction of a Swedish corpus aimed at research1on Information Retrieval, Information Extraction, Named Entity Recognitionand Multi Text Summarization, we will also present the results on evaluatingour Swedish text summarizer SweSum with this corpus. The corpus has beenconstructed by using Internet agents downloading Swedish newspaper textfrom various sources. A small part of this corpus has then been manuallyannotated. To evaluate our text summarizer SweSum we let ten studentsexecute our text summarizer with increasing compression rates on the 100manually annotated texts to find answers to predefined questions. The resultsshowed that at 40 percent summarization/compression rate the correct answerrate was 84 percent.

  • 2.
    Hassel, Martin
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Evaluation of automatic text summarizaiton: a practical implementation2004Licentiate thesis, comprehensive summary (Other scientific)
  • 3.
    Hassel, Martin
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Exploitation of Named Entities in Automatic Text Summarization for Swedish2003In: Proceedings of NODALIDA’03 – 14th Nordic Conferenceon Computational Linguistics, Reykjavik, Iceland, May 30–31 2003gs of ., 2003, p. 9-Conference paper (Other academic)
    Abstract [en]

    Named Entities are often seen as important cues to the topic of a text. Theyare among the most information dense tokens of the text and largely definethe domain of the text. Therefore, Named Entity Recognition should greatlyenhance the identification of important text segments when used by an (extractionbased) automatic text summarizer. We have compared Gold Standardsummaries produced by majority votes over a number of manually createdextracts with extracts created with our extraction based summarization system,SweSum. Furthermore we have taken an in-depth look at how overweightingof named entities affects the resulting summary and come to theconclusion that weighting of named entities should be carefully consideredwhen used in a naïve fashion.

  • 4.
    Hassel, Martin
    KTH, Superseded Departments, Numerical Analysis and Computer Science, NADA.
    Internet as Corpus: Automatic Construction of a Swedish News Corpus2001Report (Other academic)
    Abstract [en]

    This paper describes the automatic building of a corpus of short Swedish news texts from the Internet, itsapplication and possible future use. The corpus is aimed at research on Information Retrieval, InformationExtraction, Named Entity Recognition and Multi Text Summarization. The corpus has been constructed by usingan Internet agent, the so called newsAgent, downloading Swedish news text from various sources. A small partof this corpus has then been manually tagged with keywords and named entities. The newsAgent is also used asa workbench for processing the abundant flows of news texts for various users in a customized format in theapplication Nyhetsguiden.

  • 5.
    Hassel, Martin
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
    Resource Lean and Portable Automatic Text Summarization2007Doctoral thesis, comprehensive summary (Other scientific)
    Abstract [en]

    Today, with digitally stored information available in abundance, even for many minor languages, this information must by some means be filtered and extracted in order to avoid drowning in it. Automatic summarization is one such technique, where a computer summarizes a longer text to a shorter non-rendundant form. Apart from the major languages of the world there are a lot of languages for which large bodies of data aimed at language technology research to a high degree are lacking. There might also not be resources available to develop such bodies of data, since it is usually time consuming and requires substantial manual labor, hence being expensive. Nevertheless, there will still be a need for automatic text summarization for these languages in order to subdue this constantly increasing amount of electronically produced text.

    This thesis thus sets the focus on automatic summarization of text and the evaluation of summaries using as few human resources as possible. The resources that are used should to as high extent as possible be already existing, not specifically aimed at summarization or evaluation of summaries and, preferably, created as part of natural literary processes. Moreover, the summarization systems should be able to be easily assembled using only a small set of basic language processing tools, again, not specifically aimed at summarization/evaluation. The summarization system should thus be near language independent as to be quickly ported between different natural languages.

    The research put forth in this thesis mainly concerns three computerized systems, one for near language independent summarization – The HolSum summarizer; one for the collection of large-scale corpora – The KTH News Corpus; and one for summarization evaluation – The KTH eXtract Corpus. These three systems represent three different aspects of transferring the proposed summarization method to a new language.

    One aspect is the actual summarization method and how it relates to the highly irregular nature of human language and to the difference in traits among language groups. This aspect is discussed in detail in Chapter 3. This chapter also presents the notion of “holistic summarization”, an approach to self-evaluative summarization that weighs the fitness of the summary as a whole, by semantically comparing it to the text being summarized, before presenting it to the user. This approach is embodied as the text summarizer HolSum, which is presented in this chapter and evaluated in Paper 5.

    A second aspect is the collection of large-scale corpora for languages where few or none such exist. This type of corpora is on the one hand needed for building the language model used by HolSum when comparing summaries on semantic grounds, on the other hand a large enough set of (written) language use is needed to guarantee the randomly selected subcorpus used for evaluation to be representative. This topic briefly touched upon in Chapter 4, and detailed in Paper 1.

    The third aspect is, of course, the evaluation of the proposed summarization method on a new language. This aspect is investigated in Chapter 4. Evaluations of HolSum have been run on English as well as on Swedish, using both well established data and evaluation schemes (English) as well as with corpora gathered “in the wild” (Swedish). During the development of the latter corpora, which is discussed in Paper 4, evaluations of a traditional sentence ranking text summarizer, SweSum, have also been run. These can be found in Paper 2 and 3.

    This thesis thus contributes a novel approach to highly portable automatic text summarization, coupled with methods for building the needed corpora, both for training and evaluation on the new language.

  • 6.
    Hassel, Martin
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
    Dalianis, Hercules
    DSV-KTH / Stockholm University.
    Generation of Reference Summaries2005In: Proceedings of 2nd Language & Technology Conference: Human LanguageTechnologies as a Challenge for Computer Science and Linguistics, Poznan, Poland,April 21–23 2005., 2005, p. 6-Conference paper (Other academic)
    Abstract [en]

    We have constructed an integrated web-based system for collection of extract-based corpora and for evaluation of summaries andsummarization systems. During evaluation and examination of the collected and generated data we found that in a situation of lowagreement among the informants the corpus gives unduly favors to summarization systems that use sentence position as a centralweighting feature. The problem is discussed and a possible solution is outlined.

  • 7.
    Hassel, Martin
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
    Sjöbergh, Jonas
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
    Navigating Through Summary Space: Selecting Summaries, Not SentencesManuscript (preprint) (Other academic)
    Abstract [en]

    We present a novel method for extraction based summarization using statisticallexical semantics. It attempts to give an overview by selecting thesummary most similar to the source text from a set of possible candidates.It evaluates whole summaries at once, making no judgments on for instanceindividual sentences. A simple greedy search strategy can be used to searchthrough a space of possible summaries. Starting the search with the leadingsentences of the source text is a powerful heuristic, but we also evaluateother search strategies. The aim has been to construct a summarizer thatcan be quickly assembled, with the use of only a very few basic languagetools. The proposed method is largely language independent and can beused even for languages that lack large amounts of structured or annotateddata, or advanced tools for linguistic processing. When evaluated on Englishabstracts from the Document Understanding Conferences it performs well,though better language specific systems are available. It performs betterthan several of the systems evaluated there, but worse than the best systems.We have also evaluated our method on a corpus of human made extractsin Swedish. It performed poorly compared to a traditional extraction-basedsummarizer. However, since these man-made extracts were not produced toreflect the whole contents of the texts, but rather to cover only the main topic,this was expected.

  • 8.
    Rosell, Magnus
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
    Hassel, Martin
    KTH, School of Information and Communication Technology (ICT), Computer and Systems Sciences, DSV.
    Kann, Viggo
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
    Global Evaluation of Random Indexing through Swedish Word Clustering Compared to the People’s Dictionary of Synonyms2009In: Proceedings of the International Conference RANLP-2009, 2009, p. 376-380Conference paper (Refereed)
    Abstract [en]

    Evaluation of word space models is usually local in the sense that it only considers words that are deemed very similar by the model. We propose a global evaluation scheme based on clustering of the words. A clustering of high quality in an external evaluation against a semantic resource, such as a dictionary of synonyms, indicates a word space model of high quality. We use Random Indexing to create several different models and compare them by clustering evaluation against the People's Dictionary of Synonyms, a list of Swedish synonyms that are graded by the public. Most notably we get better results for models based on syntagmatic information (words that appear together) than for models based on paradigmatic information (words that appear in similar contexts). This is quite contrary to previous results that have been presented for local evaluation. Clusterings to ten clusters result in a recall of 83% for a syntagmatic model, compared to 34% for a comparable paradigmatic model, and 10% for a random partition.

1 - 8 of 8
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf