Internet as Corpus: Automatic Construction of a Swedish News Corpus
2001 (English)Report (Other academic)
This paper describes the automatic building of a corpus of short Swedish news texts from the Internet, itsapplication and possible future use. The corpus is aimed at research on Information Retrieval, InformationExtraction, Named Entity Recognition and Multi Text Summarization. The corpus has been constructed by usingan Internet agent, the so called newsAgent, downloading Swedish news text from various sources. A small partof this corpus has then been manually tagged with keywords and named entities. The newsAgent is also used asa workbench for processing the abundant flows of news texts for various users in a customized format in theapplication Nyhetsguiden.
Place, publisher, year, edition, pages
Stockholm: KTH , 2001. , 5 p.
, IPLab, 195
News text, Corpus, Swedish, Internet
IdentifiersURN: urn:nbn:se:kth:diva-14075OAI: oai:DiVA.org:kth-14075DiVA: diva2:329580
QC 20100712. Även publicerad i "Proceedings of NODALIDA’01 – 13th Nordic Conference on Computational Linguistics, Uppsala, Sweden, May 21–22 2001."2010-07-122010-07-122010-07-16Bibliographically approved