kth.sePublications
Change search
Refine search result
12 1 - 50 of 67
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. Allan, James
    et al.
    Aslam, Jay
    Azzopardi, Leif
    Belkin, Nick
    Borlund, Pia
    Bruza, Peter
    Callan, Jamie
    Carman, Mark
    Clarke, Charles L.A.
    Craswell, Nick
    Croft, W. Bruce
    Culpepper, J. Shane
    Diaz, Fernando
    Dumais, Susan
    Ferro, Nicola
    Geva, Shlomo
    Gonzalo, Julio
    Hawking, David
    Jarvelin, Kalervo
    Jones, Gareth
    Jones, Rosie
    Kamps, Jaap
    Kando, Noriko
    Kanoulas, Evangelos
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Kelly, Diane
    Lease, Matthew
    Lin, Jimmy
    Mizzaro, Stefano
    Moffat, Alistair
    Murdock, Vanessa
    Oard, Douglas W.
    Rijke, Maarten de
    Sakai, Tetsuya
    Sanderson, Mark
    Scholer, Falk
    Si, Luo
    Thom, James A.
    Thomas, Paul
    Trotman, Andrew
    Turpin, Andrew
    Vries, Arjen P. de
    Webber, William
    Zhang, Xiuzhen (Jenny)
    Zhang, Yi
    Frontiers, Challenges, and Opportunities for Information Retrieval – Report from SWIRL 2012, The Second Strategic Workshop on Information Retrieval in Lorne2012In: SIGIR Forum, ISSN 0163-5840, Vol. 46, no 1, p. 2-32Article in journal (Refereed)
    Abstract [en]

    During a three-day workshop in February 2012, 45 Information Retrieval researchers met to discuss long-range challenges and opportunities within the field. The result of the workshop is a diverse set of research directions, project ideas, and challenge areas. This report describes the workshop format, provides summaries of broad themes that emerged, includes brief descriptions of all the ideas, and provides detailed discussion of six proposals that were voted "most interesting" by the participants. Key themes include the need to: move beyond ranked lists of documents to support richer dialog and presentation, represent the context of search and searchers, provide richer support for information seeking, enable retrieval of a wide range of structured and unstructured content, and develop new evaluation methodologies.

    Download full text (pdf)
    fulltext
  • 2. Alonso, O.
    et al.
    Kamps, J.
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS. Gavagai .
    Foreword2014In: ESAIR 2014 - Proceedings of the 7th International Workshop on Exploiting Semantic Annotations in Information Retrieval, co-located with CIKM 2014, Association for Computing Machinery (ACM), 2014Conference paper (Refereed)
  • 3. Alonso, O.a
    et al.
    Kamps, J.b
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Seventh workshop on exploiting semantic annotations in information retrieval (ESAIR’14)2014In: CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management, Association for Computing Machinery (ACM), 2014, p. 2094-2095Conference paper (Refereed)
    Abstract [en]

    There is an increasing amount of structure on the Web as a result of modern Web languages, user tagging and annotation, emerging robust NLP tools, and an ever growing volume of linked data. These meaningful, semantic, annotations hold the promise to significantly enhance information access, by enhancing the depth of analysis of today’s systems. The goal of the ESAIR’14 workshop remains to advance the general research agenda on this core problem, with an explicit focus on one of the most challenging aspects to address in the coming years. The main remaining challenge is on the user’s side-the potential of rich document annotations can only be realized if matched by more articulate queries exploiting these powerful retrieval cues-and a more dynamic approach is emerging by exploiting new forms of query autosuggest. How can the query suggestion paradigm be used to encourage searcher to articulate longer queries, with concepts and relations linking their statement of request to existing semantic models? How do entity results and social network data in "graph search" change the classic division between searchers and information and lead to extreme personalization-are you the query? How to leverage transaction logs and recommendation, and how adaptive should we make the system? What are the privacy ramifications and the UX aspects-how to not creep out users?

  • 4. Alonso, Omar
    et al.
    Kamps, Jaap
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Report on the Fourth Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR 11)2012In: SIGIR Forum, ISSN 0163-5840, E-ISSN 1558-0229, Vol. 46, no 1, p. 56-64Article in journal (Refereed)
    Abstract [en]

    There is an increasing amount of structure on the Web as a result of modern Web languages, user tagging and annotation, and emerging robust NLP tools. These meaningful, semantic, annotations hold the promise to significantly enhance information access, by increasing the depth of analysis of today’s systems. Currently, we have only started to explore the possibilities and only begun to understand how these valuable semantic cues can be put to fruitful use. The workshop had an interactive format consisting of keynotes, boasters and posters, breakout groups and reports, and a final discussion, which was prolonged into the evening. There was a strong feeling that we made substantial progress. Specifically, each of the breakout groups contributed to our understanding of the way forward. First, annotations and use cases come in many different shapes and forms depending on the domain at hand, but at a higher level there are remarkable commonalities in annotation tools, indexing methods, user interfaces, and general methodology. Second, we got insights in the "exploitation" aspects, leading to a clear separation between the low-level annotations giving context or meaning to small units of information (e.g., NLP, sentiments, entities), and annotations bringing out the structure inherent in the data (e.g., sources, data schemas, document genres). Third, the plan to enrich ClueWeb with various document level (e.g., pagerank and spam scores, but also reading level) and lower level (e.g., named entities or sentiments) annotations was embraced by the workshop as a concrete next step to promote research in semantic annotations.

    Download full text (pdf)
    fulltext
  • 5. Amundin, Mats
    et al.
    Eklund, Robert
    Hållsten, Henrik
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Molinder, Lars
    A proposal to use distributional models to analyse dolphin vocalization2017In: 1st International Workshop on Vocal Interactivity in-and-between Humans, Animals and Robots, 2017, 2017Conference paper (Refereed)
    Abstract [en]

    This paper gives a brief introduction to the starting points of an experimental project to study dolphin communicative behaviour using distributional semantics, with methods implemented for the large scale study of human language.

    Download full text (pdf)
    fulltext
  • 6. Andersdotter, Amelia
    et al.
    Bylund, Markus
    Ferm, Maria
    Häglund, Kjell
    Jardenberg, Joakim
    de Kaminski, Marcin
    Karlberg, Peter
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Larsson, Hanna
    Sundberg, Sam
    Sundin, Mathias
    Godtyckligt regelverk hotar friheten på nätet2013In: Dagens Nyheter, ISSN 1101-2447, no 2013-09-03Article in journal (Other (popular science, discussion, etc.))
    Abstract [sv]

    Reglerna som möjliggör stängning av hemsidor på internet präglas av godtycke och otydlighet. Men det behöver inte vara särskilt svårt att skapa ett nytt och rättssäkert regelverk. Här har Sveriges EU-kommissionär Cecilia Malmström en viktig roll. Frågan är om hon tar sitt ansvar, skriver politiker och nätdebattörer.

  • 7. Andersson-Schwarz, Jonas
    et al.
    Christensen, Christian
    Eellend, Beate
    Hadley Kamptz, Isobel
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Thorslund, Ewa
    Wormbs, Nina
    KTH, School of Architecture and the Built Environment (ABE), Philosophy and History, History of Science, Technology and Environment.
    Transaktionsdimman på nätet hotar digitaliseringen2017In: Dagens Nyheter, ISSN 1101-2447Article in journal (Other (popular science, discussion, etc.))
    Abstract [sv]

    På nätet är vi inte längre bara medborgare eller kunder. Vi är också varor. De data vi läm-nar ut om oss själva är vad andra tjänar pengar på. Men vi vet inte vad de är värda ochvad vi skulle kunna begära i betalning. Transaktionsdimman på internet bör skingrasoch ersättas av transaktionstransparens, skriver sju medie- och it-debattörer.

  • 8. Argaw, A. A.
    et al.
    Asker, L.
    Cöster, R.
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Dictionary based Amharic - English information retrieval2004In: CEUR Workshop Proceedings, CEUR-WS , 2004Conference paper (Refereed)
    Abstract [en]

    We present two approaches to the Amharic - English bilingual track in CLEF 2004. Both experiments use a dictionary based approach to translate the Amharic queries into English Bags-of-words, but while one approach removes non-content bearing words from the Amharic queries based on their IDF value, the other uses a list of English stop words to perform the same task. The resulting translated (English) terms are then submitted to a retrieval engine that supports the Boolean and vector-space models. In our experiments, the second approach (based on a list of English stop words) performs slightly better than the one based on IDF values for the Amharic terms.

  • 9. Belkin, Nicholas J
    et al.
    Clarke, Charles L A
    Kamps, Jaap
    Gao, Ning
    Karlgren, Jussi
    Report on the SIGIR workshop on "entertain me": supporting complex search tasks2011In: SIGIR Forum, ISSN 0163-5840, E-ISSN 1558-0229, Vol. 45, no 2, p. 51-59Article in journal (Refereed)
    Abstract [en]

    Searchers with a complex information need typically slice-and-dice their problem into several queries and subqueries, and laboriously combine the answers post hoc to solve their tasks. Consider planning a social event at the last day of SIGIR, in the unknown city of Beijing, factoring in distances, timing, and preferences on budget, cuisine, and entertainment. A system supporting the entire search episode should "know" a lot, either from profiles or implicit information, or from explicit information in the query or from feedback. This may lead to the (interactive) construction of a complexly structured query, but sometimes the most obvious query for a complex need is dead simple: entertain me. Rather than returning ten-blue-lines in response to a 2.4-word query, the desired system should support searchers during their whole task or search episode, by iteratively constructing a complex query or search strategy, by exploring the result-space at every stage, and by combining the partial answers into a coherent whole.

    The workshop brought together a varied group of researchers covering both user and system centered approaches, who worked together on the problem and potential solutions. There was a strong feeling that we made substantial progress. First, there was general optimism on the wealth of contextual information that can be derived from context or natural interactions without the need for obstrusive explicit feedback. Second, the task of "contextual suggestions"--matching specific types of results against rich profiles--was identified as a manageable first step, and concrete plans for such as track were discussed in the aftermath of the workshop. Third, the identified dimensions of variation--such as the level of engagement, or user versus system initiative--give clear suggestions of the types of input a searcher is willing or able to give and the type of response expected from a system.

  • 10. Bennett, Paul
    et al.
    Gabrilovich, Evgeniy
    Kamps, Jaap
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Report on the Sixth Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR '13)2014In: SIGIR Forum, ISSN 0163-5840, E-ISSN 1558-0229, Vol. 48, no 1, p. 13-20Article in journal (Refereed)
    Abstract [en]

    There is an increasing amount of structure on the web as a result of modern web languages, user tagging and annotation, emerging robust NLP tools, and an ever growing volume of linked data. These meaningful, semantic, annotations hold the promise to significantly enhance information access, by enhancing the depth of analysis of today's systems. Currently, we have only started exploring the possibilities and only begin to understand how these valuable semantic cues can be put to fruitful use.

    ESAIR'13 focuses on two of the most challenging aspects to address in the coming years. First, there is a need to include the currently emerging knowledge resources (such as DBpedia, Freebase) as underlying semantic model giving access to an unprecedented scope and detail of factual information. Second, there is a need to include annotations beyond the topical dimension (think of sentiment, reading level, prerequisite level, etc) that contain vital cues for matching the specific needs and profile of the searcher at hand.

    There was a strong feeling that we made substantial progress. Specifically, the discussion contributed to our understanding of the way forward. First, emerging large scale knowledge bases form a crucial component for semantic search, providing a unified framework with zillions of entities and relations. Second, in addition to low level factual annotation, non-topical annotation of larger chunks of text can provide powerful cues on the expertise of the search and (un)suitability of information. Third, novel user interfaces are key to unleash powerful structured querying enabled by semantic annotation|the potential of rich document annotations can only be realized if matched by more articulate queries exploiting these powerful retrieval cues|and a more dynamic approach is emerging by exploiting new forms of query autosuggest.

  • 11.
    Bergren, Max
    et al.
    Gavagai.
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Östling, Robert
    Stockholms universitet.
    Parkvall, Mikael
    Stockholms universitet.
    Inferring the location of authors from words in their texts2015In: Proceedings of the 20th Nordic Conference of Computational Linguistics, Linköping University Electronic Press, 2015Conference paper (Refereed)
    Abstract [en]

    For the purposes of computational dialec- tology or other geographically bound text analysis tasks, texts must be annotated with their or their authors’ location. Many texts are locatable but most have no ex- plicit annotation of place. This paper describes a series of experiments to de- termine how positionally annotated mi- croblog posts can be used to learn loca- tion indicating words which then can be used to locate blog texts and their authors. A Gaussian distribution is used to model the locational qualities of words. We in- troduce the notion of placeness to describe how locational words are.

    We find that modelling word distributions to account for several locations and thus several Gaussian distributions per word, defining a filter which picks out words with high placeness based on their local distributional context, and aggregating lo- cational information in a centroid for each text gives the most useful results. The re- sults are applied to data in the Swedish language. 

    Download full text (pdf)
    fulltext
  • 12. Boman, Magnus
    Abstrakta maskiner och formella språk1996Book (Other academic)
  • 13.
    Catarci, Tiziana
    et al.
    Sapienza University of Rome.
    Ferro, Nicola
    University of Padua.
    Forner, Pamela
    CELCT.
    Hiemstra, Djoerd
    University of Twente.
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS. Gavagai.
    Peñas, Anselmo
    UNED.
    Santucci, Guiseppe
    Sapienza University of Rome.
    Womser-Hacker, Christa
    University of Hildesheim.
    CLEF 2012: Information Access Evaluation meetsMultilinguality, Multimodality, and VisualAnalytics2012In: SIGIR Forum, ISSN 0163-5840, E-ISSN 1558-0229, Vol. 46, no 2, p. 29-33Article in journal (Refereed)
  • 14. Catarci, Tiziana
    et al.
    Ferro, Nicola
    Pamela, Forner
    Djoerd, Hiemstra
    Karlgren, Jussi
    Gavagai, Sweden.
    Anselmo, Peñas
    Giuseppe, Santucci
    Womser-Hacker, Christa
    CLEF 2012: Information Access meets Multilinguality, Multimodality, and Visual Analytics2012In: SIGIR Forum, ISSN 0163-5840, E-ISSN 1558-0229, Vol. 46, no 2, p. 29-33Article in journal (Refereed)
    Download full text (pdf)
    fulltext
  • 15.
    Cornell, Filip
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS. Gavagai, Stockholm, Sweden.
    Karlgren, Jussi
    Gavagai, Stockholm, Sweden..
    Sachan, Animesh
    Indian Institute of Technology, Kharagpur, Centre of Excellence in Artificial Intelligence, Kharagpur, India.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS. Gavagai, Stockholm, Sweden..
    Symbolic Hyperdimensional Vectors with Sparse Graph Convolutional Neural Networks2022In: 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), Institute of Electrical and Electronics Engineers (IEEE) , 2022Conference paper (Refereed)
    Abstract [en]

    In this paper, we propose a novel way of representing graphs for processing in Graph Neural Networks. We reduce the dimensionality of the input data by using Random Indexing, a Vector Symbolic Architectural framework; we implement a new trainable neural layer, also inspired by Vector Symbolic Architectures; we leverage the sparseness of the incoming data in a Sparse Neural Network framework. Our experiments on a number of publicly available datasets and standard benchmarks demonstrate that we can reduce the number of parameters by up to two orders of magnitude. We show how this parsimonious approach not only delivers competitive results but even improves performance for node classification and link prediction. We find that this holds in particular for cases where the graph lacks node features.

  • 16. Eriksson, Gunnar
    et al.
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Features for modelling characteristics of conversations: Notebook for PAN at CLEF 20122012In: CLEF 2012 Evaluation Labs and Workshop Online Working Notes, 2012Conference paper (Refereed)
    Abstract [en]

    In this experiment, we find that features which model interaction andconversational behaviour contribute well to identifying sexual grooming behaviourin chat and forum text. Together with the obviously useful lexical features —which we find are more valuable if separated by who generates them — weachieve very successful results in identifying behavioural patterns which maycharacterise sexual grooming. We conjecture that the general framework can beused for other purposes than this specific case if the lexical features are exchangedfor other topical models, the conversational features characterise interaction andbehaviour rather than topical choice.

    Download full text (pdf)
    fulltext
  • 17. Espinoza, Fredrik
    et al.
    Hamfors, Ola
    Karlgren, Jussi
    Olsson, Fredrik
    Persson, Per
    Hamberg, Lars
    Sahlgren, Magnus
    Analysis of Open Answers to Survey Questions throughInteractive Clustering and Theme Extraction2018In: Proceedings of Conference on Human Information Interaction & Retrieval, ACM Digital Library, 2018, p. 317-320Conference paper (Refereed)
    Abstract [en]

    This paper describes design principles for and the implementation of Gavagai Explorer—a new application which builds on interactive text clustering to extract themes from topically coherent text sets such as open text answers to surveys or questionnaires.An automated system is quick, consistent, and has full coverage over the study material. A system allows an analyst to analyze more answers in a given time period; provides the same initial results regardless of who does the analysis, reducing the risks of inter-rater discrepancy; and does not risk miss responses due to fatigue or boredom. These factors reduce the cost and increase the reliability of the service. The most important feature, however, is relieving the human analyst from the frustrating aspects of the coding task, freeing the effort to the central challenge of understanding themes. Gavagai Explorer is available on-line at http://explorer.gavagai.se

    Download full text (pdf)
    fulltext
  • 18. Fano, E.
    et al.
    Karlgren, Jussi
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Nivre, J.
    Uppsala University and Gavagai at CLEF Erisk: Comparing word embedding models2019In: CEUR Workshop Proceedings, CEUR-WS , 2019, Vol. 2380Conference paper (Refereed)
    Abstract [en]

    This paper describes an experiment to evaluate the performance of three different types of semantic vectors or word embeddings-random indexing, GloVe, and ELMo-and two different classification architectures-linear regression and multi-layer perceptrons-for the specific task of identifying authors with eating disorders from writings they publish on a discussion forum. The task requires the classifier to process texts written by the authors in the sequence they were published, and to identify authors likely to be at risk of suffering from eating disorders as early as possible. The data are part of the eRISK evaluation task of CLEF 2019 and evaluated according to the eRISK metrics. Contrary to our expectations, we did not observe a clear-cut advantage using the recently popular contextualized ELMo vectors over the commonly used and much more light-weight GloVe vectors, or the more handily learnable random indexing vectors.

  • 19. Forner, Pamela
    et al.
    Bentivogli, Luisa
    Braschler, Martin
    Choukri, Khalid
    Ferro, Nicola
    Hanbury, Allan
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Müller, Henning
    PROMISE Technology Transfer Day: Spreading the Word on Information Access Evaluation at an Industrial Event: WORKSHOP REPORT2013In: SIGIR Forum, ISSN 0163-5840, E-ISSN 1558-0229, Vol. 47, no 1, p. 53-58Article in journal (Refereed)
    Abstract [en]

    The Technology Transfer Day was held at CeBIT 2013 from March 5 to March 9, at the Deutsche Messe in Hannover, Germany. PROMISE presented three events at CeBIT: a panel in the CeBIT Global Conference (CGC) - Power Stage, a one-day workshop hosted in the CeBIT Convention Center, and a stand "EU Language & Big Data Projects" in Hall 9. The whole program included 4 panelists, 12 invited talks, and an discussions among the speakers and with the public. This report overviews the aims and contents of the events and outlines the major outcomes. 

    Download full text (pdf)
    fulltext
  • 20. Gey, Frederic
    et al.
    Karlgren, Jussi
    Swedish Institute of Computer Science, SWEDEN .
    Kando, Noriko
    Information Access in a Multilingual World: Transitioning from Research to Real-World Applications2009In: SIGIR Forum, ISSN 0163-5840, E-ISSN 1558-0229, Vol. 43, no 2, p. 24-28Article in journal (Refereed)
  • 21. Hansen, Preben
    et al.
    Järvelin, Anni
    Eriksson, Gunnar
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    A Use Case Framework for Information Access Evaluation2014In: Professional Search in the Modern World: COST Action IC1002 on Multilingual and Multifaceted Interactive Information Access / [ed] Paltoglou, Georgios, Loizides, Fernando, Hansen, Preben, Springer, 2014, p. 6-22Chapter in book (Refereed)
    Abstract [en]

    Information access is no longer only a question of retrieving topical text documents in a work-task related context. Information search has become one of the most common uses of the personal computers; a daily task for millions of individual users searching for information motivated by information needs they experience for some reason, momentarily or continuously. Instead of professionally edited text documents, multilingual and multimedia content from a variety of sources of varying quality needs to be accessed. Even the scope of the research efforts in the field must therefore be broadened to better capture the mechanisms for the systems’ impact, take-up and success in the marketplace. Much work has been carried out in this direction: graded relevance, and new evaluation metrics, more varied document collections used in evaluation and different search tasks evaluated. The research in the field is however fragmented. Despite that the need for a common evaluation framework is widely acknowledged, such framework is still not in place. IR system evaluation results are not regularly validated in Interactive IR or field studies; the infrastructure for generalizing Interactive IR results over tasks, users and collections is still missing. This chapter presents a use case-based framework for experimental design in the field of interactive information access. Use cases in general connect system design and evaluation to interaction and user goals, and help identifying test cases for different user groups of a system. We suggest that use cases can provide a useful link even between information access system usage and evaluation mechanisms and thus bring together research from the different related research fields. In this chapter we discuss how use cases can guide the developments of rich models of users, domains, environments, and interaction, and make explicit how the models are connected to benchmarking mechanisms. We give examples of the central features of the different models. The framework is highlighted by examples that sketch out how the framework can be productively used in experimental design and reporting with a minimal threshold for adoption.

  • 22. Hansen, Preben
    et al.
    Karlgren, Jussi
    Effects of Foreign Language and Task Scenario on Relevance Assessment2005In: Journal of Documentation, ISSN 0022-0418, E-ISSN 1758-7379, Vol. 61, no 5, p. 623-639Article in journal (Refereed)
    Abstract [en]

    Purpose ? This paper aims to investigate how readers assess relevance of retrieved documents in a foreign language they know well compared with their native language, and whether work-task scenario descriptions have effect on the assessment process. Design/methodology/approach ? Queries, test collections, and relevance assessments were used from the 2002 Interactive CLEF. Swedish first-language speakers, fluent in English, were given simulated information-seeking scenarios and presented with retrieval results in both languages. Twenty-eight subjects in four groups were asked to rate the retrieved text documents by relevance. A two-level work-task scenario description framework was developed and applied to facilitate the study of context effects on the assessment process. Findings ? Relevance assessment takes longer in a foreign language than in the user first language. The quality of assessments by comparison with pre-assessed results is inferior to those made in the users’ first language. Work-task scenario descriptions had an effect on the assessment process, both by measured access time and by self-report by subjects. However, effects on results by traditional relevance ranking were detectable. This may be an argument for extending the traditional IR experimental topical relevance measures to cater for context effects. Originality/value ? An extended two-level work-task scenario description framework was developed and applied. Contextual aspects had an effect on the relevance assessment process. English texts took longer to assess than Swedish and were assessed less well, especially for the most difficult queries. The IR research field needs to close this gap and to design information access systems with users’ language competence in mind.

  • 23. Hulth, Anette
    et al.
    Karlgren, Jussi
    SICS.
    Jonsson, Anna
    Boström, Henrik
    Asker, Lars
    Automatic Keyword Extraction Using Domain Knowledge2008In: Computational Linguistics and Intelligent Text Processing, Berlin / Heidelberg: Springer , 2008, 1Chapter in book (Refereed)
    Abstract [en]

    Documents can be assigned keywords by frequency analysis of the terms found in the document text, which arguably is the primary source of knowledge about the document itself. By including a hierarchi- cally organised domain specific thesaurus as a second knowledge source the quality of such keywords was improved considerably, as measured by match to previously manually assigned keywords. In the presented ex- periment, the combination of the evidence from frequency analysis and the hierarchically organised thesaurus was done using inductive logic programming.

  • 24. Höök, Kristina
    et al.
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Some Principles for Route Descriptions Derived from Human Advisers1991In: Proceedings of the 13th Annual Meeting of the Cognitive Science Society, 1991Conference paper (Refereed)
  • 25. Höök, Kristina
    et al.
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Waern, Annika
    Inferring complex plans1993In: 1st International Workshop on Intelligent User Interfaces, 1993Conference paper (Refereed)
    Abstract [en]

    We examine the need for plan inference in intelligent help mechanisms. We argue that previous approaches have drawbacks that need to be overcome to make plan inference useful. Firstly, plans have to be inferred - not extracted from the users? help requests. Secondly, the plans inferred must be more than a single goal or solitary user command.

  • 26. Höök, Kristina
    et al.
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Waern, Annika
    Dahlbäck, Nils
    Jansson, Carl Gustaf
    Karlgren, Klas
    Lemaire, Benoit
    A glass box approach to adaptive hypermedia1996In: User Modeling and User-Adapted Interaction, Vol. 6, p. 157-184Article in journal (Refereed)
    Abstract [en]

    Utilising adaptive interface techniques in the design of systems introduces certain risks. An adaptive interface is not static, but will actively adapt to the perceived needs of the user. Unless carefully designed, these changes may lead to an unpredictable, obscure and uncontrollable interface. Therefore the design of adaptive interfaces must ensure that users can inspect the adaptivity mechanisms, and control their results. One way to do this is to rely on the user’s understanding of the application and the domain, and relate the adaptivity mechanisms to domain-specific concepts. We present an example of an adaptive hypertext help system POP, which is being built according to these principles, and discuss the design considerations and empirical findings that lead to this design.

  • 27. Iatropoulos, G.
    et al.
    Herman, Pawel
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Lansner, Anders
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Karlgren, Jussi
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS. Gavagai, Slussplan 9, Stockholm, Sweden.
    Larsson, M.
    Olofsson, J. K.
    The language of smell: Connecting linguistic and psychophysical properties of odor descriptors2018In: Cognition, ISSN 0010-0277, E-ISSN 1873-7838, Vol. 178, p. 37-49Article in journal (Refereed)
    Abstract [en]

    The olfactory sense is a particularly challenging domain for cognitive science investigations of perception, memory, and language. Although many studies show that odors often are difficult to describe verbally, little is known about the associations between olfactory percepts and the words that describe them. Quantitative models of how odor experiences are described in natural language are therefore needed to understand how odors are perceived and communicated. In this study, we develop a computational method to characterize the olfaction-related semantic content of words in a large text corpus of internet sites in English. We introduce two new metrics: olfactory association index (OAI, how strongly a word is associated with olfaction) and olfactory specificity index (OSI, how specific a word is in its description of odors). We validate the OAI and OSI metrics using psychophysical datasets by showing that terms with high OAI have high ratings of perceived olfactory association and are used to describe highly familiar odors. In contrast, terms with high OSI have high inter-individual consistency in how they are applied to odors. Finally, we analyze Dravnieks's (1985) dataset of odor ratings in terms of OAI and OSI. This analysis reveals that terms that are used broadly (applied often but with moderate ratings) tend to be olfaction-unrelated and abstract (e.g., “heavy” or “light”; low OAI and low OSI) while descriptors that are used selectively (applied seldom but with high ratings) tend to be olfaction-related (e.g., “vanilla” or “licorice”; high OAI). Thus, OAI and OSI provide behaviorally meaningful information about olfactory language. These statistical tools are useful for future studies of olfactory perception and cognition, and might help integrate research on odor perception, neuroimaging, and corpus-based linguistic models of semantic organization.

  • 28. Kamps, Jaap
    et al.
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Mika, Peter
    Murdock, Vanessa
    Report on the Fifth Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR’12): CIKM WORKSHOP REPORT2013In: SIGIR Forum, ISSN 0163-5840, E-ISSN 1558-0229, Vol. 47, no 1, p. 38-45Article in journal (Refereed)
    Abstract [en]

    There is an increasing amount of structure on the web as a result of modern web lan- guages, user tagging and annotation, emerging robust NLP tools, and an ever growing volume of linked data. These meaningful, semantic, annotations hold the promise to significantly en- hance information access, by enhancing the depth of analysis of today’s systems. Currently, we have only started exploring the possibilities and only begin to understand how these valu- able semantic cues can be put to fruitful use. To complicate matters, standard text search excels at shallow information needs expressed by short keyword queries, and here semantic annotation contributes very little, if anything. The main questions for the workshop are how to leverage the rich context currently available, especially in a mobile search scenario, giving powerful new handles to exploit semantic annotations. And how can we fruitfully combine information retrieval and knowledge intensive approaches, and for the first time work actively toward a unified view on exploiting semantic annotations.

    There was a strong feeling that we made substantial progress. Specifically, each of the breakout groups contributed to our understanding of the way forward. First, there is a need for further integration of symbolic and statistical methods with each adopting parts of the other’s strengths, by focusing on types of annotations that are informed by and meaningful for the task at hand, and relying on automatic information extraction and annotation based on web scale observations. Second, the discussion contributed to the creation of a concrete shared corpus with state of the art semantic annotation—in particular a web crawl annotated with Freebase concepts—that will benefit research in this area for years to come. 

    Download full text (pdf)
    fulltext
  • 29. Kamps, Jaap
    et al.
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Schenkel, Ralf
    Report on the Third Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR), Toronto, Canada2011In: SIGIR Forum, ISSN 0163-5840, E-ISSN 1558-0229, Vol. 45, no 1, p. 33-41Article in journal (Refereed)
    Abstract [en]

    There is an increasing amount of structure on the Web as a result of modern Web lan- guages, user tagging and annotation, and emerging robust NLP tools. These meaningful, semantic, annotations hold the promise to significantly enhance information access, by en- hancing the depth of analysis of today?s systems. Currently, we have only started exploring the possibilities and only begin to understand how these valuable semantic cues can be put to fruitful use. The workshop had an interactive format consisting of keynotes, boasters and posters, breakout groups and reports, and a final discussion, which was prolonged into the evening. There was a strong feeling that we made substantial progress. Specifically, each of the breakout groups contributed to our understanding of the way forward. First, annotations and use cases come in many different shapes and forms depending on the domain at hand, but at a higher level there are commonalities in annotation tools, indexing methods, user interfaces, and general methodology. Second, there is a framework emerging to view annota- tion as (1) a linking procedure, connecting (2) an analysis of information objects with (3) a semantic model of some sort, expressing relations that contribute to (4) a task of interest to end users. Third, we should look at complex tasks that cannot be comprehensible articulated in a few keywords, and embrace interaction both to incrementally refine the search request and to explore the results at various stages, guided by the semantic structure.

    Download full text (pdf)
    fulltext
  • 30.
    Kann, Viggo
    et al.
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Ahrenberg, Lars
    Linköping University.
    Domeij, Rickard
    Språkrådet.
    Karlsson, Ola
    Språkrådet.
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS. Gavagai.
    Nilsson, Henrik
    Terminologicentrum.
    Nivre, Joakim
    Uppsala University.
    En rekommenderad svensk språkteknologisk terminologi2016In: Proc. Sixth Swedish Language Technology Conference, Umeå: Svenska språkteknologitermgruppen , 2016Conference paper (Refereed)
    Abstract [en]

    In 2014 the Swedish Language Technology Terminology Group was created, with representatives from different parts of the language technology community, both higher education and research, industry and governmental agencies. In 2016 we have recommended Swedish terms for the 270 language technological concepts in the Bank of Finnish Terminology in Arts and Sciences. The language technology terms are published on folkets-lexikon.csc.kth.se/LTterminology, where anyone can lookup Swedish and English terms interactively and read the full list of terms. We also try to enter the most important Swedish terminology into the Swedish Wikipedia. We encourage use of these Swedish terms and welcome suggestions for improvements of the Swedish terminology.

    Download full text (pdf)
    fulltext
    Download full text (pdf)
    poster.pdf
  • 31. Kanoulas, Evangelos
    et al.
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS. Gavagai, Sweden.
    Practical Issues in Information Access System Evaluation2017In: SIGIR Forum, ISSN 0163-5840, E-ISSN 1558-0229, Vol. 51, no 1, p. 67-72Article in journal (Refereed)
    Abstract [en]

    This paper is a report from a workshop on Evaluation of Information Systems in Commercial Settings, inspired by the industrial day at SIGIR 2016. Small and medium size enterprises often lack the resources needed to develop proper evaluation infrastructures, but also to follow the research development in the field of evaluation. Similarly,academics lag behind in (a) understanding real practical issues raised when it comes to the evaluation of real systems - e.g. even depth-k pooling is often infeasible when an SME has a single ranking algorithm developed, and (b) sensing the breadth of applications and tasks on which systems require evaluation and the challenges of them. Large enterprises with the necessary resources and the data sets and flows to work with are hesitant to make their tests public, for both commercial and legal reasons.This workshop brought together representatives from technology companies, large and small, media houses, industrial consultants and academic research in information access for a discussion on practical issues and solutions to these issues.

  • 32.
    Karlgren, Jussi
    Stockholm University.
    A Computer Program for Recognizing Blazons1988Independent thesis Basic level (degree of Bachelor), 20 HE creditsStudent thesis
    Abstract [en]

    This candidate of philosophy thesis describes a computer program which analyzes so called blazons, i.e., classic descriptions of heraldic coats-of-arms. If an expression is recognized as an acceptable blazon, the program produces a graphic representation of the coat-of-arms in question on screen.

  • 33.
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Adopting systematic evaluation benchmarks in operational settings2019In: Information Retrieval Evaluation in a Changing World: Lessons Learned from 20 Years of CLEF / [ed] Nicola Ferro and Carol Peters, Cham: Springer Berlin/Heidelberg, 2019, p. 583-590Chapter in book (Refereed)
    Abstract [en]

    Evaluationofinformationsystemsincommercialandindustrialsettings differs from academic evaluation of methodology in important ways. Those dif- ferences have to do with differing organisational priorities between practice and research. Some of those priorities can be adjusted, others must be taken into account, to be able to include evaluation into an operational development pipeline.

    Download full text (pdf)
    fulltext
  • 34.
    Karlgren, Jussi
    KTH, Superseded Departments (pre-2005), Computer and Systems Sciences, DSV. Stockholm University.
    An algebra for recommendations: Using reader data as a basis for measuring document proximity1990Report (Other academic)
    Abstract [en]

    A measure for proximity between documents is defined, based on data from readers. This proximity measure can be further investigated as a tool document retrieval, and as to provide data for concept formation experiments.

    Download full text (pdf)
    fulltext
  • 35.
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Geoblockering lätt att kringgå2015Other (Other (popular science, discussion, etc.))
    Abstract [sv]

    Onsdag 6 maj förväntas EU-kommissionen presentera en handlingsplan för den digitala marknaden och en fråga som seglat upp för debatt är den om geoblockering, något som kommissionens vice ordförande Andrus Ansip tydligt har tagit ställning emot.   Jussi Karlgren, professor i språkteknologi vid KTH, reder ut vad debatten handlar om.

  • 36.
    Karlgren, Jussi
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    How Lexical Gold Standards Have Effects on the Usefulness of Text Analysis Tools for Digital Scholarship2019In: Experimental IR Meets Multilinguality, Multimodality, and Interaction: 10th International Conference of the CLEF Association, CLEF 2019, Lugano, Switzerland, September 9–12, 2019, Proceedings, Springer, 2019, Vol. 11696, p. 178-184Conference paper (Refereed)
    Abstract [en]

    This paper describes how the current lexical similarity and analogy gold standards are built to conform to certain ideas about what the models they are designed to evaluate are used for. Topical relevance has always been the most important target notion for information access tools and related language technology technologies, and while this has proven a useful starting point for much of what information technology is used for, it does not always align well with other uses to which technologies are being put, most notably use cases from digital scholarship in the humanities or social sciences. This paper argues for more systematic formulation of requirements from the digital humanities and social sciences and more explicit description of the assumptions underlying model design.

  • 37. Karlgren, Jussi
    Informationsåtkomst på flera språk1999In: Språk i Norden / [ed] Lindgren, Birgitta, Svenska språknämnden , 1999Chapter in book (Other academic)
    Abstract [sv]

    Att hitta information kan vara knivigt. Det kan vara s? att den som s?ker information vet exakt vad den vill ha fram, men inte har precis klart f?r sig var det finns; det kan ocks? vara s? att den som s?ker inte riktigt vet vad som finns men har en k?nsla av att n?gon sorts hj?lp finns att f?, bara fr?gan ?r r?tt st?lld. De senaste millennierna har m?nniskor lagrat information p? externa lagringsmedia av olika slag: det finns mer och mer information att tillg?, men av skiftande kvalitet, otydliga ?garf?rh?llanden, oklar provenans och det ?r mindre och mindre tydligt vem l?saren kan fr?ga till r?ds f?r att hitta r?tt. Det finns en m?ngd olika tekniker f?r att hj?lpa folk hitta information. Hyllor och ordentligt markerade bokryggar ?r ett gott f?rsta steg, alfabetisk eller n?gon annan systematisk hyllordning ett ytterligare, kortkataloger f?r tillexempel ?mnesordsregister med handskrivna nyckelord som ger andra sorteringskriterier ?n hyllorna ett tredje. Ju fler olika sorters index, desto l?ttare att hitta grejerna, och desto arbetsammare att adminstrera och uppr?tth?lla. Det ?r naturligtvis h?r datorer kommer in. Biblioteken arbetar idag med tekniska hj?lpmedel f?r kataloghantering, och informationsteknologin anv?nds just f?r det den ?r b?st p?: att adminstrera stora m?ngder information och sprida den med v?ldigt l?g marginalkostnad - allt vilket oftast anses vara bra.

  • 38.
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    New Measures to Investigate Term Typology by Distributional Data2013In: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013), May 22–24, 2013, Oslo University, Norway. NEALT Proceedings Series 16 / [ed] Stephan Oepen, Kristin Hagen, Janne Bondi Johannessen, Linköping: Linköping University Electronic Press, 2013Conference paper (Refereed)
    Abstract [en]

    This report describes a series of exploratory experiments to establish whether terms of different semantic type can be distinguished in useful ways in a semantic space constructed from distributional data. The hypotheses explored in this paper are that some words are more variant in their distribution than others; that the varying semantic character of words will be reflected in their distribution; and this distributional difference is encoded in current distributional models, but that the information is not accessible through the methods typically used in application of them. This paper proposes some new measures to explore variation encoded in distributional models but not usually put to use in understanding the character of words represented in them. These exploratory findings show that some proposed measures show a wide range of variation across words of various types.

    Download full text (pdf)
    fulltext
  • 39.
    Karlgren, Jussi
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Regulation of Unpredictable Effects of Decision Making Systems is Non-trivial2018In: 50 Years of Law and IT: The Swedish Law and Informatics Research Institute 1968-2018 / [ed] Peter Wahlgren, Stockholm: The Stockholm University Law Faculty , 2018, p. 127-132Chapter in book (Other academic)
    Abstract [en]

    Technical advances are rapidly delegating decision making in newarenas of human activity to information systems through theapplication of new classification mechanisms from machine learningresearch. How to manage technology-induced change and its effectsthrough legislative systems in order to encourage and supportbehaviour and activities which is desirable and beneficial to thepublic good and dissuade from such which is not is non-trivial. Ingeneral, legislation to cover new technical advances will be based onexisting technology and existing practice. This may seen reasonablebasis to build from and adds legitimacy to regulation and itsapplication, but regulation of technology too often stumbles at thebalancing line between under- standing and promoting future changeproductively and protecting past practice. This paper argues that morethought must be put into the aims of regulatory activities.

    Download full text (pdf)
    fulltext
  • 40. Karlgren, Jussi
    Reply to Fraser and Wrigley or Definitely Not The Last Word On Language Varieties1994In: Interacting with computers, ISSN 0953-5438, E-ISSN 1873-7951, Vol. 6, no 1, p. 109-110Article in journal (Refereed)
  • 41.
    Karlgren, Jussi
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Språket avslöjar hur vi röstar2014In: Språktidningen, ISSN 1654-5028, no 6, p. 16-22Article in journal (Other (popular science, discussion, etc.))
    Abstract [sv]

    Hur ser det politiska opinionsläget ut? Det går förstås att fråga väljarna. Men bättre är kanske att se vad de skriver. Nu är ett datorprogram väljarnas sympatier på spåren.

  • 42.
    Karlgren, Jussi
    Stockholm University, SICS.
    Stylistic Experiments for Information Retrieval2000Doctoral thesis, monograph (Other academic)
    Abstract [en]

    Information retrieval systems are built to handle texts as topical items:texts are tabulated by occurrence frequencies of content words in them,under the assumption that text topic is reasonably well modeled by contentword occurrence. But texts have several interesting characteristics beyondtopic. The experiments described in this text investigate {\em stylisticvariation}. Roughly put, style is the difference between two ways of sayingthe same thing --- and systematic stylistic variation can be used tocharacterize the {\em genre} of documents. These experiments investigate ifstylistic information is distinguishable using simple language engineeringmethods, and if in that case this type of information can be used toimprove information retrieval systems.

    A first set of experiments shows that simple measures of stylisticvariation can be used to distinguish genres from each other quiteadequately; how well depends on what the genres in question are.

    A second set of experiments evaluates the utility of stylistic measures forthe purposes of information retrieval, to identify common characteristicsof relevant and non-relevant documents. The conclusion is that the requestsfor information as typically expressed to retrieval systems are too terseand inspecific for non-topical information to improve retrieval results.Systems for information access need to be designed from the beginning tohandle richer information about the texts and documents at hand:information about stylistic variation cannot easily be added to an existingsystem.

    A third set of experiments explores how an interactive system can bedesigned to incorporate stylistic information in the interface between userand system. These experiments resulted in the design an interface forcategorizing retrieval results by genre, and displaying the retrievalresults using this categorization. This interface is integrated into aprototype for retrieving information from the World Wide Web.

    Download full text (pdf)
    fulltext
  • 43.
    Karlgren, Jussi
    Natural Language Processing Group, SICS.
    Sublanguages and Registers: A Note On Terminology1993In: Interacting with computers, ISSN 0953-5438, E-ISSN 1873-7951, Vol. 5, no 3, p. 348-350Article in journal (Refereed)
    Abstract [en]

    The term sublanguage from mathematical linguistics confuses interaction researchers and leads them to believe that implementing natural language interfaces is easier than it is. The term register from sociolinguistics is proposed instead.

  • 44.
    Karlgren, Jussi
    KTH, Superseded Departments (pre-2005), Computer and Systems Sciences, DSV. Stockholms universitet.
    The Interaction of Discourse Modality and User Expectations in Human-Computer Dialog1992Licentiate thesis, monograph (Other academic)
    Abstract [en]

    This study discusses the behavior of people towards natural language interfaces. It draws parallels to the behavior of people towards other people, and discusses how far these parallels can be stretched. A small experimental study of users performing tasks using a natural language interface to a database is presented, and the results related to the discussion.

    The main points made are

    1) that new modalities like the one used in typical human computer interaction - written interactive communication - are problematic for new users, from lack of conventions; and

    2) that users' attitudes towards computers and of the system's linguistic and other competence shape much of the interaction, and that these attitudes change, and that thus the important factor to take into account in system design is not what the initial attitudes are but rather what the process of changing them is and how to utilize the process of change to teach the user the system language and interaction modality.

    Download full text (pdf)
    fulltext
  • 45.
    Karlgren, Jussi
    et al.
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Bohman, Martin
    Ekgren, Ariel
    Isheden, Gabriel
    Kullmann, Emelie
    Nilsson, David
    Semantic Topology2014In: Proceedings of the 23d ACM international conference on Conference on information & knowledge management (CIKM '14), New York: Association for Computing Machinery (ACM), 2014, p. 1939-1942Conference paper (Refereed)
    Abstract [en]

    A reasonable requirement (among many others) for a lexical or semantic component in an information system is that it should be able to learn incrementally from the linguistic data it is exposed to, that it can distinguish between the topical impact of various terms, and that it knows if it knows stuff or not.

    We work with a specific representation framework – semantic spaces – which well accommodates the first requirement; in this short paper, we investigate the global qualities of semantic spaces by a topological procedure – mapper – which gives an indication of topical density of the space; we examine the local context of terms of interest in the semantic space using another topologically inspired approach which gives an indication of the neighbourhood of the terms of interest. Our aim is to be able to establish the qualities of the semantic space under consideration without resorting to inspection of the data used to build it.

    Download full text (pdf)
    fulltext
  • 46.
    Karlgren, Jussi
    et al.
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS. Gavagai, Sweden.
    Callin, Jimmy
    Collins-Thompson, Kevyn
    Gyllensten, Amaru Cuba
    Ekgren, Ariel
    Jurgens, David
    Korhonen, Anna
    Olsson, Fredrik
    Sahlgren, Magnus
    Schütze, Hinrich
    Evaluating learning language representations2015Conference paper (Refereed)
    Abstract [en]

    Machine learning offers significant benefits for systems that process and understand natural language: (a) lower maintenance and upkeep costs than when using manually-constructed resources, (b) easier portability to new domains, tasks, or languages, and (c) robust and timely adaptation to situation-specific settings. However, the behaviour of an adaptive system is less predictable than when using an edited, stable resource, which makes quality control a continuous issue. This paper proposes an evaluation benchmark for measuring the quality, coverage, and stability of a natural language system as it learns word meaning. Inspired by existing tests for human vocabulary learning, we outline measures for the quality of semantic word representations, such as when learning word embeddings or other distributed representations. These measures highlight differences between the types of underlying learning processes as systems ingest progressively more data.

  • 47.
    Karlgren, Jussi
    et al.
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Cutting, Douglass
    Recognizing Text Genres with Simple Metrics Using Discriminant Analysis1994In: Proceedings of the 15th International Conference on Computational Linguistics, 1994, Vol. 2, p. 1071-1075Conference paper (Refereed)
    Abstract [en]

    A simple method for categorizing texts into pre-determined text genre categories using the statistical standard technique of discriminant analysis is demonstrated with application to the Brown corpus. Discriminant analysis makes it possible use a large number of parameters that may be specific for a certain corpus or information stream, and combine them into a small number of functions, with the parameters weighted on basis of how useful they are for discriminating text genres. An application to information retrieval is discussed.

  • 48.
    Karlgren, Jussi
    et al.
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Ericsson, Linus
    Semantic Space Models for Profiling Reputation of Corporate Entities2013In: CLEF 2013 Evaluation Labs and Workshop: Online Working Notes, CLEF , 2013Conference paper (Refereed)
    Abstract [en]

    Gavagai used its commercially available system for the filtering and po-larity tasks in the evaluation campaign for online reputation management systemsat CLEF 2013. The system is built for large scale analysis of streaming text and aspart of the services Gavagai provides, it measures the public attitude visavi targetsof interest. This mechanism — with no adjustment for this specific task — wasused for polarisation and the experiments performed this year was to test a numberof settings for testing how an attitude might be learned from the data rather thangiven by editorial intervention.

    Download full text (pdf)
    fulltext
  • 49.
    Karlgren, Jussi
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS. Gavagai, Stockholm, Sweden.
    Esposito, L.
    Gratton, C.
    Kanerva, P.
    Authorship profiling without using topical information: Notebook for PAN at CLEF 20182018In: CLEF 2018 Working Notes, CEUR-WS , 2018, Vol. 2125Conference paper (Refereed)
    Abstract [en]

    This paper describes an experiment made for the PAN 2018 shared task on author profiling. The task is to distinguish female from male authors of microblog posts published on Twitter using no extraneous information except what is in the posts; this experiment focusses on using non-topical information from the posts, rather than gender differences in referential content.

  • 50. Karlgren, Jussi
    et al.
    Höök, Kristina
    Lantz, Ann
    KTH, Superseded Departments (pre-2005), Numerical Analysis and Computer Science, NADA.
    Palme, Jakob
    Pargman, Daniel
    The glass box user model for filtering1994In: / [ed] A. Kobsa and D. Litman, 1994Conference paper (Refereed)
    Abstract [en]

    The first requirement on an interactive system in a domain such as information filtering is to be an interface to knowledge, rather than just a knowledgeable interface. We borrow the computation instruction metaphor of a system as "a black box in a glass box" as a means to conceptualize the problem of giving a user control over the actions of an interactive system. The application domain we work in is that of information filtering. In the "black box", we hide complex knowledge of the domain objects such as facts and assumptions about text genre identification, while the "glass box", which is what the user sees, only shows the neat top level knowledge of the domain conceptual categories such as e.g. categorization rules.

12 1 - 50 of 67
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf