Change search
ReferencesLink to record
Permanent link

Direct link
Learning to rank: Regret lower bounds and efficient algorithms
KTH, School of Electrical Engineering (EES), Automatic Control.ORCID iD: 0000-0003-2988-8464
KTH, School of Electrical Engineering (EES), Automatic Control.
KTH, School of Electrical Engineering (EES), Automatic Control.
2015 (English)In: Performance Evaluation Review, ISSN 0163-5999, E-ISSN 1557-9484, Vol. 43, no 1, 231-244 p.Article in journal (Refereed) PublishedText
Abstract [en]

Algorithms for learning to rank Web documents, display ads, or other types of items constitute a fundamental component of search engines and more generally of online services. In such systems, when a user makes a request or visits a web page, an ordered list of items (e.g. documents or ads) is displayed; the user scans this list in order, and clicks on the first relevant item if any. When the user clicks on an item, the reward collected by the system typically decreases with the position of the item in the displayed list. The main challenge in the design of sequential list selection algorithms stems from the fact that the probabilities with which the user clicks on the various items are unknown and need to be learned. We formulate the design of such algorithms as a stochastic bandit optimization problem. This problem differs from the classical bandit framework: (1) the type of feedback received by the system depends on the actual relevance of the various items in the displayed list (if the user clicks on the last item, we know that none of the previous items in the list are relevant); (2) there are inherent correlations between the average relevance of the items (e.g. the user may be interested in a specific topic only). We assume that items are categorized according to their topic and that users are clustered, so that users of the same cluster are interested in the same topic. We investigate several scenarios depending on the available sideinformation on the user before selecting the displayed list: (a) we first treat the case where the topic the user is interested in is known when she places a request; (b) we then study the case where the user cluster is known but the mapping between user clusters and topics is unknown. For both scenarios, we derive regret lower bounds and devise algorithms that approach these fundamental limits.

Place, publisher, year, edition, pages
ACM Press, 2015. Vol. 43, no 1, 231-244 p.
Keyword [en]
Ad-display optimization, Learning, Multi-armed bandits, Search engines
National Category
Computer Science
URN: urn:nbn:se:kth:diva-187384DOI: 10.1145/2745844.2745852ScopusID: 2-s2.0-84955606870OAI: diva2:931278

QC 20160527

Available from: 2016-05-27 Created: 2016-05-23 Last updated: 2016-05-27Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Magureanu, StefanProutière, AlexandreLaroche, Cyrille
By organisation
Automatic Control
In the same journal
Performance Evaluation Review
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 9 hits
ReferencesLink to record
Permanent link

Direct link