Observatorio de I+D+i UPM

Memorias de investigación
Communications at congresses:
A Possibilistic Reward Method for the Multi-Armed Bandit Problem
Year:2017
Research Areas
  • Stochastic and control systems,
  • Communication, information,
  • Stocastic procedures
Information
Abstract
Different allocation strategies can be found in the literature to deal with the multi-armed bandit problem under a frequentist view or from a Bayesian perspective. In this paper, we propose a novel allocation strategy, the possibilistic reward method. First, possibilistic reward distributions are used to model the uncertainty about the arm expected rewards, which are then converted into probability distributions using a pignistic probability transformation. Finally, a simulation experiment is carried out to find out the one with the highest expected reward, which is then pulled. A parametric probability transformation of the proposed is then introduced together with a dynamic optimization, which implies that neither previous knowledge nor a simulation of the arm distributions is required. A numerical study proves that the proposed method outperforms other policies in the literature in five scenarios: a Bernoulli distribution with very low success probabilities, with success probabilities close to 0.5 and with success probabilities close to 0.5 and Gaussian rewards; and truncated in [0,10] Poisson and exponential distributions.
International
Si
Congress
6th International Conference on Operations Research and Enterprise Systems
960
Place
Oporto, Portugal
Reviewers
Si
ISBN/ISSN
978-989-758-218-9
Start Date
23/02/2017
End Date
25/02/2017
From page
75
To page
84
Proceedings of the 6th International Conference on Operations Research and Enterprise Systems
Participants
  • Autor: Miguel Martín Blanco
  • Autor: Antonio Jimenez Martin (UPM)
  • Autor: Alfonso Mateos Caballero (UPM)
Research Group, Departaments and Institutes related
  • Creador: Grupo de Investigación: Grupo de análisis de decisiones y estadística
  • Departamento: Inteligencia Artificial
S2i 2020 Observatorio de investigación @ UPM con la colaboración del Consejo Social UPM
Cofinanciación del MINECO en el marco del Programa INNCIDE 2011 (OTR-2011-0236)
Cofinanciación del MINECO en el marco del Programa INNPACTO (IPT-020000-2010-22)