Observatorio de I+D+i UPM

Memorias de investigación
Capítulo de libro:
Allocation Strategies based on Possibilistic Rewards for the Multi-Armed Bandit Problem: A Numerical Study and Regret Analysis
Año:2018
Áreas de investigación
  • Investigación operativa y programación matemática,
  • Estadística
Datos
Descripción
In this paper, we propose a novel allocation strategy based on possibilistic rewards for the multi-armed bandit problem. First, we use possibilistic reward distributions to model the uncertainty about the expected rewards from the arms, derived from a set of infinite confidence intervals nested around the expected value. They are then converted into probability distributions using a pignistic probability transformation. Finally, a simulation experiment is carried out to find out the one with the highest expected reward, which is then pulled. A parametric probability transformation of the proposed is then introduced together with a dynamic optimization. A numerical study proves that the proposed method outperforms other policies in the literature in five scenarios accounting for Bernoulli, Poisson and exponential distributions for the rewards. The regret analysis of the proposed methods suggests a logarithmic asymptotic convergence for the original possibilistic reward method, whereas a polynomial regret could be associated with the parametric extension and the dynamic optimization
Internacional
Si
DOI
Edición del Libro
Editorial del Libro
G. Palmier, F. Liberatore, M. Demange (eds.), Springer
ISBN
978-3-319-94766-2
Serie
Communications in Computer and Information Science 884
Título del Libro
Operations Research and Enterprise Systems
Desde página
186
Hasta página
209
Esta actividad pertenece a memorias de investigación
Participantes
  • Autor: Miguel Carlos Martin Blanco (UPM)
  • Autor: Antonio Jimenez Martin (UPM)
  • Autor: Alfonso Mateos Caballero (UPM)
Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Grupo de Investigación: Grupo de análisis de decisiones y estadística
  • Departamento: Inteligencia Artificial
S2i 2022 Observatorio de investigación @ UPM con la colaboración del Consejo Social UPM
Cofinanciación del MINECO en el marco del Programa INNCIDE 2011 (OTR-2011-0236)
Cofinanciación del MINECO en el marco del Programa INNPACTO (IPT-020000-2010-22)