Memorias de investigación
Capítulo de libro:
Allocation Strategies based on Possibilistic Rewards for the Multi-Armed Bandit Problem: A Numerical Study and Regret Analysis
Año:2018

Áreas de investigación
  • Investigación operativa y programación matemática,
  • Estadística

Datos
Descripción
In this paper, we propose a novel allocation strategy based on possibilistic rewards for the multi-armed bandit problem. First, we use possibilistic reward distributions to model the uncertainty about the expected rewards from the arms, derived from a set of infinite confidence intervals nested around the expected value. They are then converted into probability distributions using a pignistic probability transformation. Finally, a simulation experiment is carried out to find out the one with the highest expected reward, which is then pulled. A parametric probability transformation of the proposed is then introduced together with a dynamic optimization. A numerical study proves that the proposed method outperforms other policies in the literature in five scenarios accounting for Bernoulli, Poisson and exponential distributions for the rewards. The regret analysis of the proposed methods suggests a logarithmic asymptotic convergence for the original possibilistic reward method, whereas a polynomial regret could be associated with the parametric extension and the dynamic optimization
Internacional
Si
DOI
Edición del Libro
Editorial del Libro
G. Palmier, F. Liberatore, M. Demange (eds.), Springer
ISBN
978-3-319-94766-2
Serie
Communications in Computer and Information Science 884
Título del Libro
Operations Research and Enterprise Systems
Desde página
186
Hasta página
209

Esta actividad pertenece a memorias de investigación

Participantes

Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Grupo de Investigación: Grupo de análisis de decisiones y estadística
  • Departamento: Inteligencia Artificial