Memorias de investigación
Artículos en revistas:
The Possibilistic Reward Method and a Dynamic Extension for the Multi-armed Bandit Problem: A Numerical Study
Año:2017

Áreas de investigación
  • Ciencias de la computación y tecnología informática

Datos
Descripción
Different allocation strategies can be found in the literature to deal with the multi-armed bandit problem under a frequentist view or from a Bayesian perspective. In this paper, we propose a novel allocation strategy, the possibilistic reward method. First, possibilistic reward distributions are used to model the uncertainty about the arm expected rewards, which are then converted into probability distributions using a pignistic probability transformation. Finally, a simulation experiment is carried out to find out the one with the highest expected reward, which is then pulled. A parametric probability transformation of the proposed is then introduced together with a dynamic optimization, which implies that neither previous knowledge nor a simulation of the arm distributions is required. A numerical study proves that the proposed method outperforms other policies in the literature in five scenarios: a Bernoulli distribution with very low success probabilities, with success probabilities close to 0.5 and with success probabilities close to 0.5 and Gaussian rewards; and truncated in [0,10] Poisson and exponential distributions.
Internacional
Si
JCR del ISI
Si
Título de la revista
PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON OPERATIONS RESEARCH AND ENTERPRISE SYSTEMS (ICORES)
ISSN
978-989-758-218-9
Factor de impacto JCR
Información de impacto
Volumen
0
DOI
10.5220/0006186400750084
Número de revista
Desde la página
75
Hasta la página
84
Mes
SIN MES
Ranking

Esta actividad pertenece a memorias de investigación

Participantes

Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Departamento: Matemática e Informática Aplicadas a la Ingenierías Civil y Naval