Descripción
|
|
---|---|
In this paper, we propose a novel allocation strategy based on possibilistic rewards for the multi-armed bandit problem. First, we use possibilistic reward distributions to model the uncertainty about the expected rewards from the arms, derived from a set of infinite confidence intervals nested around the expected value. They are then converted into probability distributions using a pignistic probability transformation. Finally, a simulation experiment is carried out to find out the one with the highest expected reward, which is then pulled. A parametric probability transformation of the proposed is then introduced together with a dynamic optimization. A numerical study proves that the proposed method outperforms other policies in the literature in five scenarios accounting for Bernoulli, Poisson and exponential distributions for the rewards. The regret analysis of the proposed methods suggests a logarithmic asymptotic convergence for the original possibilistic reward method, whereas a polynomial regret could be associated with the parametric extension and the dynamic optimization | |
Internacional
|
Si |
DOI
|
|
Edición del Libro
|
|
Editorial del Libro
|
G. Palmier, F. Liberatore, M. Demange (eds.), Springer |
ISBN
|
978-3-319-94766-2 |
Serie
|
Communications in Computer and Information Science 884 |
Título del Libro
|
Operations Research and Enterprise Systems |
Desde página
|
186 |
Hasta página
|
209 |