Descripción
|
|
---|---|
Different allocation strategies can be found in the literature to deal with the multi-armed bandit problem under a frequentist view or from a Bayesian perspective. In this paper, we propose a novel allocation strategy, the possibilistic reward method. First, possibilistic reward distributions are used to model the uncertainty about the arm expected rewards, which are then converted into probability distributions using a pignistic probability transformation. Finally, a simulation experiment is carried out to find out the one with the highest expected reward, which is then pulled. A parametric probability transformation of the proposed is then introduced together with a dynamic optimization, which implies that neither previous knowledge nor a simulation of the arm distributions is required. A numerical study proves that the proposed method outperforms other policies in the literature in five scenarios: a Bernoulli distribution with very low success probabilities, with success probabilities close to 0.5 and with success probabilities close to 0.5 and Gaussian rewards; and truncated in [0,10] Poisson and exponential distributions. | |
Internacional
|
Si |
Nombre congreso
|
6th International Conference on Operations Research and Enterprise Systems |
Tipo de participación
|
960 |
Lugar del congreso
|
Oporto, Portugal |
Revisores
|
Si |
ISBN o ISSN
|
978-989-758-218-9 |
DOI
|
|
Fecha inicio congreso
|
23/02/2017 |
Fecha fin congreso
|
25/02/2017 |
Desde la página
|
75 |
Hasta la página
|
84 |
Título de las actas
|
Proceedings of the 6th International Conference on Operations Research and Enterprise Systems |