Observatorio de I+D+i UPM

| Otras actividades
HOME

Proyectos Internacionales Art�culos Patentes UPM Software UPM Empresas UPM Otras actividades Memorias de investigaci�n

Memorias de investigación

Ponencias en congresos:

The Possibilistic Reward Methods for the Multi-Armed Bandit Problem

A�o:2017

�reas de investigaci�n

Investigaci�n operativa y programaci�n matem�tica,
Ciencias de la computaci�n y tecnolog�a inform�tica

Datos

Descripci�n
The multi-armed bandit problem has been at great depth studied in statistics, becoming fundamental in different areas of economics, statistics or artificial intelligence, such as reinforcement learning and evolutionary programming. Two families of bandit settings can be distinguished. In the first, the distribution of rewards is assumed to belong to a family of probability distributions whereas in the second, the rewards are only assumed to be bounded (say, between 0 and 1), and policies rely directly on the estimates of the expected rewards for each arm. Almost all the policies or allocation strategies in the literature focus on the first family and they can be separated, in two distinct approaches: the frequentist view and the Bayesian approach. In the frequentist view, the expected mean rewards corresponding to all arms are considered as unknown deterministic quantities and the aim of the algorithm is to reach the best parameter-dependent performance. In the Bayesian approach, each arm is characterized by a parameter which is endowed with a prior distribution. In this work, we propose a set of allocation strategies to deal with the multi-armed bandit problem that accounts for the Bayesian perspective, the possibilistic reward methods. First, we use possibilistic reward distributions to model the uncertainty about the expected rewards for arm, derived from a set of infinite confidence intervals nested around the expected value. Depending on the inequity (Chenoff-Hoeffding, Chernoff or Bernstein) used to compute the confidence intervals we have three PR methods with different features. Next, we use a pignistic probability transformation borrowed from decision theory and the transferable belief model to convert these possibilistic functions into probability distributions following the insufficient reason principle. Finally, Thompson sampling techniques are used to identify the arm with the higher expected reward and play that arm. For this, we carry out a simulation experiment by sampling from each arm according to their distributions. Finally, the picked arm is pulled/played and a real reward is output. Then, the possibilistic function corresponding to the picked arm is updated and started again. A numerical study analyses the performance of the proposed methods with respect to other policies in the literature. For this, five scenarios are considered accounting for a Bernoulli distribution with very low success probabilities, success probabilities close to 0.5 and success probabilities close to 0.5 and Gaussian rewards; a truncated Poisson distribution in [0,10]; and a truncated exponential distribution in [0,10].
Internacional	Si
Nombre congreso	5th Symposium on Games and Decisions in Reliability and Risk
Tipo de participaci�n	960
Lugar del congreso	Madrid, Espa�a
Revisores	No
ISBN o ISSN
DOI
Fecha inicio congreso	07/06/2017
Fecha fin congreso	09/06/2017
Desde la p�gina
Hasta la p�gina
T�tulo de las actas

Esta actividad pertenece a memorias de investigaci�n

Participantes

Autor: Miguel Mart�n Blanco
Autor: Antonio Jimenez Martin UPM
Autor: Alfonso Mateos Caballero UPM

Grupos de investigaci�n, Departamentos, Centros e Institutos de I+D+i relacionados

Creador: Grupo de Investigaci�n: Grupo de an�lisis de decisiones y estad�stica
Departamento: Inteligencia Artificial