Memorias de investigación
Communications at congresses:
Diffusion Gradient Temporal Difference for Cooperative Reinforcement Learning with Linear Function Approximation
Year:2012

Research Areas
  • Processing and signal analysis

Information
Abstract
We introduce a diffusion-based algorithm in which multiple agents cooperate to predict a common and global statevalue function by sharing local estimates and local gradient information among neighbors. Our algorithm is a fully distributed implementation of the gradient temporal difference with linear function approximation, to make it applicable to multiagent settings. Simulations illustrate the benefit of cooperation in learning, as made possible by the proposed algorithm.
International
Si
Congress
2012 3rd International Workshop on Cognitive Incromation Processing (CIP)
960
Place
Reviewers
Si
ISBN/ISSN
978-1-4673-1878-5
Start Date
28/05/2012
End Date
30/05/2012
From page
1
To page
6
3rd International Workshop on Cognitive Incromation Processing (CIP)
Participants
  • Autor: Sergio Valcarcel Macua UPM
  • Autor: Pavle Belanovic . UPM
  • Autor: Santiago Zazo Bello UPM

Research Group, Departaments and Institutes related
  • Creador: Grupo de Investigación: Grupo de Aplicaciones del Procesado de Señal (GAPS)
  • Departamento: Señales, Sistemas y Radiocomunicaciones