Stage M1R 2017

Pointeurs

RL

cours M1: MDP et planif, RL
cours David Silver : http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Teaching.html
livre de Sutton mis à jour: https://webdocs.cs.ualberta.ca/~sutton/book/bookdraft2016sep.pdf

Multi-Agent RL :
- en premier, lire le chapitre 4 de https://tel.archives-ouvertes.fr/file/index/docid/362529/filename/these_matignon.pdf
- puis lire http://liris.cnrs.fr/laetitia.matignon/index/matignon2012KER.pdf

Travaux de De Hauwere: Learning multi-agent state space representations
- http://www.aamas-conference.org/Proceedings/aamas2010/pdf/01%20Full%20Papers/15_02_FP_0421.pdf
- https://ai.vub.ac.be/ALA2012/downloads/paper5.pdf

Construction de représentations en RL

Tile Coding et versions adaptatives adaptative_tile_coding [Whiteson,2007] et evolutionary_tile_coding [Lin,2010]
Combinaison de growing neural gaz GNG et Q-Learning pour discrétisation adaptative de l'espace d'états: http:liris.cnrs.fr/sasem/lib/exe/fetch.php?media=m1r2017:vieira2013tdgngoriginal.pdf * Self-Organizing Distinctive-State Abstraction (SODA) [Kuipers,2006] === App Constructiviste === * Thèse S. Mazac: https://tel.archives-ouvertes.fr/tel-01310583/file/TH2015MazacSebastien.pdf === RL et Inspirations Constructivistes === * Intrinsically Motivated RL [Singh2005] https://web.eecs.umich.edu/~baveja/Papers/FinalNIPSIMRL.pdf ===== Mémentos ===== A lire : * https:ai.vub.ac.be/ALA2012/downloads/paper4.pdf
http:ir.library.oregonstate.edu/xmlui/bitstream/handle/1957/39192/HolmesParkerChristopherG2013.pdf;sequence=1 ==== App Constructiviste ==== * Etat de l'art (Thèse S. Mazac) ==== RL ==== === Multi-agents === * Learning multi-agent state space representations (CQLearning) * Processus décisionnels de Markov et systèmes multiagents (Thèse L. Matignon) * Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems * Context-Sensitive Reward Shaping for Sparse Inter-action Multi-Agent Systems === Inspirations Constructivistes === * Intrinsically Motivated RL [Singh2005] ==== Value function approximation ==== * Quelques infos ==== Temporal Difference - Growing Neural Gas ==== * TD-GNG ===== Réalisations ===== * SOM ===== Réflexions ===== * CQ-Learning et TD-GNG ===== Comptes-rendu de réunion ===== Dossier contenant les slides présentés lors des réunions : slides * 02/03/17 * 14/03/17 * 24/03/17