Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulation
NAGIOS: RODERIC FUNCIONANDO

Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulation

DSpace Repository

Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulation

Show simple item record

dc.contributor.author Martinez Gil, Francisco Antonio
dc.contributor.author Lozano Ibáñez, Miguel
dc.contributor.author García-Fernández, Ignacio
dc.contributor.author Romero, Pau
dc.contributor.author Serra, Dolors
dc.contributor.author Sebastián Aguilar, Rafael
dc.date.accessioned 2020-10-05T14:06:33Z
dc.date.available 2020-10-05T14:06:33Z
dc.date.issued 2020
dc.identifier.uri https://hdl.handle.net/10550/75739
dc.description.abstract Reinforcement learning is one of the most promising machine learning techniques to get intelligent behaviors for embodied agents in simulations. The output of the classic Temporal Difference family of Reinforcement Learning algorithms adopts the form of a value function expressed as a numeric table or a function approximator. The learned behavior is then derived using a greedy policy with respect to this value function. Nevertheless, sometimes the learned policy does not meet expectations, and the task of authoring is difficult and unsafe because the modification of one value or parameter in the learned value function has unpredictable consequences in the space of the policies it represents. This invalidates direct manipulation of the learned value function as a method to modify the derived behaviors. In this paper, we propose the use of Inverse Reinforcement Learning to incorporate real behavior traces in the learning process to shape the learned behaviors, thus increasing their trustworthiness (in terms of conformance to reality). To do so, we adapt the Inverse Reinforcement Learning framework to the navigation problem domain. Specifically, we use Soft Q-learning, an algorithm based on the maximum causal entropy principle, with MARL-Ped (a Reinforcement Learning-based pedestrian simulator) to include information from trajectories of real pedestrians in the process of learning how to navigate inside a virtual 3D space that represents the real environment. A comparison with the behaviors learned using a Reinforcement Learning classic algorithm (Sarsa(λ)) shows that the Inverse Reinforcement Learning behaviors adjust significantly better to the real trajectories.
dc.language.iso eng
dc.relation.ispartof Mathematics, 2020, vol. 8, num. 1479
dc.rights.uri info:eu-repo/semantics/openAccess
dc.source Martinez Gil, Francisco Antonio Lozano Ibáñez, Miguel García-Fernández, Ignacio Romero, Pau Serra, Dolors Sebastián Aguilar, Rafael 2020 Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulation Mathematics 8 1479
dc.subject Aprenentatge
dc.subject Informàtica
dc.title Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulation
dc.type info:eu-repo/semantics/article
dc.date.updated 2020-10-05T14:06:33Z
dc.identifier.doi https://doi.org/10.3390/math8091479
dc.identifier.idgrec 140666

View       (2.520Mb)

This item appears in the following Collection(s)

Show simple item record

Search DSpace

Advanced Search

Browse

Statistics