Paper Note of GRP

Title: Learning Generalized Reactive Policies using Deep Neural Networks

Idea:

Randomly generate problem set, then use off-the-shelf planner to provide plan and trajectory, add observations & actions. Use bootstrap to increase training data.
Given the training data, use imitation learing (IL) to learn generated reactive policy (GRP) which represented by deep neural networks.
Further more, the GRP can be used as a heuristic function for guided search.
leapfrogging, which is use GRP to generate data for learning more complex GRP iteratively.

Contribution:

Experiments:

Sokoban
- 9k distinct obstacle * 5 random start/goal locations
- evaluate performance both on test domains of the same size the GRPs were trained on, 99 grids, and
  larger problems
TSP