Paper Note of GRP

Title: Learning Generalized Reactive Policies using Deep Neural Networks

Idea:

  1. Randomly generate problem set, then use off-the-shelf planner to provide plan and trajectory, add observations & actions. Use bootstrap to increase training data.
  2. Given the training data, use imitation learing (IL) to learn generated reactive policy (GRP) which represented by deep neural networks.
  3. Further more, the GRP can be used as a heuristic function for guided search.
  4. leapfrogging, which is use GRP to generate data for learning more complex GRP iteratively.

Contribution:

  • use dl to get GRP, reduce handcraft knowledge/feature
  • automatically learn a heuristic function
  • the approach could be used in end-to-end system

Experiments:

  • Sokoban
    • 9k distinct obstacle * 5 random start/goal locations
    • evaluate performance both on test domains of the same size the GRPs were trained on, 99 grids, and
      larger problems
  • TSP
Show comments from Gitment