Title: Learning Generalized Reactive Policies using Deep Neural Networks
Idea:
- Randomly generate problem set, then use off-the-shelf planner to provide plan and trajectory, add observations & actions. Use bootstrap to increase training data.
- Given the training data, use imitation learing (IL) to learn generated reactive policy (GRP) which represented by deep neural networks.
- Further more, the GRP can be used as a heuristic function for guided search.
- leapfrogging, which is use GRP to generate data for learning more complex GRP iteratively.
Contribution:
- use dl to get GRP, reduce handcraft knowledge/feature
- automatically learn a heuristic function
- the approach could be used in end-to-end system
Experiments:
- Sokoban
- 9k distinct obstacle * 5 random start/goal locations
- evaluate performance both on test domains of the same size the GRPs were trained on, 99 grids, and
larger problems
- TSP