Learning complex manipulation tasks in realistic, obstructed
environments is a challenging problem due to hard exploration in the presence
of obstacles and high-dimensional visual observations. Prior work tackles the
exploration problem by integrating motion planning and reinforcement learning.
However, the motion planner augmented policy requires access to state
information, which is often not available in the real-world settings. To this
end, we propose to distill a state-based motion planner augmented policy to a
visual control policy via (1) visual behavioral cloning to remove the motion
planner dependency along with its jittery motion, and (2) vision-based
reinforcement learning with the guidance of the smoothed trajectories from the
behavioral cloning agent. We evaluate our method on three manipulation tasks
in obstructed environments and compare it against various reinforcement
learning and imitation learning baselines. The results demonstrate that our
framework is highly sample-efficient and outperforms the state-of-the-art
algorithms. Moreover, coupled with domain randomization, our policy is capable
of zero-shot transfer to unseen environment settings with distractors. Code
and videos are available at https://clvrai.com/mopa-pd.
Accepted to Conference on Robot Learning (CoRL), 2021.