Google, in collaboration with DeepMind, just introduced their Deep Planning Network (PlaNet) agent, an AI agent that both companies hope will spur progress in reinforcement learning research.
Unlike Google and DeepMind’s current AI agents, the PlaNet agent can reportedly learn a world model based on image inputs and make efficient plans from them to gather new experiences.
To date, Google is a firm believer that reinforcement learning (RL) can significantly improve an artificial agent’s decision-making capabilities. The RL technique allows an AI agent to observe a series of sensory inputs like photos while deciding what actions to do.
Reinforcement Learning
There are two types of RL: Model-free RL and the Model-based RL.
Model-free RL focuses on directly making good action predictions based on the sensory inputs observed by the AI agent. This is the same approach that DeepMind’s DQN utilized to play Atari.
Meanwhile, model-based RL is an approach that teaches artificial agents how the world behaves in general. The model-based RL enables AI agents to plan ahead by directly mapping observations to actions specifically. An excellent example of this approach is the AlphaGo AI agent that DeepMind developed.
Both of these RL approaches have their challenges.
The model-free approach requires weeks of simulated interaction to learn through trial and error. This issue limits the AI agents usefulness in practice.
On the other hand, the model-based approach requires learning from rules or dynamics through experience, making it inefficient when deployed in unknown environments.
PlaNet Agent to Spur Progress
As a solution to these dilemmas, Google and DeepMind have launched the PlaNet agent. This new AI agent can reportedly learn from sensory inputs while successfully using it for planning.
“PlaNet solves a variety of image-based control tasks, competing with advanced model-free agents in terms of final performance while being 5000% more data efficient on average,” Danijar Hafner, a student researcher at Google AI, wrote in a blog post.
According to Hafner, PlaNet uses a new approach called a latent dynamic model which enables the AI agent to predict the latent state forward instead of making predictions from one image to another.
“The image and reward at each step are then generated from the corresponding latent state,” Hafner further explained.
“By compressing the images in this way, the agent can automatically learn more abstract representations, such as positions and velocities of objects, making it easier to predict forward without having to generate images along the way.”
PlaNet reportedly works without a policy network which means that it decides its actions through planning.
During tests, the PlaNet agent was able to outperform Google’s model-free agents like A3C and D4PG when it comes to image-based tasks. Furthermore, PlaNet was also able to learn six unknown tasks in different environments in as little as 2,000 attempts.
“Our results showcase the promise of learning dynamics models for building autonomous RL agents,” Hafner added.
“We are excited about the possibilities that model-based reinforcement learning opens up, including multi-task learning, hierarchical planning and active exploration using uncertainty estimates.”
The source code for the PlaNet agent is also available on GitHub.
Comments (0)
Most Recent