Training Robots to Understand What Humans Want

Training robots to do what they have to do requires technical guidance so that they would act accordingly, right? It may seem easy, but it’s more complicated than it sounds.

Instructing robots to accomplish what you want them to do is about teaching them how to understand the task they are asked to fulfill.

There are different ways to make autonomous robots understand what you want, like setting user preferences, or showing them what to do and how to do it via demonstrations.

Stanford researchers actually combined the two, demos and preferences, into one system that’s more efficient at training autonomous systems than either of the methods alone.

Training Robots Better, Faster, Smarter

Imagine you set a racing car in a video game to optimize for speed, and you see it speeding crazily in circles. Now, imagine the same but with a real-world autonomous vehicle.

Taking the sharp difference in consequences between the two scenarios aside, if the cars aren’t explicitly instructed to drive in a straight line, nothing will prevent them from not doing so.

It’s from examples like this one that Dorsa Sadigh, an assistant professor of computer science and electrical engineering at Stanford, and her lab teammates thought of an efficient system that could set goals for robots.

This new system for training robots called reward functions is an ingenious way of providing instruction to robots. It combines demos that involve humans physically showing the robot what to do, and user preference surveys, which contain the answers people give to questions about the robot’s mission.

Sadigh explained:

“Demonstrations are informative but they can be noisy. On the other hand, preferences provide, at most, one bit of information, but are way more accurate. Our goal is to get the best of both worlds, and combine data coming from both of these sources more intelligently to better learn about humans’ preferred reward function.”