Technology 3 min read

Training Robots to Understand What Humans Want

Stanford scientists combined demos and user preferences to design a system that can perform the seemingly impossible feat of training robots to know what humans want.

Image courtesy of Shutterstock

Image courtesy of Shutterstock

Training robots to do what they have to do requires technical guidance so that they would act accordingly, right? It may seem easy, but it’s more complicated than it sounds.

Instructing robots to accomplish what you want them to do is about teaching them how to understand the task they are asked to fulfill.

There are different ways to make autonomous robots understand what you want, like setting user preferences, or showing them what to do and how to do it via demonstrations.

Stanford researchers actually combined the two, demos and preferences, into one system that’s more efficient at training autonomous systems than either of the methods alone.

Training Robots Better, Faster, Smarter

Imagine you set a racing car in a video game to optimize for speed, and you see it speeding crazily in circles. Now, imagine the same but with a real-world autonomous vehicle.

Taking the sharp difference in consequences between the two scenarios aside, if the cars aren’t explicitly instructed to drive in a straight line, nothing will prevent them from not doing so.

It’s from examples like this one that Dorsa Sadigh, an assistant professor of computer science and electrical engineering at Stanford, and her lab teammates thought of an efficient system that could set goals for robots.

This new system for training robots called reward functions is an ingenious way of providing instruction to robots. It combines demos that involve humans physically showing the robot what to do, and user preference surveys, which contain the answers people give to questions about the robot’s mission.

Sadigh explained:

“Demonstrations are informative but they can be noisy. On the other hand, preferences provide, at most, one bit of information, but are way more accurate. Our goal is to get the best of both worlds, and combine data coming from both of these sources more intelligently to better learn about humans’ preferred reward function.”

Read More: 9 (Mostly) Household Robots That Will Change the Future

The present research builds on previous work by Sadigh and her colleagues, where they focussed on user preference surveys alone. They deemed the preference-based method too slow, so they come up with a way to speed it up. Demonstrations alone don’t work because the robot “often struggles to determine what parts of the demonstration are important.”

The team finally found that a combination of demos and preference surveys is the best way to train robots to interact with humans safely and efficiently.

Robots trained based on this combination process performed better than when they relied on one system alone, and this in both simulations and real-world experiments.

“Being able to design reward functions for autonomous systems is a big, important problem that hasn’t received quite the attention in academia as it deserves.”

The Stanford team presented their work at the Robotics: Science and Systems conference last June 24th.

Read More: Why Childcare Robots Will Become The New Norm

First AI Web Content Optimization Platform Just for Writers

Found this article interesting?

Let Zayan Guedim know how much you appreciate this article by clicking the heart icon and by sharing this article on social media.


Profile Image

Zayan Guedim

Trilingual poet, investigative journalist, and novelist. Zed loves tackling the big existential questions and all-things quantum.

Comments (0)
Most Recent most recent
You
share Scroll to top

Link Copied Successfully

Sign in

Sign in to access your personalized homepage, follow authors and topics you love, and clap for stories that matter to you.

Sign in with Google Sign in with Facebook

By using our site you agree to our privacy policy.