Computer Model Aims to Turn Movie Scripts into Animations

In a future where computer models can create animated movies directly from the script, rather than go through the tedious process of creating animations, the model would simply translate texts into computer-generated physical movements.

Well, thanks to the researchers at Carnegie Mellon University, that future may be nearer than you think.

Scientists have made considerable strides in computer modeling within the past decade. Along with getting machines to understand natural language, computers can now generate a series of physical poses to create realistic animations.

There’s just one problem. While these capabilities already exist, scientists have never been able to link natural language and physical poses.

That’s why, as an associate professor in the Language Technologies Institute (LTI), Louise-Philipe Morency teamed up with LTI Ph.D. student, Chaitanya Ahuja to bring these two worlds together. The team is using a neural architecture which they’re calling Joint Language-to-Pose or J2LP.

Using the J2LP model, the scientists were able to embed sentences and physical motions jointly. That way, it’ll learn how language is related to action, movement, and gesture.

Speaking on the project, Morency said:

“Right now, we’re talking about animating virtual characters. Eventually, this link between language and gestures could be applied to robots; we might be able to simply tell a personal assistant robot what we want it to do.”

According to the researcher, the link could also go the other way – between language and animation. In such a case, a computer would be able to describe what’s happening in a video.

Creating the J2LP Computer Model

To develop J2LP model, the researchers had to use a curriculum learning approach.

The computer model started with short essay sequences like “A person walks forward.” And eventually, it progressed into more challenging sequences such as “A person steps forward, then turns around and steps forward again,” or “A person jumps over an obstacle while running.”

Verbs and adverbs describe the action as well as its speed or acceleration. On the other hand, the nouns and adjectives describe the location and directions.

According to Ahuja, the model’s ultimate goal is to animate complex sequences with multiple actions that are either happening in sequence or simultaneously.

At the moment, the computer-generated animations are in stick figures. However, the researchers point out that a lot of things have to be happening at the same time to make the animation more complicated.

Morency noted:

“Every time you move your legs, you also move your arms, your torso and possibly your head. The body animations need to coordinate these different components, while at the same time achieving complex actions.”