Robotics and artificial intelligence researchers have been working on autonomous robots with humanoid appearance and abilities of almost human-like quality for years. Socially interactive robots focus on social interaction with human users and need to provide a transparent interface that is as close to human-human interaction as possible, in order to make the interaction as fast and intuitive as possible. As facial expressions play a central role in human-human communication, socially interactive robots need to possess face of some sort in addition to speech recognition and synthesizing abilities. Robot faces were implemented by researchers in several ways which vary in terms of human-likeness and expressiveness. We proposes a way to implement a parameterized facial animation software which renders a facial animation to be displayed on a robotic guide via a screen or projection. A prototype software is developed which can be controlled through interprocess communication. The virtual face and its expressions are designed to raise the willingness of potential users to engage in interaction with the robotic guide and the face displays signs of “being alive” to show that it is active.
My field of studies is media and communication related computer sciences. I worked on my Bachelor’s Thesis, titled “Parameterized Facial Animation for Human Robot Interaction”, from September 1st 2014 until February 27th 2015 at Reutlingen University (Germany) in cooperation with Prof. Dr. rer. nat. Uwe Kloos and Prof. Dr. rer. nat. Matthias Rätsch.
I’m currently working on papers based on the work I began for my Thesis and publish my results here, so it may help other students working on similar subjects, but I also hope to get feedback and suggestions from other researchers who may stumble upon it.
I plan on releasing the prototype software shortly. Right now you can read a shortpaper I wrote for the Informatic Inside conference at Reutlingen University in 2015.
Feel free to leave comment at the bottom or send me an e-mail. I am always interested in professional discussions (and am aware that there is a lot of room for improvement).
This short videos explains the idea and shows the abilities of the software prototype to prove my concept:
I am continuing this project under the name emofani (Emotion Model based Face Animation) on github. Feel free to contact me with questions, suggestions, bugfixes, etc.: https://github.com/steffenwittig/emofani
To parameterize the facial expressions the circumplex model of emotion by Russell will be used as it offers enough precision to satisfy the use case while being very simple to implement by using a two-dimensional animation blend tree, which is available as part of Unity3D’s animation system.
Unity 3D’s blend tree allows to place animation resources in a one- or two-dimensional coordinate systems based on one or respectively two variables declared when setting up the related animation controller. These variables can be accessed and set by scripts which hold a reference to the animation controller. They define the position of a point inside the coordinate space.
All expressions in the form of a pose have to be placed at the same (happy, exited, frustrated, sad, sleepy, relaxed) positions as the emotion words in the circumplex model. To satisfy the use case an attentive facial expression will be needed for when the robot is listening to voice commands. Since this is not an emotion that was specified in the circumplex model a fitting position has to be found. Attentive could be described as a high arousal and neutral pleasure state, and a corresponding position was chosen. This results in the following reduced and adapted version of Russell’s circumplex model.
Sending a value of 100 for arousal as well as for pleasure would set the current position in the blend tree beyond the emotion space. The user would expect the most pleasured and aroused expression at this position. This could be achieved by transforming the circular emotion space to a square. But this would yield a distinctively larger vector from e.g. neutral to exited than from neutral to happy despite both expressions being near the edge of the emotion space. As the vector length indicates the intensity of expressions this is not an acceptable solution because the vector lengths would differ depending on the angle at which the expressions are placed.
Another solution would be to extrapolate beyond the borders of the emotion space by amplifying the closest expression within the emotion space. But this would result in nonsensical expressions (how should 120% exited look?). The most reasonable solution was deemed to show the closest expression, which is still within the emotion space (or a mix of the two closes expressions) if the currently specified position of the blending sample is outside the borders of the emotion space. This means that vector ranging from neutral to the blending sample will not cause a change of intensity beyond the length of 100.