RingNet

(Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision)

Soubhik Sanyal, Timo Bolkart, Haiwen Feng and Michael J. Black
Computer Vision and Pattern Recognition (CVPR) 2019, Long Beach, CA

Abstract

The estimation of 3D face shape from a single image must be robust to variations in lighting, head pose, expression, facial hair, makeup, and occlusions. Robustness requires a large training set of in-the-wild images, which by construction, lack ground truth 3D shape. To train a network without any 2D-to-3D supervision, we present RingNet, which learns to compute 3D face shape from a single image. Our key observation is that an individual’s face shape is constant across images, regardless of expression, pose, lighting, etc. RingNet leverages multiple images of a person and automatically detected 2D face features. It uses a novel loss that encourages the face shape to be similar when the identity is the same and different for different people. We achieve invariance to expression by representing the face using the FLAME model. Once trained, our method takes a single image and outputs the parameters of FLAME, which can be readily animated. Additionally we create a new database of faces “not quite in-the-wild” (NoW) with 3D head scans and high-resolution images of the subjects in a wide variety of conditions. We evaluate publicly available methods and find that RingNet is more accurate than methods that use 3D supervision. The dataset, model, and results are available for research purposes.

Video

NoW Challenge

The goal of this benchmark is to introduce a standard evaluation metric to measure the accuracy and robustness of 3D face reconstruction methods under variations in viewing angle, lighting, and common occlusions. For the current leaderboards, see the non-metrical evaluation (scale invariant) and the metrical evaluation pages. For information about the NoW dataset see the data overview. If you want to participate in the NOW challenge, follow the instructions on the challenge front page and the download page (requires registration and agreement to the license), from where you can download the data.

Update: NoW dataset is divided into validation set (20 subjects) and test set (80 subjects). Reference scans are available for the validation set.

More Information

The pdf preprint can be downloaded from here.
The arXiv version is also available here.
The inference code for RingNet, the monocular 3D face reconstruction method can be found on github.
Pre-trained models are available in the downloads page.
NoW challenge can be found here.
RingNet project page at MPI-IS
For questions, please contact ringnet@tue.mpg.de

News

04/22: The NoW website moved to https://now.is.tue.mpg.de.

Referencing RingNet and the NoW Dataset

@inproceedings{RingNet:CVPR:2019,
title = {Learning to Regress {3D} Face Shape and Expression from an Image without {3D} Supervision},
author = {Sanyal, Soubhik and Bolkart, Timo and Feng, Haiwen and Black, Michael},
booktitle = {Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
month = jun,
pages = {7763--7772},
year = {2019},
month_numeric = {6} 
}