|
Content Assessment: Captivatingly Terrifying? One-Shot Megapixel Neural Head Avatars
Information - 94%
Insight - 95%
Relevance - 88%
Objectivity - 89%
Authority - 90%
91%
Excellent
A short percentage-based assessment of the qualitative benefit of the report highlighting a new and unique approach to image rendering from one-shot megapixel portraits.
Editor’s Note: From time to time, ComplexDiscovery highlights publicly available or privately purchasable announcements, content updates, and research from cyber, data, and legal discovery providers, research organizations, and ComplexDiscovery community members. While ComplexDiscovery regularly highlights this information, it does not assume any responsibility for content assertions.
To submit recommendations for consideration and inclusion in ComplexDiscovery’s cyber, data, and legal discovery-centric service, product, or research announcements, contact us today.
Background Note: Shared for the non-commercial educational benefit of cybersecurity, information governance, and legal professionals, this recently published research report explains the first system for creating megapixel avatars from single portrait images. The report may be beneficial for investigators and litigators monitoring potential image-based rendering tools and technologies that may be used in deepfake creation.
Publication from arXiv*
MegaPortraits: One-shot Megapixel Neural Head Avatars
By Nikita Drobyshev, Jenya Chelishev, Taras Khakhulin, Aleksei Ivakhnenko, Victor Lempitsky, and Egor Zakharov
Abstract
In this work, we advance the neural head avatar technology to the megapixel resolution while focusing on the particularly challenging task of cross-driving synthesis, i.e., when the appearance of the driving image is substantially different from the animated source image. We propose a set of new neural architectures and training methods that can leverage both medium-resolution video data and high-resolution image data to achieve the desired levels of rendered image quality and generalization to novel views and motion. We demonstrate that suggested architectures and methods produce convincing high-resolution neural avatars, outperforming the competitors in the cross-driving scenario. Lastly, we show how a trained high-resolution neural avatar model can be distilled into a lightweight student model which runs in real-time and locks the identities of neural avatars to several dozens of pre-defined source images. Real-time operation and identity lock are essential for many practical applications head avatar systems.
Introduction
Neural head avatars offer a new fascinating way of creating virtual head models. They bypass the complexity of realistic physics-based modeling of human avatars by learning the shape and appearance directly from the videos of talking people. Over the last several years, methods that can create realistic avatars from a single photograph (one-shot) have been developed. They leverage extensive pre-training on the large datasets of videos of different people to create the avatars in the one-shot mode using generic knowledge about human appearance.
Despite the impressive results obtained by this class of methods, their quality is severely limited by the resolution of the training datasets. This limitation cannot be easily bypassed by collecting a higher resolution dataset since it needs to be simultaneously large-scale and diverse, i.e., include thousands of humans with multiple frames per person, diverse demographics, lighting, background, face expression, and head pose. To the best of our knowledge, all public datasets that meet these criteria are limited in resolution. As a result, even the most recent one-shot avatar systems learn the avatars at resolutions up to 512 × 512.
In our work, we make three main contributions. First, we propose a new model for one-shot neural avatars that achieves state-of-the-art cross-reenactment quality in up to 512 × 512 resolution. In our architecture, we utilize the idea of representing the appearance of the avatars as a latent 3D volume and propose a new way to combine it with the latent motion representations, which includes a novel contrastive loss that allows our system to achieve higher degrees of disentanglement between the latent motion and appearance representations. On top of that, we add a problem-specific gaze loss that increases the realism and accuracy of eye animation.
Our second and crucial contribution is showing how a model trained on medium-resolution videos can be “upgraded” to the megapixel (1024 × 1024) resolution using an additional dataset of high-resolution still images. As a result, our proposed method, while using the same training dataset, outperforms the baseline super-resolution approach for the task of cross-reenactment. We are thus the first to demonstrate neural head avatars in proper megapixel resolution.
Lastly, since many practical applications for human avatar creation require real-time or faster than real-time rendering, we distill our megapixel model into a ten times faster student model that runs at 130 FPS on a modern GPU. This significant speedup is possible since the student is trained for specific appearances (unlike the main model that can create new avatars for previously unseen people). Furthermore, the applications based on such a student model “locked” to predefined identities can prevent its misuse for creating “deep fakes” while at the same time achieving low rendering latency.
Complete Report: MegaPortraits: One-shot Megapixel Neural Head Avatars (PDF) – Mouseover to Scroll
MegaPortraits- One-Shot Megapixel Neural Head AvatarsRead the original publication.
*Shared with permission based on educational and non-commercial distribution under Creative Commons 4.0 International license.
Publication Source:
Nikita Drobyshev, Jenya Chelishev, Taras Khakhulin, Aleksei Ivakhnenko, Victor Lempitsky, and Egor Zakharov. 2022. MegaPortraits: One-shot Megapixel Neural Head Avatars. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22), October 10–14, 2022, Lisboa, Portugal. ACM, New York, NY, USA, 18 pages. https://doi.org/10.1145/3503161.3547838
Additional Reading
- [Samsung Labs] MegaPortraits: One-shot Megapixel Neural Head Avatars
- Defining Cyber Discovery? A Definition and Framework
Source: ComplexDiscovery