Fri. Mar 29th, 2024

Content Assessment: Captivatingly Terrifying? One-Shot Megapixel Neural Head Avatars

Information - 94%
Insight - 95%
Relevance - 88%
Objectivity - 89%
Authority - 90%

91%

Excellent

A short percentage-based assessment of the qualitative benefit of the report highlighting a new and unique approach to image rendering from one-shot megapixel portraits.

Editor’s Note: From time to time, ComplexDiscovery highlights publicly available or privately purchasable announcements, content updates, and research from cyber, data, and legal discovery providers, research organizations, and ComplexDiscovery community members. While ComplexDiscovery regularly highlights this information, it does not assume any responsibility for content assertions.

To submit recommendations for consideration and inclusion in ComplexDiscovery’s cyber, data, and legal discovery-centric service, product, or research announcements, contact us today.


Background Note: Shared for the non-commercial educational benefit of cybersecurity, information governance, and legal professionals, this recently published research report explains the first system for creating megapixel avatars from single portrait images. The report may be beneficial for investigators and litigators monitoring potential image-based rendering tools and technologies that may be used in deepfake creation. 

Publication from arXiv*

MegaPortraits: One-shot Megapixel Neural Head Avatars

By Nikita Drobyshev, Jenya Chelishev, Taras Khakhulin, Aleksei Ivakhnenko, Victor Lempitsky, and Egor Zakharov

Abstract

In this work, we advance the neural head avatar technology to the megapixel resolution while focusing on the particularly challenging task of cross-driving synthesis, i.e., when the appearance of the driving image is substantially different from the animated source image. We propose a set of new neural architectures and training methods that can leverage both medium-resolution video data and high-resolution image data to achieve the desired levels of rendered image quality and generalization to novel views and motion. We demonstrate that suggested architectures and methods produce convincing high-resolution neural avatars, outperforming the competitors in the cross-driving scenario. Lastly, we show how a trained high-resolution neural avatar model can be distilled into a lightweight student model which runs in real-time and locks the identities of neural avatars to several dozens of pre-defined source images. Real-time operation and identity lock are essential for many practical applications head avatar systems.



Introduction

Neural head avatars offer a new fascinating way of creating virtual head models. They bypass the complexity of realistic physics-based modeling of human avatars by learning the shape and appearance directly from the videos of talking people. Over the last several years, methods that can create realistic avatars from a single photograph (one-shot) have been developed. They leverage extensive pre-training on the large datasets of videos of different people to create the avatars in the one-shot mode using generic knowledge about human appearance.

Despite the impressive results obtained by this class of methods, their quality is severely limited by the resolution of the training datasets. This limitation cannot be easily bypassed by collecting a higher resolution dataset since it needs to be simultaneously large-scale and diverse, i.e., include thousands of humans with multiple frames per person, diverse demographics, lighting, background, face expression, and head pose. To the best of our knowledge, all public datasets that meet these criteria are limited in resolution. As a result, even the most recent one-shot avatar systems learn the avatars at resolutions up to 512 × 512.

In our work, we make three main contributions. First, we propose a new model for one-shot neural avatars that achieves state-of-the-art cross-reenactment quality in up to 512 × 512 resolution. In our architecture, we utilize the idea of representing the appearance of the avatars as a latent 3D volume and propose a new way to combine it with the latent motion representations, which includes a novel contrastive loss that allows our system to achieve higher degrees of disentanglement between the latent motion and appearance representations. On top of that, we add a problem-specific gaze loss that increases the realism and accuracy of eye animation.

Our second and crucial contribution is showing how a model trained on medium-resolution videos can be “upgraded” to the megapixel (1024 × 1024) resolution using an additional dataset of high-resolution still images. As a result, our proposed method, while using the same training dataset, outperforms the baseline super-resolution approach for the task of cross-reenactment. We are thus the first to demonstrate neural head avatars in proper megapixel resolution.

Lastly, since many practical applications for human avatar creation require real-time or faster than real-time rendering, we distill our megapixel model into a ten times faster student model that runs at 130 FPS on a modern GPU. This significant speedup is possible since the student is trained for specific appearances (unlike the main model that can create new avatars for previously unseen people). Furthermore, the applications based on such a student model “locked” to predefined identities can prevent its misuse for creating “deep fakes” while at the same time achieving low rendering latency.

Read the original post.


Complete Report: MegaPortraits: One-shot Megapixel Neural Head Avatars (PDF) – Mouseover to Scroll

MegaPortraits- One-Shot Megapixel Neural Head Avatars

Read the original publication.

*Shared with permission based on educational and non-commercial distribution under Creative Commons 4.0 International license.

Publication Source:

Nikita Drobyshev, Jenya Chelishev, Taras Khakhulin, Aleksei Ivakhnenko, Victor Lempitsky, and Egor Zakharov. 2022. MegaPortraits: One-shot Megapixel Neural Head Avatars. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22), October 10–14, 2022, Lisboa, Portugal. ACM, New York, NY, USA, 18 pages. https://doi.org/10.1145/3503161.3547838


Additional Reading

Source: ComplexDiscovery

 

Generative Artificial Intelligence and Large Language Model Use

ComplexDiscovery OÜ recognizes the value of GAI and LLM tools in streamlining content creation processes and enhancing the overall quality of its research, writing, and editing efforts. To this end, ComplexDiscovery OÜ regularly employs GAI tools, including ChatGPT, Claude, Midjourney, and DALL-E, to assist, augment, and accelerate the development and publication of both new and revised content in posts and pages published (initiated in late 2022).

ComplexDiscovery also provides a ChatGPT-powered AI article assistant for its users. This feature leverages LLM capabilities to generate relevant and valuable insights related to specific page and post content published on ComplexDiscovery.com. By offering this AI-driven service, ComplexDiscovery OÜ aims to create a more interactive and engaging experience for its users, while highlighting the importance of responsible and ethical use of GAI and LLM technologies.

 

Have a Request?

If you have information or offering requests that you would like to ask us about, please let us know, and we will make our response to you a priority.

ComplexDiscovery OÜ is a highly recognized digital publication focused on providing detailed insights into the fields of cybersecurity, information governance, and eDiscovery. Based in Estonia, a hub for digital innovation, ComplexDiscovery OÜ upholds rigorous standards in journalistic integrity, delivering nuanced analyses of global trends, technology advancements, and the eDiscovery sector. The publication expertly connects intricate legal technology issues with the broader narrative of international business and current events, offering its readership invaluable insights for informed decision-making.

For the latest in law, technology, and business, visit ComplexDiscovery.com.