Dmitry Rodin -

We present LiveFace, a modular neural rendering system that achieves photorealistic talking-head animation at 30 fps on low-end mobile devices with as little as ~10 GFLOPS of compute (e.g., Qualcomm Snapdragon 439). Prior photorealistic facial animation systems either require cloud infrastructure with 100M+ parameter models (HeyGen, DID , Synthesia) or demand desktop-class GPUs (MetaHuman, Audio2Face), while on-device alternatives sacrifice realism for stylized cartoon aesthetics (Apple Memoji, Samsung AR Emoji). LiveFace bridges this gap through three key contributions: (1) a decomposed per-avatar decoder architecture that factorizes the face into four independently rendered regions-mouth, eyes, hair, and body-each handled by a compact neural decoder (1.3-5.7M parameters) augmented with a 128-dimensional learnable identity embedding; (2) a universal compositor-upscaler (~7M parameters) shared across all avatars that composites the decoded patches onto a 9:16 portrait canvas and upscales to 360x640 (or 384x672) in a single forward pass; and (3) a videodriven knowledge distillation pipeline that uses RAVDESS emotional speech videos as driving sources for LivePortrait (~300M parameters) to generate diverse, naturalistic training data for the student decoders. The MouthDecoder supports dual-input conditioning: both viseme-based (audio-driven) and landmark-based (MediaPipe Face Mesh) modes, enabling flexible integration with different upstream pipelines. A perframe quality filter employing Haar cascade face detection, Laplacian blur scoring, and SSIM comparison ensures training data integrity by rejecting approximately 0.6% of generated frames. A working V3 prototype has been trained and validated, demonstrating that the architecture successfully produces photorealistic output from compact per-avatar models. The full system comprises ~20M INT8 parameters with a 08.04.2026, 02:01 file:///C:/Users/123/AppData/Local/Temp/arxiv_paper_liveface_v2_EN.html 1/23 total inference latency of ~19 ms per frame, enabling real-time, fully offline operation on commodity mobile hardware without any cloud dependency.