Multi4D High-Fidelity Dynamic Gaussian Splatting via Multi-Level Competitive Allocation

ECCV 2026

ETH Zürich

Paper (coming soon) arXiv (coming soon) Code Video
Multi4D teaser: dynamic reconstruction and 4D segmentation results
Multi4D enables (1) high-quality, efficient dynamic scene reconstruction via competitive multi-level specialization, and (2) compact, high-accuracy 4D segmentation with fast inference.

Overview

Abstract

Dynamic 3D Gaussian splatting faces a fundamental tension between motion consistency and visual fidelity. Deformation-based approaches preserve temporal correspondence but suffer from motion over-factorization, oversmoothing high-frequency dynamics. In contrast, 4D-primitive methods capture fine visual details yet incur temporal over-parameterization, breaking object identity and leading to severe storage overhead. To resolve this, we introduce Multi4D, a framework for high-fidelity dynamic Gaussian Splatting based on multi-level competitive allocation. Instead of a monolithic representation, we distribute modeling capacity across three structured levels: static structure, persistent dynamic geometry, and transient appearance primitives. Through shared rasterization and residual-driven optimization, these levels dynamically compete to explain photometric error, enabling adaptive specialization without pre-assigned decomposition. This allocation preserves long-term motion consistency while capturing fine dynamic detail, achieving state-of-the-art rendering quality and real-time performance with significantly fewer dynamic primitives. Furthermore, because our representation explicitly tracks compact persistent Gaussians over time, semantic features can be embedded afterward, enabling Multi4D to achieve state-of-the-art 4D segmentation accuracy with an order-of-magnitude speedup.

Method

Overview of the Multi4D pipeline
Overview of the Multi4D pipeline. A bottom-up, self-regularized training scheme enables competitive allocation across multi-level Gaussian subsets via cross-set self-supervision. After optimization, the persistent subset is frozen and reused for efficient downstream 4D segmentation.

Multi4D decomposes a dynamic scene into three functionally specialized Gaussian subsets that compete under a shared photometric objective: Static Gaussians anchor the time-invariant structure; Persistent Dynamic Gaussians model long-term, trackable motion through a geometry-only deformation field; and Transient Gaussians (4D primitives) absorb high-frequency appearance residuals. All subsets are rendered in a single differentiable pass — shared transmittance couples their gradients and induces competition, so once one subset explains a region, residual-driven densification in the others is suppressed. A bottom-up training strategy with velocity-aware periodical lifting and mask-aware utility-based pruning yields compact, specialized representations, and the persistent subset can be frozen for fast, accurate 4D semantic embedding.

Novel View Synthesis

Comparisons against state-of-the-art deformation-based and 4D-primitive baselines across three datasets. Each clip shows Ground Truth, a baseline, and Ours (left → right) with per-frame metrics rendered in-video; some clips play at reduced speed for clarity.

Theater — GT · E-D3DGS · Ours
Theater — GT · STG · Ours
Train — GT · E-D3DGS · Ours
Train — GT · STG · Ours
Painter — GT · E-D3DGS · Ours
Painter — GT · STG · Ours
Quantitative comparison on Technicolor (mean over scenes). Best in bold.
MethodPSNR ↑DSSIM ↓LPIPS ↓FPS ↑
DyNeRF31.800.14000.02
HyperReel32.700.0470.10904.0
4DGaussians30.860.0710.164735
Def-3DGS30.950.0700.155376
E-D3DGS32.890.0490.111479
4DGS32.070.0540.118955
STG33.350.0400.084686
Multi4D (Ours)34.300.0370.0704161
Cook Spinach — GT · 4DGS · Ours
Cook Spinach — GT · 4DGaussian · Ours
Flame Steak — GT · 4DGS · Ours
Flame Steak — GT · 4DGaussian · Ours
Sear Steak — GT · 4DGS · Ours
Sear Steak — GT · 4DGaussian · Ours
Quantitative comparison on Neu3D (mean over scenes). Best in bold.
MethodPSNR ↑DSSIM ↓LPIPS ↓FPS ↑
NeRFPlayer30.690.0340.11100.05
HyperReel31.100.0360.09852.0
HexPlane31.710.07500.56
Def-3DGS30.980.0330.059429
4DGaussian31.120.0320.058853
DeGauss31.520.0290.0475157
E-D3DGS31.200.0260.036970
4DGS31.570.0290.0573114
STG32.040.0260.0441140
Multi4D (Ours)32.300.0260.0440217

Monocular setting. Clips show four panels: GT · 4DGaussian · 4DGS · Ours.

Bell
Sieve
Cup
Quantitative comparison on monocular NeRF-DS (mean over scenes). Best in bold.
MethodPSNR ↑DSSIM ↓LPIPS ↓
NeRF-DS23.240.0810.2402
HyperNeRF19.010.0920.2615
Def-3DGS23.430.0860.2201
4DGaussian22.790.0880.2115
4DGS21.510.1080.3390
STG22.540.0890.3145
Multi4D (Ours)23.690.0770.1903

4D Segmentation & Tracking

Because Multi4D explicitly tracks a compact set of persistent Gaussians, semantic features can be embedded afterward — yielding state-of-the-art 4D segmentation with an order-of-magnitude speedup. Comparisons against TRASE; tracking uses 2D Co-Tracker as a point-based reference.

Semantic Feature Rendering

Cook Spinach — feature PCA: TRASE · Ours
Sear Steak — feature PCA: TRASE · Ours

Persistent All Points Tracking

Cook Spinach — Co-Tracker ref · TRASE · Ours
Sear Steak — Co-Tracker ref · TRASE · Ours
Semantic segmentation on the Neu3D-Mask benchmark (average over scenes). Best in bold.
MethodmIoU ↑mAcc ↑
OpenGaussian0.81780.9899
SA4D0.88320.9931
TRASE0.89320.9938
Multi4D (Ours)0.91420.9952

BibTeX

The BibTeX entry will be added once the paper is published. (coming soon)