Multi4D: High-Fidelity Dynamic Gaussian Splatting via Multi-Level Competitive Allocation

Multi4D teaser: dynamic reconstruction and 4D segmentation results — **Multi4D** enables (1) high-quality, efficient dynamic scene reconstruction via competitive multi-level specialization, and (2) compact, high-accuracy 4D segmentation with fast inference.

Overview

Abstract

Dynamic 3D Gaussian splatting faces a fundamental tension between motion consistency and visual fidelity. Deformation-based approaches preserve temporal correspondence but suffer from motion over-factorization, oversmoothing high-frequency dynamics. In contrast, 4D-primitive methods capture fine visual details yet incur temporal over-parameterization, breaking object identity and leading to severe storage overhead. To resolve this, we introduce Multi4D, a framework for high-fidelity dynamic Gaussian Splatting based on multi-level competitive allocation. Instead of a monolithic representation, we distribute modeling capacity across three structured levels: static structure, persistent dynamic geometry, and transient appearance primitives. Through shared rasterization and residual-driven optimization, these levels dynamically compete to explain photometric error, enabling adaptive specialization without pre-assigned decomposition. This allocation preserves long-term motion consistency while capturing fine dynamic detail, achieving state-of-the-art rendering quality and real-time performance with significantly fewer dynamic primitives. Furthermore, because our representation explicitly tracks compact persistent Gaussians over time, semantic features can be embedded afterward, enabling Multi4D to achieve state-of-the-art 4D segmentation accuracy with an order-of-magnitude speedup.

Dynamic Gaussian Splatting
4D Scene Reconstruction
Novel View Synthesis
4D Segmentation

Method

**Overview of the Multi4D pipeline.** A bottom-up, self-regularized training scheme enables competitive allocation across multi-level Gaussian subsets via cross-set self-supervision. After optimization, the persistent subset is frozen and reused for efficient downstream 4D segmentation.

Multi4D decomposes a dynamic scene into three functionally specialized Gaussian subsets that compete under a shared photometric objective: Static Gaussians anchor the time-invariant structure; Persistent Dynamic Gaussians model long-term, trackable motion through a geometry-only deformation field; and Transient Gaussians (4D primitives) absorb high-frequency appearance residuals. All subsets are rendered in a single differentiable pass — shared transmittance couples their gradients and induces competition, so once one subset explains a region, residual-driven densification in the others is suppressed. A bottom-up training strategy with velocity-aware periodical lifting and mask-aware utility-based pruning yields compact, specialized representations, and the persistent subset can be frozen for fast, accurate 4D semantic embedding.

Novel View Synthesis

Comparisons against state-of-the-art deformation-based and 4D-primitive baselines across three datasets. Each clip shows Ground Truth, a baseline, and Ours (left → right) with per-frame metrics rendered in-video; some clips play at reduced speed for clarity.

Theater — GT · E-D3DGS · Ours

Theater — GT · STG · Ours

Train — GT · E-D3DGS · Ours

Train — GT · STG · Ours

Painter — GT · E-D3DGS · Ours

Painter — GT · STG · Ours

Quantitative comparison on Technicolor (mean over scenes). **Best** in bold.
Method	PSNR ↑	DSSIM ↓	LPIPS ↓	FPS ↑
DyNeRF	31.80	—	0.1400	0.02
HyperReel	32.70	0.047	0.1090	4.0
4DGaussians	30.86	0.071	0.1647	35
Def-3DGS	30.95	0.070	0.1553	76
E-D3DGS	32.89	0.049	0.1114	79
4DGS	32.07	0.054	0.1189	55
STG	33.35	0.040	0.0846	86
Multi4D (Ours)	34.30	0.037	0.0704	161

Cook Spinach — GT · 4DGS · Ours

Cook Spinach — GT · 4DGaussian · Ours

Flame Steak — GT · 4DGS · Ours

Flame Steak — GT · 4DGaussian · Ours

Sear Steak — GT · 4DGS · Ours

Sear Steak — GT · 4DGaussian · Ours

Quantitative comparison on Neu3D (mean over scenes). **Best** in bold.
Method	PSNR ↑	DSSIM ↓	LPIPS ↓	FPS ↑
NeRFPlayer	30.69	0.034	0.1110	0.05
HyperReel	31.10	0.036	0.0985	2.0
HexPlane	31.71	—	0.0750	0.56
Def-3DGS	30.98	0.033	0.0594	29
4DGaussian	31.12	0.032	0.0588	53
DeGauss	31.52	0.029	0.0475	157
E-D3DGS	31.20	0.026	0.0369	70
4DGS	31.57	0.029	0.0573	114
STG	32.04	0.026	0.0441	140
Multi4D (Ours)	32.30	0.026	0.0440	217

Monocular setting. Clips show four panels: GT · 4DGaussian · 4DGS · Ours.

Bell

Sieve

Cup

Quantitative comparison on monocular NeRF-DS (mean over scenes). **Best** in bold.
Method	PSNR ↑	DSSIM ↓	LPIPS ↓
NeRF-DS	23.24	0.081	0.2402
HyperNeRF	19.01	0.092	0.2615
Def-3DGS	23.43	0.086	0.2201
4DGaussian	22.79	0.088	0.2115
4DGS	21.51	0.108	0.3390
STG	22.54	0.089	0.3145
Multi4D (Ours)	23.69	0.077	0.1903

4D Segmentation & Tracking

Because Multi4D explicitly tracks a compact set of persistent Gaussians, semantic features can be embedded afterward — yielding state-of-the-art 4D segmentation with an order-of-magnitude speedup. Comparisons against TRASE; tracking uses 2D Co-Tracker as a point-based reference.