Over++: Generative Video Compositing for Layer Interaction Effects

Luchao Qi^1,3,* Jiaye Wu² Jun Myeong Choi¹ Cary Phillips³ Roni Sengupta¹ Dan B Goldman³

¹University of North Carolina at Chapel Hill ²University of Maryland ³Industrial Light & Magic

* Work done during an internship at Industrial Light & Magic

I. Effect Generation

III. Keyframe masking

IV. Background Swapping

We introduce Over++, a framework for generating environmental effects and enabling effect editing through mask- or prompt-guided control. Explore the sections below for more details:

Baseline Comparisons

Our Framework

Naively compositing the foreground over the background layer (copy-paste: $\mathcal{I}_{\text{over}} = \mathcal{I}_{\text{fg}} \oplus \mathcal{I}_{\text{bg}}$) produces a video that lacks environmental effects such as shadows or wakes. Given such an input composite and an optional binary mask ($\mathcal{M}_{\text{effect}}$) indicating the target effect regions, our model generates the desired effects within those regions.

Our method is trained on both paired and unpaired data. For unpaired data, we zero out the latent codes of $\mathcal{I}_{\text{over}}$ and $\mathcal{M}_{\text{effect}}$. (Text prompts $\mathcal{T}$ are not shown here for simplicity.)

References

Baseline comparisons

Ku et al. AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks. TMLR, 2024.
Gao et al. LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning. ArXiv, 2025.
Jiang et al. VACE: All-in-One Video Creation and Editing. ICCV, 2025.
Runway. Runway Aleph. 2025.

Data collection

Gillman et al. Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals. NeurIPS, 2025.
Ruiz et al. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. CVPR, 2023.
Lu et al. Omnimatte: Associating Objects and Their Effects in Video. CVPR, 2021.
Lee et al. Generative Omnimatte: Learning to Decompose Video into Layers. CVPR, 2025.
Lin et al. OmnimatteRF: Robust Omnimatte with 3D Background Modeling. ICCV, 2023.
Greff et al. Kubric: A scalable dataset generator. CVPR, 2022.

Failure cases

Sadat et al. Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models ICLR, 2025.

Back to baseline comparisons Back to top

Societal Impact

We acknowledge that powerful video editing tools, including ours, may raise ethical considerations depending on their context of use. While our work is intended to augment video compositing and professional workflows, such capabilities could potentially be misused. We therefore encourage responsible use aligned with community guidelines and emphasize transparency regarding any applied edits.

Acknowledgements

Thank you to all ILM staff who assisted in preparing this work, especially Miguel Perez Senent for the 3D boat and ocean elements used in Figure 3 (row 2) and Figure 6 (row 3), and ILM leaders Rob Bredow, Francois Chardavoine, and Greg Grusby for their assistance in clearing this work for publication.
The views and conclusions contained herein are those of the authors and do not represent the official policies or endorsements of these institutions.

BibTeX

@misc{qi2025overgenerativevideocompositing,
  title={Over++: Generative Video Compositing for Layer Interaction Effects}, 
  author={Luchao Qi and Jiaye Wu and Jun Myeong Choi and Cary Phillips and Roni Sengupta and Dan B Goldman},
  year={2025},
  eprint={2512.19661},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2512.19661}, 
}

Over++: Generative Video Compositing for Layer Interaction Effects

TL;DR: Generate environmental effects between any foreground and background layers.

I. Effect Generation

II. Effect Editing