UniVA: Universal Video Agents towards Next-Generation Video Intelligence

Zhengyang Liang*1 Daoan Zhang*2 Huichi Zhou3 Rui Huang4 Bobo Li4 Yuechen Zhang5 Shengqiong Wu4
Xiaohan Wang6 Jiebo Luo2 Lizi Liao1 Hao Fei4
1Singapore Management University
2University of Rochester
3Imperial College London
4National University of Singapore
5The Chinese University of Hong Kong
6Stanford University
*Equal Contribution
Paper Demo

「UniVA」Universal Video Agents

We introduce UniVA, an agentic framework that achieves Breadth by unifying a vast suite of video tools on a single platform, and Depth through its core innovation of "Agentic Synergy." This synergy is enabled by a dual-agent, memory-augmented architecture that dynamically manages information flow, allowing tools like Understanding to actively guide Editing and Segmentation. By transforming isolated functions into a seamlessly collaborative workflow, UniVA solves complex planning and consistency problems that are intractable for single models.

An unparalleled creative experience

Video Gallery

Technical Details

Teaser Figure

Revolutionary User Experience

UniVA delivers unprecedented user experience and comprehensive, industrial-grade production power through its innovative agentic framework.

Pipeline Figure

Dual-Agent Architecture

Plan Agent decomposes user input into subtasks using global and user memory, while Act Agent executes via MCP protocol to generate versatile multimodal outputs.

Task Figure

Synergistic Components

UniVA's components work in synergy, revealing both depth in handling complex autonomous tasks and breadth in supporting interactive multi-tool creation.

Task Figure

Memory Mechanism

We build a three-level Memory Mechanism that dynamically manages information flow, allowing tools like Understanding to actively guide Editing and Segmentation.