Señorita-2M: AHigh-Quality Instruction-based Dataset for General Video Editing by Video Specialists

Jun-15-2026, 02:32:23 GMT–Neural Information Processing Systems

Video content editing has a wide range of applications. With the advancement of diffusion-based generative models, video editing techniques have made remarkable progress, yet they still remain far from practical usability. Existing inversion-based video editing methods are time-consuming and struggle to maintain consistency in unedited regions. Although instruction-based methods have high theoretical potential, they face significant challenges in constructing high-quality training datasets - current datasets suffer from issues such as editing correctness, frame consistency, and sample diversity. To bridge these gaps, we introduce the Señorita2M dataset, a large-scale, diverse, and high-quality video editing dataset.

art style, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Jun-15-2026, 02:32:23 GMT

Conferences PDF

Add feedback

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.93)

Industry:
- Leisure & Entertainment (1.00)
- Media > Photography (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language (1.00)
  - Machine Learning > Neural Networks (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found