AITopics | controllability

Collaborating Authors

controllability

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Compositional Transformers for Scene Generation Supplementary Material

Neural Information Processing SystemsApr-25-2026, 20:32:14 GMT

Figure 10: A visualization of the layouts and unsupervised depth maps produced by GANformer2's planning stage while synthesizing varied images, making the generative process more structured and interpretable. GANformer2 creates the layout sequentially, segment-by-segment, to capture the scene's compositionality, effectively allowing us to add or remove objects from the resulting images. Since GANformer2 creates each scene as a composition of interacting segments, it supports adding and removal of objects while respecting various dependencies with their surroundings: Amodal completion of occluded objects is denoted by pink, updates of shadows and especially reflections by cyan, and other object removals cases by yellow. Shape manipulation is denoted by green, while position changes by yellow. Color manipulation is denoted by pink, while updates of material by cyan.

artificial intelligence, layout, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

VideoComposer Compositional Video Synthesis with Motion

Neural Information Processing SystemsApr-25-2026, 08:49:14 GMT

The pursuit of controllability as a higher standard of visual content creation has yielded remarkable progress in customizable image synthesis. However, achieving controllable video synthesis remains challenging due to the large variation of temporal dynamics and the requirement of cross-frame temporal consistency. Based on the paradigm of compositional generation, this work presents VideoComposer that allows users to flexibly compose a video with textual conditions, spatial conditions, and more importantly temporal conditions. Specifically, considering the characteristic of video data, we introduce the motion vector from compressed videos as an explicit control signal to provide guidance regarding temporal dynamics. In addition, we develop a Spatio-Temporal Condition encoder (STCencoder) that serves as a unified interface to effectively incorporate the spatial and temporal relations of sequential inputs, with which the model could make better use of temporal conditions and hence achieve higher inter-frame consistency. Extensive experimental results suggest that VideoComposer is able to control the spatial and temporal patterns simultaneously within a synthesized video in various forms, such as text description, sketch sequence, reference video, or even simply hand-crafted motions. The code and models are publicly available at https://videocomposer.github.io.

arxiv preprint arxiv, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

InstructG2I: Synthesizing Images from Multimodal Attributed Graphs

Neural Information Processing SystemsMar-22-2026, 15:30:47 GMT

In this paper, we approach an overlooked yet critical task Graph2Image: generating images from multimodal attributed graphs (MMAGs). This task poses significant challenges due to the explosion in graph size, dependencies among graph entities, and the need for controllability in graph conditions. To address these challenges, we propose a graph context-conditioned diffusion model called InstructG2I. InstructG2I first exploits the graph structure and multimodal information to conduct informative neighbor sampling by combining personalized page rank and re-ranking based on vision-language features. Then, a graph QFormer encoder adaptively encodes the graph nodes into an auxiliary set of graph prompts to guide the denoising process of diffusion. Finally, we propose graph classifier-free guidance, enabling controllable generation by varying the strength of graph guidance and multiple connected edges to a node. Extensive experiments conducted on three datasets from different domains demonstrate the effectiveness and controllability of our approach.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.58)

Add feedback

Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

Neural Information Processing SystemsMar-21-2026, 22:54:14 GMT

World models can foresee the outcomes of different actions, which is of paramount importance for autonomous driving. Nevertheless, existing driving world models still have limitations in generalization to unseen environments, prediction fidelity of critical details, and action controllability for flexible application.

artificial intelligence, name change, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback