AITopics | Graphics

VideoGUI: A Benchmark for GUI Automation from Instructional Videos Kevin Qinghong Lin

Neural Information Processing SystemsMay-30-2025, 11:53:26 GMT

Graphical User Interface (GUI) automation holds significant promise for enhancing human productivity by assisting with computer tasks. Existing task formulations primarily focus on simple tasks that can be specified by a single, language-only instruction, such as "Insert a new slide." In this work, we introduce VideoGUI, a novel multi-modal benchmark designed to evaluate GUI assistants on visual-centric GUI tasks. Sourced from high-quality web instructional videos, our benchmark focuses on tasks involving professional and novel software (e.g., Adobe Photoshop or Stable Diffusion WebUI) and complex activities (e.g., video editing). VideoGUI evaluates GUI assistants through a hierarchical process, allowing for identification of the specific levels at which they may fail: (i) high-level planning: reconstruct procedural subtasks from visual conditions without language descriptions; (ii) middle-level planning: generate sequences of precise action narrations based on visual state (i.e., screenshot) and goals; (iii) atomic action execution: perform specific actions such as accurately clicking designated elements. For each level, we design evaluation metrics across individual dimensions to provide clear signals, such as individual performance in clicking, dragging, typing, and scrolling for atomic action execution. Our evaluation on VideoGUI reveals that even the SoTA large multimodal model GPT4o performs poorly on visual-centric GUI tasks, especially for high-level planning.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.61)

Industry:

Education > Educational Technology > Audio & Video (0.71)
Education > Educational Technology > Media (0.61)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(3 more...)

Add feedback

NeuPhysics: Editable Neural Geometry and Physics from Monocular Videos Alexander Gao

Neural Information Processing SystemsMay-29-2025, 21:29:21 GMT

We present a method for learning 3D geometry and physics parameters of a dynamic scene from only a monocular RGB video input. To decouple the learning of underlying scene geometry from dynamic motion, we represent the scene as a time-invariant signed distance function (SDF) which serves as a reference frame, along with a time-conditioned deformation field. We further bridge this neural geometry representation with a differentiable physics simulator by designing a twoway conversion between the neural field and its corresponding hexahedral mesh, enabling us to estimate physics parameters from the source video by minimizing a cycle consistency loss. Our method also allows a user to interactively edit 3D objects from the source video by modifying the recovered hexahedral mesh, and propagating the operation back to the neural field representation. Experiments show that our method achieves superior mesh and video reconstruction of dynamic scenes compared to competing Neural Field approaches, and we provide extensive examples which demonstrate its ability to extract useful 3D representations from videos captured with consumer-grade cameras.

artificial intelligence, machine learning, physics, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Maryland (0.14)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Graphics (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Learning Deformable Tetrahedral Meshes for 3D Reconstruction

Neural Information Processing SystemsMay-29-2025, 17:17:09 GMT

Previous work on neural 3D reconstruction demonstrated benefits, but also limitations, of point cloud, voxel, surface mesh, and implicit function representations.

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Technology:

Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Graphics (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learning 3D Garment Animation from Trajectories of A Piece of Cloth Chen Change Loy 1

Neural Information Processing SystemsMay-29-2025, 10:21:31 GMT

Garment animation is ubiquitous in various applications, such as virtual reality, gaming, and film production. Learning-based approaches obtain compelling performance in animating diverse garments under versatile scenarios. Nevertheless, to mimic the deformations of the observed garments, data-driven methods often require large-scale garment data, which are both resource-expensive and time-consuming. In addition, forcing models to match the dynamics of observed garment animation may hinder the potential to generalize to unseen cases. In this paper, instead of using garment-wise supervised learning we adopt a disentangled scheme to learn how to animate observed garments: 1) learning constitutive behaviors from the observed cloth; 2) dynamically animate various garments constrained by the learned constitutive laws. Specifically, we propose an Energy Unit network (EUNet) to model the constitutive relations in the form of energy.

artificial intelligence, eunet, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Graphics (0.92)

Add feedback

GPU-Accelerated Primal Learning for Extremely Fast Large-Scale Classification: Supplementary Material

Neural Information Processing SystemsMay-29-2025, 03:32:58 GMT

The following summarizes the major operations of the GPU-optimized TRON logistic regression solver, TRON-LR-GPU, as described in the main paper and herein. For each set of operations, the original lines from Algorithm 1 being optimized are listed in red.

artificial intelligence, hessian-vector product, machine learning, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.16)
North America > Canada (0.14)

Genre:

Research Report > New Finding (0.37)
Research Report > Experimental Study (0.37)

Technology:

Information Technology > Hardware (0.95)
Information Technology > Graphics (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

Add feedback

HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation

Neural Information Processing SystemsMay-28-2025, 18:58:44 GMT

Human image animation involves generating videos from a character photo, allowing user control and unlocking the potential for video and movie production. While recent approaches yield impressive results using high-quality training data, the inaccessibility of these datasets hampers fair and transparent benchmarking. Moreover, these approaches prioritize 2D human motion and overlook the significance of camera motions in videos, leading to limited control and unstable video generation. To demystify the training data, we present HumanVid, the first large-scale high-quality dataset tailored for human image animation, which combines crafted real-world and synthetic data. For the real-world data, we compile a vast collection of real-world videos from the internet.

artificial intelligence, machine learning, video, (15 more...)

Neural Information Processing Systems

Country:

Europe (0.28)
Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Television (0.52)
Media > Photography (0.52)
Media > Film (0.52)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

1909ac72220bf5016b6c93f08b66cf36-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsMay-28-2025, 17:37:10 GMT

artificial intelligence, machine learning, nighttime, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Graphics (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

185fdf627eaae2abab36205dcd19b817-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsMay-28-2025, 17:17:24 GMT

artificial intelligence, dataset, machine learning, (15 more...)

Neural Information Processing Systems

Country: Europe (0.14)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

2082273791021571c410f41d565d0b45-Supplemental-Conference.pdf

Neural Information Processing SystemsMay-28-2025, 15:12:51 GMT

Privacy Assessment on Reconstructed Images: Are Existing Evaluation Metrics Faithful to Human Perception? In Section 4.1, we briefly introduced how humans annotate the reconstructed images for different datasets. In the supplementary material, we have included a graphical user interface (GUI) that was utilized by the annotators. Figure 1 displays the GUI, where (A) and (B) were specifically designed for annotating different datasets. To minimize the influence of subjective bias, we use a relatively objective formulation: whether the reconstructed image can be correctly labeled.

artificial intelligence, machine learning, similarity, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Graphics (0.76)
Information Technology > Sensing and Signal Processing > Image Processing (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)

Add feedback

DMesh: A Differentiable Mesh Representation Yang Zhou 2

Neural Information Processing SystemsMay-28-2025, 13:56:23 GMT

We present a differentiable representation, DMesh, for general 3D triangular meshes. DMesh considers both the geometry and connectivity information of a mesh. In our design, we first get a set of convex tetrahedra that compactly tessellates the domain based on Weighted Delaunay Triangulation (WDT), and select triangular faces on the tetrahedra to define the final mesh. We formulate probability of faces to exist on the actual surface in a differentiable manner based on the WDT. This enables DMesh to represent meshes of various topology in a differentiable way, and allows us to reconstruct the mesh under various observations, such as point clouds and multi-view images using gradient-based optimization.

artificial intelligence, machine learning, mesh, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Maryland (0.14)

Genre: