AITopics

GLOBER: Coherent Non-autoregressive Video Generation via Global Guided Video Decoder

Neural Information Processing SystemsMay-25-2025, 16:09:17 GMT

The goal of this work is to advance research on video generation methods. All experiments are conducted without conditional inputs. Figure 1: Genetated long videos with 128 frames on the Sky Time-lapse and UCF-101 datasets (4 frames skipped). A.5 Settings of Hyper Parameters The detailed settings of model hyper parameters are presented in Table 4. Table 4: Hyper-parameters of the video auto-encoder and the quantitative results on video reconstruction. Experimental settings on the UCF-101 dataset are the same for both conditional and unconditional video generation except given video descriptions.

artificial intelligence, dataset, machine learning, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.37)

Add feedback

efe36e55d80a94d1726f660b8d237a0f-Paper-Conference.pdf

Neural Information Processing SystemsMay-25-2025, 16:09:14 GMT

artificial intelligence, deep learning, machine learning, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

efcb5b06ce8bb672ffa26b9dc5cdd0f9-Paper-Conference.pdf

Neural Information Processing SystemsMay-25-2025, 16:08:56 GMT

machine learning, polynomial, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.93)
Europe > United Kingdom > England (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and Motion Estimation (Supplementary Material)

Neural Information Processing SystemsMay-25-2025, 16:08:40 GMT

The prediction module of vanilla EPN is designed for global SE(3)-equivariance for all input points. For this purpose, we further devise two heads for rigid segmentation and motion estimation. Figure 1 demonstrates the detailed design of our EPN feature extractor on SAPIEN, OGC-DR, and OGC-DRSV, and that on KITTI-SF shares the same structure but has larger numbers of output dimensions accordingly. Figure 1: Structure of our feature extractor based on EPN. "EPNConv" is the SE(3)-equivariant convolution proposed in the vanilla EPN network.

artificial intelligence, machine learning, our-sup, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.48)

Add feedback

Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and Motion Estimation

Neural Information Processing SystemsMay-25-2025, 16:08:36 GMT

A truly generalizable approach to rigid segmentation and motion estimation is fundamental to 3D understanding of articulated objects and moving scenes. In view of the closely intertwined relationship between segmentation and motion estimates, we present an SE(3) equivariant architecture and a training strategy to tackle this task in an unsupervised manner. Our architecture is composed of two interconnected, lightweight heads. These heads predict segmentation masks using point-level invariant features and estimate motion from SE(3) equivariant features, all without the need for category information. Our training strategy is unified and can be implemented online, which jointly optimizes the predicted segmentation and motion by leveraging the interrelationships among scene flow, segmentation mask, and rigid transformations. We conduct experiments on four datasets to demonstrate the superiority of our method. The results show that our method excels in both model performance and computational efficiency, with only 0.25M parameters and 0.92G FLOPs. To the best of our knowledge, this is the first work designed for category-agnostic part-level SE(3) equivariance in dynamic point clouds.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations

Neural Information Processing SystemsMay-25-2025, 16:08:14 GMT

Recent advances in neural reconstruction enable high-quality 3D object reconstruction from casually captured image collections. Current techniques mostly analyze their progress on relatively simple image collections where Structurefrom-Motion (SfM) techniques can provide ground-truth (GT) camera poses. We note that SfM techniques tend to fail on in-the-wild image collections such as image search results with varying backgrounds and illuminations. To enable systematic research progress on 3D reconstruction from casual image captures, we propose'NAVI': a new dataset of category-agnostic image collections of objects with high-quality 3D scans along with per-image 2D-3D alignments providing near-perfect GT camera parameters. These 2D-3D alignments allow us to extract accurate derivative annotations such as dense pixel correspondences, depth and segmentation maps. We demonstrate the use of NAVI image collections on different problem settings and show that NAVI enables more thorough evaluations that were not possible with existing datasets. We believe NAVI is beneficial for systematic research progress on 3D reconstruction and correspondence estimation.

artificial intelligence, machine learning, pattern recognition, (17 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Industry: Media > Photography (0.88)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Communications (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.35)

Add feedback

efbba7719cc5172d175240f24be11280-Paper-Conference.pdf

Neural Information Processing SystemsMay-25-2025, 16:07:54 GMT

Pre-trained language models can be surprisingly adept at tasks they were not explicitly trained on, but how they implement these capabilities is poorly understood. In this paper, we investigate the basic mathematical abilities often acquired by pre-trained language models. Concretely, we use mechanistic interpretability techniques to explain the (limited) mathematical abilities of GPT-2 small. As a case study, we examine its ability to take in sentences such as "The war lasted from the year 1732 to the year 17", and predict valid two-digit end years (years > 32). We first identify a circuit, a small subset of GPT-2 small's computational graph that computes this task's output. Then, we explain the role of each circuit component, showing that GPT-2 small's final multi-layer perceptrons boost the probability of end years greater than the start year. Finally, we find related tasks that activate our circuit. Our results suggest that GPT-2 small computes greater-than using a complex mechanism that activates across diverse contexts.

machine learning, mlp 10, natural language, (18 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

efb9629755e598c4f261c44aeb6fde5e-Paper-Conference.pdf

Neural Information Processing SystemsMay-25-2025, 16:07:40 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe (0.14)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

efb2072a358cefb75886a315a6fcf880-Supplemental-Conference.pdf

Neural Information Processing SystemsMay-25-2025, 16:07:23 GMT

artificial intelligence, planning & scheduling, province, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Industry: Transportation (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Add feedback

On the Planning Abilities of Large Language Models: A Critical Investigation

Neural Information Processing SystemsMay-25-2025, 16:07:19 GMT

Intrigued by the claims of emergent reasoning capabilities in LLMs trained on general web corpora, in this paper, we set out to investigate their planning capabilities. We aim to evaluate (1) the effectiveness of LLMs in generating plans autonomously in commonsense planning tasks and (2) the potential of LLMs as a source of heuristic guidance for other agents (AI planners) in their planning tasks. We conduct a systematic study by generating a suite of instances on domains similar to the ones employed in the International Planning Competition and evaluate LLMs in two distinct modes: autonomous and heuristic. Our findings reveal that LLMs' ability to generate executable plans autonomously is rather limited, with the best model (GPT-4) having an average success rate of 12% across the domains.

large language model, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Arizona (0.14)

Genre: Research Report > New Finding (1.00)

Technology: