AITopics

A Benchmark for Compositional Visual Reasoning

Neural Information Processing SystemsMar-27-2025, 15:12:13 GMT

A fundamental component of human vision is our ability to parse complex visual scenes and judge the relations between their constituent objects. AI benchmarks for visual reasoning have driven rapid progress in recent years with state-of-the-art systems now reaching human accuracy on some of these benchmarks. Yet, there remains a major gap between humans and AI systems in terms of the sample efficiency with which they learn new visual reasoning tasks. Humans' remarkable efficiency at learning has been at least partially attributed to their ability to harness compositionality - allowing them to efficiently take advantage of previously gained knowledge when learning new tasks. Here, we introduce a novel visual reasoning benchmark, Compositional Visual Relations (CVR), to drive progress towards the development of more data-efficient learning algorithms.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

CaptainCook4D: A Dataset for Understanding Errors in Procedural Activities [ Supplementary Material ]

Neural Information Processing SystemsMar-27-2025, 15:12:13 GMT

To facilitate these annotations, we used both our user application and Label Studio.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:

Overview (1.00)
Workflow (0.68)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

CaptainCook4D: A Dataset for Understanding Errors in Procedural Activities

Neural Information Processing SystemsMar-27-2025, 15:12:06 GMT

Following step-by-step procedures is an essential component of various activities carried out by individuals in their daily lives. These procedures serve as a guiding framework that helps to achieve goals efficiently, whether it is assembling furniture or preparing a recipe. However, the complexity and duration of procedural activities inherently increase the likelihood of making errors. Understanding such procedural activities from a sequence of frames is a challenging task that demands an accurate interpretation of visual information and the ability to reason about the structure of the activity. To this end, we collect a new egocentric 4D dataset CaptainCook4D comprising 384 recordings (94.5 hours) of people performing recipes in real kitchen environments. This dataset consists of two distinct types of activities: one in which participants adhere to the provided recipe instructions and another in which they deviate and induce errors. We provide 5.3K step annotations and 10K finegrained action annotations and benchmark the dataset for the following tasks: error recognition, multi-step localization and procedure learning

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:

Workflow (1.00)
Overview (1.00)

Industry:

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Neural MMO 2.0: A Massively Multi-task Addition to Massively Multi-agent Learning

Neural Information Processing SystemsMar-27-2025, 15:11:53 GMT

Neural MMO 2.0 is a massively multi-agent environment for reinforcement learning research. The key feature of this new version is a flexible task system that allows users to define a broad range of objectives and reward signals. We challenge researchers to train agents capable of generalizing to tasks, maps, and opponents never seen during training. Neural MMO features procedurally generated maps with 128 agents in the standard setting and support for up to. Version 2.0 is a complete rewrite of its predecessor with three-fold improved performance and compatibility with CleanRL. We release the platform as free and open-source software with comprehensive documentation available at neuralmmo.github.io

artificial intelligence, machine learning, neural mmo 2, (15 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization

Neural Information Processing SystemsMar-27-2025, 15:11:43 GMT

Transformers have demonstrated great power in the recent development of large foundational models. In particular, the Vision Transformer (ViT) has brought revolutionary changes to the field of vision, achieving significant accomplishments on the experimental side. However, their theoretical capabilities, particularly in terms of generalization when trained to overfit training data, are still not fully understood. To address this gap, this work delves deeply into the benign overfitting perspective of transformers in vision. To this end, we study the optimization of a Transformer composed of a self-attention layer with softmax followed by a fully connected layer under gradient descent on a certain data distribution model. By developing techniques that address the challenges posed by softmax and the interdependent nature of multiple weights in transformer optimization, we successfully characterized the training dynamics and achieved generalization in post-training. Our results establish a sharp condition that can distinguish between the small test error phase and the large test error regime, based on the signal-to-noise ratio in the data model. The theoretical results are further verified by experimental simulation. To the best of our knowledge, this is the first work to characterize benign overfitting for Transformers.

artificial intelligence, exp, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > China (0.27)
Europe > Switzerland > Zürich > Zürich (0.13)

Genre: Research Report > Experimental Study (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

c06f788963f0ce069f5b2dbf83fe7822-Supplemental-Conference.pdf

Neural Information Processing SystemsMar-27-2025, 15:11:35 GMT

artificial intelligence, eqn, machine learning, (15 more...)

Neural Information Processing Systems

Industry: Education (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

c057cb81b8d3c67093427bf1c16a4e9f-Supplemental-Conference.pdf

Neural Information Processing SystemsMar-27-2025, 15:09:13 GMT

artificial intelligence, experiment, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)

Add feedback

c017e92288b5056c578bb6b0b69d9e76-Supplemental-Conference.pdf

Neural Information Processing SystemsMar-27-2025, 15:08:52 GMT

artificial intelligence, decision tree learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Depth is More Powerful than Width with Prediction Concatenation in Deep Forest

Neural Information Processing SystemsMar-27-2025, 15:08:47 GMT

Random Forest (RF) is an ensemble learning algorithm proposed by Breiman [1] that constructs a large number of randomized decision trees individually and aggregates their predictions by naive averaging. Zhou and Feng [2] further propose Deep Forest (DF) algorithm with multi-layer feature transformation, which significantly outperforms random forest in various application fields. The prediction concatenation (PreConc) operation is crucial for the multi-layer feature transformation in deep forest, though little has been known about its theoretical property. In this paper, we analyze the influence of Preconc on the consistency of deep forest. Especially when the individual tree is inconsistent (as in practice, the individual tree is often set to be fully grown, i.e., there is only one sample at each leaf node), we find that the convergence rate of two-layer DF w.r.t. the number of trees M can reach O(1/M