AITopics | perceptual input

Collaborating Authors

perceptual input

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios

Wang, Chao, Zhang, Luning, Wang, Zheng, Zhou, Yang

arXiv.org Artificial IntelligenceMar-9-2025

Combining multiple perceptual inputs and performing combinatorial reasoning in complex scenarios is a sophisticated cognitive function in humans. With advancements in multi-modal large language models, recent benchmarks tend to evaluate visual understanding across multiple images. However, they often overlook the necessity of combinatorial reasoning across multiple perceptual information. To explore the ability of advanced models to integrate multiple perceptual inputs for combinatorial reasoning in complex scenarios, we introduce two benchmarks: Clue-Visual Question Answering (CVQA), with three task types to assess visual comprehension and synthesis, and Clue of Password-Visual Question Answering (CPVQA), with two task types focused on accurate interpretation and application of visual data. For our benchmarks, we present three plug-and-play approaches: utilizing model input for reasoning, enhancing reasoning through minimum margin decoding with randomness generation, and retrieving semantically relevant visual information for effective data integration. The combined results reveal current models' poor performance on combinatorial reasoning benchmarks, even the state-of-the-art (SOTA) closed-source model achieves only 33.04% accuracy on CVQA, and drops to 7.38% on CPVQA. Notably, our approach improves the performance of models on combinatorial reasoning, with a 22.17% boost on CVQA and 9.40% on CPVQA over the SOTA closed-source model, demonstrating its effectiveness in enhancing combinatorial reasoning with multiple perceptual inputs in complex scenarios. The code will be publicly available.

benchmark, llava-1, reasoning, (17 more...)

arXiv.org Artificial Intelligence

2502.19973

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Review for NeurIPS paper: PLANS: Neuro-Symbolic Program Learning from Videos

Neural Information Processing SystemsFeb-8-2025, 17:39:47 GMT

Relation to Prior Work: The relation to Ellis 2018 (which the authors discuss) should be reframed. They also learn to infer specifications from noisy perceptual input, which are then fed to a downstream symbolic solver, and also addresses the challenge of uncertainty over specifications, albeit in a Bayesian way rather than via the heuristics proposed here. Could you similarly situate your system in a probabilistic framework, and resolve the ambiguity over specs in a less heuristic manner? Would that fare better or worse on your data sets? I feel this is the main substantive difference, rather than the details which are presently emphasized in the text.

neurips paper, neuro-symbolic program learning, perceptual input, (6 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.59)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.38)
Information Technology > Artificial Intelligence > Machine Learning (0.36)

Add feedback

Imitation Learning of Factored Multi-agent Reactive Models

Teng, Michael, Le, Tuan Anh, Scibior, Adam, Wood, Frank

arXiv.org Artificial IntelligenceMar-11-2019

We apply recent advances in deep generative modeling to the task of imitation learning from biological agents. Specifically, we apply variations of the variational recurrent neural network model to a multi-agent setting where we learn policies of individual uncoordinated agents acting based on their perceptual inputs and their hidden belief state. We learn stochastic policies for these agents directly from observational data, without constructing a reward function. An inference network learned jointly with the policy allows for efficient inference over the agent's belief state given a sequence of its current perceptual inputs and the prior actions it performed, which lets us extrapolate observed sequences of behavior into the future while maintaining uncertainty estimates over future trajectories. We test our approach on a dataset of flies interacting in a 2D environment, where we demonstrate better predictive performance than existing approaches which learn deterministic policies with recurrent neural networks. We further show that the uncertainty estimates over future trajectories we obtain are well calibrated, which makes them useful for a variety of downstream processing tasks.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1903.04714

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > Canada > British Columbia (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback