perceptual input
Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios
Wang, Chao, Zhang, Luning, Wang, Zheng, Zhou, Yang
Combining multiple perceptual inputs and performing combinatorial reasoning in complex scenarios is a sophisticated cognitive function in humans. With advancements in multi-modal large language models, recent benchmarks tend to evaluate visual understanding across multiple images. However, they often overlook the necessity of combinatorial reasoning across multiple perceptual information. To explore the ability of advanced models to integrate multiple perceptual inputs for combinatorial reasoning in complex scenarios, we introduce two benchmarks: Clue-Visual Question Answering (CVQA), with three task types to assess visual comprehension and synthesis, and Clue of Password-Visual Question Answering (CPVQA), with two task types focused on accurate interpretation and application of visual data. For our benchmarks, we present three plug-and-play approaches: utilizing model input for reasoning, enhancing reasoning through minimum margin decoding with randomness generation, and retrieving semantically relevant visual information for effective data integration. The combined results reveal current models' poor performance on combinatorial reasoning benchmarks, even the state-of-the-art (SOTA) closed-source model achieves only 33.04% accuracy on CVQA, and drops to 7.38% on CPVQA. Notably, our approach improves the performance of models on combinatorial reasoning, with a 22.17% boost on CVQA and 9.40% on CPVQA over the SOTA closed-source model, demonstrating its effectiveness in enhancing combinatorial reasoning with multiple perceptual inputs in complex scenarios. The code will be publicly available.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
Review for NeurIPS paper: PLANS: Neuro-Symbolic Program Learning from Videos
Relation to Prior Work: The relation to Ellis 2018 (which the authors discuss) should be reframed. They also learn to infer specifications from noisy perceptual input, which are then fed to a downstream symbolic solver, and also addresses the challenge of uncertainty over specifications, albeit in a Bayesian way rather than via the heuristics proposed here. Could you similarly situate your system in a probabilistic framework, and resolve the ambiguity over specs in a less heuristic manner? Would that fare better or worse on your data sets? I feel this is the main substantive difference, rather than the details which are presently emphasized in the text.
Imitation Learning of Factored Multi-agent Reactive Models
Teng, Michael, Le, Tuan Anh, Scibior, Adam, Wood, Frank
We apply recent advances in deep generative modeling to the task of imitation learning from biological agents. Specifically, we apply variations of the variational recurrent neural network model to a multi-agent setting where we learn policies of individual uncoordinated agents acting based on their perceptual inputs and their hidden belief state. We learn stochastic policies for these agents directly from observational data, without constructing a reward function. An inference network learned jointly with the policy allows for efficient inference over the agent's belief state given a sequence of its current perceptual inputs and the prior actions it performed, which lets us extrapolate observed sequences of behavior into the future while maintaining uncertainty estimates over future trajectories. We test our approach on a dataset of flies interacting in a 2D environment, where we demonstrate better predictive performance than existing approaches which learn deterministic policies with recurrent neural networks. We further show that the uncertainty estimates over future trajectories we obtain are well calibrated, which makes them useful for a variety of downstream processing tasks.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > Canada > British Columbia (0.04)