Goto

Collaborating Authors

 Commonsense Reasoning


Activating Visual Context and Commonsense Reasoning through Masked Prediction in VLMs

Yu, Jiaao, Li, Shenwei, Han, Mingjie, Yin, Yifei, Song, Wenzheng, Jia, Chenghao, Lan, Man

arXiv.org Artificial Intelligence

Recent breakthroughs in reasoning models have markedly advanced the reasoning capabilities of large language models, particularly via training on tasks with verifiable rewards. Y et, a significant gap persists in their adaptation to real-world mul-timodal scenarios, most notably, vision-language tasks, due to a heavy focus on single-modal language settings. While efforts to transplant reinforcement learning techniques from NLP to Visual Language Models (VLMs) have emerged, these approaches often remain confined to perception-centric tasks or reduce images to textual summaries, failing to fully exploit visual context and commonsense knowledge, ultimately constraining the generalization of reasoning capabilities across diverse multimodal environments. To address this limitation, we introduce a novel fine-tuning task, Masked Prediction via Context and Commonsense (MPCC), which forces models to integrate visual context and commonsense reasoning by reconstructing semantically meaningful content from occluded images, thereby laying the foundation for generalized reasoning. To systematically evaluate the model's performance in generalized reasoning, we developed a specialized evaluation benchmark, MPCC-Eval, and employed various fine-tuning strategies to guide reasoning. Among these, we introduced an innovative training method, Reinforcement Fine-Tuning with Prior Sampling, which not only enhances model performance but also improves its generalized reasoning capabilities in out-of-distribution (OOD) and cross-task scenarios. Code and data are available at yjainqdc.


Culturally Grounded Physical Commonsense Reasoning in Italian and English: A Submission to the MRL 2025 Shared Task

De Santis, Marco, Alazraki, Lisa

arXiv.org Artificial Intelligence

This paper presents our submission to the MRL 2025 Shared Task on Multilingual Physical Reasoning Datasets. The objective of the shared task is to create manually-annotated evaluation data in the physical commonsense reasoning domain, for languages other than English, following a format similar to PIQA. Our contribution, FormaMentis, is a novel benchmark for physical commonsense reasoning that is grounded in Italian language and culture. The data samples in FormaMentis are created by expert annotators who are native Italian speakers and are familiar with local customs and norms. The samples are additionally translated into English, while preserving the cultural elements unique to the Italian context.


A Community-driven vision for a new Knowledge Resource for AI

Chaudhri, Vinay K, Baru, Chaitan, Bennett, Brandon, Bhatt, Mehul, Cassel, Darion, Cohn, Anthony G, Dechter, Rina, Erdem, Esra, Ferrucci, Dave, Forbus, Ken, Gelfond, Gregory, Genesereth, Michael, Gordon, Andrew S., Grosof, Benjamin, Gupta, Gopal, Hendler, Jim, Israni, Sharat, Josephson, Tyler R., Kyllonen, Patrick, Lierler, Yuliya, Lifschitz, Vladimir, McFate, Clifton, McGinty, Hande K., Morgenstern, Leora, Oltramari, Alessandro, Paritosh, Praveen, Roth, Dan, Shepard, Blake, Shimzu, Cogan, Vrandečić, Denny, Whiting, Mark, Witbrock, Michael

arXiv.org Artificial Intelligence

The Cyc project, started in 1984, created the first large-scale database of commonsense knowledge. The initiative continues to this day with its aim to provide a comprehensive ontology and knowledge base of commonsense knowledge to enable human-like reasoning for AI systems. In the concluding paragraph of his Communications of the Association of Computing Machinery (CACM) 1995 article A Large-Scale Investment in Knowledge Infrastructure [52], Cyc's founder Douglas B. Lenat wrote: Is Cyc necessary? How far would a user get with something simpler than Cyc but that lacks everyday commonsense knowledge? Nobody knows; the question will be settled empirically. Our guess is most of these applications will eventually tap the synergy in a suite of sources (including neural nets and decision theory), one of which will be Cyc. Although 30 years have passed since the above article was written, AI research community has not conclusively settled [10] the question "How far would a user get with something simpler than Cyc but that lacks everyday commonsense knowledge?" However, it is clear that significant strides have been made in addressing many of the tasks that were original Cyc use cases, including information retrieval, semi-automatically linking multiple heterogeneous external information sources, spelling and grammar correction, machine translation, natural language understanding and speech understanding.






Automating Dataset Updates Towards Reliable and Timely Evaluation of Large Language Models

Neural Information Processing Systems

There are two updating strategies: 1) mimicking strategy to generate similar samples based on original data, preserving stylistic and contextual essence, and 2) extending strategy that further expands existing samples at varying cognitive levels by adapting Bloom's taxonomy of educational objectives.



Incorporating Geographical and Temporal Contexts into Generative Commonsense Reasoning

Neural Information Processing Systems

Recently, commonsense reasoning in text generation has attracted much attention. Generative commonsense reasoning is the task that requires machines, given a group of keywords, to compose a single coherent sentence with commonsense plausibility. While existing datasets targeting generative commonsense reasoning focus on everyday scenarios, it is unclear how well machines reason under specific geographical and temporal contexts.