AITopics | Commonsense Reasoning

Collaborating Authors

Commonsense Reasoning

Knowledge that Everyone Knows. "People do not walk on their heads." The assertion comes about 900 statements deep into the 527,308 items that comprise the Open Mind common sense database. It's after "Laws are the rules of society" and before "The sky is blue during the day." This collection of mundane facts, which would take more than 20,000 pages to print out, consists entirely of statements so unremarkable they are barely worth stating. Most of us would correctly dismiss them as common sense.
– from D.C. Denison, Guess who's smarter. Boston Globe Online (page hosted at MIT), May 26, 2003.

News Overviews Instructional Materials AI-Alerts Classics

Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge

Jiang, Bowen, Zhuang, Zhijun, Taylor, Camillo Jose

arXiv.org Artificial IntelligenceNov-21-2023

This work presents an enhanced approach to generating scene graphs by incorporating a relationship hierarchy and commonsense knowledge. Specifically, we propose a Bayesian classification head that exploits an informative hierarchical structure. It jointly predicts the super-category or type of relationship between the two objects, along with the detailed relationship under each super-category. We design a commonsense validation pipeline that uses a large language model to critique the results from the scene graph prediction system and then use that feedback to enhance the model performance. The system requires no external large language model assistance at test time, making it more convenient for practical applications. Experiments on the Visual Genome and the OpenImage V6 datasets demonstrate that harnessing hierarchical relationships enhances the model performance by a large margin. The proposed Bayesian head can also be incorporated as a portable module in existing scene graph generation algorithms to improve their results. In addition, the commonsense validation enables the model to generate an extensive set of reasonable predictions beyond dataset annotations.

language model, prediction, scene graph generation, (9 more...)

arXiv.org Artificial Intelligence

2311.12889

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Learning to Initialize: Can Meta Learning Improve Cross-task Generalization in Prompt Tuning?

Qin, Chengwei, Li, Qian, Zhao, Ruochen, Joty, Shafiq

arXiv.org Artificial IntelligenceNov-19-2023

Prompt tuning (PT) which only tunes the embeddings of an additional sequence of tokens per task, keeping the pre-trained language model (PLM) frozen, has shown remarkable performance in few-shot learning. Despite this, PT has been shown to rely heavily on good initialization of the prompt embeddings. In this work, we study meta prompt tuning (MPT) to systematically explore how meta-learning can help improve (if it can) cross-task generalization in PT through learning to initialize the prompt embeddings from other relevant tasks. We empirically analyze a representative set of meta learning algorithms in a wide range of adaptation settings with different source/target task configurations on a large set of few-shot tasks. With extensive experiments and analysis, we demonstrate the effectiveness of MPT. We find the improvement to be significant particularly on classification tasks. For other kinds of tasks such as question answering, we observe that while MPT can outperform PT in most cases, it does not always outperform multi-task learning. We further provide an in-depth analysis from the perspective of task similarity.

computational linguistic, proceedings, target task, (13 more...)

arXiv.org Artificial Intelligence

2302.08143

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.15)
North America > United States > Washington > King County > Seattle (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
(24 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (1.00)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Inferring the Reader: Guiding Automated Story Generation with Commonsense Reasoning

Peng, Xiangyu, Li, Siyan, Wiegreffe, Sarah, Riedl, Mark

arXiv.org Artificial IntelligenceNov-17-2023

Transformer-based language model approaches to automated story generation currently provide state-of-the-art results. However, they still suffer from plot incoherence when generating narratives over time, and critically lack basic commonsense reasoning. Furthermore, existing methods generally focus only on single-character stories, or fail to track characters at all. To improve the coherence of generated narratives and to expand the scope of character-centric narrative generation, we introduce Commonsense-inference Augmented neural StoryTelling (CAST), a framework for introducing commonsense reasoning into the generation process with the option to model the interaction between multiple characters. We find that our CAST method produces significantly more coherent, on-topic, enjoyable and fluent stories than existing models in both the single-character and two-character settings in three storytelling domains.

bob, inference, proceedings, (16 more...)

arXiv.org Artificial Intelligence

2105.01311

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (1.00)
Media (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

FairytaleCQA: Integrating a Commonsense Knowledge Graph into Children's Storybook Narratives

Chen, Jiaju, Lu, Yuxuan, Zhang, Shao, Yao, Bingsheng, Dong, Yuanzhe, Xu, Ying, Li, Yunyao, Wang, Qianwen, Wang, Dakuo, Sun, Yuling

arXiv.org Artificial IntelligenceNov-16-2023

AI models (including LLM) often rely on narrative question-answering (QA) datasets to provide customized QA functionalities to support downstream children education applications; however, existing datasets only include QA pairs that are grounded within the given storybook content, but children can learn more when teachers refer the storybook content to real-world knowledge (e.g., commonsense knowledge). We introduce the FairytaleCQA dataset, which is annotated by children education experts, to supplement 278 storybook narratives with educationally appropriate commonsense knowledge. The dataset has 5,868 QA pairs that not only originate from the storybook narrative but also contain the commonsense knowledge grounded by an external knowledge graph (i.e., ConceptNet). A follow-up experiment shows that a smaller model (T5-large) fine-tuned with FairytaleCQA reliably outperforms much larger prompt-engineered LLM (e.g., GPT-4) in this new QA-pair generation task (QAG). This result suggests that: 1) our dataset brings novel challenges to existing LLMs, and 2) human experts' data annotation are still critical as they have much nuanced knowledge that LLMs do not know in the children educational domain.

knowledge, pipeline, triplet, (16 more...)

arXiv.org Artificial Intelligence

2311.09756

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(11 more...)

Genre:

Workflow (0.95)
Research Report > New Finding (0.48)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

An Overview Of Temporal Commonsense Reasoning and Acquisition

Wenzel, Georg, Jatowt, Adam

arXiv.org Artificial IntelligenceNov-16-2023

Temporal commonsense reasoning refers to the ability to understand the typical temporal context of phrases, actions, and events, and use it to reason over problems requiring such knowledge. This trait is essential in temporal natural language processing tasks, with possible applications such as timeline summarization, temporal question answering, and temporal natural language inference. Recent research on the performance of large language models suggests that, although they are adept at generating syntactically correct sentences and solving classification tasks, they often take shortcuts in their reasoning and fall prey to simple linguistic traps. This article provides an overview of research in the domain of temporal commonsense reasoning, particularly focusing on enhancing language model performance through a variety of augmentations and their evaluation across a growing number of datasets. However, these augmented models still struggle to approach human performance on reasoning tasks over temporal common sense properties, such as the typical occurrence times, orderings, or durations of events. We further emphasize the need for careful interpretation of research to guard against overpromising evaluation results in light of the shallow reasoning present in transformers. This can be achieved by appropriately preparing datasets and suitable evaluation metrics.

dataset, proceedings, reasoning, (14 more...)

arXiv.org Artificial Intelligence

2308.00002

Country:

North America > United States (0.67)
Europe > Austria > Tyrol > Innsbruck (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
(2 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.46)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(4 more...)

Add feedback

MixPro: Simple yet Effective Data Augmentation for Prompt-based Learning

Li, Bohan, Dou, Longxu, Hou, Yutai, Feng, Yunlong, Mu, Honglin, Zhu, Qingfu, Sun, Qinghua, Che, Wanxiang

arXiv.org Artificial IntelligenceNov-11-2023

Prompt-based learning has shown considerable promise in reformulating various downstream tasks as cloze problems by combining original input with a predetermined template. This approach demonstrates its effectiveness, especially in few-shot learning scenarios, where the model is trained on a scarce amount of data. Despite its successes, the limited templates and text in few-shot prompt-based learning scenarios leave significant room for performance improvement. Moreover, existing methods sometimes resort to model ensembles, which, while effective, could potentially hamper model efficiency due to increased computational demands. To address these issues, we introduce MixPro, an augmentation method designed to augment both the vanilla input text and the templates. We implement this through the token-level, the sentence-level, and the template-level Mixup strategies. The experimental results on five few-shot datasets show that MixPro outperforms other augmentation baselines, improving model performance by an average of 5.08% compared to before augmentation.

augmentation, proceedings, template, (13 more...)

arXiv.org Artificial Intelligence

2304.09402

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(4 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.46)

Add feedback

InfoEntropy Loss to Mitigate Bias of Learning Difficulties for Generative Language Models

Su, Zhenpeng, Wu, Xing, Bai, Xue, Lin, Zijia, Chen, Hui, Ding, Guiguang, Zhou, Wei, Hu, Songlin

arXiv.org Artificial IntelligenceNov-10-2023

Generative language models are usually pretrained on large text corpus via predicting the next token (i.e., sub-word/word/phrase) given the previous ones. Recent works have demonstrated the impressive performance of large generative language models on downstream tasks. However, existing generative language models generally neglect an inherent challenge in text corpus during training, i.e., the imbalance between frequent tokens and infrequent ones. It can lead a language model to be dominated by common and easy-to-learn tokens, thereby overlooking the infrequent and difficult-to-learn ones. To alleviate that, we propose an Information Entropy Loss (InfoEntropy Loss) function. During training, it can dynamically assess the learning difficulty of a to-be-learned token, according to the information entropy of the corresponding predicted probability distribution over the vocabulary. Then it scales the training loss adaptively, trying to lead the model to focus more on the difficult-to-learn tokens. On the Pile dataset, we train generative language models at different scales of 468M, 1.2B, and 6.7B parameters. Experiments reveal that models incorporating the proposed InfoEntropy Loss can gain consistent performance improvement on downstream benchmarks.

focal loss, infoentropy loss, language model, (15 more...)

arXiv.org Artificial Intelligence

2310.19531

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Asia > China > Beijing > Beijing (0.04)
North America > United States > New York > New York County > New York City (0.04)
(19 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.68)

Add feedback

BRAINTEASER: Lateral Thinking Puzzles for Large Language Models

Jiang, Yifan, Ilievski, Filip, Ma, Kaixin, Sourati, Zhivar

arXiv.org Artificial IntelligenceNov-9-2023

The success of language models has inspired the NLP community to attend to tasks that require implicit and complex reasoning, relying on human-like commonsense mechanisms. While such vertical thinking tasks have been relatively popular, lateral thinking puzzles have received little attention. To bridge this gap, we devise BRAINTEASER: a multiple-choice Question Answering task designed to test the model's ability to exhibit lateral thinking and defy default commonsense associations. We design a three-step procedure for creating the first lateral thinking benchmark, consisting of data collection, distractor generation, and generation of adversarial examples, leading to 1,100 puzzles with high-quality annotations. To assess the consistency of lateral reasoning by models, we enrich BRAINTEASER based on a semantic and contextual reconstruction of its questions. Our experiments with state-of-the-art instruction- and commonsense language models reveal a significant gap between human and model performance, which is further widened when consistency across adversarial formats is considered. We make all of our code and data available to stimulate work on developing and evaluating lateral thinking models.

computational linguistic, lateral, puzzle, (16 more...)

arXiv.org Artificial Intelligence

2310.05057

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(13 more...)

Genre: Research Report (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain Dialogue Systems

Ghazarian, Sarik, Shao, Yijia, Han, Rujun, Galstyan, Aram, Peng, Nanyun

arXiv.org Artificial IntelligenceNov-3-2023

Commonsense reasoning is omnipresent in human communications and thus is an important feature for open-domain dialogue systems. However, evaluating commonsense in dialogue systems is still an open challenge. We take the first step by focusing on event commonsense that considers events and their relations, and is crucial in both dialogues and general commonsense reasoning. We propose ACCENT, an event commonsense evaluation metric empowered by commonsense knowledge bases (CSKBs). ACCENT first extracts event-relation tuples from a dialogue, and then evaluates the response by scoring the tuples in terms of their compatibility with the CSKB. To evaluate ACCENT, we construct the first public event commonsense evaluation dataset for open-domain dialogues. Our experiments show that ACCENT is an efficient metric for event commonsense evaluation, which achieves higher correlations with human judgments than existing baselines.

computational linguistic, relation, tuple, (16 more...)

arXiv.org Artificial Intelligence

2305.07797

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > China > Hong Kong (0.04)
(10 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.46)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.89)

Add feedback

What's In My Big Data?

Elazar, Yanai, Bhagia, Akshita, Magnusson, Ian, Ravichander, Abhilasha, Schwenk, Dustin, Suhr, Alane, Walsh, Pete, Groeneveld, Dirk, Soldaini, Luca, Singh, Sameer, Hajishirzi, Hanna, Smith, Noah A., Dodge, Jesse

arXiv.org Artificial IntelligenceOct-31-2023

Large text corpora are the backbone of language models. However, we have a limited understanding of the content of these corpora, including general statistics, quality, social factors, and inclusion of evaluation data (contamination). In this work, we propose What's In My Big Data? (WIMBD), a platform and a set of sixteen analyses that allow us to reveal and compare the contents of large text corpora. WIMBD builds on two basic capabilities -- count and search -- at scale, which allows us to analyze more than 35 terabytes on a standard compute node. We apply WIMBD to ten different corpora used to train popular language models, including C4, The Pile, and RedPajama. Our analysis uncovers several surprising and previously undocumented findings about these corpora, including the high prevalence of duplicate, synthetic, and low-quality content, personally identifiable information, toxic language, and benchmark contamination. For instance, we find that about 50% of the documents in RedPajama and LAION-2B-en are duplicates. In addition, several datasets used for benchmarking models trained on such corpora are contaminated with respect to important benchmarks, including the Winograd Schema Challenge and parts of GLUE and SuperGLUE. We open-source WIMBD's code and artifacts to provide a standard set of evaluations for new text-based corpora and to encourage more analyses and transparency around them: github.com/allenai/wimbd.

corpora, dataset, laion-2b-en, (15 more...)

arXiv.org Artificial Intelligence

2310.20707

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > India (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(21 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Education (1.00)
Leisure & Entertainment (0.92)
Information Technology > Services (0.67)
Health & Medicine > Therapeutic Area (0.45)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(3 more...)

Add feedback