Chhaya, Niyati
Is your LLM trapped in a Mental Set? Investigative study on how mental sets affect the reasoning capabilities of LLMs
Haq, Saiful, Chhaya, Niyati, Pandey, Piyush, Bhattacharya, Pushpak
In this paper, we present an investigative study on how Mental Sets influence the reasoning capabilities of LLMs. LLMs have excelled in diverse natural language processing (NLP) tasks, driven by advancements in parameter-efficient fine-tuning (PEFT) and emergent capabilities like in-context learning (ICL). For complex reasoning tasks, selecting the right model for PEFT or ICL is critical, often relying on scores on benchmarks such as MMLU, MATH, and GSM8K. However, current evaluation methods, based on metrics like F1 Score or reasoning chain assessments by larger models, overlook a key dimension: adaptability to unfamiliar situations and overcoming entrenched thinking patterns. In cognitive psychology, Mental Set refers to the tendency to persist with previously successful strategies, even when they become inefficient - a challenge for problem solving and reasoning. We compare the performance of LLM models like Llama-3.1-8B-Instruct, Llama-3.1-70B-Instruct and GPT-4o in the presence of mental sets. To the best of our knowledge, this is the first study to integrate cognitive psychology concepts into the evaluation of LLMs for complex reasoning tasks, providing deeper insights into their adaptability and problem-solving efficacy.
"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning
Kalarani, Abisek Rajakumar, Bhattacharyya, Pushpak, Chhaya, Niyati, Shekhar, Sumit
Well-formed context aware image captions and tags in enterprise content such as marketing material are critical to ensure their brand presence and content recall. Manual creation and updates to ensure the same is non trivial given the scale and the tedium towards this task. We propose a new unified Vision-Language (VL) model based on the One For All (OFA) model, with a focus on context-assisted image captioning where the caption is generated based on both the image and its context. Our approach aims to overcome the context-independent (image and text are treated independently) nature of the existing approaches. We exploit context by pretraining our model with datasets of three tasks: news image captioning where the news article is the context, contextual visual entailment, and keyword extraction from the context. The second pretraining task is a new VL task, and we construct and release two datasets for the task with 1.1M and 2.2K data instances. Our system achieves state-of-the-art results with an improvement of up to 8.34 CIDEr score on the benchmark news image captioning datasets. To the best of our knowledge, ours is the first effort at incorporating contextual information in pretraining the models for the VL tasks.
Variational Fusion for Multimodal Sentiment Analysis
Majumder, Navonil, Poria, Soujanya, Krishnamurthy, Gangeshwar, Chhaya, Niyati, Mihalcea, Rada, Gelbukh, Alexander
This is important, as more and more enterprises tend to make business decisions based on the user sentiment behind their products as expressed through these videos. Multimodal fusion is considered a key step in multimodal sentiment analysis. Most recent work on multimodal fusion (Poria et al., 2017; Zadeh et al., 2018c) has focused on the strategy of obtaining a multimodal representation from the independent unimodal representations. Our approach takes this strategy one step further, by also requiring that the original unimodal representations be reconstructed from the unified multimodal representation. The motivation behind this is the intuition that different modalities are an expression of the state of the mind. Hence, if we assume that the fused representation is the mind-state/sentiment/emotion, then in our approach we are ensuring that the fused representation can be mapped back to the unimodal representations, which should improve the quality of the multi-modal representation. In this paper, we empirically argue that this is the case by showing that this approach outperforms the state-of-the-art in mul-timodal fusion. We employ a variational autoencoder (V AE) (Kingma and Welling, 2014), where the encoder network generates a latent representation from the unimodal representations.
Reports of the Workshops of the 32nd AAAI Conference on Artificial Intelligence
Bouchard, Bruno (Universitรฉ du Quรฉbec ร Chicoutimi) | Bouchard, Kevin (Universitรฉ du Quรฉbec ร Chicoutimi) | Brown, Noam (Carnegie Mellon University) | Chhaya, Niyati (Adobe Research, Bangalore) | Farchi, Eitan (IBM Research, Haifa) | Gaboury, Sebastien (Universitรฉ du Quรฉbec ร Chicoutimi) | Geib, Christopher (Smart Information Flow Technologies) | Gyrard, Amelie (Wright State University) | Jaidka, Kokil (University of Pennsylvania) | Keren, Sarah (Technion โ Israel Institute of Technology) | Khardon, Roni (Tufts University) | Kordjamshidi, Parisa (Tulane University) | Martinez, David (MIT Lincoln Laboratory) | Mattei, Nicholas (IBM Research, TJ Watson) | Michalowski, Martin (University of Minnesota School of Nursing) | Mirsky, Reuth (Ben Gurion University) | Osborn, Joseph (Pomona College) | Sahin, Cem (MIT Lincoln Laboratory) | Shehory, Onn (Bar Ilan University) | Shaban-Nejad, Arash (University of Tennessee Health Science Center) | Sheth, Amit (Wright State University) | Shimshoni, Ilan (University of Haifa) | Shrobe, Howie (Massachusetts Institute of Technology) | Sinha, Arunesh (University of Southern California.) | Sinha, Atanu R. (Adobe Research, Bangalore) | Srivastava, Biplav (IBM Research, Yorktown Height) | Streilein, William (MIT Lincoln Laboratory) | Theocharous, Georgios (Adobe Research, San Jose) | Venable, K. Brent (Tulane University and IHMC) | Wagner, Neal (MIT Lincoln Laboratory) | Zamansky, Anna (University of Haifa)
The AAAI-18 workshop program included 15 workshops covering a wide range of topics in AI. Workshops were held Sunday and Monday, February 2โ7, 2018, at the Hilton New Orleans Riverside in New Orleans, Louisiana, USA. This report contains summaries of the Affective Content Analysis workshop; the Artificial Intelligence Applied to Assistive Technologies and Smart Environments; the AI and Marketing Science workshop; the Artificial Intelligence for Cyber Security workshop; the AI for Imperfect-Information Games; the Declarative Learning Based Programming workshop; the Engineering Dependable and Secure Machine Learning Systems workshop; the Health Intelligence workshop; the Knowledge Extraction from Games workshop; the Plan, Activity, and Intent Recognition workshop; the Planning and Inference workshop; the Preference Handling workshop; the Reasoning and Learning for Human-Machine Dialogues workshop; and the the AI Enhanced Internet of Things Data Processing for Intelligent Applications workshop.
Aff2Vec: Affect--Enriched Distributional Word Representations
Khosla, Sopan, Chhaya, Niyati, Chawla, Kushal
Human communication includes information, opinions, and reactions. Reactions are often captured by the affective-messages in written as well as verbal communications. While there has been work in affect modeling and to some extent affective content generation, the area of affective word distributions in not well studied. Synsets and lexica capture semantic relationships across words. These models however lack in encoding affective or emotional word interpretations. Our proposed model, Aff2Vec provides a method for enriched word embeddings that are representative of affective interpretations of words. Aff2Vec outperforms the state--of--the--art in intrinsic word-similarity tasks. Further, the use of Aff2Vec representations outperforms baseline embeddings in downstream natural language understanding tasks including sentiment analysis, personality detection, and frustration prediction.
Editorial for the AAAI-18 Workshop on Affective Content Analysis
Chhaya, Niyati (Big Data Experience Lab, Adobe Research) | Jaidka, Kokil (University of Pennsylvania) | Ungar, Lyle H. (University of Pennsylvania)
The first AAAI-18 Workshop on Affective Content Analysis was an interdisciplinary platform that focused on the analysis of emotions, sentiments, and attitudes in textual, visual, and multimodal content for applications in psychology, consumer behavior, language understanding, and computer vision. The program comprised interdisciplinary keynotes, original research presentations, a poster session and short pitches for datasets and pre-published work.