AITopics | Pitcairn

Collaborating Authors

Pitcairn

Cross-Task Inconsistency Based Active Learning (CTIAL) for Emotion Recognition

arXiv.org Artificial IntelligenceDec-2-2024

Emotion recognition is a critical component of affective computing. Training accurate machine learning models for emotion recognition typically requires a large amount of labeled data. Due to the subtleness and complexity of emotions, multiple evaluators are usually needed for each affective sample to obtain its ground-truth label, which is expensive. To save the labeling cost, this paper proposes an inconsistency-based active learning approach for cross-task transfer between emotion classification and estimation. Affective norms are utilized as prior knowledge to connect the label spaces of categorical and dimensional emotions. Then, the prediction inconsistency on the two tasks for the unlabeled samples is used to guide sample selection in active learning for the target task. Experiments on within-corpus and cross-corpus transfers demonstrated that cross-task inconsistency could be a very valuable metric in active learning. To our knowledge, this is the first work that utilizes prior knowledge on affective norms and data in a different task to facilitate active learning for a new task, even the two tasks are from different datasets.

dataset, dee, iemocap, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TAFFC.2024.3366767

2412.01171

Country:

Oceania > Pitcairn (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
(14 more...)

Genre: Research Report (1.00)

Industry:

Media (0.68)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Orgad, Hadas, Toker, Michael, Gekhman, Zorik, Reichart, Roi, Szpektor, Idan, Kotek, Hadas, Belinkov, Yonatan

arXiv.org Artificial IntelligenceOct-28-2024

Large language models (LLMs) often produce errors, including factual inaccuracies, biases, and reasoning failures, collectively referred to as "hallucinations". Recent studies have demonstrated that LLMs' internal states encode information regarding the truthfulness of their outputs, and that this information can be utilized to detect errors. In this work, we show that the internal representations of LLMs encode much more information about truthfulness than previously recognized. We first discover that the truthfulness information is concentrated in specific tokens, and leveraging this property significantly enhances error detection performance. Yet, we show that such error detectors fail to generalize across datasets, implying that -- contrary to prior claims -- truthfulness encoding is not universal but rather multifaceted. Next, we show that internal representations can also be used for predicting the types of errors the model is likely to make, facilitating the development of tailored mitigation strategies. Lastly, we reveal a discrepancy between LLMs' internal encoding and external behavior: they may encode the correct answer, yet consistently generate an incorrect one. Taken together, these insights deepen our understanding of LLM errors from the model's internal perspective, which can guide future research on enhancing error analysis and mitigation.

computational linguistic, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.02707

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Connecticut (0.04)
North America > United States > Missouri (0.04)
(26 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Film (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Journalists, Emotions, and the Introduction of Generative AI Chatbots: A Large-Scale Analysis of Tweets Before and After the Launch of ChatGPT

Lewis, Seth C., Markowitz, David M., Bunquin, Jon Benedik

arXiv.org Artificial IntelligenceSep-13-2024

As part of a broader look at the impact of generative AI, this study investigated the emotional responses of journalists to the release of ChatGPT at the time of its launch. By analyzing nearly 1 million Tweets from journalists at major U.S. news outlets, we tracked changes in emotional tone and sentiment before and after the introduction of ChatGPT in November 2022. Using various computational and natural language processing techniques to measure emotional shifts in response to ChatGPT's release, we found an increase in positive emotion and a more favorable tone post-launch, suggesting initial optimism toward AI's potential. This research underscores the pivotal role of journalists as interpreters of technological innovation and disruption, highlighting how their emotional reactions may shape public narratives around emerging technologies. The study contributes to understanding the intersection of journalism, emotion, and AI, offering insights into the broader societal impact of generative AI tools.

chatgpt, emotion, journalist, (14 more...)

arXiv.org Artificial Intelligence

2409.08761

Country:

North America > United States > Oregon (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Ireland (0.04)
(12 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Media > News (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Don't Kill the Baby: The Case for AI in Arbitration

Broyde, Michael, Mei, Yiyang

arXiv.org Artificial IntelligenceAug-21-2024

Since the introduction of Generative AI (GenAI) in 2022, its ability to simulate human intelligence and generate content has sparked both enthusiasm and concern. While much of the criticism focuses on AI's potential to perpetuate bias, create emotional dissonance, displace jobs, and raise ethical questions, these concerns often overlook the practical benefits of AI, particularly in legal contexts. This article examines the integration of AI into arbitration, arguing that the Federal Arbitration Act (FAA) allows parties to contractually choose AI-driven arbitration, despite traditional reservations. The article makes three key contributions: (1) It shifts the focus from debates over AI's personhood to the practical aspects of incorporating AI into arbitration, asserting that AI can effectively serve as an arbitrator if both parties agree; (2) It positions arbitration as an ideal starting point for broader AI adoption in the legal field, given its flexibility and the autonomy it grants parties to define their standards of fairness; and (3) It outlines future research directions, emphasizing the importance of empirically comparing AI and human arbitration, which could lead to the development of distinct systems. By advocating for the use of AI in arbitration, this article underscores the importance of respecting contractual autonomy and creating an environment that allows AI's potential to be fully realized. Drawing on the insights of Judge Richard Posner, the article argues that the ethical obligations of AI in arbitration should be understood within the context of its technological strengths and the voluntary nature of arbitration agreements. Ultimately, it calls for a balanced, open-minded approach to AI in arbitration, recognizing its potential to enhance the efficiency, fairness, and flexibility of dispute resolution.

arbitration, broyde & yiyang mei, emory, (9 more...)

arXiv.org Artificial Intelligence

2408.11608

Country:

Europe > Italy (0.27)
Europe > France (0.04)
North America > United States > Texas (0.04)
(19 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Law > Alternative Dispute Resolution (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
(3 more...)

Add feedback

MIRAI: Evaluating LLM Agents for Event Forecasting

Ye, Chenchen, Hu, Ziniu, Deng, Yihe, Huang, Zijie, Ma, Mingyu Derek, Zhu, Yanqiao, Wang, Wei

arXiv.org Artificial IntelligenceJul-1-2024

Recent advancements in Large Language Models (LLMs) have empowered LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex problems. Given this capability, increasing interests have been put into employing LLM agents for predicting international events, which can influence decision-making and shape policy development on an international scale. Despite such a growing interest, there is a lack of a rigorous benchmark of LLM agents' forecasting capability and reliability. To address this gap, we introduce MIRAI, a novel benchmark designed to systematically evaluate LLM agents as temporal forecasters in the context of international events. Our benchmark features an agentic environment with tools for accessing an extensive database of historical, structured events and textual news articles. We refine the GDELT event database with careful cleaning and parsing to curate a series of relational prediction tasks with varying forecasting horizons, assessing LLM agents' abilities from short-term to long-term forecasting. We further implement APIs to enable LLM agents to utilize different tools via a code-based interface. In summary, MIRAI comprehensively evaluates the agents' capabilities in three dimensions: 1) autonomously source and integrate critical information from large global databases; 2) write codes using domain-specific APIs and libraries for tool-use; and 3) jointly reason over historical knowledge from diverse formats and time to accurately predict future events. Through comprehensive benchmarking, we aim to establish a reliable framework for assessing the capabilities of LLM agents in forecasting international events, thereby contributing to the development of more accurate and trustworthy models for international relation analysis.

cameocode, isocode, relation, (15 more...)

arXiv.org Artificial Intelligence

2407.01231

Country:

Asia > North Korea (0.14)
Oceania > Australia > Australian Indian Ocean Territories > Territory of Cocos (Keeling) Islands (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(234 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Law (1.00)
Government > Foreign Policy (1.00)
Government > Military (0.93)
Information Technology (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Laissez-Faire Harms: Algorithmic Biases in Generative Language Models

Shieh, Evan, Vassel, Faye-Marie, Sugimoto, Cassidy, Monroe-White, Thema

arXiv.org Artificial IntelligenceApr-16-2024

The rapid deployment of generative language models (LMs) has raised concerns about social biases affecting the well-being of diverse consumers. The extant literature on generative LMs has primarily examined bias via explicit identity prompting. However, prior research on bias in earlier language-based technology platforms, including search engines, has shown that discrimination can occur even when identity terms are not specified explicitly. Studies of bias in LM responses to open-ended prompts (where identity classifications are left unspecified) are lacking and have not yet been grounded in end-consumer harms. Here, we advance studies of generative LM bias by considering a broader set of natural use cases via open-ended prompting. In this "laissez-faire" setting, we find that synthetically generated texts from five of the most pervasive LMs (ChatGPT3.5, ChatGPT4, Claude2.0, Llama2, and PaLM2) perpetuate harms of omission, subordination, and stereotyping for minoritized individuals with intersectional race, gender, and/or sexual orientation identities (AI/AN, Asian, Black, Latine, MENA, NH/PI, Female, Non-binary, Queer). We find widespread evidence of bias to an extent that such individuals are hundreds to thousands of times more likely to encounter LM-generated outputs that portray their identities in a subordinated manner compared to representative or empowering portrayals. We also document a prevalence of stereotypes (e.g. perpetual foreigner) in LM-generated outputs that are known to trigger psychological harms that disproportionately affect minoritized individuals. These include stereotype threat, which leads to impaired cognitive performance and increased negative self-perception. Our findings highlight the urgent need to protect consumers from discriminatory harms caused by language models and invest in critical AI education programs tailored towards empowering diverse consumers.

chatgpt3, claude2, dataset, (15 more...)

arXiv.org Artificial Intelligence

2404.07475

Country:

North America > Haiti (0.27)
Europe (0.14)
Asia > Timor-Leste (0.14)
(55 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Information Technology (1.00)
Health & Medicine > Therapeutic Area (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PetKaz at SemEval-2024 Task 3: Advancing Emotion Classification with an LLM for Emotion-Cause Pair Extraction in Conversations

Kazakov, Roman, Petukhova, Kseniia, Kochmar, Ekaterina

arXiv.org Artificial IntelligenceApr-8-2024

In this paper, we present our submission to the SemEval-2023 Task~3 "The Competition of Multimodal Emotion Cause Analysis in Conversations", focusing on extracting emotion-cause pairs from dialogs. Specifically, our approach relies on combining fine-tuned GPT-3.5 for emotion classification and a BiLSTM-based neural network to detect causes. We score 2nd in the ranking for Subtask 1, demonstrating the effectiveness of our approach through one of the highest weighted-average proportional F1 scores recorded at 0.264.

emotion, extraction, utterance, (14 more...)

arXiv.org Artificial Intelligence

2404.05502

Country:

Oceania > Pitcairn (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
(4 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Digital Divides in Scene Recognition: Uncovering Socioeconomic Biases in Deep Learning Systems

Greene, Michelle R., Josyula, Mariam, Si, Wentao, Hart, Jennifer A.

arXiv.org Artificial IntelligenceJan-23-2024

Computer-based scene understanding has influenced fields ranging from urban planning to autonomous vehicle performance, yet little is known about how well these technologies work across social differences. We investigate the biases of deep convolutional neural networks (dCNNs) in scene classification, using nearly one million images from global and US sources, including user-submitted home photographs and Airbnb listings. We applied statistical models to quantify the impact of socioeconomic indicators such as family income, Human Development Index (HDI), and demographic factors from public data sources (CIA and US Census) on dCNN performance. Our analyses revealed significant socioeconomic bias, where pretrained dCNNs demonstrated lower classification accuracy, lower classification confidence, and a higher tendency to assign labels that could be offensive when applied to homes (e.g., "ruin", "slum"), especially in images from homes with lower socioeconomic status (SES). This trend is consistent across two datasets of international images and within the diverse economic and racial landscapes of the United States. This research contributes to understanding biases in computer vision, emphasizing the need for more inclusive and representative training datasets. By mitigating the bias in the computer vision pipelines, we can ensure fairer and more equitable outcomes for applied computer vision, including home valuation and smart home security systems. There is urgency in addressing these biases, which can significantly impact critical decisions in urban development and resource allocation. Our findings also motivate the development of AI systems that better understand and serve diverse communities, moving towards technology that equitably benefits all sectors of society.

classification, classification entropy, dataset, (13 more...)

arXiv.org Artificial Intelligence

2401.13097

Country:

North America > United States (0.67)
Oceania > Samoa (0.04)
Oceania > Pitcairn (0.04)
(204 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Smart Houses & Appliances (0.54)
Health & Medicine > Public Health (0.48)
Banking & Finance > Economy (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GEAR: Augmenting Language Models with Generalizable and Efficient Tool Resolution

Lu, Yining, Yu, Haoping, Khashabi, Daniel

arXiv.org Artificial IntelligenceJul-17-2023

Augmenting large language models (LLM) to use external tools enhances their performance across a variety of tasks. However, prior works over-rely on task-specific demonstration of tool use that limits their generalizability and computational cost due to making many calls to large-scale LLMs. We introduce GEAR, a computationally efficient query-tool grounding algorithm that is generalizable to various tasks that require tool use while not relying on task-specific demonstrations. GEAR achieves better efficiency by delegating tool grounding and execution to small language models (SLM) and LLM, respectively; while leveraging semantic and pattern-based evaluation at both question and answer levels for generalizable tool grounding. We evaluate GEAR on 14 datasets across 6 downstream tasks, demonstrating its strong generalizability to novel tasks, tools and different SLMs. Despite offering more efficiency, GEAR achieves higher precision in tool grounding compared to prior strategies using LLM prompting, thus improving downstream accuracy at a reduced computational cost. For example, we demonstrate that GEAR-augmented GPT-J and GPT-3 outperform counterpart tool-augmented baselines because of better tool use.

accuracy, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2307.08775

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Africa > Ghana (0.05)
Oceania > Pitcairn (0.04)
(8 more...)

Genre:

Research Report (0.82)
Questionnaire & Opinion Survey (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Transformer-based Text Classification on Unified Bangla Multi-class Emotion Corpus

Sourav, Md Sakib Ullah, Wang, Huidong, Mahmud, Mohammad Sultan, Zheng, Hua

arXiv.org Artificial IntelligenceJun-13-2023

While enough research has been done to identify emotions from visual and auditory data, emotion recognition from textual data is still a new and active study topic [4]. WeChat, Twitter, YouTube, Instagram, and Facebook, as well as other Web 2.0 platforms or social networks (SNs), have recently emerged as the most important platforms for social communication [32], education [23], information exchange [31], and other purposes [2, 9, 10] among a variety of people. Users of SN connect, share their thoughts, feelings, and ideas, and participate in discussion groups. Text conversation, or more specifically, emotion classification (EC), is essential to comprehending people's activities since the internet's invisible nature has made it possible for a single user to engage in violent SN speech data [19]. EC is a subset of sentiment analysis (SA).

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2210.06405

Country:

Asia > Singapore (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Oceania > Pitcairn (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Services (0.69)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback