Goto

Collaborating Authors

 Media


Beyond Semantic Similarity: Reducing Unnecessary API Calls via Behavior-Aligned Retriever

arXiv.org Artificial Intelligence

Tool-augmented large language models (LLMs) leverage external functions to extend their capabilities, but inaccurate function calls can lead to inefficiencies and increased costs.Existing methods address this challenge by fine-tuning LLMs or using demonstration-based prompting, yet they often suffer from high training overhead and fail to account for inconsistent demonstration samples, which misguide the model's invocation behavior. In this paper, we trained a behavior-aligned retriever (BAR), which provides behaviorally consistent demonstrations to help LLMs make more accurate tool-using decisions. To train the BAR, we construct a corpus including different function-calling behaviors, i.e., calling or non-calling.We use the contrastive learning framework to train the BAR with customized positive/negative pairs and a dual-negative contrastive loss, ensuring robust retrieval of behaviorally consistent examples.Experiments demonstrate that our approach significantly reduces erroneous function calls while maintaining high task performance, offering a cost-effective and efficient solution for tool-augmented LLMs.


Security Concerns for Large Language Models: A Survey

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing (NLP), including text generation, translation, summarization, and code synthesis, as a consequence of which revolutionizing a wide range of AI applications [10, 56, 45]. Models such as OpenAI's ChatGPT series, Google's Gemini, and Anthropic's Claude have been widely deployed in commercial systems, including search engines, customer support, software development tools, and personal assistants [45, 55, 3]. However, as their capabilities grow, so do their attack surfaces and the potential for misuse [51, 77, 50]. While the scale and specific nature of these vulnerabilities are new, the fundamental challenge of ensuring that powerful AI systems operate safely and align with human intent is a longstanding concern in the AI community. Foundational work, such as the identification of concrete problems in AI safety long before the current LLM era, laid the groundwork for understanding issues like reward hacking and negative side effects that remain highly relevant today [1]. The susceptibility arises because the models are trained on vast, yet imperfectly curated, datasets containing potentially harmful content, and because they interact with users through open-ended prompts that can be manipulated [48, 17, 16]. Researchers and practitioners are increasingly concerned that these systems can be manipulated, misused, or even behave in misaligned and potentially deceptive ways [25, 42, 6]. Consequently, the security and alignment of LLMs have become critical areas of study, requiring an understanding of emergent threats and robust, multi-faceted defenses [17, 70, 43].


X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents

arXiv.org Artificial Intelligence

Multi-turn interactions with language models (LMs) pose critical safety risks, as harmful intent can be strategically spread across exchanges. Yet, the vast majority of prior work has focused on single-turn safety, while adaptability and diversity remain among the key challenges of multi-turn red-teaming. To address these challenges, we present X-Teaming, a scalable framework that systematically explores how seemingly harmless interactions escalate into harmful outcomes and generates corresponding attack scenarios. X-Teaming employs collaborative agents for planning, attack optimization, and verification, achieving state-of-the-art multi-turn jailbreak effectiveness and diversity with success rates up to 98.1% across representative leading open-weight and closed-source models. In particular, X-Teaming achieves a 96.2% attack success rate against the latest Claude 3.7 Sonnet model, which has been considered nearly immune to single-turn attacks. Building on X-Teaming, we introduce XGuard-Train, an open-source multi-turn safety training dataset that is 20x larger than the previous best resource, comprising 30K interactive jailbreaks, designed to enable robust multi-turn safety alignment for LMs. Our work offers essential tools and insights for mitigating sophisticated conversational attacks, advancing the multi-turn safety of LMs.


Elon Musk's xAI Sues Apple and OpenAI Over App Store Rankings

WIRED

Elon Musk's xAI filed a lawsuit against Apple and OpenAI on Monday, accusing the companies of behaving like monopolies and claiming Apple deprioritized ChatGPT rivals like Grok in the App Store. "This is a tale of two monopolists joining forces to ensure their continued dominance in a world rapidly driven by the most powerful technology humanity has ever created: artificial intelligence," the lawsuit alleges. "Working in tandem, Defendants Apple and OpenAI have locked up markets to maintain their monopolies and prevent innovators like X and xAI from competing." Grok is currently ranked third in the App Store for free productivity apps--behind only ChatGPT and Gmail. The'uncensored' chatbot is also integrated into Musk's social platform X, which is the number one free news app in the App Store.


Paul McCartney uses AI to 'extricate' John Lennon's voice from two more old demos - following the number 1 success of the 'last Beatles song'

Daily Mail - Science & tech

Paul McCartney enlisted a little help from artificial intelligence to complete the'last Beatles song' two years ago. The track, 'Now and Then', became the first Beatles music to reach number 1 in the UK for 64 years. Now, in an apparent effort to repeat its success, McCartney has once again used AI โ€“ on two more songs. The sophisticated tool called'MAL' is the creation of WingNut Films, the production company headed by Lord of the Rings director Peter Jackson. MAL has managed to extricate John Lennon's voice from two poor-quality demos he made shortly before his death.


A.I. Is Coming for Culture

The New Yorker

I often wake up before dawn, ahead of my wife and kids, so that I can enjoy a little solitary time. I creep downstairs to the silent kitchen, drink a glass of water, and put in my AirPods. Then I choose some music, set up the coffee maker, and sit and listen while the coffee brews. It's in this liminal state that my encounter with the algorithm begins. Groggily, I'll scroll through some dad content on Reddit, or watch photography videos on YouTube, or check Apple News.


A Text-Based Recommender System that Leverages Explicit Affective State Preferences

arXiv.org Artificial Intelligence

The affective attitude of liking a recommended item reflects just one category in a wide spectrum of affective phenomena that also includes emotions such as entranced or intrigued, moods such as cheerful or buoyant, as well as more fine-grained affective states, such as "pleasantly surprised by the conclusion". In this paper, we introduce a novel recommendation task that can leverage a virtually unbounded range of affective states sought explicitly by the user in order to identify items that, upon consumption, are likely to induce those affective states. Correspondingly, we create a large dataset of user preferences containing expressions of fine-grained affective states that are mined from book reviews, and propose a Transformer-based architecture that leverages such affective expressions as input. We then use the resulting dataset of affective states preferences, together with the linked users and their histories of book readings, ratings, and reviews, to train and evaluate multiple recommendation models on the task of matching recommended items with affective preferences. Experiments show that the best results are obtained by models that can utilize textual descriptions of items and user affective preferences.


Transfer Learning via Lexical Relatedness: A Sarcasm and Hate Speech Case Study

arXiv.org Artificial Intelligence

--Detecting hate speech in non-direct forms, such as irony, sarcasm, and innuendos, remains a persistent challenge for social networks. Although sarcasm and hate speech are regarded as distinct expressions, our work explores whether integrating sarcasm as a pre-training step improves implicit hate speech detection and, by extension, explicit hate speech detection. Incorporating samples from ETHOS, Sarcasm on Reddit, and Implicit Hate Corpus, we devised two training strategies to compare the effectiveness of sarcasm pre-training on a CNN+LSTM and BERT+BiLSTM model. The first strategy is a single-step training approach, where a model trained only on sarcasm is then tested on hate speech. The second strategy uses sequential transfer learning to fine-tune models for sarcasm, implicit hate, and explicit hate. Our results show that sarcasm pre-training improved the BERT+BiLSTM's recall by 9.7%, AUC by 7.8%, and F1-score by 6% on ETHOS. On the Implicit Hate Corpus, precision increased by 7.8% when tested only on implicit samples. By incorporating sarcasm into the training process, we show that models can more effectively detect both implicit and explicit hate. Note: This paper contains offensive and derogatory language shown only for demonstration. A key challenge in specialized machine learning is the lack of sufficient data for a given task.


What makes an entity salient in discourse?

arXiv.org Artificial Intelligence

Entities in discourse vary broadly in salience: main participants, objects and locations are noticeable and memorable, while tangential ones are less important and quickly forgotten, raising questions about how humans signal and infer relative salience. Using a graded operationalization of salience based on summary-worthiness in multiple summaries of a discourse, this paper explores data from 24 spoken and written genres of English to extract a multifactorial complex of overt and implicit linguistic cues, such as recurring subjecthood or definiteness, discourse relations and hierarchy across utterances, as well as pragmatic functional inferences based on genre and communicative intent. Tackling the question 'how is the degree of salience expressed for each and every entity mentioned?' our results show that while previous approaches to salience all correlate with our salience scores to some extent, no single generalization is without exceptions, and the phenomenon cuts across all levels of linguistic representation.


Dac-Fake: A Divide and Conquer Framework for Detecting Fake News on Social Media

arXiv.org Artificial Intelligence

With the rapid evolution of technology and the Internet, the proliferation of fake news on social media has become a critical issue, leading to widespread misinformation that can cause societal harm. Traditional fact checking methods are often too slow to prevent the dissemination of false information. Therefore, the need for rapid, automated detection of fake news is paramount. We introduce DaCFake, a novel fake news detection model using a divide and conquer strategy that combines content and context based features. Our approach extracts over eighty linguistic features from news articles and integrates them with either a continuous bag of words or a skipgram model for enhanced detection accuracy. We evaluated the performance of DaCFake on three datasets including Kaggle, McIntire + PolitiFact, and Reuter achieving impressive accuracy rates of 97.88%, 96.05%, and 97.32%, respectively. Additionally, we employed a ten-fold cross validation to further enhance the model's robustness and accuracy. These results highlight the effectiveness of DaCFake in early detection of fake news, offering a promising solution to curb misinformation on social media platforms.