Law
OneShield -- the Next Generation of LLM Guardrails
DeLuca, Chad, Gentile, Anna Lisa, Asthana, Shubhi, Zhang, Bing, Chowdhary, Pawan, Cheng, Kellen, Shbita, Basel, Li, Pengyuan, Ren, Guang-Jie, Gopisetty, Sandeep
The rise of Large Language Models has created a general excitement about the great potential for a myriad of applications. While LLMs offer many possibilities, questions about safety, privacy, and ethics have emerged, and all the key actors are working to address these issues with protective measures for their own models and standalone solutions. The constantly evolving nature of LLMs makes it extremely challenging to universally shield users against their potential risks, and one-size-fits-all solutions are unfeasible. In this work, we propose OneShield, our stand-alone, model-agnostic and customizable solution to safeguard LLMs. OneShield aims to provide facilities for defining risk factors, expressing and declaring contextual safety and compliance policies, and mitigating LLM risks, with a focus on each specific customer. We describe the implementation of the framework, discuss scalability considerations, and provide usage statistics of OneShield since its initial deployment.
On Gradual Semantics for Assumption-Based Argumentation
Rapberger, Anna, Russo, Fabrizio, Rago, Antonio, Toni, Francesca
In computational argumentation, gradual semantics are fine-grained alternatives to extension-based and labelling-based semantics . They ascribe a dialectical strength to (components of) arguments sanctioning their degree of acceptability. Several gradual semantics have been studied for abstract, bipolar and quantitative bipolar argumentation frameworks (QBAFs), as well as, to a lesser extent, for some forms of structured argumentation. However, this has not been the case for assumption-based argumentation (ABA), despite it being a popular form of structured argumentation with several applications where gradual semantics could be useful. In this paper, we fill this gap and propose a family of novel gradual semantics for equipping assumptions, which are the core components in ABA frameworks, with dialectical strengths. To do so, we use bipolar set-based argumentation frameworks as an abstraction of (potentially non-flat) ABA frameworks and generalise state-of-the-art modular gradual semantics for QBAFs. We show that our gradual ABA semantics satisfy suitable adaptations of desirable properties of gradual QBAF semantics, such as balance and monotonicity. We also explore an argument-based approach that leverages established QBAF modular semantics directly, and use it as baseline. Finally, we conduct experiments with synthetic ABA frameworks to compare our gradual ABA semantics with its argument-based counterpart and assess convergence.
EmissionNet: Air Quality Pollution Forecasting for Agriculture
Saligram, Prady, Bhathal, Tanvir
Air pollution from agricultural emissions is a significant yet often overlooked contributor to environmental and public health challenges. Traditional air quality forecasting models rely on physics-based approaches, which struggle to capture complex, nonlinear pollutant interactions. In this work, we explore forecasting N$_2$O agricultural emissions through evaluating popular architectures, and proposing two novel deep learning architectures, EmissionNet (ENV) and EmissionNet-Transformer (ENT). These models leverage convolutional and transformer-based architectures to extract spatial-temporal dependencies from high-resolution emissions data
Debunking with Dialogue? Exploring AI-Generated Counterspeech to Challenge Conspiracy Theories
Lisker, Mareike, Gottschalk, Christina, Mihaljeviฤ, Helena
Counterspeech is a key strategy against harmful online content, but scaling expert-driven efforts is challenging. Large Language Models (LLMs) present a potential solution, though their use in countering conspiracy theories is under-researched. Unlike for hate speech, no datasets exist that pair conspiracy theory comments with expert-crafted counterspeech. We address this gap by evaluating the ability of GPT-4o, Llama 3, and Mistral to effectively apply counterspeech strategies derived from psychological research provided through structured prompts. Our results show that the models often generate generic, repetitive, or superficial results. Additionally, they over-acknowledge fear and frequently hallucinate facts, sources, or figures, making their prompt-based use in practical applications problematic.
The Urban Impact of AI: Modeling Feedback Loops in Next-Venue Recommendation
Mauro, Giovanni, Minici, Marco, Pappalardo, Luca
Next-venue recommender systems are increasingly embedded in location-based services, shaping individual mobility decisions in urban environments. While their predictive accuracy has been extensively studied, less attention has been paid to their systemic impact on urban dynamics. In this work, we introduce a simulation framework to model the human-AI feedback loop underpinning next-venue recommendation, capturing how algorithmic suggestions influence individual behavior, which in turn reshapes the data used to retrain the models. Our simulations, grounded in real-world mobility data, systematically explore the effects of algorithmic adoption across a range of recommendation strategies. We find that while recommender systems consistently increase individual-level diversity in visited venues, they may simultaneously amplify collective inequality by concentrating visits on a limited subset of popular places. This divergence extends to the structure of social co-location networks, revealing broader implications for urban accessibility and spatial segregation. Our framework operationalizes the feedback loop in next-venue recommendation and offers a novel lens through which to assess the societal impact of AI-assisted mobility-providing a computational tool to anticipate future risks, evaluate regulatory interventions, and inform the design of ethic algorithmic systems.
Fox News AI Newsletter: Your own personal 'superintelligence'
CEO of Meta Mark Zuckerberg arrives for a Senate Judiciary Committee hearing with representatives of social media companies at the Dirksen Senate Office Building on Jan. AI FOR ALL: Meta CEO Mark Zuckerberg on Wednesday announced the tech giant will focus on developing a personal superintelligence for everyone, which will further enable creative and leisurely pursuits. PUSHING BACK: Tech giant Nvidia said on Thursday that its chips do not contain any "backdoors" that would allow others to remotely access or control them, following concerns from China over the security of the company's H20 artificial intelligence chip. EXCLUSIVE CLUB: Microsoft touched 4 trillion in market cap Thursday, joining Nvidia as the only two companies to reach this level. REGULATORY RECALL: The Trump administration's DOGE developed a new tool that leverages artificial intelligence (AI) to review federal regulations for potential elimination, according a new report. ROBOT RAMPAGE: A jaw-dropping video showing a Unitree H1 humanoid robot flailing violently during a test has captured the internet's attention and sparked a new wave of concern about the safety of advanced robotics.
It would begin with a first date and end with him pinning, raping his victims
A serial rapist who used dating apps to meet his victims was sentenced to 111 years to life in state prison on Thursday, according to a statement from the Ventura County district attorney's office. Dustin Ronald Alba, a 31-year-old from Oxnard, was found guilty of the rape and sexual assault of five women last month. He committed his offenses from 2012 to 2020 in the cities of Thousand Oaks, Oxnard and Los Angeles, the release said. Multiple victims of Alba said they met him online through dating apps and social media. After meeting in person, they said he would use his body weight to confine and then assault them, the statement said.
Automating AI Failure Tracking: Semantic Association of Reports in AI Incident Database
Russo, Diego, Orlando, Gian Marco, La Gatta, Valerio, Moscato, Vincenzo
Artificial Intelligence (AI) systems are transforming critical sectors such as healthcare, finance, and transportation, enhancing operational efficiency and decision-making processes. However, their deployment in high-stakes domains has exposed vulnerabilities that can result in significant societal harm. To systematically study and mitigate these risk, initiatives like the AI Incident Database (AIID) have emerged, cataloging over 3,000 real-world AI failure reports. Currently, associating a new report with the appropriate AI Incident relies on manual expert intervention, limiting scalability and delaying the identification of emerging failure patterns. To address this limitation, we propose a retrieval-based framework that automates the association of new reports with existing AI Incidents through semantic similarity modeling. We formalize the task as a ranking problem, where each report-comprising a title and a full textual description-is compared to previously documented AI Incidents based on embedding cosine similarity. Benchmarking traditional lexical methods, cross-encoder architectures, and transformer-based sentence embedding models, we find that the latter consistently achieve superior performance. Our analysis further shows that combining titles and descriptions yields substantial improvements in ranking accuracy compared to using titles alone. Moreover, retrieval performance remains stable across variations in description length, highlighting the robustness of the framework. Finally, we find that retrieval performance consistently improves as the training set expands. Our approach provides a scalable and efficient solution for supporting the maintenance of the AIID.
Transparent AI: The Case for Interpretability and Explainability
Ramachandram, Dhanesh, Joshi, Himanshu, Zhu, Judy, Gandhi, Dhari, Hartman, Lucas, Raval, Ananya
As artificial intelligence systems increasingly inform high-stakes decisions across sectors, transparency has become foundational to responsible and trustworthy AI implementation. Leveraging our role as a leading institute in advancing AI research and enabling industry adoption, we present key insights and lessons learned from practical interpretability applications across diverse domains. This paper offers actionable strategies and implementation guidance tailored to organizations at varying stages of AI maturity, emphasizing the integration of interpretability as a core design principle rather than a retrospective add-on.
EducationQ: Evaluating LLMs' Teaching Capabilities Through Multi-Agent Dialogue Framework
Shi, Yao, Liang, Rongkeng, Xu, Yong
Large language models (LLMs) increasingly serve as educational tools, yet evaluating their teaching capabilities remains challenging due to the resource-intensive, context-dependent, and methodologically complex nature of teacher-student interactions. We introduce EducationQ, a multi-agent dialogue framework that efficiently assesses teaching capabilities through simulated dynamic educational scenarios, featuring specialized agents for teaching, learning, and evaluation. Testing 14 LLMs across major AI Organizations (OpenAI, Meta, Google, Anthropic, and others) on 1,498 questions spanning 13 disciplines and 10 difficulty levels reveals that teaching effectiveness does not correlate linearly with model scale or general reasoning capabilities - with some smaller open-source models outperforming larger commercial counterparts in teaching contexts. This finding highlights a critical gap in current evaluations that prioritize knowledge recall over interactive pedagogy. Our mixed-methods evaluation, combining quantitative metrics with qualitative analysis and expert case studies, identifies distinct pedagogical strengths employed by top-performing models (e.g., sophisticated questioning strategies, adaptive feedback mechanisms). Human expert evaluations show 78% agreement with our automated qualitative analysis of effective teaching behaviors, validating our methodology. EducationQ demonstrates that LLMs-as-teachers require specialized optimization beyond simple scaling, suggesting next-generation educational AI prioritize targeted enhancement of specific pedagogical effectiveness.