Goto

Collaborating Authors

 Law


RobEthiChor: Automated Context-aware Ethics-based Negotiation for Autonomous Robots

arXiv.org Artificial Intelligence

The presence of autonomous systems is growing at a fast pace and it is impacting many aspects of our lives. Designed to learn and act independently, these systems operate and perform decision-making without human intervention. However, they lack the ability to incorporate users' ethical preferences, which are unique for each individual in society and are required to personalize the decision-making processes. This reduces user trust and prevents autonomous systems from behaving according to the moral beliefs of their end-users. When multiple systems interact with differing ethical preferences, they must negotiate to reach an agreement that satisfies the ethical beliefs of all the parties involved and adjust their behavior consequently. To address this challenge, this paper proposes RobEthiChor, an approach that enables autonomous systems to incorporate user ethical preferences and contextual factors into their decision-making through ethics-based negotiation. RobEthiChor features a domain-agnostic reference architecture for designing autonomous systems capable of ethic-based negotiating. The paper also presents RobEthiChor-Ros, an implementation of RobEthiChor within the Robot Operating System (ROS), which can be deployed on robots to provide them with ethics-based negotiation capabilities. To evaluate our approach, we deployed RobEthiChor-Ros on real robots and ran scenarios where a pair of robots negotiate upon resource contention. Experimental results demonstrate the feasibility and effectiveness of the system in realizing ethics-based negotiation. RobEthiChor allowed robots to reach an agreement in more than 73% of the scenarios with an acceptable negotiation time (0.67s on average). Experiments also demonstrate that the negotiation approach implemented in RobEthiChor is scalable.


Many LLMs Are More Utilitarian Than One

arXiv.org Artificial Intelligence

Moral judgment is integral to large language models' (LLMs) social reasoning. As multi-agent systems gain prominence, it becomes crucial to understand how LLMs function when collaborating compared to operating as individual agents. In human moral judgment, group deliberation leads to a Utilitarian Boost: a tendency to endorse norm violations that inflict harm but maximize benefits for the greatest number of people. We study whether a similar dynamic emerges in multi-agent LLM systems. We test six models on well-established sets of moral dilemmas across two conditions: (1) Solo, where models reason independently, and (2) Group, where they engage in multi-turn discussions in pairs or triads. In personal dilemmas, where agents decide whether to directly harm an individual for the benefit of others, all models rated moral violations as more acceptable when part of a group, demonstrating a Utilitarian Boost similar to that observed in humans. However, the mechanism for the Boost in LLMs differed: While humans in groups become more utilitarian due to heightened sensitivity to decision outcomes, LLM groups showed either reduced sensitivity to norms or enhanced impartiality. We report model differences in when and how strongly the Boost manifests. We also discuss prompt and agent compositions that enhance or mitigate the effect. We end with a discussion of the implications for AI alignment, multi-agent design, and artificial moral reasoning. Code available at: https://github.com/baltaci-r/MoralAgents


CANDI: Hybrid Discrete-Continuous Diffusion Models

arXiv.org Machine Learning

While continuous diffusion has shown remarkable success in continuous domains such as image generation, its direct application to discrete data has underperformed compared to purely discrete formulations. This gap is counterintuitive, given that continuous diffusion learns score functions that enable joint evolution across multiple positions. To understand this gap, we introduce token identifiability as an analytical framework for understanding how Gaussian noise corrupts discrete data through two mechanisms: discrete identity corruption and continuous rank degradation. We reveal that these mechanisms scale differently with vocabulary size, creating a temporal dissonance: at noise levels where discrete corruption preserves enough structure for conditional learning, continuous denoising is trivial; at noise levels where continuous denoising is meaningful, discrete corruption destroys nearly all conditional structure. To solve this, we propose CANDI (Continuous ANd DIscrete diffusion), a hybrid framework that decouples discrete and continuous corruption, enabling simultaneous learning of both conditional structure and continuous geometry. We empirically validate the temporal dissonance phenomenon and demonstrate that CANDI successfully avoids it. This unlocks the benefits of continuous diffusion for discrete spaces: on controlled generation, CANDI enables classifier-based guidance with off-the-shelf classifiers through simple gradient addition; on text generation, CANDI outperforms masked diffusion at low NFE, demonstrating the value of learning continuous gradients for discrete spaces. We include the code on the project page available here: https://patrickpynadath1.github.io/candi-lander


Nvidia becomes first 5 trillion firm as AI rally picks up steam

The Japan Times

Nvidia CEO Jensen Huang speaks during an event in Washington on Tuesday. Nvidia achieved a historic $5 trillion market capitalization on Wednesday as CEO Jensen Huang's spree of deals catapults the artificial intelligence frenzy to new heights. The shares closed 3.1% higher at $207.16, propelling Nvidia just over the milestone. It's only been four months since the company cracked the $4 trillion barrier, and the rally has accelerated as Huang forges new agreements to supply companies from Nokia Oyj to Samsung Electronics and Hyundai Motor Group with chips. Nvidia has become the most-important stock in a bull market that's been driven by optimism for AI to revolutionize the global economy.


Character.AI bans users under 18 after being sued over child's suicide

The Guardian

Character.AI bans users under 18 after being sued over child's suicide Move comes as lawmakers move to bar minors from using AI companions and require companies to verify users' age The chatbot company Character.AI will ban users 18 and under from conversing with its virtual companions beginning in late November after months of legal scrutiny. The announced change comes after the company, which enables its users to create characters with which they can have open-ended conversations, faced tough questions over how these AI companions can affect teen and general mental health, including a lawsuit over a child's suicide and a proposed bill that would ban minors from conversing with AI companions. "We're making these changes to our under-18 platform in light of the evolving landscape around AI and teens," the company wrote in its announcement. "We have seen recent news reports raising questions, and have received questions from regulators, about the content teens may encounter when chatting with AI and about how open-ended AI chat in general might affect teens, even when content controls work perfectly." Last year, the company was sued by the family of 14-year-old Sewell Setzer III, who took his own life after allegedly developing an emotional attachment to a character he created on Character.AI.


Character.ai to ban teens from talking to its AI chatbots

BBC News

Character.ai to ban teens from talking to its AI chatbots The platform, founded in 2021, is used by millions to talk to chatbots powered by artificial intelligence (AI). But it is facing several lawsuits in the US from parents, including one over the death of a teenager, with some branding it a clear and present danger to young people. Online safety campaigners have welcomed the move but said the feature should never have been available to children in the first place. Character.ai said it was making the changes after reports and feedback from regulators, safety experts, and parents, which have highlighted concerns about its chatbots' interactions with teens. Experts have previously warned the potential for AI chatbots to make things up, be overly-encouraging, and feign empathy can pose risks to young and vulnerable people.


Scammers target retirees as major 401(k) rule changes loom for 2026 tax year ahead nationwide

FOX News

This material may not be published, broadcast, rewritten, or redistributed. Quotes displayed in real-time or delayed by at least 15 minutes. Market data provided by Factset . Powered and implemented by FactSet Digital Solutions . Mutual Fund and ETF data provided by Refinitiv Lipper .


Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs

arXiv.org Artificial Intelligence

The increasing deployment of Large Vision-Language Models (LVLMs) raises safety concerns under potential malicious inputs. However, existing multimodal safety evaluations primarily focus on model vulnerabilities exposed by static image inputs, ignoring the temporal dynamics of video that may induce distinct safety risks. To bridge this gap, we introduce Video-SafetyBench, the first comprehensive benchmark designed to evaluate the safety of LVLMs under video-text attacks. It comprises 2,264 video-text pairs spanning 48 fine-grained unsafe categories, each pairing a synthesized video with either a harmful query, which contains explicit malice, or a benign query, which appears harmless but triggers harmful behavior when interpreted alongside the video. To generate semantically accurate videos for safety evaluation, we design a controllable pipeline that decomposes video semantics into subject images (what is shown) and motion text (how it moves), which jointly guide the synthesis of query-relevant videos. To effectively evaluate uncertain or borderline harmful outputs, we propose RJScore, a novel LLM-based metric that incorporates the confidence of judge models and human-aligned decision threshold calibration. Extensive experiments show that benign-query video composition achieves average attack success rates of 67.2%, revealing consistent vulnerabilities to video-induced attacks. We believe Video-SafetyBench will catalyze future research into video-based safety evaluation and defense strategies.


Law in Silico: Simulating Legal Society with LLM-Based Agents

arXiv.org Artificial Intelligence

Since real-world legal experiments are often costly or infeasible, simulating legal societies with Artificial Intelligence (AI) systems provides an effective alternative for verifying and developing legal theory, as well as supporting legal administration. Large Language Models (LLMs), with their world knowledge and role-playing capabilities, are strong candidates to serve as the foundation for legal society simulation. However, the application of LLMs to simulate legal systems remains underexplored. In this work, we introduce Law in Silico, an LLM-based agent framework for simulating legal scenarios with individual decision-making and institutional mechanisms of legislation, adjudication, and enforcement. Our experiments, which compare simulated crime rates with real-world data, demonstrate that LLM-based agents can largely reproduce macro-level crime trends and provide insights that align with real-world observations. At the same time, micro-level simulations reveal that a well-functioning, transparent, and adaptive legal system offers better protection of the rights of vulnerable individuals.


Can LLMs Write Faithfully? An Agent-Based Evaluation of LLM-generated Islamic Content

arXiv.org Artificial Intelligence

Large language models are increasingly used for Islamic guidance, but risk misquoting texts, misapplying jurisprudence, or producing culturally inconsistent responses. We pilot an evaluation of GPT-4o, Ansari AI, and Fanar on prompts from authentic Islamic blogs. Our dual-agent framework uses a quantitative agent for citation verification and six-dimensional scoring (e.g., Structure, Islamic Consistency, Citations) and a qualitative agent for five-dimensional side-by-side comparison (e.g., Tone, Depth, Originality). GPT-4o scored highest in Islamic Accuracy (3.93) and Citation (3.38), Ansari AI followed (3.68, 3.32), and Fanar lagged (2.76, 1.82). Despite relatively strong performance, models still fall short in reliably producing accurate Islamic content and citations -- a paramount requirement in faith-sensitive writing. GPT-4o had the highest mean quantitative score (3.90/5), while Ansari AI led qualitative pairwise wins (116/200). Fanar, though trailing, introduces innovations for Islamic and Arabic contexts. This study underscores the need for community-driven benchmarks centering Muslim perspectives, offering an early step toward more reliable AI in Islamic knowledge and other high-stakes domains such as medicine, law, and journalism.