Goto

Collaborating Authors

 Law


Mitigating Societal Cognitive Overload in the Age of AI: Challenges and Directions

arXiv.org Artificial Intelligence

Societal cognitive overload, driven by the deluge of inform ation and complexity in the AI age, poses a critical challenge to human well-being an d societal resilience. This paper argues that mitigating cognitive overload is not only essential for improving present-day life but also a crucial prerequisite fo r navigating the potential risks of advanced AI, including existential threats. W e exa mine how AI exacerbates cognitive overload through various mechanisms, incl uding information proliferation, algorithmic manipulation, automation anxiet ies, deregulation, and the erosion of meaning. The paper reframes the AI safety debate t o center on cognitive overload, highlighting its role as a bridge between near-te rm harms and long-term risks. It concludes by discussing potential institutional adaptations, research directions, and policy considerations that arise from adopti ng an overload-resilient perspective on human-AI alignment, suggesting pathways fo r future exploration rather than prescribing definitive solutions. W e stand at a precipice. Human societies are increasingly st ruggling to process the sheer volume and complexity of information in the digital age, a conditio n dramatically amplified by the rapid proliferation of artificial intelligence (AI). While Toffle r (1970) foresaw "future shock" from accelerating change and Eppler & Mengis (2004); Bawden & Robin son (2009) analyzed individual information overload, Byung-Chul Han, in his critique of ne oliberalism and technological domination (Han, 2017), argues that contemporary society faces a regime of technological domination that exploits and overwhelms the psyche. This exploitation and overwhelming of the psyche, now dramatically amplified by AI-driven information and comple xity, elevates information overload to a systemic crisis: societal cognitive overload .


From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review

arXiv.org Artificial Intelligence

Large language models and autonomous AI agents have evolved rapidly, resulting in a diverse array of evaluation benchmarks, frameworks, and collaboration protocols. However, the landscape remains fragmented and lacks a unified taxonomy or comprehensive survey. Therefore, we present a side-by-side comparison of benchmarks developed between 2019 and 2025 that evaluate these models and agents across multiple domains. In addition, we propose a taxonomy of approximately 60 benchmarks that cover general and academic knowledge reasoning, mathematical problem-solving, code generation and software engineering, factual grounding and retrieval, domain-specific evaluations, multimodal and embodied tasks, task orchestration, and interactive assessments. Furthermore, we review AI-agent frameworks introduced between 2023 and 2025 that integrate large language models with modular toolkits to enable autonomous decision-making and multi-step reasoning. Moreover, we present real-world applications of autonomous AI agents in materials science, biomedical research, academic ideation, software engineering, synthetic data generation, chemical reasoning, mathematical problem-solving, geographic information systems, multimedia, healthcare, and finance. We then survey key agent-to-agent collaboration protocols, namely the Agent Communication Protocol (ACP), the Model Context Protocol (MCP), and the Agent-to-Agent Protocol (A2A). Finally, we discuss recommendations for future research, focusing on advanced reasoning strategies, failure modes in multi-agent LLM systems, automated scientific discovery, dynamic tool integration via reinforcement learning, integrated search capabilities, and security vulnerabilities in agent protocols.


Ethical Challenges of Using Artificial Intelligence in Judiciary

arXiv.org Artificial Intelligence

Artificial intelligence (AI) has emerged as a ubiquitous concept in numerous domains, including the legal system. AI has the potential to revolutionize the functioning of the judiciary and the dispensation of justice. Incorporating AI into the legal system offers the prospect of enhancing decision-making for judges, lawyers, and legal professionals, while concurrently providing the public with more streamlined, efficient, and cost-effective services. The integration of AI into the legal landscape offers manifold benefits, encompassing tasks such as document review, legal research, contract analysis, case prediction, and decision-making. By automating laborious and error-prone procedures, AI has the capacity to alleviate the burden associated with these arduous tasks. Consequently, courts around the world have begun embracing AI technology as a means to enhance the administration of justice. However, alongside its potential advantages, the use of AI in the judiciary poses a range of ethical challenges. These ethical quandaries must be duly addressed to ensure the responsible and equitable deployment of AI systems. This article delineates the principal ethical challenges entailed in employing AI within the judiciary and provides recommendations to effectively address these issues.


Navigating AI Policy Landscapes: Insights into Human Rights Considerations Across IEEE Regions

arXiv.org Artificial Intelligence

This paper explores the integration of human rights considerations into AI regulatory frameworks across different IEEE regions - specifically the United States (Region 1-6), Europe (Region 8), China (part of Region 10), and Singapore (part of Region 10). While all acknowledge the transformative potential of AI and the necessity of ethical guidelines, their regulatory approaches significantly differ. Europe exhibits a rigorous framework with stringent protections for individual rights, while the U.S. promotes innovation with less restrictive regulations. China emphasizes state control and societal order in its AI strategies. In contrast, Singapore's advisory framework encourages self-regulation and aligns closely with international norms. This comparative analysis underlines the need for ongoing global dialogue to harmonize AI regulations that safeguard human rights while promoting technological advancement, reflecting the diverse perspectives and priorities of each region.


Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are advancing at an amazing speed and have become indispensable across academia, industry, and daily applications. To keep pace with the status quo, this survey probes the core challenges that the rise of LLMs poses for evaluation. We identify and analyze two pivotal transitions: (i) from task-specific to capability-based evaluation, which reorganizes benchmarks around core competencies such as knowledge, reasoning, instruction following, multi-modal understanding, and safety; and (ii) from manual to automated evaluation, encompassing dynamic dataset curation and "LLM-as-a-judge" scoring. Yet, even with these transitions, a crucial obstacle persists: the evaluation generalization issue. Bounded test sets cannot scale alongside models whose abilities grow seemingly without limit. We will dissect this issue, along with the core challenges of the above two transitions, from the perspectives of methods, datasets, evaluators, and metrics. Due to the fast evolving of this field, we will maintain a living GitHub repository (links are in each section) to crowd-source updates and corrections, and warmly invite contributors and collaborators.


Technical Challenges in Maintaining Tax Prep Software with Large Language Models

arXiv.org Artificial Intelligence

As the US tax law evolves to adapt to ever-changing politico-economic realities, tax preparation software plays a significant role in helping taxpayers navigate these complexities. The dynamic nature of tax regulations poses a significant challenge to accurately and timely maintaining tax software artifacts. The state-of-the-art in maintaining tax prep software is time-consuming and error-prone as it involves manual code analysis combined with an expert interpretation of tax law amendments. We posit that the rigor and formality of tax amendment language, as expressed in IRS publications, makes it amenable to automatic translation to executable specifications (code). Our research efforts focus on identifying, understanding, and tackling technical challenges in leveraging Large Language Models (LLMs), such as ChatGPT and Llama, to faithfully extract code differentials from IRS publications and automatically integrate them with the prior version of the code to automate tax prep software maintenance.


Optimizing the Privacy-Utility Balance using Synthetic Data and Configurable Perturbation Pipelines

arXiv.org Artificial Intelligence

The Banking, Financial Services, and Insurance (BFSI) sector operates on vast volumes of highly sensitive customer data, creating an enduring tension between the drive for data-driven insights and the imperative to comply with strict privacy and security regulations such as GDPR [1] and CCP A [2]. Traditional anonymization methods like masking, aggregation, k-anonymity, L-diversity, and T-closeness often degrade data quality to the point where sophisticated analytics, fraud detection, risk modeling, and machine learning applications suffer significant performance drops. Moreover, these legacy approaches can remain vulnerable to linkage and inference attacks, undermining both privacy guarantees and competitive innovation in financial institutions. The need for advanced techniques that can create privacy-preserving datasets without sacrificing analytical utility is paramount. In response, advanced techniques for creating privacy-preserving datasets have emerged, broadly categorized as purely synthetic data generation and advanced data perturbation. Purely synthetic data, often created using deep generative models (like GANs), aims to capture the statistical patterns of real data without any one-to-one mapping to real individuals. Advanced data perturbation applies carefully calibrated noise, transformations, and privacy-enhancing techniques like differential privacy to original datasets, seeking to obscure sensitive information while retaining analytical value. These methods can include context-aware transformations, where the nature of the data and its intended use inform the perturbation strategy, ensuring that the resulting dataset remains useful for specific tasks. However, the challenge remains to balance privacy and utility effectively. Traditional methods often fail to provide sufficient privacy guarantees or result in datasets that are too noisy for practical use.


Large Language Model Empowered Privacy-Protected Framework for PHI Annotation in Clinical Notes

arXiv.org Artificial Intelligence

The de-identification of private information in medical data is a crucial process to mitigate the risk of confidentiality breaches, particularly when patient personal details are not adequately removed before the release of medical records. Although rule-based and learning-based methods have been proposed, they often struggle with limited generalizability and require substantial amounts of annotated data for effective performance. Recent advancements in large language models (LLMs) have shown significant promise in addressing these issues due to their superior language comprehension capabilities. However, LLMs present challenges, including potential privacy risks when using commercial LLM APIs and high computational costs for deploying open-source LLMs locally. In this work, we introduce LPPA, an LLM-empowered Privacy-Protected PHI Annotation framework for clinical notes, targeting the English language. By fine-tuning LLMs locally with synthetic notes, LPPA ensures strong privacy protection and high PHI annotation accuracy. Extensive experiments demonstrate LPPA's effectiveness in accurately de-identifying private information, offering a scalable and efficient solution for enhancing patient privacy protection.


Mind the Language Gap: Automated and Augmented Evaluation of Bias in LLMs for High- and Low-Resource Languages

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have exhibited impressive natural language processing capabilities but often perpetuate social biases inherent in their training data. To address this, we introduce MultiLingual Augmented Bias Testing (MLA-BiTe), a framework that improves prior bias evaluation methods by enabling systematic multilingual bias testing. MLA-BiTe leverages automated translation and paraphrasing techniques to support comprehensive assessments across diverse linguistic settings. In this study, we evaluate the effectiveness of MLA-BiTe by testing four state-of-the-art LLMs in six languages -- including two low-resource languages -- focusing on seven sensitive categories of discrimination.


Elon Musk's Doge conflicts of interest worth 2.37bn, Senate report says

The Guardian

Elon Musk and his companies face at least 2.37bn in legal exposure from federal investigations, litigation and regulatory oversight, according to a new report from Senate Democrats. The report attempts to put a number to Musk's many conflicts of interest through his work with his so-called "department of government efficiency" (Doge), warning that he may seek to use his influence to avoid legal liability. The report, which was published on Monday by Democratic members of the Senate homeland security committee's permanent subcommittee on investigations, looked at 65 actual or potential actions against Musk across 11 separate agencies. Investigators calculated the financial liabilities Musk and his companies, such as Tesla, SpaceX and Neuralink, may face in 45 of those actions. Since Donald Trump won re-election last year and Musk took on the role of de facto head of Doge in January, ethics watchdogs and Democratic officials have warned that the Tesla CEO could use his power to oust regulators and quash investigations into his companies.