Goto

Collaborating Authors

 Law


LAFA: Agentic LLM-Driven Federated Analytics over Decentralized Data Sources

arXiv.org Artificial Intelligence

Abstract--Large Language Models (LLMs) have shown great promise in automating data analytics tasks by interpreting natural language queries and generating multi-operation execution plans. However, existing LLM-agent-based analytics frameworks operate under the assumption of centralized data access, offering little to no privacy protection. In contrast, federated analytics (F A) enables privacy-preserving computation across distributed data sources, but lacks support for natural language input and requires structured, machine-readable queries. In this work, we present LAF A, the first system that integrates LLM-agent-based data analytics with F A. LAF A introduces a hierarchical multi-agent architecture that accepts natural language queries and transforms them into optimized, executable F A workflows. T o improve execution efficiency, an optimizer agent rewrites and merges multiple DAGs, eliminating redundant operations and minimizing computational and communicational overhead. Our experiments demonstrate that LAF A consistently outperforms baseline prompting strategies by achieving higher execution plan success rates and reducing resource-intensive F A operations by a substantial margin. This work establishes a practical foundation for privacy-preserving, LLM-driven analytics that supports natural language input in the F A setting. The rapid development of Large Language Models (LLMs) has offered unprecedented capabilities in natural language understanding, reasoning, and planning [1], significantly transforming the landscape of data analytics. LLMs can interpret complex analytical intents, generate structured code, and orchestrate multi-step tasks by interacting with external environments such as databases and computation sandboxes. These capabilities have led to the emergence of LLM-based agents that decompose high-level queries, plan analytical workflows, and execute or verify results through tool interactions.


Algorithmic Collusion of Pricing and Advertising on E-commerce Platforms

arXiv.org Artificial Intelligence

When online sellers use AI learning algorithms to automatically compete on e-commerce platforms, there is concern that they will learn to coordinate on higher than competitive prices. However, this concern was primarily raised in single-dimension price competition. We investigate whether this prediction holds when sellers make pricing and advertising decisions together, i.e., two-dimensional decisions. We analyze competition in multi-agent reinforcement learning, and use a large-scale dataset from Amazon.com to provide empirical evidence. We show that when consumers have high search costs, learning algorithms can coordinate on prices lower than competitive prices, facilitating a win-win-win for consumers, sellers, and platforms. This occurs because algorithms learn to coordinate on lower advertising bids, which lower advertising costs, leading to lower prices and enlarging demand on the platform. We also show that our results generalize to any learning algorithm that uses exploration of price and advertising bids. Consistent with our predictions, an empirical analysis shows that price levels exhibit a negative interaction between estimated consumer search costs and algorithm usage index. We analyze the platform's strategic response and find that reserve price adjustments will not increase platform profits, but commission adjustments will, while maintaining the beneficial outcomes for both sellers and consumers.


LLMs Can Covertly Sandbag on Capability Evaluations Against Chain-of-Thought Monitoring

arXiv.org Artificial Intelligence

Trustworthy evaluations of dangerous capabilities are increasingly crucial for determining whether an AI system is safe to deploy. One empirically demonstrated threat is sandbagging - the strategic underperformance on evaluations by AI models or their developers. A promising defense is to monitor a model's chain-of-thought (CoT) reasoning, as this could reveal its intentions and plans. In this work, we measure the ability of models to sandbag on dangerous capability evaluations against a CoT monitor by prompting them to sandbag while being either monitor-oblivious or monitor-aware. We show that both frontier models and small open-sourced models can covertly sandbag against CoT monitoring 0-shot without hints. However, they cannot yet do so reliably: they bypass the monitor 16-36% of the time when monitor-aware, conditioned on sandbagging successfully. We qualitatively analyzed the uncaught CoTs to understand why the monitor failed. We reveal a rich attack surface for CoT monitoring and contribute five covert sandbagging policies generated by models. These results inform potential failure modes of CoT monitoring and may help build more diverse sandbagging model organisms.


Are YOU addicted to ChatGPT? Scientists warn something strange is happening to people who use AI too often

Daily Mail - Science & tech

How Andrew's'rude' comment about Kate sparked bitter feud between ex-prince and William - who'couldn't wait for the day' when Charles finally threw him out Once a typical Californian'blue' enclave, a beachside paradise is now burning red... and it's coming for Gavin Newsom All the winning cards are now in her hands. I know her next move - it's devastating'I saw Aileen Wournos 12 hours before she was executed and she finally admitted she was a serial killer': How the 46-year-old executed for murdering seven men in just one year confessed her sins to her best friend in their final meeting Nancy Mace accused of throwing explosive airport tantrum at cops after curb pickup mix-up... as she fires back See the best celebrity costumes from Heidi Klum's iconic 2025 Halloween party... and the scariest Watch'naked nanny' accused of murdering hero grandpa with screwdriver as she frolics with 2-year-old in new videos... and her dark spiral is revealed Outrage over America's worst school where students fight, smoke weed and have sex in full view of horrified neighbors The whispers about Oprah's best girl Gayle King are reaching fever pitch among all my media friends. ISIS-inspired terror plot hatched by'homegrown radicals' thwarted by FBI as agents raid suburban home and arrest child New York City Marathon legend Dave Obelkevich, 82, reveals what's kept him pounding NYC streets for five decades We lost 100 lbs without taking'easy way out' Ozempic by using these'traditional' methods: They're simple daily habits... that ended our 1,000-calorie donuts binges for good Are YOU addicted to ChatGPT? People who use AI too often are experiencing a strange and concerning new psychological condition, experts have warned. Psychologists say that fans of popular chatbots like ChatGPT, Claude, and Replika are at risk of becoming addicted to AI.


The Great Tree Test: Best Artificial Christmas Trees 2025

WIRED

We brought 10 of the most popular artificial Christmas trees into a studio, had volunteers assemble them, then got three interior designers to pick the best through blind judging. All products featured on WIRED are independently selected by our editors. However, we may receive compensation from retailers and/or from purchases of products through these links. You can spend hours scrolling through lists of the best artificial Christmas trees and still end up wondering what to buy. How real does it look? Are the branches strong enough to hold that lopsided homemade macaroni ornament you've hung on your tree since 2004? We decided to settle the debate once and for all by bringing the best-selling artificial trees from three leading brands into a studio for a blind-judged contest. We got 10 trees from Balsam Hill, King of Christmas, and National Tree Company, then found 10 assemblers to put the trees together and fluff them.


Meta Claims Downloaded Porn at Center of AI Lawsuit Was for 'Personal Use'

WIRED

Meta Claims Downloaded Porn at Center of AI Lawsuit Was for'Personal Use' In a motion to dismiss filed earlier this week, Meta denied claims that employees had downloaded pornography from Strike 3 Holdings to train its artificial intelligence models. This week, Meta asked a US district court to toss a lawsuit alleging that the tech giant illegally torrented pornography to train AI . The move comes after Strike 3 Holdings discovered illegal downloads of some of its adult films on Meta corporate IP addresses, as well as other downloads that Meta allegedly concealed using a "stealth network" of 2,500 "hidden IP addresses." Accusing Meta of stealing porn to secretly train an unannounced adult version of its AI model powering Movie Gen, Strike 3 sought damages that could have exceeded $350 million, TorrentFreak reported . Strike 3 also cited "no facts to suggest that Meta has ever trained an AI model on adult images or video, much less intentionally so," Meta claimed.


Senate Republican demands Google shut down AI model over false rape allegation

FOX News

Sen. Marsha Blackburn, R-Tenn., accused Google's AI Gemma of generating false sexual assault allegations against her and other conservatives in a letter to CEO Sundar Pichai.


DHS rule expands facial recognition to all US ports of entry for foreign travelers

FOX News

This material may not be published, broadcast, rewritten, or redistributed. Quotes displayed in real-time or delayed by at least 15 minutes. Market data provided by Factset . Powered and implemented by FactSet Digital Solutions . Mutual Fund and ETF data provided by Refinitiv Lipper .


Beyond Isolated Dots: Benchmarking Structured Table Construction as Deep Knowledge Extraction

arXiv.org Artificial Intelligence

With the emergence of large language models (LLMs), there is an expectation that LLMs can effectively extract explicit information from complex real-world documents (e.g., papers, reports). However, most LLMs generate paragraph-style answers that are chaotic, disorganized, and untraceable. To bridge this gap, we introduce the Arranged and Organized Extraction Benchmark (AOE), a new bilingual benchmark with data and documents of varying lengths designed to systematically evaluate the ability of LLMs to comprehend fragmented documents and reconstruct isolated information into one organized table. Unlike conventional text-to-table tasks, which rely on fixed schema and narrow task domains, AOE includes 11 carefully crafted tasks across three diverse domains, requiring models to generate context-specific schema tailored to varied input queries. In the experiment, we evaluated both open-source and closed-source state-of-the-art LLMs. The results show that even the most advanced models struggled significantly. The benchmark is available at https://anonymous.4open.science/r/AOE-Benchmark/.


The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs

arXiv.org Artificial Intelligence

With the rapid advancement of artificial intelligence, Large Language Models (LLMs) have shown remarkable capabilities in Natural Language Processing (NLP), including content generation, human-computer interaction, machine translation, and code generation. However, their widespread deployment has also raised significant safety concerns. In particular, LLM-generated content can exhibit unsafe behaviors such as toxicity, bias, or misinformation, especially in adversarial contexts, which has attracted increasing attention from both academia and industry. Although numerous studies have attempted to evaluate these risks, a comprehensive and systematic survey on safety evaluation of LLMs is still lacking. This work aims to fill this gap by presenting a structured overview of recent advances in safety evaluation of LLMs. Specifically, we propose a four-dimensional taxonomy: (i) Why to evaluate, which explores the background of safety evaluation of LLMs, how they differ from general LLMs evaluation, and the significance of such evaluation; (ii) What to evaluate, which examines and categorizes existing safety evaluation tasks based on key capabilities, including dimensions such as toxicity, robustness, ethics, bias and fairness, truthfulness, and related aspects; (iii) Where to evaluate, which summarizes the evaluation metrics, datasets and benchmarks currently used in safety evaluations; (iv) How to evaluate, which reviews existing mainstream evaluation methods based on the roles of the evaluators and some evaluation frameworks that integrate the entire evaluation pipeline. Finally, we identify the challenges in safety evaluation of LLMs and propose promising research directions to promote further advancement in this field. We emphasize the necessity of prioritizing safety evaluation to ensure the reliable and responsible deployment of LLMs in real-world applications.