120 court cases have been caught with AI hallucinations, according to new database
Lawyers representing Anthropic recently got busted for using a false attribution generated by Claude in an expert testimony. But that's one of more than 20 court cases containing AI hallucinations in the past month alone, according to a new database created by French lawyer and data scientist Damien Charlotin. And those were just the ones that were caught in the act. In 2024, which was the first full year of tracking cases, Charlotin found 36 instances. That jumped up to 48 in 2025, and the year is only half way over.
Generalizing Nonlinear ICA Beyond Structural Sparsity
Nonlinear independent component analysis (ICA) aims to uncover the true latent sources from their observable nonlinear mixtures. Despite its significance, the identifiability of nonlinear ICA is known to be impossible without additional assumptions. Recent advances have proposed conditions on the connective structure from sources to observed variables, known as Structural Sparsity, to achieve identifiability in an unsupervised manner. However, the sparsity constraint may not hold universally for all sources in practice. Furthermore, the assumptions of bijectivity of the mixing process and independence among all sources, which arise from the setting of ICA, may also be violated in many real-world scenarios. To address these limitations and generalize nonlinear ICA, we propose a set of new identifiability results in the general settings of undercompleteness, partial sparsity and source dependence, and flexible grouping structures. Specifically, we prove identifiability when there are more observed variables than sources (undercomplete), and when certain sparsity and/or source independence assumptions are not met for some changing sources. Moreover, we show that even in cases with flexible grouping structures (e.g., part of the sources can be divided into irreducible independent groups with various sizes), appropriate identifiability results can also be established. Theoretical claims are supported empirically on both synthetic and real-world datasets.
Report: Tesla has not prepared Austin for robotaxi launch next week
Elon Musk has said that he's now going all-in with his companies after shifting focus from his role as a special government employee for the Trump administration. And there's no shortage of problems for Musk to attend to, including Tesla's recent abysmal quarterly report and crashing Tesla sales numbers in Europe. Now, another big Tesla project may be in jeopardy. Tesla is set to launch its long-awaited robotaxi program in Austin, Texas next week. However, according to a new report from Fortune, the city of Austin is not ready for Tesla's robotaxis just yet. A small fleet of Tesla robotaxis is already up and running in Austin and San Francisco, serving an "early set of employees" in the two cities as part of an initial testing phase.
Ad Auctions for LLMs via Retrieval Augmented Generation
In the field of computational advertising, the integration of ads into the outputs of large language models (LLMs) presents an opportunity to support these services without compromising content integrity. This paper introduces novel auction mechanisms for ad allocation and pricing within the textual outputs of LLMs, leveraging retrieval-augmented generation (RAG). We propose a segment auction where an ad is probabilistically retrieved for each discourse segment (paragraph, section, or entire output) according to its bid and relevance, following the RAG framework, and priced according to competing bids. We show that our auction maximizes logarithmic social welfare, a new notion of welfare that balances allocation efficiency and fairness, and we characterize the associated incentive-compatible pricing rule. These results are extended to multi-ad allocation per segment. An empirical evaluation validates the feasibility and effectiveness of our approach over several ad auction scenarios, and exhibits inherent tradeoffs in metrics as we allow the LLM more flexibility to allocate ads.
Fairness and Efficiency in Online Class Matching MohammadTaghi Hajiaghayi Shayan Chashm Jahan Mohammad Sharifi University of Maryland University of Maryland Sharif University of Technology Suho Shin
The online bipartite matching problem, extensively studied in the literature, deals with the allocation of online arriving vertices (items) to a predetermined set of offline vertices (agents). However, little attention has been given to the concept of class fairness, where agents are categorized into different classes, and the matching algorithm must ensure equitable distribution across these classes. We here focus on randomized algorithms for the fair matching of indivisible items, subject to various definitions of fairness. Our main contribution is the first (randomized) non-wasteful algorithm that simultaneously achieves a 1/2 approximation to class envy-freeness (CEF) while simultaneously ensuring an equivalent approximation to the class proportionality (CPROP) and utilitarian social welfare (USW) objectives. We supplement this result by demonstrating that no non-wasteful algorithm can achieve an ฮฑ-CEF guarantee for ฮฑ > 0.761. In a similar vein, we provide a novel input instance for deterministic divisible matching that demonstrates a nearly tight CEF approximation. Lastly, we define the "price of fairness," which represents the trade-off between optimal and fair matching. We demonstrate that increasing the level of fairness in the approximation of the solution leads to a decrease in the objective of maximizing USW, following an inverse proportionality relationship.
WalkLM: A Uniform Language Model Fine-tuning Framework for Attributed Graph Embedding
We conduct extensive experiments on two new real-world KG datasets, i.e., Freebase The nodes and edges are extracted according to [5]. A large portion of books are labeled into eight genres of literature. Each labeled book has only one label. FB15K-237 is a standard dataset in the knowledge graph community, which contains 310,116 triples with 14,541 entities and 237 relation types. Since we did not manually label the nodes, we only predicted whether a triple is correct or not on this dataset.