Goto

Collaborating Authors

 Law


INFERENCEDYNAMICS: Efficient Routing Across LLMs through Structured Capability and Knowledge Profiling

arXiv.org Artificial Intelligence

Large Language Model (LLM) routing is a pivotal technique for navigating a diverse landscape of LLMs, aiming to select the best-performing LLMs tailored to the domains of user queries, while managing computational resources. However, current routing approaches often face limitations in scalability when dealing with a large pool of specialized LLMs, or in their adaptability to extending model scope and evolving capability domains. To overcome those challenges, we propose InferenceDynamics, a flexible and scalable multi-dimensional routing framework by modeling the capability and knowledge of models. We operate it on our comprehensive dataset RouteMix, and demonstrate its effectiveness and generalizability in group-level routing using modern benchmarks including MMLU-Pro, GPQA, BigGenBench, and LiveBench, showcasing its ability to identify and leverage top-performing models for given tasks, leading to superior outcomes with efficient resource utilization. The broader adoption of Inference Dynamics can empower users to harness the full specialized potential of the LLM ecosystem, and our code will be made publicly available to encourage further research.


Augmenting LLM Reasoning with Dynamic Notes Writing for Complex QA

arXiv.org Artificial Intelligence

Iterative RAG for multi-hop question answering faces challenges with lengthy contexts and the buildup of irrelevant information. This hinders a model's capacity to process and reason over retrieved content and limits performance. While recent methods focus on compressing retrieved information, they are either restricted to single-round RAG, require finetuning or lack scalability in iterative RAG. To address these challenges, we propose Notes Writing, a method that generates concise and relevant notes from retrieved documents at each step, thereby reducing noise and retaining only essential information. This indirectly increases the effective context length of Large Language Models (LLMs), enabling them to reason and plan more effectively while processing larger volumes of input text. Notes Writing is framework agnostic and can be integrated with different iterative RAG methods. We demonstrate its effectiveness with three iterative RAG methods, across two models and four evaluation datasets. Notes writing yields an average improvement of 15.6 percentage points overall, with minimal increase in output tokens.


Signals of Provenance: Practices & Challenges of Navigating Indicators in AI-Generated Media for Sighted and Blind Individuals

arXiv.org Artificial Intelligence

AI-Generated (AIG) content has become increasingly widespread by recent advances in generative models and the easy-to-use tools that have significantly lowered the technical barriers for producing highly realistic audio, images, and videos through simple natural language prompts. In response, platforms are adopting provable provenance with platforms recommending AIG to be self-disclosed and signaled to users. However, these indicators may be often missed, especially when they rely solely on visual cues and make them ineffective to users with different sensory abilities. To address the gap, we conducted semi-structured interviews (N=28) with 15 sighted and 13 BLV participants to examine their interaction with AIG content through self-disclosed AI indicators. Our findings reveal diverse mental models and practices, highlighting different strengths and weaknesses of content-based (e.g., title, description) and menu-aided (e.g., AI labels) indicators. While sighted participants leveraged visual and audio cues, BLV participants primarily relied on audio and existing assistive tools, limiting their ability to identify AIG. Across both groups, they frequently overlooked menu-aided indicators deployed by platforms and rather interacted with content-based indicators such as title and comments. We uncovered usability challenges stemming from inconsistent indicator placement, unclear metadata, and cognitive overload. These issues were especially critical for BLV individuals due to the insufficient accessibility of interface elements. We provide practical recommendations and design implications for future AIG indicators across several dimensions.


BR-TaxQA-R: A Dataset for Question Answering with References for Brazilian Personal Income Tax Law, including case law

arXiv.org Artificial Intelligence

This paper presents BR-TaxQA-R, a novel dataset designed to support question answering with references in the context of Brazilian personal income tax law. The dataset contains 715 questions from the 2024 official Q&A document published by Brazil's Internal Revenue Service, enriched with statutory norms and administrative rulings from the Conselho Administrativo de Recursos Fiscais (CARF). We implement a Retrieval-Augmented Generation (RAG) pipeline using OpenAI embeddings for searching and GPT-4o-mini for answer generation. We compare different text segmentation strategies and benchmark our system against commercial tools such as ChatGPT and Perplexity.ai using RAGAS-based metrics. Results show that our custom RAG pipeline outperforms commercial systems in Response Relevancy, indicating stronger alignment with user queries, while commercial models achieve higher scores in Factual Correctness and fluency . These findings highlight a trade-off between legally grounded generation and linguistic fluency. Crucially, we argue that human expert evaluation remains essential to ensure the legal validity of AI-generated answers in high-stakes domains such as taxation.


One Bad NOFO? AI Governance in Federal Grantmaking

arXiv.org Artificial Intelligence

Much scholarship considers how U.S. federal agencies govern artificial intelligence (AI) through rulemaking and their own internal use policies. But agencies have an overlooked AI governance role: setting discretionary grant policy when directing billions of dollars in federal financial assistance. These dollars enable state and local entities to study, create, and use AI. This funding not only goes to dedicated AI programs, but also to grantees using AI in the course of meeting their routine grant objectives. As discretionary grantmakers, agencies guide and restrict what grant winners do -- a hidden lever for AI governance. Agencies pull this lever by setting program objectives, judging criteria, and restrictions for AI use. Using a novel dataset of over 40,000 non-defense federal grant notices of funding opportunity (NOFOs) posted to the U.S. federal grants website between 2009 and 2024, we analyze how agencies regulate the use of AI by grantees. We select records mentioning AI and review their stated goals and requirements. We find agencies promoting AI in notice narratives, shaping adoption in ways other records of grant policy might fail to capture. Of the grant opportunities that mention AI, we find only a handful of AI-specific judging criteria or restrictions. This silence holds even when agencies fund AI uses in contexts affecting people's rights and which, under an analogous federal procurement regime, would result in extra oversight. These findings recast grant notices as a site of AI policymaking -- albeit one that is developing out of step with other regulatory efforts and incomplete in its consideration of transparency, accountability, and privacy protections. The paper concludes by drawing lessons from AI procurement scholarship, while identifying distinct challenges in grantmaking that invite further study.


ATR-Bench: A Federated Learning Benchmark for Adaptation, Trust, and Reasoning

arXiv.org Artificial Intelligence

Federated Learning (FL) has emerged as a promising paradigm for collaborative model training while preserving data privacy across decentralized participants. As FL adoption grows, numerous techniques have been proposed to tackle its practical challenges. However, the lack of standardized evaluation across key dimensions hampers systematic progress and fair comparison of FL methods. In this work, we introduce ATR-Bench, a unified framework for analyzing federated learning through three foundational dimensions: Adaptation, Trust, and Reasoning. We provide an in-depth examination of the conceptual foundations, task formulations, and open research challenges associated with each theme. We have extensively benchmarked representative methods and datasets for adaptation to heterogeneous clients and trustworthiness in adversarial or unreliable environments. Due to the lack of reliable metrics and models for reasoning in FL, we only provide literature-driven insights for this dimension. ATR-Bench lays the groundwork for a systematic and holistic evaluation of federated learning with real-world relevance. We will make our complete codebase publicly accessible and a curated repository that continuously tracks new developments and research in the FL literature.


Congress Passed a Sweeping Free-Speech Crackdown--and No One's Talking About It

Slate

Sign up for the Slatest to get the most insightful analysis, criticism, and advice out there, delivered to your inbox daily. Had you scanned any of the latest headlines around the TAKE IT DOWN Act, legislation that President Donald Trump signed into law Monday, you would have come away with a deeply mistaken impression of the bill and its true purpose. The surface-level pitch is that this is a necessary law for addressing nonconsensual intimate images--known more widely as revenge porn. Obfuscating its intent with a classic congressional acronym (Tools to Address Known Exploitation by Immobilizing Technological Deepfakes on Websites and Networks), the TAKE IT DOWN Act purports to help scrub the internet of exploitative, nonconsensual sexual media, whether real or digitally mocked up, at a time when artificial intelligence tools and automated image generators have supercharged its spread. Enforcement is delegated to the Federal Trade Commission, which will give online communities that specialize primarily in user-generated content (e.g., social media, message boards) a heads-up and a 48-hour takedown deadline whenever an appropriate example is reported.


Politico's Newsroom Is Starting a Legal Battle With Management Over AI

WIRED

Politico became one of the first newsrooms last year to win a union contract that included rules on how the media outlet can deploy artificial intelligence. The PEN Guild, which represents Politico and its sister publication, environment and energy site E&E News, is now gearing up for another first. The union's members allege that the AI provisions in their contract have been violated, and they're preparing for a groundbreaking legal dispute with management. The outcome could set a precedent for how much input journalists ultimately have over how AI is used in their newsrooms. Last year, Politico began publishing AI-generated live news summaries during big political events like the Democratic National Convention and the US vice presidential debates.


Who's to Blame When AI Agents Screw Up?

WIRED

Over the past year, veteran software engineer Jay Prakash Thakur has spent his nights and weekends prototyping AI agents that could, in the near future, order meals and engineer mobile apps almost entirely on their own. His agents, while surprisingly capable, have also exposed new legal questions that await companies trying to capitalize on Silicon Valley's hottest new technology. Agents are AI programs that can act mostly independently, allowing companies to automate tasks such as answering customer questions or paying invoices. While ChatGPT and similar chatbots can draft emails or analyze bills upon request, Microsoft and other tech giants expect that agents will tackle more complex functions--and most importantly, do it with little human oversight. The tech industry's most ambitious plans involve multi-agent systems, with dozens of agents someday teaming up to replace entire workforces.


Interview with Gillian Hadfield: Normative infrastructure for AI alignment

AIHub

During the 33rd International Joint Conference on Artificial Intelligence (IJCAI), held in Jeju, I had the opportunity to meet with one of the keynote speakers, Gillian Hadfield. We spoke about her interdisciplinary research, career trajectory, path into AI alignment, law, and general thoughts on AI systems. Transcript: Note: the transcript has been lightly edited for clarity. This is an interview with Professor Gillian Hadfield who was a keynote speaker at IJCAI 2024. She gave a very insightful talk about normative infrastructures and how they can guide our search for AI alignment. Kumar Kshitij Patel (KKP): Could you talk a bit about your background and career trajectory? I want our readers to understand how much interdisciplinary work you've done over the years. Gillian Hadfield (GH): I did a PhD in economics and a law degree, a JD, at Stanford, originally motivated by wanting to think about the big questions about the world. So I read John Rawls' theory of justice when I was an undergraduate, and those are the big questions: how do we organize the world and just institutions, but I was very interested in using more formal methods and social scientific approaches. That's why I decided to do that joint degree. So, this is in the 1980s, and in the early days of starting to use a lot of game theory. I studied information theory, a student of Canaro and Paul Milgram at the economics department at Stanford. I did work on contract theory, bargaining theory, but I was still very interested in going to law school, not to practice law, but to learn about legal institutions and how those work. I was a member of this emerging area of law and economics early in my career, which of course, was interdisciplinary, using economics to think about law and legal institutions.