Goto

Collaborating Authors

 Law


RKEFino1: A Regulation Knowledge-Enhanced Large Language Model

arXiv.org Artificial Intelligence

--Recent advances in large language models (LLMs) hold great promise for financial applications but introduce critical accuracy and compliance challenges in Digital Regulatory Reporting (DRR). T o address these issues, we propose RKEFino1, a regulation knowledge-enhanced financial reasoning model built upon Fino1, fine-tuned with domain knowledge from XBRL, CDM, and MOF . We formulate two QA tasks--knowledge-based and mathematical reasoning--and introduce a novel Numerical NER task covering financial entities in both sentences and tables. Experimental results demonstrate the effectiveness and generalization capacity of RKEFino1 in compliance-critical financial tasks. The financial industry increasingly leverages reinforcement learning (RL) techniques, giving rise to the interdisciplinary field known as Financial Reinforcement Learning (FinRL), which includes applications in portfolio management, algorithmic trading, and option pricing.


A Fictional Q&A Dataset for Studying Memorization and Knowledge Acquisition

arXiv.org Artificial Intelligence

When language models are trained on textual data, they acquire both knowledge about the structure of language as well as knowledge of facts about the world. At inference time, their knowledge of facts can be leveraged to solve interesting problems and perform useful knowledge work for users. It is well known that language models can verbatim memorize long sequences from their training data. However, it is much less well understood how language models memorize facts seen during training. In this work, we propose a new dataset to specifically empower researchers to study the dual processes of fact memorization and verbatim sequence memorization. The dataset consists of synthetically-generated, webtext-like documents about fictional events, as well as question-answer pairs about the events. We conduct training experiments showing how synthetic data about fictional events can be effective in teasing apart different forms of memorization. We also document the challenges in effectively building realistic, fictional synthetic data.


Scenarios in Computing Research: A Systematic Review of the Use of Scenario Methods for Exploring the Future of Computing Technologies in Society

arXiv.org Artificial Intelligence

Scenario building is an established method to anticipate the future of emerging technologies. Its primary goal is to use narratives to map future trajectories of technology development and sociotechnical adoption. Following this process, risks and benefits can be identified early on, and strategies can be developed that strive for desirable futures. In recent years, computer science has adopted this method and applied it to various technologies, including Artificial Intelligence (AI). Because computing technologies play such an important role in shaping modern societies, it is worth exploring how scenarios are being used as an anticipatory tool in the field -- and what possible traditional uses of scenarios are not yet covered but have the potential to enrich the field. We address this gap by conducting a systematic literature review on the use of scenario building methods in computer science over the last decade (n = 59). We guide the review along two main questions. First, we aim to uncover how scenarios are used in computing literature, focusing especially on the rationale for why scenarios are used. Second, in following the potential of scenario building to enhance inclusivity in research, we dive deeper into the participatory element of the existing scenario building literature in computer science.


Designing DSIC Mechanisms for Data Sharing in the Era of Large Language Models

arXiv.org Artificial Intelligence

Training large language models (LLMs) requires vast amounts of high-quality data from institutions that face legal, privacy, and strategic constraints. Existing data procurement methods often rely on unverifiable trust or ignore heterogeneous provider costs. We introduce a mechanism-design framework for truthful, trust-minimized data sharing that ensures dominant-strategy incentive compatibility (DSIC), individual rationality, and weak budget balance, while rewarding data based on both quality and learning utility. We formalize a model where providers privately know their data cost and quality, and value arises solely from the data's contribution to model performance. Based on this, we propose the Quality-Weighted Marginal-Incentive Auction (Q-MIA), which ranks providers using a virtual cost metric and uses Myerson-style payments to ensure DSIC and budget feasibility. To support settings with limited liquidity or long-term incentives, we introduce the Marginal Utility Token (MUT), which allocates future rights based on marginal contributions. We unify these in Mixed-MIA, a hybrid mechanism balancing upfront payments and deferred rewards. All mechanisms support verifiable, privacy-preserving implementation. Theoretically and empirically, they outperform volume-based and trust-based baselines, eliciting higher-quality data under budget constraints while remaining robust to misreporting and collusion. This establishes a principled foundation for sustainable and fair data markets for future LLMs.


The Coming Crisis of Multi-Agent Misalignment: AI Alignment Must Be a Dynamic and Social Process

arXiv.org Artificial Intelligence

This position paper states that AI Alignment in Multi-Agent Systems (MAS) should be considered a dynamic and interaction-dependent process that heavily depends on the social environment where agents are deployed, either collaborative, cooperative, or competitive. While AI alignment with human values and preferences remains a core challenge, the growing prevalence of MAS in real-world applications introduces a new dynamic that reshapes how agents pursue goals and interact to accomplish various tasks. As agents engage with one another, they must coordinate to accomplish both individual and collective goals. However, this complex social organization may unintentionally misalign some or all of these agents with human values or user preferences. Drawing on social sciences, we analyze how social structure can deter or shatter group and individual values. Based on these analyses, we call on the AI community to treat human, preferential, and objective alignment as an interdependent concept, rather than isolated problems. Finally, we emphasize the urgent need for simulation environments, benchmarks, and evaluation frameworks that allow researchers to assess alignment in these interactive multi-agent contexts before such dynamics grow too complex to control.


Web Intellectual Property at Risk: Preventing Unauthorized Real-Time Retrieval by Large Language Models

arXiv.org Artificial Intelligence

The protection of cyber Intellectual Property (IP) such as web content is an increasingly critical concern. The rise of large language models (LLMs) with online retrieval capabilities enables convenient access to information but often undermines the rights of original content creators. As users increasingly rely on LLM-generated responses, they gradually diminish direct engagement with original information sources, which will significantly reduce the incentives for IP creators to contribute, and lead to a saturating cyberspace with more AI-generated content. In response, we propose a novel defense framework that empowers web content creators to safeguard their web-based IP from unauthorized LLM real-time extraction and redistribution by leveraging the semantic understanding capability of LLMs themselves. Our method follows principled motivations and effectively addresses an intractable black-box optimization problem. Real-world experiments demonstrated that our methods improve defense success rates from 2.5% to 88.6% on different LLMs, outperforming traditional defenses such as configuration-based restrictions.


UK ministers delay AI regulation amid plans for more 'comprehensive' bill

The Guardian

This will not be ready before the next king's speech, and is likely to trigger concerns about delays to regulating the technology. The date for the next king's speech has not been set but several sources said it could take place in May 2026. Labour had originally planned to introduce a short, narrowly drafted AI bill within months of entering office that would have been focused on large language models, such as ChatGPT. The legislation would have required companies to hand over their models for testing by the UK's AI Security Institute. It was intended to address concerns that AI models could become so advanced that they posed a risk to humanity.


Government drones used in 'runaway spying operation' to peek into backyards in Sonoma County, lawsuit says

Los Angeles Times

Three residents filed a lawsuit this week against Sonoma County seeking to block code enforcement from using drones to take aerial images of their homes in what the American Civil Liberties Union is calling a "runaway spying operation." The lawsuit, filed by the ACLU Wednesday on behalf of the three residents, alleges that the county began using drones with high-powered cameras and zoom lenses in 2019 to track illegal cannabis cultivation, but in the years since, officials have used the devices more than 700 times to find other code violations on private property without first seeking a warrant. "For too long, Sonoma County code enforcement has used high-powered drones to warrantlessly sift through people's private affairs and initiate charges that upend lives and livelihoods. All the while, the county has hidden these unlawful searches from the people they have spied on, the community, and the media," Matt Cagle, a senior staff attorney with the ACLU Foundation of Northern California, said in a statement. A spokesperson for Sonoma County said the county is reviewing the complaint and takes "the allegations very seriously."


On board the driverless lorries hoping to transform China's transport industry

BBC News

They rumble down the highway between Beijing and Tianjin port: big lorries, loaded up and fully able to navigate themselves. Sure, there is a safety driver in the seat, as per government regulations, but these lorries don't require them, and many analysts say it won't take long before they are gone. When "safety driver" Huo Kangtian, 32, first takes his hands off the wheel, and lets the lorry drive itself, it is somehow impressive and disconcerting in equal measures. For the initial stages of the journey, he is in full control. Then - at a certain point - he hits a few buttons, and the powerful, heavy machine is driving itself, moving at speed along a public road to Tianjin.


High court tells UK lawyers to stop misuse of AI after fake case-law citations

The Guardian

The high court has told senior lawyers to take urgent action to prevent the misuse of artificial intelligence after dozens of fake case-law citations were put before the courts that were either completely fictitious or contained made-up passages. Lawyers are increasingly using AI systems to help them build legal arguments, but two cases this year were blighted by made-up case-law citations that were either definitely or suspected to have been generated by AI. In a 89m damages case against the Qatar National Bank, the claimants made 45 case-law citations, 18 of which turned out to be fictitious, with quotes in many of the others also bogus. The claimant admitted using publicly available AI tools and his solicitor accepted he cited the sham authorities. When Haringey Law Centre challenged the London borough of Haringey over its alleged failure to provide its client with temporary accommodation, its lawyer cited phantom case law five times.