Goto

Collaborating Authors

 Law


Activation Space Interventions Can Be Transferred Between Large Language Models

arXiv.org Artificial Intelligence

The study of representation universality in AI models reveals growing convergence across domains, modalities, and architectures. However, the practical applications of representation universality remain largely unexplored. We bridge this gap by demonstrating that safety interventions can be transferred between models through learned mappings of their shared activation spaces. We demonstrate this approach on two well-established AI safety tasks: backdoor removal and refusal of harmful prompts, showing successful transfer of steering vectors that alter the models' outputs in a predictable way. Additionally, we propose a new task, \textit{corrupted capabilities}, where models are fine-tuned to embed knowledge tied to a backdoor. This tests their ability to separate useful skills from backdoors, reflecting real-world challenges. Extensive experiments across Llama, Qwen and Gemma model families show that our method enables using smaller models to efficiently align larger ones. Furthermore, we demonstrate that autoencoder mappings between base and fine-tuned models can serve as reliable ``lightweight safety switches", allowing dynamic toggling between model behaviors.


Talking Back -- human input and explanations to interactive AI systems

arXiv.org Artificial Intelligence

While XAI focuses on providing AI explanations to humans, can the reverse - humans explaining their judgments to AI - foster richer, synergistic human-AI systems? This paper explores various forms of human inputs to AI and examines how human explanations can guide machine learning models toward automated judgments and explanations that align more closely with human concepts.


The Challenge of Identifying the Origin of Black-Box Large Language Models

arXiv.org Artificial Intelligence

The tremendous commercial potential of large language models (LLMs) has heightened concerns about their unauthorized use. Third parties can customize LLMs through fine-tuning and offer only black-box API access, effectively concealing unauthorized usage and complicating external auditing processes. This practice not only exacerbates unfair competition, but also violates licensing agreements. In response, identifying the origin of black-box LLMs is an intrinsic solution to this issue. In this paper, we first reveal the limitations of state-of-the-art passive and proactive identification methods with experiments on 30 LLMs and two real-world black-box APIs. Then, we propose the proactive technique, PlugAE, which optimizes adversarial token embeddings in a continuous space and proactively plugs them into the LLM for tracing and identification. The experiments show that PlugAE can achieve substantial improvement in identifying fine-tuned derivatives. We further advocate for legal frameworks and regulations to better address the challenges posed by the unauthorized use of LLMs.


AI for Scaling Legal Reform: Mapping and Redacting Racial Covenants in Santa Clara County

arXiv.org Artificial Intelligence

Legal reform can be challenging in light of the volume, complexity, and interdependence of laws, codes, and records. One salient example of this challenge is the effort to restrict and remove racially restrictive covenants, clauses in property deeds that historically barred individuals of specific races from purchasing homes. Despite the Supreme Court holding such racial covenants unenforceable in 1948, they persist in property records across the United States. Many jurisdictions have moved to identify and strike these provisions, including California, which mandated in 2021 that all counties implement such a process. Yet the scale can be overwhelming, with Santa Clara County (SCC) alone having over 24 million property deed documents, making purely manual review infeasible. We present a novel approach to addressing this pressing issue, developed through a partnership with the SCC Clerk-Recorder's Office. First, we leverage an open large language model, finetuned to detect racial covenants with high precision and recall. We estimate that this system reduces manual efforts by 86,500 person hours and costs less than 2% of the cost for a comparable off-the-shelf closed model. Second, we illustrate the County's integration of this model into responsible operational practice, including legal review and the creation of a historical registry, and release our model to assist the hundreds of jurisdictions engaged in similar efforts. Finally, our results reveal distinct periods of utilization of racial covenants, sharp geographic clustering, and the disproportionate role of a small number of developers in maintaining housing discrimination. We estimate that by 1950, one in four properties across the County were subject to racial covenants.


Knowledge Augmentation in Federation: Rethinking What Collaborative Learning Can Bring Back to Decentralized Data

arXiv.org Artificial Intelligence

Data, as an observable form of knowledge, has become one of the most important factors of production for the development of Artificial Intelligence (AI). Meanwhile, increasing legislation and regulations on private and proprietary information results in scattered data sources also known as the "data islands". Although some collaborative learning paradigms such as Federated Learning (FL) can enable privacy-preserving training over decentralized data, they have inherent deficiencies in fairness, costs and reproducibility because of being learning-centric, which greatly limits the way how participants cooperate with each other. In light of this, we present a knowledge-centric paradigm termed Knowledge Augmentation in Federation (KAF), with focus on how to enhance local knowledge through collaborative effort. We provide the suggested system architecture, formulate the prototypical optimization objective, and review emerging studies that employ methodologies suitable for KAF. On our roadmap, with a three-way categorization we describe the methods for knowledge expansion, knowledge filtering, and label and feature space correction in the federation. Further, we highlight several challenges and open questions that deserve more attention from the community. With our investigation, we intend to offer new insights for what collaborative learning can bring back to decentralized data.


Physics-consistent machine learning: output projection onto physical manifolds

arXiv.org Artificial Intelligence

Data-driven machine learning models often require extensive datasets, which can be costly or inaccessible, and their predictions may fail to comply with established physical laws. Current approaches for incorporating physical priors mitigate these issues by penalizing deviations from known physical laws, as in physics-informed neural networks, or by designing architectures that automatically satisfy specific invariants. However, penalization approaches do not guarantee compliance with physical constraints for unseen inputs, and invariant-based methods lack flexibility and generality. We propose a novel physics-consistent machine learning method that directly enforces compliance with physical principles by projecting model outputs onto the manifold defined by these laws. This procedure ensures that predictions inherently adhere to the chosen physical constraints, improving reliability and interpretability. Our method is demonstrated on two systems: a spring-mass system and a low-temperature reactive plasma. Compared to purely data-driven models, our approach significantly reduces errors in physical law compliance, enhances predictive accuracy of physical quantities, and outperforms alternatives when working with simpler models or limited datasets. The proposed projection-based technique is versatile and can function independently or in conjunction with existing physics-informed neural networks, offering a powerful, general, and scalable solution for developing fast and reliable surrogate models of complex physical systems, particularly in resource-constrained scenarios.


Shh, don't say that! Domain Certification in LLMs

arXiv.org Machine Learning

Large language models (LLMs) are often deployed to perform constrained tasks, with narrow domains. For example, customer support bots can be built on top of LLMs, relying on their broad language understanding and capabilities to enhance performance. However, these LLMs are adversarially susceptible, potentially generating outputs outside the intended domain. To formalize, assess, and mitigate this risk, we introduce domain certification; a guarantee that accurately characterizes the out-of-domain behavior of language models. We then propose a simple yet effective approach, which we call VALID that provides adversarial bounds as a certificate. Finally, we evaluate our method across a diverse set of datasets, demonstrating that it yields meaningful certificates, which bound the probability of out-of-domain samples tightly with minimum penalty to refusal behavior.


Fox News AI Newsletter: Judge denies Musk's move against OpenAI

FOX News

Gladstone A.I. co-founders and CEOs Edouard Harris and Jeremie Harris explain the major role that A.I will play in national security and warfare on'The Will Cain Show.' Elon Musk met with members of the Senate DOGE caucus at the White House. MUSK'S MOVE BLOCKED: A California judge denied Elon Musk's move to halt OpenAI's efforts to convert it into a for-profit entity, saying in a ruling that the SpaceX and Tesla CEO hadn't met "the high burden required for a preliminary injunction." 'DOWNFALLS' OF AI: A federal judge has declined to impose sanctions on an attorney who submitted a brief that contained incorrect case citations and quotes generated by artificial intelligence. DEFEND YOUR DATA: Windows has always been a favorite target for hackers, but it seems they have now figured out how to actively target Macs as well. We've seen an alarming rise in malware affecting Mac computers, stealing personal data and cryptocurrency.


Judge denies Musk's initial bid to halt OpenAI's for-profit shift but sets trial for fall

The Guardian

A US judge on Tuesday denied Elon Musk's request for a preliminary injunction to pause OpenAI's transition to a for-profit model but agreed to hear a trial in the fall of this year, the latest turn in the high-stakes legal fight. The tech billionaire does not have "the high burden required for a preliminary injunction" to block the conversion of OpenAI, said Yvonne Gonzalez Rogers, a US district judge in Oakland, California. But Rogers wrote in the order that she wanted to resolve the lawsuit quickly given "the public interest at stake and potential for harm if a conversion contrary to law occurred". Musk and OpenAI, which he co-founded as a non-profit in 2015 but left before it took off, have been embroiled in a yearlong legal battle. The CEO of Tesla and X, formerly Twitter, accuses OpenAI of straying from its founding mission to develop artificial intelligence for the good of humanity, not corporate profit.


Court denies Elon Musk's attempt to block OpenAI's for-profit transformation

Engadget

US federal judge Yvonne Gonzalez Rogers has denied Elon Musk's request for an injunction that would have immediately stopped OpenAI's conversion into a for-profit entity. Musk filed for an injunction late last year after suing OpenAI and Microsoft and accusing them of telling investors not to fund rival AI companies, such as his own xAI. According to the Financial Times, the judge dismissed his request based on that claim of anticompetitive behavior. Gonzalez Rogers cited a previous statement by OpenAI CEO Sam Altman, saying that the company only warned certain investors who were granted access to sensitive information that their rights would be terminated if they made a non-passive investment in rival companies. The judge also reportedly rejected the request based on Musk's claim that OpenAI and Altman broke their contract with him and violated the company's founding mission of building AI "for the benefit of humanity."