Goto

Collaborating Authors

 Law


Elon Musk's Doge conflicts of interest worth 2.37bn, Senate report says

The Guardian

Elon Musk and his companies face at least 2.37bn in legal exposure from federal investigations, litigation and regulatory oversight, according to a new report from Senate Democrats. The report attempts to put a number to Musk's many conflicts of interest through his work with his so-called "department of government efficiency" (Doge), warning that he may seek to use his influence to avoid legal liability. The report, which was published on Monday by Democratic members of the Senate homeland security committee's permanent subcommittee on investigations, looked at 65 actual or potential actions against Musk across 11 separate agencies. Investigators calculated the financial liabilities Musk and his companies, such as Tesla, SpaceX and Neuralink, may face in 45 of those actions. Since Donald Trump won re-election last year and Musk took on the role of de facto head of Doge in January, ethics watchdogs and Democratic officials have warned that the Tesla CEO could use his power to oust regulators and quash investigations into his companies.


Commissioner calls for ban on apps that make deepfake nude images of children

The Guardian

Artificial intelligence "nudification" apps that create deepfake sexual images of children should be immediately banned, amid growing fears among teenage girls that they could fall victim, the children's commissioner for England is warning. Girls said they were stopping posting images of themselves on social media out of a fear that generative AI tools could be used to digitally remove their clothes or sexualise them, according to the commissioner's report on the tools, drawing on children's experiences. Although it is illegal to create or share a sexually explicit image of a child, the technology enabling them remains legal, the report noted. "Children have told me they are frightened by the very idea of this technology even being available, let alone used. They fear that anyone โ€“ a stranger, a classmate, or even a friend โ€“ could use a smartphone as a way of manipulating them by creating a naked image using these bespoke apps," the commissioner, Dame Rachel de Souza, said.


TextTIGER: Text-based Intelligent Generation with Entity Prompt Refinement for Text-to-Image Generation

arXiv.org Artificial Intelligence

Generating images from prompts containing specific entities requires models to retain as much entity-specific knowledge as possible. However, fully memorizing such knowledge is impractical due to the vast number of entities and their continuous emergence. To address this, we propose Text-based Intelligent Generation with Entity prompt Refinement (TextTIGER), which augments knowledge on entities included in the prompts and then summarizes the augmented descriptions using Large Language Models (LLMs) to mitigate performance degradation from longer inputs. To evaluate our method, we introduce WiT-Cub (WiT with Captions and Uncomplicated Background-explanations), a dataset comprising captions, images, and an entity list. Experiments on four image generation models and five LLMs show that TextTIGER improves image generation performance in standard metrics (IS, FID, and CLIPScore) compared to caption-only prompts. Additionally, multiple annotators' evaluation confirms that the summarized descriptions are more informative, validating LLMs' ability to generate concise yet rich descriptions. These findings demonstrate that refining prompts with augmented and summarized entity-related descriptions enhances image generation capabilities. The code and dataset will be available upon acceptance.


Local Statistical Parity for the Estimation of Fair Decision Trees

arXiv.org Artificial Intelligence

Given the high computational complexity of decision tree estimation, classical methods construct a tree by adding one node at a time in a recursive way. To facilitate promoting fairness, we propose a fairness criterion local to the tree nodes. We prove how it is related to the Statistical Parity criterion, popular in the Algorithmic Fairness literature, and show how to incorporate it into standard recursive tree estimation algorithms. We present a tree estimation algorithm called Constrained Logistic Regression Tree (C-LRT), which is a modification of the standard CART algorithm using locally linear classifiers and imposing restrictions as done in Constrained Logistic Regression. Finally, we evaluate the performance of trees estimated with C-LRT on datasets commonly used in the Algorithmic Fairness literature, using various classification and fairness metrics. The results confirm that C-LRT successfully allows to control and balance accuracy and fairness.


Even Small Reasoners Should Quote Their Sources: Introducing the Pleias-RAG Model Family

arXiv.org Artificial Intelligence

We introduce a new generation of small reasoning models for RAG, search, and source summarization. Pleias-RAG-350m and Pleias-RAG-1B are mid-trained on a large synthetic dataset emulating the retrieval of a wide variety of multilingual open sources from the Common Corpus. They provide native support for citation and grounding with literal quotes and reintegrate multiple features associated with RAG workflows, such as query routing, query reformulation, and source reranking. Pleias-RAG-350m and Pleias-RAG-1B outperform SLMs below 4 billion parameters on standardized RAG benchmarks (HotPotQA, 2wiki) and are competitive with popular larger models, including Qwen-2.5-7B, Llama-3.1-8B, and Gemma-3-4B. They are the only SLMs to date maintaining consistent RAG performance across leading European languages and ensuring systematic reference grounding for statements. Due to their size and ease of deployment on constrained infrastructure and higher factuality by design, the models unlock a range of new use cases for generative AI.


Aligning Language Models for Icelandic Legal Text Summarization

arXiv.org Artificial Intelligence

The integration of language models in the legal domain holds considerable promise for streamlining processes and improving efficiency in managing extensive workloads. However, the specialized terminology, nuanced language, and formal style of legal texts can present substantial challenges. This study examines whether preference-based training techniques, specifically Reinforcement Learning from Human Feedback and Direct Preference Optimization, can enhance models' performance in generating Icelandic legal summaries that align with domain-specific language standards and user preferences. We compare models fine-tuned with preference training to those using conventional supervised learning. Results indicate that preference training improves the legal accuracy of generated summaries over standard fine-tuning but does not significantly enhance the overall quality of Icelandic language usage. Discrepancies between automated metrics and human evaluations further underscore the importance of qualitative assessment in developing language models for the legal domain.


NoEsis: Differentially Private Knowledge Transfer in Modular LLM Adaptation

arXiv.org Artificial Intelligence

Large Language Models (LLM) are typically trained on vast amounts of data from various sources. Even when designed modularly (e.g., Mixture-of-Experts), LLMs can leak privacy on their sources. Conversely, training such models in isolation arguably prohibits generalization. Large Language Models have brought much disruption in the field of Artificial Intelligence and have transformed various use-cases, from intelligent assistants (Dong et al., 2023) and code copilots (Chen et al., 2021) to agentic web browsing (Zheng et al., 2024) and enhanced tutoring (Ko-talwar et al., 2024). They have shown great scaling potential, devouring terabytes of raw textual or multi-modal data (Kaplan et al., 2020) without their performance plateauing. As this trend continues, all public resources will eventually be consumed. Therefore, tapping into private data silos will become the next significant source of information (Shumailov et al., 2024; Iacob et al., 2024). This introduces the need to orchestrate model training that is somehow separated per region or source. Maintaining separate models, though, quickly becomes intractable and burdensome. Private organizations can own data they want to use for their custom LLM but not expose it publicly Carlini et al. (2021); OpenAI (2023). For instance, client institutions may wish to train domain-specific Copilots (GitHub, 2024) without leaking proprietary information (Niu et al., 2023) to the public domain. To approach this problem, we draw from Modular Learning (Pfeiffer et al., 2023) for routing knowledge across parts of a neural network and adaptively serve to different domains. While off-the-shelf Mixture-of-Experts (MoE) models (Cai et al., 2024) adopt an architecture where different domains can share common parameters - thus enabling knowledge transfer. However, they can introduce privacy risks (Carlini et al., 2019) exactly because of this sharing. In addition, training an entire MoE model under Differential Privacy (DP) significantly reduces its utility as training a large shared backbone network over multiple domains requires adding large amounts of DP noise.


AI Ethics and Social Norms: Exploring ChatGPT's Capabilities From What to How

arXiv.org Artificial Intelligence

Using LLMs in healthcare, Computer-Supported Cooperative Work, and Social Computing requires the examination of ethical and social norms to ensure safe incorporation into human life. We conducted a mixed-method study, including an online survey with 111 participants and an interview study with 38 experts, to investigate the AI ethics and social norms in ChatGPT as everyday life tools. This study aims to evaluate whether ChatGPT in an empirical context operates following ethics and social norms, which is critical for understanding actions in industrial and academic research and achieving machine ethics. The findings of this study provide initial insights into six important aspects of AI ethics, including bias, trustworthiness, security, toxicology, social norms, and ethical data. Significant obstacles related to transparency and bias in unsupervised data collection methods are identified as ChatGPT's ethical concerns.


RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models

arXiv.org Artificial Intelligence

Efforts to ensure the safety of large language models (LLMs) include safety fine-tuning, evaluation, and red teaming. However, despite the widespread use of the Retrieval-Augmented Generation (RAG) framework, AI safety work focuses on standard LLMs, which means we know little about how RAG use cases change a model's safety profile. We conduct a detailed comparative analysis of RAG and non-RAG frameworks with eleven LLMs. We find that RAG can make models less safe and change their safety profile. We explore the causes of this change and find that even combinations of safe models with safe documents can cause unsafe generations. In addition, we evaluate some existing red teaming methods for RAG settings and show that they are less effective than when used for non-RAG settings. Our work highlights the need for safety research and red-teaming methods specifically tailored for RAG LLMs.


LRAGE: Legal Retrieval Augmented Generation Evaluation Tool

arXiv.org Artificial Intelligence

Recently, building retrieval-augmented generation (RAG) systems to enhance the capability of large language models (LLMs) has become a common practice. Especially in the legal domain, previous judicial decisions play a significant role under the doctrine of stare decisis which emphasizes the importance of making decisions based on (retrieved) prior documents. However, the overall performance of RAG system depends on many components: (1) retrieval corpora, (2) retrieval algorithms, (3) rerankers, (4) LLM backbones, and (5) evaluation metrics. Here we propose LRAGE, an open-source tool for holistic evaluation of RAG systems focusing on the legal domain. LRAGE provides GUI and CLI interfaces to facilitate seamless experiments and investigate how changes in the aforementioned five components affect the overall accuracy. We validated LRAGE using multilingual legal benches including Korean (KBL), English (LegalBench), and Chinese (LawBench) by demonstrating how the overall accuracy changes when varying the five components mentioned above. The source code is available at https://github.com/hoorangyee/LRAGE.