Goto

Collaborating Authors

 Law


Explaining Categorical Feature Interactions Using Graph Covariance and LLMs

arXiv.org Machine Learning

Modern datasets often consist of numerous samples with abundant features and associated timestamps. Analyzing such datasets to uncover underlying events typically requires complex statistical methods and substantial domain expertise. A notable example, and the primary data focus of this paper, is the global synthetic dataset from the Counter Trafficking Data Collaborative (CTDC) -- a global hub of human trafficking data containing over 200,000 anonymized records spanning from 2002 to 2022, with numerous categorical features for each record. In this paper, we propose a fast and scalable method for analyzing and extracting significant categorical feature interactions, and querying large language models (LLMs) to generate data-driven insights that explain these interactions. Our approach begins with a binarization step for categorical features using one-hot encoding, followed by the computation of graph covariance at each time. This graph covariance quantifies temporal changes in dependence structures within categorical data and is established as a consistent dependence measure under the Bernoulli distribution. We use this measure to identify significant feature pairs, such as those with the most frequent trends over time or those exhibiting sudden spikes in dependence at specific moments. These extracted feature pairs, along with their timestamps, are subsequently passed to an LLM tasked with generating potential explanations of the underlying events driving these dependence changes. The effectiveness of our method is demonstrated through extensive simulations, and its application to the CTDC dataset reveals meaningful feature pairs and potential data stories underlying the observed feature interactions.


Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video

arXiv.org Artificial Intelligence

We aim to redefine robust ego-motion estimation and photorealistic 3D reconstruction by addressing a critical limitation: the reliance on noise-free data in existing models. While such sanitized conditions simplify evaluation, they fail to capture the unpredictable, noisy complexities of real-world environments. Dynamic motion, sensor imperfections, and synchronization perturbations lead to sharp performance declines when these models are deployed in practice, revealing an urgent need for frameworks that embrace and excel under real-world noise. To bridge this gap, we tackle three core challenges: scalable data generation, comprehensive benchmarking, and model robustness enhancement. First, we introduce a scalable noisy data synthesis pipeline that generates diverse datasets simulating complex motion, sensor imperfections, and synchronization errors. Second, we leverage this pipeline to create Robust-Ego3D, a benchmark rigorously designed to expose noise-induced performance degradation, highlighting the limitations of current learning-based methods in ego-motion accuracy and 3D reconstruction quality. Third, we propose Correspondence-guided Gaussian Splatting (CorrGS), a novel test-time adaptation method that progressively refines an internal clean 3D representation by aligning noisy observations with rendered RGB-D frames from clean 3D map, enhancing geometric alignment and appearance restoration through visual correspondence. Extensive experiments on synthetic and real-world data demonstrate that CorrGS consistently outperforms prior state-of-the-art methods, particularly in scenarios involving rapid motion and dynamic illumination.


PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures

arXiv.org Artificial Intelligence

Writing comprehensive and accurate descriptions of technical drawings in patent documents is crucial to effective knowledge sharing and enabling the replication and protection of intellectual property. However, automation of this task has been largely overlooked by the research community. To this end, we introduce PatentDesc-355K, a novel large-scale dataset containing ~355K patent figures along with their brief and detailed textual descriptions extracted from more than 60K US patent documents. In addition, we propose PatentLMM - a novel multimodal large language model specifically tailored to generate high-quality descriptions of patent figures. Our proposed PatentLMM comprises two key components: (i) PatentMME, a specialized multimodal vision encoder that captures the unique structural elements of patent figures, and (ii) PatentLLaMA, a domain-adapted version of LLaMA fine-tuned on a large collection of patents. Extensive experiments demonstrate that training a vision encoder specifically designed for patent figures significantly boosts the performance, generating coherent descriptions compared to fine-tuning similar-sized off-the-shelf multimodal models. PatentDesc-355K and PatentLMM pave the way for automating the understanding of patent figures, enabling efficient knowledge sharing and faster drafting of patent documents. We make the code and data publicly available.


Overcoming Fairness Trade-offs via Pre-processing: A Causal Perspective

arXiv.org Machine Learning

Training machine learning models for fair decisions faces two key challenges: The \emph{fairness-accuracy trade-off} results from enforcing fairness which weakens its predictive performance in contrast to an unconstrained model. The incompatibility of different fairness metrics poses another trade-off -- also known as the \emph{impossibility theorem}. Recent work identifies the bias within the observed data as a possible root cause and shows that fairness and predictive performance are in fact in accord when predictive performance is measured on unbiased data. We offer a causal explanation for these findings using the framework of the FiND (fictitious and normatively desired) world, a "fair" world, where protected attributes have no causal effects on the target variable. We show theoretically that (i) classical fairness metrics deemed to be incompatible are naturally satisfied in the FiND world, while (ii) fairness aligns with high predictive performance. We extend our analysis by suggesting how one can benefit from these theoretical insights in practice, using causal pre-processing methods that approximate the FiND world. Additionally, we propose a method for evaluating the approximation of the FiND world via pre-processing in practical use cases where we do not have access to the FiND world. In simulations and empirical studies, we demonstrate that these pre-processing methods are successful in approximating the FiND world and resolve both trade-offs. Our results provide actionable solutions for practitioners to achieve fairness and high predictive performance simultaneously.


Envisioning Stakeholder-Action Pairs to Mitigate Negative Impacts of AI: A Participatory Approach to Inform Policy Making

arXiv.org Artificial Intelligence

The potential for negative impacts of AI has rapidly become more pervasive around the world, and this has intensified a need for responsible AI governance. While many regulatory bodies endorse risk-based approaches and a multitude of risk mitigation practices are proposed by companies and academic scholars, these approaches are commonly expert-centered and thus lack the inclusion of a significant group of stakeholders. Ensuring that AI policies align with democratic expectations requires methods that prioritize the voices and needs of those impacted. In this work we develop a participative and forward-looking approach to inform policy-makers and academics that grounds the needs of lay stakeholders at the forefront and enriches the development of risk mitigation strategies. Our approach (1) maps potential mitigation and prevention strategies of negative AI impacts that assign responsibility to various stakeholders, (2) explores the importance and prioritization thereof in the eyes of laypeople, and (3) presents these insights in policy fact sheets, i.e., a digestible format for informing policy processes. We emphasize that this approach is not targeted towards replacing policy-makers; rather our aim is to present an informative method that enriches mitigation strategies and enables a more participatory approach to policy development.


Unlocking the Black Box: Analysing the EU Artificial Intelligence Act's Framework for Explainability in AI

arXiv.org Artificial Intelligence

Published in Law, Innovation and Technology. Published by Taylor & Francis. This AAM (author accepted manuscript/ pre - print) is provided for your own personal use only. It may not be used for resale, reprinting, systematic distribution, emailing, or for any other commercial purpose without the permission of the publisher. Abstract: The lack of explainability of Artificial Intelligence (AI) is one of the first obstacles that the industry and regulators must overcome to mitigate the risks associated with the technology . The need for'eXplainable AI' (XAI) is evident in fields where accountability, ethics and fairness are critical, such as healthcare, credit scoring, policing and the criminal justice system. At the EU level, the notion of explainability is one of the fund amental principles that underpin the AI Act, though the exact XAI techn iques and requirements are still to be determined and tested in practice. This paper explores various approaches and techniques that promise to advance XAI, as well as the challenges of implementing the principle of explainability in AI governance and poli cies. Finally, the paper examines the integration of XAI into EU law, emphasising the issues of standard setting, oversight, and enforcement. Jean Monnet Chair and UNESCO Chair, Associate Professor of International and EU Law, NUP Cyprus, Director of the Jean Monnet Centre of Excellence AI - 2 - TRACE - CRIME (EU - funded), email: g.pavlidis@nup.ac.cy 1. Artificial intelligence (AI) has emerged as a fascinating and influential force in today's technological and business worlds. AI has already started to streamline mundane tasks, advance critical domains of scientific research and disrupt professions and in dustries.


OpenAI Goes MAGA

The Atlantic - Technology

Things were not looking great for OpenAI at the end of last year. The company had been struggling with major delays on its long-awaited GPT-5 and hemorrhaging key talent--notably, Chief Scientist Ilya Sutskever, Chief Technology Officer Mira Murati, and Alec Radford, the researcher who'd set the company on the path of developing GPTs in the first place. Several people who left either joined OpenAI competitors or launched new ones. The start-up's relationship with Microsoft, its biggest backer and a crucial provider of the computing infrastructure needed to train and deploy its AI models, was being investigated by the Federal Trade Commission. And then there was Elon Musk.


Machine Learning-Driven Convergence Analysis in Multijurisdictional Compliance Using BERT and K-Means Clustering

arXiv.org Artificial Intelligence

Digital data continues to grow, there has been a shift towards using effective regulatory mechanisms to safeguard personal information. The CCPA of California and the General Data Protection Regulation (GDPR) of the European Union are two of the most important privacy laws. The regulation is intended to safeguard consumer privacy, but it varies greatly in scope, definitions, and methods of enforcement. This paper presents a fresh approach to adaptive compliance, using machine learning and emphasizing natural language processing (NLP) as the primary focus of comparison between the GDPR and CCPA. Using NLP, this study compares various regulations to identify areas where they overlap or diverge. This includes the "right to be forgotten" provision in the GDPR and the "opt-out of sale" provision under CCPA. International companies can learn valuable lessons from this report, as it outlines strategies for better enforcement of laws across different nations. Additionally, the paper discusses the challenges of utilizing NLP in legal literature and proposes methods to enhance the model-ability of machine learning models for studying regulations. The study's objective is to "bridge the gap between legal knowledge and technical expertise" by developing regulatory compliance strategies that are more efficient in operation and more effective in data protection.


Identifying relevant indicators for monitoring a National Artificial Intelligence Strategy

arXiv.org Artificial Intelligence

Artificial intelligence (AI) has been one of the main drivers for the development of cutting-edge technologies that are impacting society at different levels [1-3]. To harness the benefits of AI, while mitigating the risks, governments are developing National Strategies, seeking geopolitical protagonism and leveraging economic, social and cultural progress [4]. Launched in 2017, the Pan-Canadian Artificial Intelligence Strategy [5] was the first national strategy with the goal of guiding the priorities of AI policy at the country level [6]. Finland also developed its national AI strategy in 2017, closely followed by Japan, France, Germany, and the United Kingdom in 2018.


CoPERLex: Content Planning with Event-based Representations for Legal Case Summarization

arXiv.org Artificial Intelligence

Legal professionals often struggle with lengthy judgments and require efficient summarization for quick comprehension. To address this challenge, we investigate the need for structured planning in legal case summarization, particularly through event-centric representations that reflect the narrative nature of legal case documents. We propose our framework, CoPERLex, which operates in three stages: first, it performs content selection to identify crucial information from the judgment; second, the selected content is utilized to generate intermediate plans through event-centric representations modeled as Subject-Verb-Object tuples; and finally, it generates coherent summaries based on both the content and the structured plan. Our experiments on four legal summarization datasets demonstrate the effectiveness of integrating content selection and planning components, highlighting the advantages of event-centric plans over traditional entity-centric approaches in the context of legal judgements.