Goto

Collaborating Authors

 Law


Reasoning-Based AI for Startup Evaluation (R.A.I.S.E.): A Memory-Augmented, Multi-Step Decision Framework

arXiv.org Artificial Intelligence

We present a novel framework that bridges the gap between the interpretability of decision trees and the advanced reasoning capabilities of large language models (LLMs) to predict startup success. Our approach leverages chain-of-thought prompting to generate detailed reasoning logs, which are subsequently distilled into structured, human-understandable logical rules. The pipeline integrates multiple enhancements - efficient data ingestion, a two-step refinement process, ensemble candidate sampling, simulated reinforcement learning scoring, and persistent memory - to ensure both stable decision-making and transparent output. Experimental evaluations on curated startup datasets demonstrate that our combined pipeline improves precision by 54% from 0.225 to 0.346 and accuracy by 50% from 0.46 to 0.70 compared to a standalone OpenAI o3 model. Notably, our model achieves over 2x the precision of a random classifier (16%). By combining state-of-the-art AI reasoning with explicit rule-based explanations, our method not only augments traditional decision-making processes but also facilitates expert intervention and continuous policy refinement. This work lays the foundation for the implementation of interpretable LLM-powered decision frameworks in high-stakes investment environments and other domains that require transparent and data-driven insights.


Seeking and leveraging alternative variable dependency concepts in gray-box-elusive bimodal land-use allocation problems

arXiv.org Artificial Intelligence

Solving land-use allocation problems can help us to deal with some of the most urgent global environmental issues. Since these problems are NP-hard, effective optimizers are needed to handle them. The knowledge about variable dependencies allows for proposing such tools. However, in this work, we consider a real-world multi-objective problem for which standard variable dependency discovery techniques are inapplicable. Therefore, using linkage-based variation operators is unreachable. To address this issue, we propose a definition of problem-dedicated variable dependency. On this base, we propose obtaining masks of dependent variables. Using them, we construct three novel crossover operators. The results concerning real-world test cases show that introducing our propositions into two well-known optimizers (NSGA-II, MOEA/D) dedicated to multi-objective optimization significantly improves their effectiveness.


Perceptions of Agentic AI in Organizations: Implications for Responsible AI and ROI

arXiv.org Artificial Intelligence

As artificial intelligence (AI) systems rapidly gain autonomy, the need for robust responsible AI frameworks becomes paramount. This paper investigates how organizations perceive and adapt such frameworks amidst the emerging landscape of increasingly sophisticated agentic AI. Employing an interpretive qualitative approach, the study explores the lived experiences of AI professionals. Findings highlight that the inherent complexity of agentic AI systems and their responsible implementation, rooted in the intricate interconnectedness of responsible AI dimensions and the thematic framework (an analytical structure developed from the data), combined with the novelty of agentic AI, contribute to significant challenges in organizational adaptation, characterized by knowledge gaps, a limited emphasis on stakeholder engagement, and a strong focus on control. These factors, by hindering effective adaptation and implementation, ultimately compromise the potential for responsible AI and the realization of ROI.


A Framework for the Private Governance of Frontier Artificial Intelligence

arXiv.org Artificial Intelligence

This paper presents a proposal for the governance of frontier AI systems through a hybrid public-private system. Private bodies, authorized and overseen by government, provide certifications to developers of frontier AI systems on an opt-in basis. In exchange for opting in, frontier AI firms receive protections from tort liability for customer misuse of their models. Before detailing the proposal, the paper explores more commonly discussed approaches to AI governance, analyzing their strengths and flaws. It also examines the nature of frontier AI governance itself. The paper includes consideration of the political economic, institutional, legal, safety, and other merits and tradeoffs inherent in the governance system it proposes.


Images of AI – between fiction and function

AIHub

In this blog post, Dominik Vrabič Dežman provides a summary of his recent research article, 'Promising the future, encoding the past: AI hype and public media imagery'. Dominik also draws attention to the algorithms which perpetuate the dominance of familiar and sensationalist visuals and calls for movements which reshape media systems to make better images of AI more visible in public discourse. The full paper is published in the AI and Ethics Journal's special edition on'The Ethical Implications of AI Hype, a collection edited by We and AI. AI promises innovation, yet its imagery remains trapped in the past. Deep-blue, sci-fi-inflected visuals have flooded public media, saturating our collective imagination with glowing, retro-futuristic interfaces and humanoid robots.


Brazilian butt lift ads banned by UK regulator

BBC News

The advertising watchdog says it has been using AI to proactively search for online ads that might break the rules. Three of the clinics - Beautyjenics, Bomb Doll Aesthetics and Ccskinlondondubai -did not respond to the ASA's inquiries. Rejuvenate Clinics said it has reviewed ASA guidance and will remove all references to time-limited offers and state in ads that the surgery is carried out by a medical professional with ultrasound, to minimise risks and enhance safety. EME Aesthetics said all its clients are given a full consultation and are under no obligation to book any procedures, and it therefore considers that its ad had not pressured consumers or trivialised the risks of cosmetic procedures. Dr Ducu said it will ensure it follows the ASA's rules and guidance, that the time-limited Black Friday offer was intended to provide consumers with an opportunity to access the company's services at a discounted rate, and it always encourages consumers to make informed decisions without pressure.


Predictive Multiplicity in Survival Models: A Method for Quantifying Model Uncertainty in Predictive Maintenance Applications

arXiv.org Machine Learning

In many applications, especially those involving prediction, models may yield near-optimal performance yet significantly disagree on individual-level outcomes. This phenomenon, known as predictive multiplicity, has been formally defined in binary, probabilistic, and multi-target classification, and undermines the reliability of predictive systems. However, its implications remain unexplored in the context of survival analysis, which involves estimating the time until a failure or similar event while properly handling censored data. We frame predictive multiplicity as a critical concern in survival-based models and introduce formal measures -- ambiguity, discrepancy, and obscurity -- to quantify it. This is particularly relevant for downstream tasks such as maintenance scheduling, where precise individual risk estimates are essential. Understanding and reporting predictive multiplicity helps build trust in models deployed in high-stakes environments. We apply our methodology to benchmark datasets from predictive maintenance, extending the notion of multiplicity to survival models. Our findings show that ambiguity steadily increases, reaching up to 40-45% of observations; discrepancy is lower but exhibits a similar trend; and obscurity remains mild and concentrated in a few models. These results demonstrate that multiple accurate survival models may yield conflicting estimations of failure risk and degradation progression for the same equipment. This highlights the need to explicitly measure and communicate predictive multiplicity to ensure reliable decision-making in process health management.


The Robotability Score: Enabling Harmonious Robot Navigation on Urban Streets

arXiv.org Artificial Intelligence

This paper introduces the Robotability Score ($R$), a novel metric that quantifies the suitability of urban environments for autonomous robot navigation. Through expert interviews and surveys, we identify and weigh key features contributing to R for wheeled robots on urban streets. Our findings reveal that pedestrian density, crowd dynamics and pedestrian flow are the most critical factors, collectively accounting for 28% of the total score. Computing robotability across New York City yields significant variation; the area of highest R is 3.0 times more "robotable" than the area of lowest R. Deployments of a physical robot on high and low robotability areas show the adequacy of the score in anticipating the ease of robot navigation. This new framework for evaluating urban landscapes aims to reduce uncertainty in robot deployment while respecting established mobility patterns and urban planning principles, contributing to the discourse on harmonious human-robot environments.


Poly-Vector Retrieval: Reference and Content Embeddings for Legal Documents

arXiv.org Artificial Intelligence

Retrieval-Augmented Generation (RAG) has emerged as an effective paradigm for generating contextually accurate answers by integrating Large Language Models (LLMs) with retrieval mechanisms. However, in legal contexts, users frequently reference norms by their labels or nicknames (e.g., Article 5 of the Constitution or Consumer Defense Code (CDC)), rather than by their content, posing challenges for traditional RAG approaches that rely solely on semantic embeddings of text. Furthermore, legal texts themselves heavily rely on explicit cross-references (e.g., "pursuant to Article 34") that function as pointers. Both scenarios pose challenges for traditional RAG approaches that rely solely on semantic embeddings of text, often failing to retrieve the necessary referenced content. This paper introduces Poly-Vector Retrieval, a method assigning multiple distinct embeddings to each legal provision: one embedding captures the content (the full text), another captures the label (the identifier or proper name), and optionally additional embeddings capture alternative denominations. Inspired by Frege's distinction between Sense and Reference, this poly-vector retrieval approach treats labels, identifiers and reference markers as rigid designators and content embeddings as carriers of semantic substance. Experiments on the Brazilian Federal Constitution demonstrate that Poly-Vector Retrieval significantly improves retrieval accuracy for label-centric queries and potential to resolve internal and external cross-references, without compromising performance on purely semantic queries. The study discusses philosophical and practical implications of explicitly separating reference from content in vector embeddings and proposes future research directions for applying this approach to broader legal datasets and other domains characterized by explicit reference identifiers.


Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA

arXiv.org Artificial Intelligence

Checkboxes are critical in real-world document processing where the presence or absence of ticks directly informs data extraction and decision-making processes. Yet, despite the strong performance of Large Vision and Language Models across a wide range of tasks, they struggle with interpreting checkable content. This challenge becomes particularly pressing in industries where a single overlooked checkbox may lead to costly regulatory or contractual oversights. To address this gap, we introduce the CheckboxQA dataset, a targeted resource designed to evaluate and improve model performance on checkbox-related tasks. It reveals the limitations of current models and serves as a valuable tool for advancing document comprehension systems, with significant implications for applications in sectors such as legal tech and finance. The dataset is publicly available at: https://github.com/Snowflake-Labs/CheckboxQA