tinyurl
Branching Out: Broadening AI Measurement and Evaluation with Measurement Trees
Greenberg, Craig, Hall, Patrick, Jensen, Theodore, Greene, Kristen, Amironesei, Razvan
This paper introduces \textit{measurement trees}, a novel class of metrics designed to combine various constructs into an interpretable multi-level representation of a measurand. Unlike conventional metrics that yield single values, vectors, surfaces, or categories, measurement trees produce a hierarchical directed graph in which each node summarizes its children through user-defined aggregation methods. In response to recent calls to expand the scope of AI system evaluation, measurement trees enhance metric transparency and facilitate the integration of heterogeneous evidence, including, e.g., agentic, business, energy-efficiency, sociotechnical, or security signals. We present definitions and examples, demonstrate practical utility through a large-scale measurement exercise, and provide accompanying open-source Python code. By operationalizing a transparent approach to measurement of complex constructs, this work offers a principled foundation for broader and more interpretable AI evaluation.
- Oceania > Australia > Queensland (0.04)
- North America > United States > Maryland > Montgomery County > Gaithersburg (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Africa > Eswatini > Manzini > Manzini (0.04)
Beyond Explainability: The Case for AI Validation
Feldman, Dalit Ken-Dror, Benoliel, Daniel
Artificial Knowledge (AK) systems are transforming decision-making across critical domains such as healthcare, finance, and criminal justice. However, their growing opacity presents governance challenges that current regulatory approaches, focused predominantly on explainability, fail to address adequately. This article argues for a shift toward validation as a central regulatory pillar. Validation, ensuring the reliability, consistency, and robustness of AI outputs, offers a more practical, scalable, and risk-sensitive alternative to explainability, particularly in high-stakes contexts where interpretability may be technically or economically unfeasible. We introduce a typology based on two axes, validity and explainability, classifying AK systems into four categories and exposing the trade-offs between interpretability and output reliability. Drawing on comparative analysis of regulatory approaches in the EU, US, UK, and China, we show how validation can enhance societal trust, fairness, and safety even where explainability is limited. We propose a forward-looking policy framework centered on pre- and post-deployment validation, third-party auditing, harmonized standards, and liability incentives. This framework balances innovation with accountability and provides a governance roadmap for responsibly integrating opaque, high-performing AK systems into society.
- Asia > China (0.25)
- Asia > Middle East > Israel > Haifa District > Haifa (0.05)
- North America > United States > California (0.05)
- (6 more...)
- Law (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Banking & Finance (1.00)
- (2 more...)
AI Literacy for Legal AI Systems: A practical approach
Legal AI systems are increasingly being adopted by judicial and legal system deployers and providers worldwide to support a range of applications. While they offer potential benefits such as reducing bias, increasing efficiency, and improving accountability, they also pose significant risks, requiring a careful balance between opportunities, and legal and ethical development and deployment. AI literacy, as a legal requirement under the EU AI Act and a critical enabler of ethical AI for deployers and providers, could be a tool to achieve this. The article introduces the term "legal AI systems" and then analyzes the concept of AI literacy and the benefits and risks associated with these systems. This analysis is linked to a broader AI-L concept for organizations that deal with legal AI systems. The outcome of the article, a roadmap questionnaire as a practical tool for developers and providers to assess risks, benefits, and stakeholder concerns, could be useful in meeting societal and regulatory expectations for legal AI.
- North America > United States > Alabama (0.04)
- Europe > Hungary > Csongrád-Csanád County > Szeged (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (12 more...)
- Law > Criminal Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Education > Educational Setting (0.93)
- (6 more...)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
- Information Technology > Artificial Intelligence > Applied AI (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
#AAAI2025 social media round-up: part one
The 39th Annual AAAI Conference on Artificial Intelligence (AAAI 2025) is currently in full swing in Philadelphia. So far, delegates have been treated to tutorials, the first few of the invited talks, and an exciting variety of oral and poster presentations. We take a look at what attendees have been getting up to during the opening days of the event. I'll be presenting our #AAAI2025 tutorial tomorrow on "Symbolic Regression: Towards Interpretability and Automated Scientific Discovery"! https://t.co/UcSNYyrkAe If you're attending AAAI-25 and are interested to learn more about symbolic regression and its potential in… pic.twitter.com/yaeCpcPoQI
Comparative Global AI Regulation: Policy Perspectives from the EU, China, and the US
Chun, Jon, de Witt, Christian Schroeder, Elkins, Katherine
As a powerful and rapidly advancing dual-use technology, AI offers both immense benefits and worrisome risks. In response, governing bodies around the world are developing a range of regulatory AI laws and policies. This paper compares three distinct approaches taken by the EU, China and the US. Within the US, we explore AI regulation at both the federal and state level, with a focus on California's pending Senate Bill 1047. Each regulatory system reflects distinct cultural, political and economic perspectives. Each also highlights differing regional perspectives on regulatory risk-benefit tradeoffs, with divergent judgments on the balance between safety versus innovation and cooperation versus competition. Finally, differences between regulatory frameworks reflect contrastive stances in regards to trust in centralized authority versus trust in a more decentralized free market of self-interested stakeholders. Taken together, these varied approaches to AI innovation and regulation influence each other, the broader international community, and the future of AI regulation.
- Europe > France (0.14)
- Asia > Philippines (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (9 more...)
- Law > Statutes (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Regional Government > Europe Government (1.00)
- Government > Regional Government > Asia Government > China Government (0.68)
Co-design of a novel CMOS highly parallel, low-power, multi-chip neural network accelerator
Hokenmaier, W, Jurasek, R, Bowen, E, Granger, R, Odom, D
Why do security cameras, sensors, and siri use cloud servers instead of on-board computation? The lack of very-low-power, high-performance chips greatly limits the ability to field untethered edge devices. We present the NV-1, a new low-power ASIC AI processor that greatly accelerates parallel processing (> 10X) with dramatic reduction in energy consumption (> 100X), via many parallel combined processor-memory units, i.e., a drastically non-von-Neumann architecture, allowing very large numbers of independent processing streams without bottlenecks due to typical monolithic memory. The current initial prototype fab arises from a successful co-development effort between algorithm- and software-driven architectural design and VLSI design realities. An innovative communication protocol minimizes power usage, and data transport costs among nodes were vastly reduced by eliminating the address bus, through local target address matching. Throughout the development process, the software and architecture teams were able to innovate alongside the circuit design team's implementation effort. A digital twin of the proposed hardware was developed early on to ensure that the technical implementation met the architectural specifications, and indeed the predicted performance metrics have now been thoroughly verified in real hardware test data. The resulting device is currently being used in a fielded edge sensor application; additional proofs of principle are in progress demonstrating the proof on the ground of this new real-world extremely low-power high-performance ASIC device.
- Information Technology (0.72)
- Energy > Power Industry (0.34)
EmoCAM: Toward Understanding What Drives CNN-based Emotion Recognition
Doulfoukar, Youssef, Mertens, Laurent, Vennekens, Joost
Convolutional Neural Networks are particularly suited for image analysis tasks, such as Image Classification, Object Recognition or Image Segmentation. Like all Artificial Neural Networks, however, they are "black box" models, and suffer from poor explainability. This work is concerned with the specific downstream task of Emotion Recognition from images, and proposes a framework that combines CAM-based techniques with Object Detection on a corpus level to better understand on which image cues a particular model, in our case EmoNet, relies to assign a specific emotion to an image. We demonstrate that the model mostly focuses on human characteristics, but also explore the pronounced effect of specific image modifications.
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.06)
- North America > United States > New York > New York County > New York City (0.04)
Time Machine GPT
Drinkall, Felix, Rahimikia, Eghbal, Pierrehumbert, Janet B., Zohren, Stefan
Large language models (LLMs) are often trained on extensive, temporally indiscriminate text corpora, reflecting the lack of datasets with temporal metadata. This approach is not aligned with the evolving nature of language. Conventional methods for creating temporally adapted language models often depend on further pre-training static models on time-specific data. This paper presents a new approach: a series of point-in-time LLMs called Time Machine GPT (TiMaGPT), specifically designed to be nonprognosticative. This ensures they remain uninformed about future factual information and linguistic changes. This strategy is beneficial for understanding language evolution and is of critical importance when applying models in dynamic contexts, such as time-series forecasting, where foresight of future information can prove problematic. We provide access to both the models and training datasets.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Dominican Republic (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (12 more...)
- Energy (0.47)
- Information Technology (0.46)
- Health & Medicine (0.32)
Dynamic AGV Task Allocation in Intelligent Warehouses
Dehghan, Arash, Cevik, Mucahit, Bodur, Merve
This paper explores the integration of Automated Guided Vehicles (AGVs) in warehouse order picking, a crucial and cost-intensive aspect of warehouse operations. The booming AGV industry, accelerated by the COVID-19 pandemic, is witnessing widespread adoption due to its efficiency, reliability, and cost-effectiveness in automating warehouse tasks. This paper focuses on enhancing the picker-to-parts system, prevalent in small to medium-sized warehouses, through the strategic use of AGVs. We discuss the benefits and applications of AGVs in various warehouse tasks, highlighting their transformative potential in improving operational efficiency. We examine the deployment of AGVs by leading companies in the industry, showcasing their varied functionalities in warehouse management. Addressing the gap in research on optimizing operational performance in hybrid environments where humans and AGVs coexist, our study delves into a dynamic picker-to-parts warehouse scenario. We propose a novel approach Neural Approximate Dynamic Programming approach for coordinating a mixed team of human and AGV workers, aiming to maximize order throughput and operational efficiency. This involves innovative solutions for non-myopic decision making, order batching, and battery management. We also discuss the integration of advanced robotics technology in automating the complete order-picking process. Through a comprehensive numerical study, our work offers valuable insights for managing a heterogeneous workforce in a hybrid warehouse setting, contributing significantly to the field of warehouse automation and logistics.
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Spain > Andalusia > Córdoba Province > Córdoba (0.04)
- Europe > Poland (0.04)
- (2 more...)
- Overview (1.00)
- Research Report > New Finding (0.93)
- Transportation (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.34)
- Health & Medicine > Therapeutic Area > Immunology (0.34)
Overview of AuTexTification at IberLEF 2023: Detection and Attribution of Machine-Generated Text in Multiple Domains
Sarvazyan, Areg Mikael, González, José Ángel, Franco-Salvador, Marc, Rangel, Francisco, Chulvi, Berta, Rosso, Paolo
This paper presents the overview of the AuTexTification shared task as part of the IberLEF 2023 Workshop in Iberian Languages Evaluation Forum, within the framework of the SEPLN 2023 conference. AuTexTification consists of two subtasks: for Subtask 1, participants had to determine whether a text is human-authored or has been generated by a large language model. For Subtask 2, participants had to attribute a machine-generated text to one of six different text generation models. Our AuTexTification 2023 dataset contains more than 160.000 texts across two languages (English and Spanish) and five domains (tweets, reviews, news, legal, and how-to articles). A total of 114 teams signed up to participate, of which 36 sent 175 runs, and 20 of them sent their working notes. In this overview, we present the AuTexTification dataset and task, the submitted participating systems, and the results.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Spain > Andalusia > Jaén Province > Jaén (0.04)
- Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
- (6 more...)