Goto

Collaborating Authors

 ersion


A Position Paper on the Automatic Generation of Machine Learning Leaderboards

Timmer, Roelien C, Hou, Yufang, Wan, Stephen

arXiv.org Artificial Intelligence

An important task in machine learning (ML) research is comparing prior work, which is often performed via ML leaderboards: a tabular overview of experiments with comparable conditions (e.g., same task, dataset, and metric). However, the growing volume of literature creates challenges in creating and maintaining these leaderboards. To ease this burden, researchers have developed methods to extract leaderboard entries from research papers for automated leaderboard curation. Yet, prior work varies in problem framing, complicating comparisons and limiting real-world applicability. In this position paper, we present the first overview of Automatic Leaderboard Generation (ALG) research, identifying fundamental differences in assumptions, scope, and output formats. We propose an ALG unified conceptual framework to standardise how the ALG task is defined. We offer ALG benchmarking guidelines, including recommendations for datasets and metrics that promote fair, reproducible evaluation. Lastly, we outline challenges and new directions for ALG, such as, advocating for broader coverage by including all reported results and richer metadata.


Modeling the Diachronic Evolution of Legal Norms: An LRMoo-Based, Component-Level, Event-Centric Approach to Legal Knowledge Graphs

de Martim, Hudson

arXiv.org Artificial Intelligence

Representing the temporal evolution of legal norms is a critical challenge for automated processing. While foundational frameworks exist, they lack a formal pattern for granular, component-level versioning, hindering the deterministic point-in-time reconstruction of legal texts required by reliable AI applications. This paper proposes a structured, temporal modeling pattern grounded in the LRMoo ontology. Our approach models a norm's evolution as a diachronic chain of versioned F1 Works, distinguishing between language-agnostic Temporal Versions (TV)-each being a distinct Work-and their monolingual Language Versions (LV), modeled as F2 Expressions. The legislative amendment process is formalized through event-centric modeling, allowing changes to be traced precisely. Using the Brazilian Constitution as a case study, we demonstrate that our architecture enables the exact reconstruction of any part of a legal text as it existed on a specific date. This provides a verifiable semantic backbone for legal knowledge graphs, offering a deterministic foundation for trustworthy legal AI.


Appendix Active Surrogate Estimators: An Active Learning Approach to Label-Efficient Model Evaluation A Code

Neural Information Processing Systems

In Figure 2, we examine the probability of acquiring a '7' as a function of the number of acquired We see that XWED initially focuses on 7s but then diversifies. The XWED behavior is preferable: we are initially unsure about the loss of these points, but once the loss is well characterized for the 7s we should explore other areas as well. B.2 Constant π Fails for Distribution Shift. Figure B.1 (a) shows that, for LURE suffered high variance in Figure 3. In Figure B.1 (b), we observe that ASE continues to Figure B.2 demonstrates that ASEs continue to outperform all other baselines for the task of This result highlights the importance of the adaptive nature of both ASE-and LUREbased active testing. Figure B.2: V ariant of the experiments of 7.3 where we estimate the accuracy of the main model. We here investigate a variation of the experiments in 7.3: reducing the size of the training set to Despite this, Figure B.3 demonstrates that ASEs continue to outperform all baselines.


RegionE: Adaptive Region-Aware Generation for Efficient Image Editing

Chen, Pengtao, Zeng, Xianfang, Zhao, Maosen, Shen, Mingzhu, Ye, Peng, Xiang, Bangyin, Wang, Zhibo, Cheng, Wei, Yu, Gang, Chen, Tao

arXiv.org Artificial Intelligence

Recently, instruction-based image editing (IIE) has received widespread attention. In practice, IIE often modifies only specific regions of an image, while the remaining areas largely remain unchanged. Although these two types of regions differ significantly in generation difficulty and computational redundancy, existing IIE models do not account for this distinction, instead applying a uniform generation process across the entire image. This motivates us to propose RegionE, an adaptive, region-aware generation framework that accelerates IIE tasks without additional training. Specifically, the RegionE framework consists of three main components: 1) Adaptive Region Partition. We observed that the trajectory of unedited regions is straight, allowing for multi-step denoised predictions to be inferred in a single step. Therefore, in the early denoising stages, we partition the image into edited and unedited regions based on the difference between the final estimated result and the reference image. 2) Region-Aware Generation. After distinguishing the regions, we replace multi-step denoising with one-step prediction for unedited areas. For edited regions, the trajectory is curved, requiring local iterative denoising. To improve the efficiency and quality of local iterative generation, we propose the Region-Instruction KV Cache, which reduces computational cost while incorporating global information. 3) Adaptive Velocity Decay Cache. Observing that adjacent timesteps in edited regions exhibit strong velocity similarity, we further propose an adaptive velocity decay cache to accelerate the local denoising process. We applied RegionE to state-of-the-art IIE base models, including Step1X-Edit, FLUX.1 Kontext, and Qwen-Image-Edit. RegionE achieved acceleration factors of 2.57, 2.41, and 2.06. Evaluations by GPT-4o confirmed that semantic and perceptual fidelity were well preserved.


Deep Literature Survey Automation with an Iterative Workflow

Zhang, Hongbo, Cui, Han, Wang, Yidong, Tian, Yijian, Guo, Qi, Wang, Cunxiang, Wu, Jian, Song, Chiyu, Zhang, Yue

arXiv.org Artificial Intelligence

Automatic literature survey generation has attracted increasing attention, yet most existing systems follow a one-shot paradigm, where a large set of papers is retrieved at once and a static outline is generated before drafting. This design often leads to noisy retrieval, fragmented structures, and context overload, ultimately limiting survey quality. Inspired by the iterative reading process of human researchers, we propose \ours, a framework based on recurrent outline generation, in which a planning agent incrementally retrieves, reads, and updates the outline to ensure both exploration and coherence. To provide faithful paper-level grounding, we design paper cards that distill each paper into its contributions, methods, and findings, and introduce a review-and-refine loop with visualization enhancement to improve textual flow and integrate multimodal elements such as figures and tables. Experiments on both established and emerging topics show that \ours\ substantially outperforms state-of-the-art baselines in content coverage, structural coherence, and citation quality, while producing more accessible and better-organized surveys. To provide a more reliable assessment of such improvements, we further introduce Survey-Arena, a pairwise benchmark that complements absolute scoring and more clearly positions machine-generated surveys relative to human-written ones. The code is available at https://github.com/HancCui/IterSurvey\_Autosurveyv2.


An Ontology-Driven Graph RAG for Legal Norms: A Structural, Temporal, and Deterministic Approach

de Martim, Hudson

arXiv.org Artificial Intelligence

Retrieval-Augmented Generation (RAG) systems in the legal domain face a critical challenge: standard, flat-text retrieval is blind to the hierarchical, diachronic, and causal structure of law, leading to anachronistic and unreliable answers. This paper introduces the Structure-Aware Temporal Graph RAG (SAT-Graph RAG), an ontology-driven framework designed to overcome these limitations by explicitly modeling the formal structure and diachronic nature of legal norms. We ground our knowledge graph in a formal, LRMoo-inspired model that distinguishes abstract legal Works from their versioned Expressions. We model temporal states as efficient aggregations that reuse the versioned expressions (CTVs) of unchanged components, and we reify legislative events as first-class Action nodes to make causality explicit and queryable. This structured backbone enables a unified, planner-guided query strategy that applies explicit policies to deterministically resolve complex requests for (i) point-in-time retrieval, (ii) hierarchical impact analysis, and (iii) auditable provenance reconstruction. Through a case study on the Brazilian Constitution, we demonstrate how this approach provides a verifiable, temporally-correct substrate for LLMs, enabling higher-order analytical capabilities while drastically reducing the risk of factual errors. The result is a practical framework for building more trustworthy and explainable legal AI systems.


Appendix Active Surrogate Estimators: An Active Learning Approach to Label-Efficient Model Evaluation A Code

Neural Information Processing Systems

In Figure 2, we examine the probability of acquiring a '7' as a function of the number of acquired We see that XWED initially focuses on 7s but then diversifies. The XWED behavior is preferable: we are initially unsure about the loss of these points, but once the loss is well characterized for the 7s we should explore other areas as well. B.2 Constant π Fails for Distribution Shift. Figure B.1 (a) shows that, for LURE suffered high variance in Figure 3. In Figure B.1 (b), we observe that ASE continues to Figure B.2 demonstrates that ASEs continue to outperform all other baselines for the task of This result highlights the importance of the adaptive nature of both ASE-and LUREbased active testing. Figure B.2: V ariant of the experiments of 7.3 where we estimate the accuracy of the main model. We here investigate a variation of the experiments in 7.3: reducing the size of the training set to Despite this, Figure B.3 demonstrates that ASEs continue to outperform all baselines.


Comparative Analysis of Document-Level Embedding Methods for Similarity Scoring on Shakespeare Sonnets and Taylor Swift Lyrics

Kramer, Klara

arXiv.org Artificial Intelligence

Document similarity assessment plays an important role in various natural language processing (NLP) applications, such as information retrieval, plagiarism detection, recommendation systems, and question answering [11, 19]. For instance, in recommendation systems, document similarity helps personalise suggestions by finding content that closely matches user preference. These tasks rely on accurate measurements of how similar documents are in terms of their structure, content, and meaning, which depends on the way the document is represented computationally. This representation is usually done in vector format and is obtained via document embedding methods. V arious methodologies can be employed to obtain document-level embeddings, and the choice of method directly impacts the accuracy and usefulness of the similarity scores calculated [14, 19].


Designing a Mobile Social and Vocational Reintegration Assistant for Burn-out Outpatient Treatment

Gebhard, Patrick, Schneeberger, Tanja, Dietz, Michael, André, Elisabeth, Bajwa, Nida ul Habib

arXiv.org Artificial Intelligence

Using Social Agents as health-care assistants or trainers is one focus area of IVA research. While their use as physical health-care agents is well established, their employment in the field of psychotherapeutic care comes with daunting challenges. This paper presents our mobile Social Agent EmmA in the role of a vocational reintegration assistant for burn-out outpatient treatment. We follow a typical participatory design approach including experts and patients in order to address requirements from both sides. Since the success of such treatments is related to a patients emotion regulation capabilities, we employ a real-time social signal interpretation together with a computational simulation of emotion regulation that influences the agent's social behavior as well as the situational selection of verbal treatment strategies. Overall, our interdisciplinary approach enables a novel integrative concept for Social Agents as assistants for burn-out patients.


A Linked Aggregate Code for Processing Faces (Revised Version)

Lyons, Michael, Morikawa, Kazunori

arXiv.org Artificial Intelligence

A model of face representation, inspired by the biology of the visual system, is compared to experimental data on the perception of facial similarity. The face representation model uses aggregate primary visual cortex (V1) cell responses topographically linked to a grid covering the face, allowing comparison of shape and texture at corresponding points in two facial images. When a set of relatively similar faces was used as stimuli, this Linked Aggregate Code (LAC) predicted human performance in similarity judgment experiments. When faces of perceivable categories were used, dimensions such as apparent sex and race emerged from the LAC model without training. The dimensional structure of the LAC similarity measure for the mixed category task displayed some psychologically plausible features but also highlighted differences between the model and the human similarity judgements. The human judgements exhibited a racial perceptual bias that was not shared by the LAC model. The results suggest that the LAC based similarity measure may offer a fertile starting point for further modelling studies of face representation in higher visual areas, including studies of the development of biases in face perception.