Goto

Collaborating Authors

 attr



Gradient-Guided Exploration of Generative Model's Latent Space for Controlled Iris Image Augmentations

arXiv.org Artificial Intelligence

Developing reliable iris recognition and presentation attack detection methods requires diverse datasets that capture realistic variations in iris features and a wide spectrum of anomalies. Because of the rich texture of iris images, which spans a wide range of spatial frequencies, synthesizing same-identity iris images while controlling specific attributes remains challenging. In this work, we introduce a new iris image augmentation strategy by traversing a generative model's latent space toward latent codes that represent same-identity samples but with some desired iris image properties manipulated. The latent space traversal is guided by a gradient of specific geometrical, textural, or quality-related iris image features (e.g., sharpness, pupil size, iris size, or pupil-to-iris ratio) and preserves the identity represented by the image being manipulated. The proposed approach can be easily extended to manipulate any attribute for which a differentiable loss term can be formulated. Additionally, our approach can use either randomly generated images using either a pre-train GAN model or real-world iris images. W e can utilize GAN inversion to project any given iris image into the latent space and obtain its corresponding latent code.


Element2Vec: Build Chemical Element Representation from Text for Property Prediction

arXiv.org Artificial Intelligence

Accurate property data for chemical elements is crucial for materials design and manufacturing, but many of them are difficult to measure directly due to equipment constraints. While traditional methods use the properties of other elements or related properties for prediction via numerical analyses, they often fail to model complex relationships. After all, not all characteristics can be represented as scalars. Recent efforts have been made to explore advanced AI tools such as language models for property estimation, but they still suffer from hallucinations and a lack of interpretability. In this paper, we investigate Element2Vecto effectively represent chemical elements from natural languages to support research in the natural sciences. Given the text parsed from Wikipedia pages, we use language models to generate both a single general-purpose embedding (Global) and a set of attribute-highlighted vectors (Local). Despite the complicated relationship across elements, the computational challenges also exist because of 1) the discrepancy in text distribution between common descriptions and specialized scientific texts, and 2) the extremely limited data, i.e., with only 118 known elements, data for specific properties is often highly sparse and incomplete. Thus, we also design a test-time training method based on self-attention to mitigate the prediction error caused by Vanilla regression clearly. We hope this work could pave the way for advancing AI-driven discovery in materials science.



Understanding Software Engineering Agents Through the Lens of Traceability: An Empirical Study

arXiv.org Artificial Intelligence

With the advent of large language models (LLMs), software engineering agents (SWE agents) have emerged as a powerful paradigm for automating a range of software tasks -- from code generation and repair to test case synthesis. These agents operate autonomously by interpreting user input and responding to environmental feedback. While various agent architectures have demonstrated strong empirical performance, the internal decision-making worfklows that drive their behavior remain poorly understood. Deeper insight into these workflows hold promise for improving both agent reliability and efficiency. In this work, we present the first systematic study of SWE agent behavior through the lens of execution traces. Our contributions are as follows: (1) we propose the first taxonomy of decision-making pathways across five representative agents; (2) using this taxonomy, we identify three core components essential to agent success -- bug localization, patch generation, and reproduction test generation -- and study each in depth; (3) we study the impact of test generation on successful patch production; and analyze strategies that can lead to successful test generation; (4) we further conduct the first large-scale code clone analysis comparing agent-generated and developer-written patches and provide a qualitative study revealing structural and stylistic differences in patch content. Together, these findings offer novel insights into agent design and open avenues for building agents that are both more effective and more aligned with human development practices.


WHERE and WHICH: Iterative Debate for Biomedical Synthetic Data Augmentation

arXiv.org Artificial Intelligence

In Biomedical Natural Language Processing (BioNLP) tasks, such as Relation Extraction, Named Entity Recognition, and Text Classification, the scarcity of high-quality data remains a significant challenge. This limitation poisons large language models to correctly understand relationships between biological entities, such as molecules and diseases, or drug interactions, and further results in potential misinterpretation of biomedical documents. To address this issue, current approaches generally adopt the Synthetic Data Augmentation method which involves similarity computation followed by word replacement, but counterfactual data are usually generated. As a result, these methods disrupt meaningful word sets or produce sentences with meanings that deviate substantially from the original context, rendering them ineffective in improving model performance. To this end, this paper proposes a biomedical-dedicated rationale-based synthetic data augmentation method. Beyond the naive lexicon similarity, specific bio-relation similarity is measured to hold the augmented instance having a strong correlation with bio-relation instead of simply increasing the diversity of augmented data. Moreover, a multi-agents-involved reflection mechanism helps the model iteratively distinguish different usage of similar entities to escape falling into the mis-replace trap. We evaluate our method on the BLURB and BigBIO benchmark, which includes 9 common datasets spanning four major BioNLP tasks. Our experimental results demonstrate consistent performance improvements across all tasks, highlighting the effectiveness of our approach in addressing the challenges associated with data scarcity and enhancing the overall performance of biomedical NLP models.


Palette of Language Models: A Solver for Controlled Text Generation

arXiv.org Artificial Intelligence

Recent advancements in large language models have revolutionized text generation with their remarkable capabilities. These models can produce controlled texts that closely adhere to specific requirements when prompted appropriately. However, designing an optimal prompt to control multiple attributes simultaneously can be challenging. A common approach is to linearly combine single-attribute models, but this strategy often overlooks attribute overlaps and can lead to conflicts. Therefore, we propose a novel combination strategy inspired by the Law of Total Probability and Conditional Mutual Information Minimization on generative language models. This method has been adapted for single-attribute control scenario and is termed the Palette of Language Models due to its theoretical linkage between attribute strength and generation style, akin to blending colors on an artist's palette. Moreover, positive correlation and attribute enhancement are advanced as theoretical properties to guide a rational combination strategy design. We conduct experiments on both single control and multiple control settings, and achieve surpassing results.


Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning

arXiv.org Artificial Intelligence

Humans leverage rich internal models of the world to reason about the future, imagine counterfactuals, and adapt flexibly to new situations. In Reinforcement Learning (RL), world models aim to capture how the environment evolves in response to the agent's actions, facilitating planning and generalization. However, typical world models directly operate on the environment variables (e.g. pixels, physical attributes), which can make their training slow and cumbersome; instead, it may be advantageous to rely on high-level latent dimensions that capture relevant multimodal variables. Global Workspace (GW) Theory offers a cognitive framework for multimodal integration and information broadcasting in the brain, and recent studies have begun to introduce efficient deep learning implementations of GW. Here, we evaluate the capabilities of an RL system combining GW with a world model. We compare our GW-Dreamer with various versions of the standard PPO and the original Dreamer algorithms. We show that performing the dreaming process (i.e., mental simulation) inside the GW latent space allows for training with fewer environment steps. As an additional emergent property, the resulting model (but not its comparison baselines) displays strong robustness to the absence of one of its observation modalities (images or simulation attributes). We conclude that the combination of GW with World Models holds great potential for improving decision-making in RL agents.


ASP-driven User-interaction with Clinguin

arXiv.org Artificial Intelligence

The growing popularity of Answer Set Programming (ASP; [13]) in both academia and industry necessitates the development of user-friendly graphical interfaces to cater to end users. This is especially critical for interactive applications where users engage in iterative feedback loops with ASP systems. Examples include timetabling or product configuration tools. This leads to challenges in frontend development and requires skills in areas beyond ASP development. In addition, custom solutions have a limited reach, as they cannot be easily adapted. Clinguin addresses this challenge and streamlines User Interface (UI) development for ASP developers by letting them build interactive prototypes directly in ASP, eliminating the need for separate frontend languages. To this end, clinguin uses a few dedicated predicates to define UIs and the treatment of user-triggered events.


Abduction of Domain Relationships from Data for VQA

arXiv.org Artificial Intelligence

Visual Question Answering (VQA) is an AI task designed to reason about images. Commonly, the image is transformed into a "scene graph" that enables the deployment of more formal reasoning tools. For example, in recent work, both the scene graph and associated query were represented as an ASP Program [2, 1]; however, notably the scene graph itself only contains information about the scene, but lacks commonsense knowledge - in particular, knowledge about the domains of attributes identified by the scene. Existing work to address this shortcoming relies on leveraging large commonsense knowledge graphs for obtaining domain knowledge [5, 6, 7]. However, such approaches require the ability to accurately align the language of the knowledge graph with the language of the scene graph. Further, for some applications, this does not guarantee that the aligned knowledge graph will necessarily improve VQA performance (e.g., if domain knowledge relevant to the queries is not possessed in the knowledge graph). In this paper, we provide an orthogonal and complementary approach that leverages logical representations of the scene graph and query to abduce domain relationships that can improve query answering performance. We frame the abduction problem and provide a simple algorithm that provides a valid solution. We also provide an implementation and show on a standard dataset that we can improve question answering accuracy from 59.98% to 81.01%, and provide comparable results with few historical examples.