AITopics | Hengel, Anton van den

Collaborating Authors

Hengel, Anton van den

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Analytic DAG Constraints for Differentiable DAG Learning

Zhang, Zhen, Ng, Ignavier, Gong, Dong, Liu, Yuhang, Gong, Mingming, Huang, Biwei, Zhang, Kun, Hengel, Anton van den, Shi, Javen Qinfeng

arXiv.org Artificial IntelligenceMar-24-2025

Recovering the underlying Directed Acyclic Graph (DAG) structures from observational data presents a formidable challenge, partly due to the combinatorial nature of the DAG-constrained optimization problem. Recently, researchers have identified gradient vanishing as one of the primary obstacles in differentiable DAG learning and have proposed several DAG constraints to mitigate this issue. By developing the necessary theory to establish a connection between analytic functions and DAG constraints, we demonstrate that analytic functions from the set $\{f(x) = c_0 + \sum_{i=1}^{\infty}c_ix^i | \forall i > 0, c_i > 0; r = \lim_{i\rightarrow \infty}c_{i}/c_{i+1} > 0\}$ can be employed to formulate effective DAG constraints. Furthermore, we establish that this set of functions is closed under several functional operators, including differentiation, summation, and multiplication. Consequently, these operators can be leveraged to create novel DAG constraints based on existing ones. Using these properties, we design a series of DAG constraints and develop an efficient algorithm to evaluate them. Experiments in various settings demonstrate that our DAG constraints outperform previous state-of-the-art comparators. Our implementation is available at https://github.com/zzhang1987/AnalyticDAGLearning.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2503.19218

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Interactive Medical Image Analysis with Concept-based Similarity Reasoning

Huy, Ta Duc, Tran, Sen Kim, Nguyen, Phan, Tran, Nguyen Hoang, Sam, Tran Bao, Hengel, Anton van den, Liao, Zhibin, Verjans, Johan W., To, Minh-Son, Phan, Vu Minh Hieu

arXiv.org Artificial IntelligenceMar-11-2025

The ability to interpret and intervene model decisions is important for the adoption of computer-aided diagnosis methods in clinical workflows. Recent concept-based methods link the model predictions with interpretable concepts and modify their activation scores to interact with the model. However, these concepts are at the image level, which hinders the model from pinpointing the exact patches the concepts are activated. Alternatively, prototype-based methods learn representations from training image patches and compare these with test image patches, using the similarity scores for final class prediction. However, interpreting the underlying concepts of these patches can be challenging and often necessitates post-hoc guesswork. To address this issue, this paper introduces the novel Concept-based Similarity Reasoning network (CSR), which offers (i) patch-level prototype with intrinsic concept interpretation, and (ii) spatial interactivity. First, the proposed CSR provides localized explanation by grounding prototypes of each concept on image regions. Second, our model introduces novel spatial-level interaction, allowing doctors to engage directly with specific image areas, making it an intuitive and transparent tool for medical imaging. CSR improves upon prior state-of-the-art interpretable methods by up to 4.5\% across three biomedical datasets. Our code is released at https://github.com/tadeephuy/InteractCSR.

artificial intelligence, image understanding, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2503.06873

Genre: Research Report (0.84)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.35)

Add feedback

I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?

Liu, Yuhang, Gong, Dong, Gao, Erdun, Zhang, Zhen, Huang, Biwei, Gong, Mingming, Hengel, Anton van den, Shi, Javen Qinfeng

arXiv.org Artificial IntelligenceMar-11-2025

The remarkable achievements of large language models (LLMs) have led many to conclude that they exhibit a form of intelligence. This is as opposed to explanations of their capabilities based on their ability to perform relatively simple manipulations of vast volumes of data. To illuminate the distinction between these explanations, we introduce a novel generative model that generates tokens on the basis of human interpretable concepts represented as latent discrete variables. Under mild conditions, even when the mapping from the latent space to the observed space is non-invertible, we establish an identifiability result: the representations learned by LLMs through next-token prediction can be approximately modeled as the logarithm of the posterior probabilities of these latent discrete concepts, up to an invertible linear transformation. This theoretical finding not only provides evidence that LLMs capture underlying generative factors, but also strongly reinforces the linear representation hypothesis, which posits that LLMs learn linear representations of human-interpretable concepts. Empirically, we validate our theoretical results through evaluations on both simulation data and the Pythia, Llama, and DeepSeek model families.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.0898

Country:

Europe (0.28)
North America > United States > California (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

RandLoRA: Full-rank parameter-efficient fine-tuning of large models

Albert, Paul, Zhang, Frederic Z., Saratchandran, Hemanth, Rodriguez-Opazo, Cristian, Hengel, Anton van den, Abbasnejad, Ehsan

arXiv.org Artificial IntelligenceFeb-2-2025

Low-Rank Adaptation (LoRA) and its variants have shown impressive results in reducing the number of trainable parameters and memory requirements of large transformer networks while maintaining fine-tuning performance. This raises a critical question: when a performance gap between LoRA and standard fine-tuning is observed, is it due to the reduced number of trainable parameters or the rank deficiency? This paper aims to answer this question by introducing RandLoRA, a parameter-efficient method that performs full-rank updates using a learned linear combinations of low-rank, non-trainable random matrices. Our method limits the number of trainable parameters by restricting optimization to diagonal scaling matrices applied to the fixed random matrices. This allows us to effectively overcome the low-rank limitations while maintaining parameter and memory efficiency during training. Through extensive experimentation across vision, language, and vision-language benchmarks, we systematically evaluate the limitations of LoRA and existing random basis methods. Our findings reveal that full-rank updates are beneficial across vision and language tasks individually, and even more so for vision-language tasks, where RandLoRA significantly reduces-- and sometimes eliminates--the performance gap between standard fine-tuning and LoRA, demonstrating its efficacy. Large pre-trained models that leverage broad data have demonstrated significantly improved generalization capabilities and remarkable versatility across diverse tasks. However, the resultant high parameter count also leads to a significant increase in the computational resources required to finetune such models on downstream tasks. To tackle this issue, parameter-efficient fine-tuning (PEFT) approaches such as low-rank adaptation (LoRA) (Hu et al., 2022), draw inspiration from the low intrinsic dimensionality of pre-trained models (Li et al., 2018; Aghajanyan et al., 2021) and characterize the weight updates as the product of two low-rank matrices, substantially reducing the number of trainable parameters and memory requirements during training. This formulation leads to an adaptable number of trainable parameters, as one modifies the rank of the matrices, providing great flexibility under various resource constraints.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.00987

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report > New Finding (0.87)

Industry: Government > Regional Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss

Shu, Yangyang, Xu, Haiming, Zhou, Ziqin, Hengel, Anton van den, Liu, Lingqiao

arXiv.org Artificial IntelligenceJul-5-2024

Automatically generating symbolic music-music scores tailored to specific human needs-can be highly beneficial for musicians and enthusiasts. Recent studies have shown promising results using extensive datasets and advanced transformer architectures. However, these state-of-the-art models generally offer only basic control over aspects like tempo and style for the entire composition, lacking the ability to manage finer details, such as control at the level of individual bars. While fine-tuning a pre-trained symbolic music generation model might seem like a straightforward method for achieving this finer control, our research indicates challenges in this approach. The model often fails to respond adequately to new, fine-grained bar-level control signals. To address this, we propose two innovative solutions. First, we introduce a pre-training task designed to link control signals directly with corresponding musical tokens, which helps in achieving a more effective initialization for subsequent fine-tuning. Second, we implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts. Together, these techniques significantly enhance our ability to control music generation at the bar level, showing a 13.06\% improvement over conventional methods. Our subjective evaluations also confirm that this enhanced control does not compromise the musical quality of the original pre-trained generative model.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2407.04331

Country: North America > United States (0.14)

Genre:

Research Report > Promising Solution (0.68)
Research Report > New Finding (0.48)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback

Knowledge Composition using Task Vectors with Learned Anisotropic Scaling

Zhang, Frederic Z., Albert, Paul, Rodriguez-Opazo, Cristian, Hengel, Anton van den, Abbasnejad, Ehsan

arXiv.org Artificial IntelligenceJul-3-2024

Pre-trained models produce strong generic representations that can be adapted via fine-tuning. The learned weight difference relative to the pre-trained model, known as a task vector, characterises the direction and stride of fine-tuning. The significance of task vectors is such that simple arithmetic operations on them can be used to combine diverse representations from different domains. This paper builds on these properties of task vectors and aims to answer (1) whether components of task vectors, particularly parameter blocks, exhibit similar characteristics, and (2) how such blocks can be used to enhance knowledge composition and transfer. To this end, we introduce aTLAS, an algorithm that linearly combines parameter blocks with different learned coefficients, resulting in anisotropic scaling at the task vector level. We show that such linear combinations explicitly exploit the low intrinsic dimensionality of pre-trained models, with only a few coefficients being the learnable parameters. Furthermore, composition of parameter blocks leverages the already learned representations, thereby reducing the dependency on large amounts of data. We demonstrate the effectiveness of our method in task arithmetic, few-shot recognition and test-time adaptation, with supervised or unsupervised objectives. In particular, we show that (1) learned anisotropic scaling allows task vectors to be more disentangled, causing less interference in composition; (2) task vector composition excels with scarce or no labeled data and is less prone to domain shift, thus leading to better generalisability; (3) mixing the most informative parameter blocks across different task vectors prior to training can reduce the memory footprint and improve the flexibility of knowledge transfer. Moreover, we show the potential of aTLAS as a PEFT method, particularly with less data, and demonstrate that its scalibility.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2407.0288

Country:

Europe (0.92)
North America > United States > California > Santa Clara County (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling

Rodriguez-Opazo, Cristian, Abbasnejad, Ehsan, Teney, Damien, Marrese-Taylor, Edison, Damirchi, Hamed, Hengel, Anton van den

arXiv.org Artificial IntelligenceMay-27-2024

Contrastive Language-Image Pretraining (CLIP) stands out as a prominent method for image representation learning. Various architectures, from vision transformers (ViTs) to convolutional networks (ResNets) have been trained with CLIP to serve as general solutions to diverse vision tasks. This paper explores the differences across various CLIP-trained vision backbones. Despite using the same data and training objective, we find that these architectures have notably different representations, different classification performance across datasets, and different robustness properties to certain types of image perturbations. Our findings indicate a remarkable possible synergy across backbones by leveraging their respective strengths. In principle, classification accuracy could be improved by over 40 percentage with an informed selection of the optimal backbone per test example. Using this insight, we develop a straightforward yet powerful approach to adaptively ensemble multiple backbones. The approach uses as few as one labeled example per class to tune the adaptive combination of backbones. On a large collection of datasets, the method achieves a remarkable increase in accuracy of up to 39.1% over the best single backbone, well beyond traditional ensembles.

backbone, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2405.17139

Country:

North America > United States (0.27)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Identifiable Latent Neural Causal Models

Liu, Yuhang, Zhang, Zhen, Gong, Dong, Gong, Mingming, Huang, Biwei, Hengel, Anton van den, Zhang, Kun, Shi, Javen Qinfeng

arXiv.org Machine LearningMar-23-2024

Causal representation learning seeks to uncover latent, high-level causal representations from low-level observed data. It is particularly good at predictions under unseen distribution shifts, because these shifts can generally be interpreted as consequences of interventions. Hence leveraging {seen} distribution shifts becomes a natural strategy to help identifying causal representations, which in turn benefits predictions where distributions are previously {unseen}. Determining the types (or conditions) of such distribution shifts that do contribute to the identifiability of causal representations is critical. This work establishes a {sufficient} and {necessary} condition characterizing the types of distribution shifts for identifiability in the context of latent additive noise models. Furthermore, we present partial identifiability results when only a portion of distribution shifts meets the condition. In addition, we extend our findings to latent post-nonlinear causal models. We translate our findings into a practical algorithm, allowing for the acquisition of reliable latent causal representations. Our algorithm, guided by our underlying theory, has demonstrated outstanding performance across a diverse range of synthetic and real-world datasets. The empirical observations align closely with the theoretical findings, affirming the robustness and effectiveness of our approach.

artificial intelligence, latent causal, machine learning, (17 more...)

arXiv.org Machine Learning

2403.15711

Country:

Oceania > Australia (0.46)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning

McDonnell, Mark D., Gong, Dong, Abbasnejad, Ehsan, Hengel, Anton van den

arXiv.org Artificial IntelligenceMar-12-2024

Continual learning requires a model to adapt to ongoing changes in the data distribution, and often to the set of tasks to be performed. It is rare, however, that the data and task changes are completely unpredictable. Given a description of an overarching goal or data theme, which we call a realm, humans can often guess what concepts are associated with it. We show here that the combination of a large language model and an image generation model can similarly provide useful premonitions as to how a continual learning challenge might develop over time. We use the large language model to generate text descriptions of semantically related classes that might potentially appear in the data stream in future. These descriptions are then rendered using Stable Diffusion to generate new labelled image samples. The resulting synthetic dataset is employed for supervised pre-training, but is discarded prior to commencing continual learning, along with the pre-training classification head. We find that the backbone of our pre-trained networks can learn representations useful for the downstream continual learning problem, thus becoming a valuable input to any existing continual learning method. Although there are complexities arising from the domain gap between real and synthetic images, we show that pre-training models in this manner improves multiple Class Incremenal Learning (CIL) methods on fine-grained image classification benchmarks.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2403.07356

Country:

Asia > Middle East > Israel (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Revealing Multimodal Contrastive Representation Learning through Latent Partial Causal Models

Liu, Yuhang, Zhang, Zhen, Gong, Dong, Huang, Biwei, Gong, Mingming, Hengel, Anton van den, Zhang, Kun, Shi, Javen Qinfeng

arXiv.org Artificial IntelligenceFeb-9-2024

One promising methods have proven successful across a range strategy in this context is to use data from one modality, e.g., of domains, partly due to their ability to generate text data, as a supervision signal in the interpretation of another, meaningful shared representations of complex e.g., image data (Mori et al., 1999; Wang et al., 2009; phenomena. To enhance the depth of analysis Ramanathan et al., 2013; He & Peng, 2017; Radford et al., and understanding of these acquired representations, 2021). The primary approach for achieving this is known we introduce a unified causal model specifically as multimodal contrastive representation learning, which designed for multimodal data. By examining focuses on optimizing a symmetric contrastive loss (Zhang this model, we show that multimodal contrastive et al., 2022; Radford et al., 2021), e.g., a symmetric adaptation representation learning excels at identifying latent of the standard contrastive loss (Wu et al., 2018; Tian coupled variables within the proposed unified et al., 2020; He et al., 2020; Chen et al., 2020). The learned model, up to linear or permutation transformations representations, guided by the symmetric contrastive loss, resulting from different assumptions. Our have been applied in a variety of applications, including findings illuminate the potential of pre-trained zero/few-shot learning (Radford et al., 2021; Zhou et al., multimodal models, e.g., CLIP, in learning disentangled 2022a), domain generalization (Zhou et al., 2022a;b), and representations through a surprisingly robustness to adversarial examples (Ban & Dong, 2022).

artificial intelligence, machine learning, representation, (12 more...)

arXiv.org Artificial Intelligence

2402.06223

Country:

Oceania > Australia (0.46)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.62)

Add feedback