Goto

Collaborating Authors

 Overview


Analyzable Chain-of-Musical-Thought Prompting for High-Fidelity Music Generation

arXiv.org Artificial Intelligence

Autoregressive (AR) models have demonstrated impressive capabilities in generating high-fidelity music. However, the conventional next-token prediction paradigm in AR models does not align with the human creative process in music composition, potentially compromising the musicality of generated samples. To overcome this limitation, we introduce MusiCoT, a novel chain-of-thought (CoT) prompting technique tailored for music generation. MusiCoT empowers the AR model to first outline an overall music structure before generating audio tokens, thereby enhancing the coherence and creativity of the resulting compositions. By leveraging the contrastive language-audio pretraining (CLAP) model, we establish a chain of "musical thoughts", making MusiCoT scalable and independent of human-labeled data, in contrast to conventional CoT methods. Moreover, MusiCoT allows for in-depth analysis of music structure, such as instrumental arrangements, and supports music referencing -- accepting variable-length audio inputs as optional style references. This innovative approach effectively addresses copying issues, positioning MusiCoT as a vital practical method for music prompting. Our experimental results indicate that MusiCoT consistently achieves superior performance across both objective and subjective metrics, producing music quality that rivals state-of-the-art generation models. Our samples are available at https://MusiCoT.github.io/.


Continual Learning With Quasi-Newton Methods

arXiv.org Artificial Intelligence

Received 17 February 2025, accepted 5 March 2025, date of publication 13 March 2025, date of current version 21 March 2025. Continual Learning with Quasi-Newton Methods STEVEN VANDER EECKT and HUGO VAN HAMME (Senior, IEEE) Department Electrical Engineering ESAT-PSI, KU Leuven, B-3001 Leuven, Belgium Corresponding author: Steven Vander Eeeckt (e-mail: steven.vandereeckt@esat.kuleuven.be).ABSTRACT Catastrophic forgetting remains a major challenge when neural networks learn tasks sequentially. Elastic Weight Consolidation (EWC) attempts to address this problem by introducing a Bayesian-inspired regularization loss to preserve knowledge of previously learned tasks. However, EWC relies on a Laplace approximation where the Hessian is simplified to the diagonal of the Fisher information matrix, assuming uncorrelated model parameters. This overly simplistic assumption often leads to poor Hessian estimates, limiting its effectiveness. To overcome this limitation, we introduce Continual Learning with Sampled Quasi-Newton (CSQN), which leverages Quasi-Newton methods to compute more accurate Hessian approximations. Experimental results across four benchmarks demonstrate that CSQN consistently outperforms EWC and other state-of-the-art baselines, including rehearsal-based methods. CSQN reduces EWC's forgetting by 50% and improves its performance by 8% on average. Notably, CSQN achieves superior results on three out of four benchmarks, including the most challenging scenarios, highlighting its potential as a robust solution for continual learning.INDEX TERMS artificial neural networks, catastrophic forgetting, continual learning, quasi-Newton methods I. INTRODUCTION Since the 2010s, Artificial Neural Networks (ANNs) have been able to match or even surpass human performance on a wide variety of tasks. However, when presented with a set of tasks to be learned sequentially--a setting referred to as Continual Learning (CL)--ANNs suffer from catastrophic forgetting [1]. Unlike humans, ANNs struggle to retain previously learned knowledge when extending their knowledge. Naively adapting an ANN to a new task generally leads to a deterioration in the network's performance on previous tasks. Many CL methods have been proposed to alleviate catastrophic forgetting. One of the most well-known is Elastic Weight Consolidation (EWC) [2], which approaches CL from a Bayesian perspective. After training on a task, EWC uses Laplace approximation [3] to estimate a posterior distribution over the model parameters for that task. When training on the next task, this posterior is used via a regularization loss to prevent the model from catastrophically forgetting the previous task. To estimate the Hessian, which is needed in the Laplace approximation to measure the (un)certainty of the model parameters, EWC uses the Fisher Information Matrix (FIM). Furthermore, to simplify the computation, EWC assumes that the FIM is approximately diagonal.


FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models

arXiv.org Artificial Intelligence

Recent advancements in Large Language Models (LLMs) have significantly enhanced interactions between users and models. These advancements concurrently underscore the need for rigorous safety evaluations due to the manifestation of social biases, which can lead to harmful societal impacts. Despite these concerns, existing benchmarks may overlook the intrinsic weaknesses of LLMs, which can generate biased responses even with simple adversarial instructions. To address this critical gap, we introduce a new benchmark, Fairness Benchmark in LLM under Extreme Scenarios (FLEX), designed to test whether LLMs can sustain fairness even when exposed to prompts constructed to induce bias. To thoroughly evaluate the robustness of LLMs, we integrate prompts that amplify potential biases into the fairness assessment. Comparative experiments between FLEX and existing benchmarks demonstrate that traditional evaluations may underestimate the inherent risks in models. This highlights the need for more stringent LLM evaluation benchmarks to guarantee safety and fairness.


Taxonomy Inference for Tabular Data Using Large Language Models

arXiv.org Artificial Intelligence

Taxonomy inference for tabular data is a critical task of schema inference, aiming at discovering entity types (i.e., concepts) of the tables and building their hierarchy. It can play an important role in data management, data exploration, ontology learning, and many data-centric applications. Existing schema inference systems focus more on XML, JSON or RDF data, and often rely on lexical formats and structures of the data for calculating similarities, with limited exploitation of the semantics of the text across a table. Motivated by recent works on taxonomy completion and construction using Large Language Models (LLMs), this paper presents two LLM-based methods for taxonomy inference for tables: (i) EmTT which em beds columns by fine-tuning with contrastive learning encoder-alone LLMs like BERT and utilises clustering for hierarchy construction, and (ii) GeTT which ge nerates table entity types and their hierarchy by iterative prompting using a decoder-alone LLM like GPT-4. Extensive evaluation on three real-world datasets with six metrics covering different aspects of the output taxonomies has demonstrated that EmTT and GeTT can both produce taxonomies with strong consistency relative to the Ground Truth.


GENIUS: A Generative Framework for Universal Multimodal Search

arXiv.org Artificial Intelligence

Generative retrieval is an emerging approach in information retrieval that generates identifiers (IDs) of target data based on a query, providing an efficient alternative to traditional embedding-based retrieval methods. However, existing models are task-specific and fall short of embedding-based retrieval in performance. This paper proposes GENIUS, a universal generative retrieval framework supporting diverse tasks across multiple modalities and domains. At its core, GENIUS introduces modality-decoupled semantic quantization, transforming multimodal data into discrete IDs encoding both modality and semantics. Moreover, to enhance generalization, we propose a query augmentation that interpolates between a query and its target, allowing GENIUS to adapt to varied query forms. Evaluated on the M-BEIR benchmark, it surpasses prior generative methods by a clear margin. Unlike embedding-based retrieval, GENIUS consistently maintains high retrieval speed across database size, with competitive performance across multiple benchmarks. With additional re-ranking, GENIUS often achieves results close to those of embedding-based methods while preserving efficiency.


An Overview of Low-Rank Structures in the Training and Adaptation of Large Models

arXiv.org Machine Learning

The rise of deep learning has revolutionized data processing and prediction in signal processing and machine learning, yet the substantial computational demands of training and deploying modern large-scale deep models present significant challenges, including high computational costs and energy consumption. Recent research has uncovered a widespread phenomenon in deep networks: the emergence of low-rank structures in weight matrices and learned representations during training. These implicit low-dimensional patterns provide valuable insights for improving the efficiency of training and fine-tuning large-scale models. Practical techniques inspired by this phenomenon--such as low-rank adaptation (LoRA) and training--enable significant reductions in computational cost while preserving model performance. In this paper, we present a comprehensive review of recent advances in exploiting low-rank structures for deep learning and shed light on their mathematical foundations. Mathematically, we present two complementary perspectives on understanding the low-rankness in deep networks: (i) the emergence of low-rank structures throughout the whole optimization dynamics of gradient and (ii) the implicit regularization effects that induce such low-rank structures at convergence. From a practical standpoint, studying the low-rank learning dynamics of gradient descent offers a mathematical foundation for understanding the effectiveness of LoRA in fine-tuning large-scale models and inspires parameter-efficient low-rank training strategies. Furthermore, the implicit low-rank regularization effect helps explain the success of various masked training approaches in deep neural networks, ranging from dropout to masked self-supervised learning. In summary, this tutorial provides researchers and practitioners with a deeper understanding of low-rank structures in the training and adaptation of large-scale deep learning models, highlighting both the theoretical foundations and practical applications of low-rank methods, and outlining promising directions for future research.


Membership Inference Attacks on Large-Scale Models: A Survey

arXiv.org Artificial Intelligence

The adoption of the Large Language Model (LLM) has accelerated dramatically since the ChatGPT from OpenAI went online in November 2022. Recent advances in Large Multimodal Models (LMMs), which process diverse data types and enable interaction through various channels, have expanded beyond the text-to-text limitations of early LLMs, attracting significant and concurrent attention from both researchers and industry. While LLMs and LMMs are starting to spread widely, concerns about their privacy risks are increasing as well. Membership Inference Attacks (MIAs), techniques used to determine whether a particular data point was part of a model's training set, serve as a key metric for assessing the privacy vulnerabilities of machine learning models. Hu et al. show that various machine learning algorithms are vulnerable to MIA. Despite extensive studies on MIAs in traditional models, there remains a lack of systematic surveys addressing their effectiveness and implications in modern large-scale models like LLMs and LMMs. In this paper, we systematically reviewed recent studies of MIA against LLMs and LMMs. We analyzed and categorized each attack based on their methodology and scenario and discussed the limitations in existing research. Additionally, we examine privacy concerns associated with the fine-tuning process. Finally, we provided some suggestions for future research in this direction.


Dom, cars don't fly! -- Or do they? In-Air Vehicle Maneuver for High-Speed Off-Road Navigation

arXiv.org Artificial Intelligence

-- When pushing the speed limit for aggressive off-road navigation on uneven terrain, it is inevitable that vehicles may become airborne from time to time. During time-sensitive tasks, being able to fly over challenging terrain can also save time, instead of cautiously circumventing or slowly negotiating through. However, most off-road autonomy systems operate under the assumption that the vehicles are always on the ground and therefore limit operational speed. In this paper, we present a novel approach for in-air vehicle maneuver during high-speed off-road navigation. Based on a hybrid forward kinodynamic model using both physics principles and machine learning, our fixed-horizon, sampling-based motion planner ensures accurate vehicle landing poses and their derivatives within a short airborne time window using vehicle throttle and steering commands. We test our approach in extensive in-air experiments both indoors and outdoors, compare it against an error-driven control method, and demonstrate that precise and timely in-air vehicle maneuver is possible through existing ground vehicle controls. Off-road navigation presents various challenges that sharply contrast those encountered in on-road or indoor scenarios. In unstructured off-road environments, robots must detect and avoid obstacles, evaluate the traversability of varied terrain, and continuously adapt to complex vehicle-terrain interactions. Tackling all these challenges is essential to prevent terminal states that can jeopardize the mission and damage the robot, such as vehicle rollover and getting stuck.


Near-optimal Active Reconstruction

arXiv.org Artificial Intelligence

With the growing practical interest in vision-based tasks for autonomous systems, the need for efficient and complex methods becomes increasingly larger. In the rush to develop new methods with the aim to outperform the current state of the art, an analysis of the underlying theory is often neglected and simply replaced with empirical evaluations in simulated or real-world experiments. While such methods might yield favorable performance in practice, they are often less well understood, which prevents them from being applied in safety-critical systems. The goal of this work is to design an algorithm for the Next Best View (NBV) problem in the context of active object reconstruction, for which we can provide qualitative performance guarantees with respect to true optimality. To the best of our knowledge, no previous work in this field addresses such an analysis for their proposed methods. Based on existing work on Gaussian process optimization, we rigorously derive sublinear bounds for the cumulative regret of our algorithm, which guarantees near-optimality. Complementing this, we evaluate the performance of our algorithm empirically within our simulation framework. We further provide additional insights through an extensive study of potential objective functions and analyze the differences to the results of related work.


SE-GNN: Seed Expanded-Aware Graph Neural Network with Iterative Optimization for Semi-supervised Entity Alignment

arXiv.org Artificial Intelligence

Entity alignment aims to use pre-aligned seed pairs to find other equivalent entities from different knowledge graphs (KGs) and is widely used in graph fusion-related fields. However, as the scale of KGs increases, manually annotating pre-aligned seed pairs becomes difficult. Existing research utilizes entity embeddings obtained by aggregating single structural information to identify potential seed pairs, thus reducing the reliance on pre-aligned seed pairs. However, due to the structural heterogeneity of KGs, the quality of potential seed pairs obtained using only a single structural information is not ideal. In addition, although existing research improves the quality of potential seed pairs through semi-supervised iteration, they underestimate the impact of embedding distortion produced by noisy seed pairs on the alignment effect. In order to solve the above problems, we propose a seed expanded-aware graph neural network with iterative optimization for semi-supervised entity alignment, named SE-GNN. First, we utilize the semantic attributes and structural features of entities, combined with a conditional filtering mechanism, to obtain high-quality initial potential seed pairs. Next, we designed a local and global awareness mechanism. It introduces initial potential seed pairs and combines local and global information to obtain a more comprehensive entity embedding representation, which alleviates the impact of KGs structural heterogeneity and lays the foundation for the optimization of initial potential seed pairs. Then, we designed the threshold nearest neighbor embedding correction strategy. It combines the similarity threshold and the bidirectional nearest neighbor method as a filtering mechanism to select iterative potential seed pairs and also uses an embedding correction strategy to eliminate the embedding distortion.