Goto

Collaborating Authors

 Transfer Learning


Review for NeurIPS paper: Online Multitask Learning with Long-Term Memory

Neural Information Processing Systems

The paper concerns a multi-task version of a well-known online learning problem of switching with long-term memory. It considers two two types of the hypothesis spaces: a finite space, and an RKHS space of functions. In both cases, the authors first provide a regret bound for an exponential-time algorithm based on a reduction to a single-task problem using the idea of „meta-experts". These algorithms are then followed by their efficient (polynomial-time) versions, which achieve the same bound up to a small overhead. The paper received a very mixed set of scores, ranging from „reject" to „to 15% of accepted papers". The main strength of the paper is a novel, efficient long-term memory algorithms for a multi-task version of the prediction with expert advice problem, as well as kernel linear classification (with hinge loss, but written in terms of 0/1 loss by only considering interpolants on an instance sequence). In particular, the second part seems a significant extension of the „switching with long-term memory" framework to an infinite hypothesis space (even leaving the multitask extension aside).


Provable Sample-Efficient Transfer Learning Conditional Diffusion Models via Representation Learning

arXiv.org Machine Learning

While conditional diffusion models have achieved remarkable success in various applications, they require abundant data to train from scratch, which is often infeasible in practice. To address this issue, transfer learning has emerged as an essential paradigm in small data regimes. Despite its empirical success, the theoretical underpinnings of transfer learning conditional diffusion models remain unexplored. In this paper, we take the first step towards understanding the sample efficiency of transfer learning conditional diffusion models through the lens of representation learning. Inspired by practical training procedures, we assume that there exists a low-dimensional representation of conditions shared across all tasks. Our analysis shows that with a well-learned representation from source tasks, the samplecomplexity of target tasks can be reduced substantially. In addition, we investigate the practical implications of our theoretical results in several real-world applications of conditional diffusion models. Numerical experiments are also conducted to verify our results.


A Theoretical Framework for Data Efficient Multi-Source Transfer Learning Based on Cram\'er-Rao Bound

arXiv.org Artificial Intelligence

Multi-source transfer learning provides an effective solution to data scarcity in real-world supervised learning scenarios by leveraging multiple source tasks. In this field, existing works typically use all available samples from sources in training, which constrains their training efficiency and may lead to suboptimal results. To address this, we propose a theoretical framework that answers the question: what is the optimal quantity of source samples needed from each source task to jointly train the target model? Specifically, we introduce a generalization error measure that aligns with cross-entropy loss, and minimize it based on the Cram\'er-Rao Bound to determine the optimal transfer quantity for each source task. Additionally, we develop an architecture-agnostic and data-efficient algorithm OTQMS to implement our theoretical results for training deep multi-source transfer learning models. Experimental studies on diverse architectures and two real-world benchmark datasets show that our proposed algorithm significantly outperforms state-of-the-art approaches in both accuracy and data efficiency. The code and supplementary materials are available in https://anonymous.4open.science/r/Materials.


Towards Unified Music Emotion Recognition across Dimensional and Categorical Models

arXiv.org Artificial Intelligence

--One of the most significant challenges in Music Emotion Recognition (MER) comes from the fact that emotion labels can be heterogeneous across datasets with regard to the emotion representation, including categorical (e.g., happy, sad) versus dimensional labels (e.g., valence-arousal). In this paper, we present a unified multitask learning framework that combines these two types of labels and is thus able to be trained on multiple datasets. This framework uses an effective input representation that combines musical features (i.e., key and chords) and MERT embeddings. Moreover, knowledge distillation is employed to transfer the knowledge of teacher models trained on individual datasets to a student model, enhancing its ability to generalize across multiple tasks. T o validate our proposed framework, we conducted extensive experiments on a variety of datasets, including MTG-Jamendo, DEAM, PMEmo, and EmoMusic. According to our experimental results, the inclusion of musical features, multitask learning, and knowledge distillation significantly enhances performance. In particular, our model outperforms the state-of-the-art models on the MTG-Jamendo dataset. Our work makes a significant contribution to MER by allowing the combination of categorical and dimensional emotion labels in one unified framework, thus enabling training across datasets. I NTRODUCTION Music plays an essential role in influencing human emotions [36]. In the past decades, numerous Music Emotion Recognition (MER) models been developed.


Transfer Learning for Covert Speech Classification Using EEG Hilbert Envelope and Temporal Fine Structure

arXiv.org Artificial Intelligence

Brain-Computer Interfaces (BCIs) can decode imagined speech from neural activity. However, these systems typically require extensive training sessions where participants imaginedly repeat words, leading to mental fatigue and difficulties identifying the onset of words, especially when imagining sequences of words. This paper addresses these challenges by transferring a classifier trained in overt speech data to covert speech classification. We used electroencephalogram (EEG) features derived from the Hilbert envelope and temporal fine structure, and used them to train a bidirectional long-short-term memory (BiLSTM) model for classification. Our method reduces the burden of extensive training and achieves state-of-the-art classification accuracy: 86.44% for overt speech and 79.82% for covert speech using the overt speech classifier.


Review for NeurIPS paper: Co-Tuning for Transfer Learning

Neural Information Processing Systems

I am changing my score up a bit General comments: 1) The process seems like a two step (as opposed learning end to end) - first derive the connection of source and target labels (train a separate network to do this), and then using this connection, train a target model while requiring the output (target labels) to conform to this derived connection. Are both steps happening on the same target dataset? Not clear whether it works when number of target classes is larger than number of source classes 3) Authors state that their setting is when source data is not available, but actually their calibration requires the source data. Alternatively, neural net g should be able to learn the calibration in theory, as long as enough complexity is used . Experiments: 1) A reasonable baseline would just be source model (full) one or several new layers for the target.


Review for NeurIPS paper: Co-Tuning for Transfer Learning

Neural Information Processing Systems

This paper presents a simple method which seems to work well in practice. Some reviewers would have preferred to see more discussion on limitations of the method. However, the contribution was deemed clear enough without this discussion, because of the intuitive, novel take on the popular fine-tuning task, as well as the strong performance demonstrated on popular vision tasks. Overall, the paper is expected to be of interest to the community.


Reviews: Transfusion: Understanding Transfer Learning for Medical Imaging

Neural Information Processing Systems

The authors investigate the current transfer learning scheme for deep learning applications to medical imaging. They thoroughly assess and compare the performance of standard architectures that originally designed for the natural image classification tasks with their in-house-developed lightweight and simple models on medical imaging tasks. In this concern, the study demonstrates that latter models can perform comparably with computationally expensive state-of-the-art models. The second finding of the study is that transfer learning does not have a significant benefit for performance. The authors validate the claim by comparing the latent representations of the networks learned with the pretrained weights and training from scratch, and by measuring representational similarity with canonical correlation analysis (CCA).


Reviews: Transfusion: Understanding Transfer Learning for Medical Imaging

Neural Information Processing Systems

This work studied the merits of current transfer learning in medical imaging. It represents a strong empirical analysis of current state-of-the-art approaches, and leads to some somewhat surprising conclusions. Overall the reviewers agreed the work was strong and merited accepted.