AITopics | pre-train

Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors

Neural Information Processing SystemsDec-25-2025, 00:46:47 GMT

Deep learning is increasingly moving towards a transfer learning paradigm whereby large foundation models are fine-tuned on downstream tasks, starting from an initialization learned on the source task. But an initialization contains relatively little information about the source task, and does not reflect the belief that our knowledge of the source task should affect the locations and shape of optima on the downstream task.Instead, we show that we can learn highly informative posteriors from the source task, through supervised or self-supervised approaches, which then serve as the basis for priors that modify the whole loss surface on the downstream task. This simple modular approach enables significant performance gains and more data-efficient learning on a variety of downstream classification and segmentation tasks, serving as a drop-in replacement for standard pre-training strategies. These highly informative priors also can be saved for future use, similar to pre-trained weights, and stand in contrast to the zero-mean isotropic uninformative priors that are typically used in Bayesian deep learning.

easy bayesian transfer learning, name change, source task, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors

Neural Information Processing SystemsMay-27-2025, 19:56:27 GMT

Deep learning is increasingly moving towards a transfer learning paradigm whereby large foundation models are fine-tuned on downstream tasks, starting from an initialization learned on the source task. But an initialization contains relatively little information about the source task, and does not reflect the belief that our knowledge of the source task should affect the locations and shape of optima on the downstream task.Instead, we show that we can learn highly informative posteriors from the source task, through supervised or self-supervised approaches, which then serve as the basis for priors that modify the whole loss surface on the downstream task. This simple modular approach enables significant performance gains and more data-efficient learning on a variety of downstream classification and segmentation tasks, serving as a drop-in replacement for standard pre-training strategies. These highly informative priors also can be saved for future use, similar to pre-trained weights, and stand in contrast to the zero-mean isotropic uninformative priors that are typically used in Bayesian deep learning.

artificial intelligence, easy bayesian transfer learning, machine learning, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors

Neural Information Processing SystemsJan-18-2025, 14:39:57 GMT

Deep learning is increasingly moving towards a transfer learning paradigm whereby large foundation models are fine-tuned on downstream tasks, starting from an initialization learned on the source task. But an initialization contains relatively little information about the source task, and does not reflect the belief that our knowledge of the source task should affect the locations and shape of optima on the downstream task.Instead, we show that we can learn highly informative posteriors from the source task, through supervised or self-supervised approaches, which then serve as the basis for priors that modify the whole loss surface on the downstream task. This simple modular approach enables significant performance gains and more data-efficient learning on a variety of downstream classification and segmentation tasks, serving as a drop-in replacement for standard pre-training strategies. These highly informative priors also can be saved for future use, similar to pre-trained weights, and stand in contrast to the zero-mean isotropic uninformative priors that are typically used in Bayesian deep learning.

downstream task, easy bayesian transfer learning, source task, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Pre-train, Align, and Disentangle: Empowering Sequential Recommendation with Large Language Models

Wang, Yuhao, Pan, Junwei, Zhao, Xiangyu, Jia, Pengyue, Wang, Wanyu, Wang, Yuan, Liu, Yue, Liu, Dapeng, Jiang, Jie

arXiv.org Artificial IntelligenceDec-5-2024

Sequential recommendation (SR) aims to model the sequential dependencies in users' historical interactions to better capture their evolving interests. However, existing SR approaches primarily rely on collaborative data, which leads to limitations such as the cold-start problem and sub-optimal performance. Meanwhile, despite the success of large language models (LLMs), their application in industrial recommender systems is hindered by high inference latency, inability to capture all distribution statistics, and catastrophic forgetting. To this end, we propose a novel Pre-train, Align, and Disentangle (PAD) paradigm to empower recommendation models with LLMs. Specifically, we first pre-train both the SR and LLM models to get collaborative and textual embeddings. Next, a characteristic recommendation-anchored alignment loss is proposed using multi-kernel maximum mean discrepancy with Gaussian kernels. Finally, a triple-experts architecture, consisting aligned and modality-specific experts with disentangled embeddings, is fine-tuned in a frequency-aware manner. Experiments conducted on three public datasets demonstrate the effectiveness of PAD, showing significant improvements and compatibility with various SR backbone models, especially on cold items. The implementation code and datasets will be publicly available.

alignment loss, proceedings, recommendation, (11 more...)

arXiv.org Artificial Intelligence

2412.04107

Country:

North America > United States > District of Columbia > Washington (0.05)
Asia > China > Hong Kong (0.05)
Asia > China > Guangdong Province > Shenzhen (0.05)
(4 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Learning Successor Features the Simple Way

Chua, Raymond, Ghosh, Arna, Kaplanis, Christos, Richards, Blake A., Precup, Doina

arXiv.org Artificial IntelligenceOct-30-2024

In Deep Reinforcement Learning (RL), it is a challenge to learn representations that do not exhibit catastrophic forgetting or interference in non-stationary environments. Successor Features (SFs) offer a potential solution to this challenge. However, canonical techniques for learning SFs from pixel-level observations often lead to representation collapse, wherein representations degenerate and fail to capture meaningful variations in the data. More recent methods for learning SFs can avoid representation collapse, but they often involve complex losses and multiple learning phases, reducing their efficiency. We introduce a novel, simple method for learning SFs directly from pixels. Our approach uses a combination of a Temporal-difference (TD) loss and a reward prediction loss, which together capture the basic mathematical definition of SFs. We show that our approach matches or outperforms existing SF learning techniques in both 2D (Minigrid), 3D (Miniworld) mazes and Mujoco, for both single and continual learning scenarios. As well, our technique is efficient, and can reach higher levels of performance in less time than other approaches. Our work provides a new, streamlined technique for learning SFs directly from pixel observations, with no pretraining required.

agent, successor feature, successor representation, (13 more...)

arXiv.org Artificial Intelligence

2410.22133

Country:

North America > Canada > Quebec > Montreal (0.14)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.92)

Industry:

Information Technology (0.67)
Education > Educational Setting (0.45)
Health & Medicine > Therapeutic Area > Neurology (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Domain Specific Data Distillation and Multi-modal Embedding Generation

Peddiraju, Sharadind, Rajagopal, Srini

arXiv.org Artificial IntelligenceOct-26-2024

The challenge of creating domain-centric embeddings arises from the abundance of unstructured data and the scarcity of domain-specific structured data. Conventional embedding techniques often rely on either modality, limiting their applicability and efficacy. This paper introduces a novel modeling approach that leverages structured data to filter noise from unstructured data, resulting in embeddings with high precision and recall for domain-specific attribute prediction. The proposed model operates within a Hybrid Collaborative Filtering (HCF) framework, where generic entity representations are fine-tuned through relevant item prediction tasks. Our experiments, focusing on the cloud computing domain, demonstrate that HCF-based embeddings outperform AutoEncoder-based embeddings (using purely unstructured data), achieving a 28% lift in precision and an 11% lift in recall for domain-specific attribute prediction.

data mining, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.20325

Country: Europe > Bulgaria > Varna Province > Varna (0.04)

Genre: Research Report (0.64)

Industry:

Information Technology (0.93)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language Pair

Sakai, Yusuke, Makinae, Mana, Kamigaito, Hidetaka, Watanabe, Taro

arXiv.org Artificial IntelligenceApr-18-2024

In Simultaneous Machine Translation (SiMT) systems, training with a simultaneous interpretation (SI) corpus is an effective method for achieving high-quality yet low-latency systems. However, it is very challenging to curate such a corpus due to limitations in the abilities of annotators, and hence, existing SI corpora are limited. Therefore, we propose a method to convert existing speech translation corpora into interpretation-style data, maintaining the original word order and preserving the entire source content using Large Language Models (LLM-SI-Corpus). We demonstrate that fine-tuning SiMT models in text-to-text and speech-to-text settings with the LLM-SI-Corpus reduces latencies while maintaining the same level of quality as the models trained with offline datasets. The LLM-SI-Corpus is available at \url{https://github.com/yusuke1997/LLM-SI-Corpus}.

computational linguistic, gpt-3, translation, (13 more...)

arXiv.org Artificial Intelligence

2404.12299

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
North America > Canada > Ontario > Toronto (0.04)
(24 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Pre-train, Prompt, and Predict – Part1 – Towards AI

#artificialintelligenceMar-4-2023, 12:55:13 GMT

Originally published on Towards AI. I came across this wonderful paper on Prompting while going through this amazing course on Advanced NLP (UMass). Being a survey paper, they have given a holistic explanation of this latest paradigm in NLP. Over multiple articles, we will be discussing the key highlights from the paper and learn why Prompting is considered to be "The Second Sea Change in NLP". To appreciate what is prompting and to get started, Part 1 discusses 4 major paradigms that have occurred over the past years.

engineering, paradigm, pre-train, (7 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors

#artificialintelligenceMay-23-2022, 00:23:22 GMT

Deep learning is increasingly moving towards a transfer learning paradigm whereby large foundation models are fine-tuned on downstream tasks, starting from an initialization learned on the source task. But an initialization contains relatively little information about the source task. Instead, we show that we can learn highly informative posteriors from the source task, through supervised or self-supervised approaches, which then serve as the basis for priors that modify the whole loss surface on the downstream task. This simple modular approach enables significant performance gains and more data-efficient learning on a variety of downstream classification and segmentation tasks, serving as a drop-in replacement for standard pre-training strategies. These highly informative priors also can be saved for future use, similar to pre-trained weights, and stand in contrast to the zero-mean isotropic uninformative priors that are typically used in Bayesian deep learning.

easy bayesian transfer learning, pre-train

#artificialintelligence

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.44)

Add feedback

The Joy of Neural Painting

Diaz-Aviles, Ernesto, Orellana-Rodriguez, Claudia, Jochim, Beth

arXiv.org Artificial IntelligenceNov-22-2021

Neural Painters is a class of models that follows a GAN framework to generate brushstrokes, which are then composed to create paintings. GANs are great generative models for AI Art but they are known to be notoriously difficult to train. To overcome GAN's limitations and to speed up the Neural Painter training, we applied Transfer Learning to the process reducing it from days to only hours, while achieving the same level of visual aesthetics in the final paintings generated. We report our approach and results in this work.

brushstroke, generator, neural painter, (14 more...)

arXiv.org Artificial Intelligence

2111.10283

Genre: Research Report (0.40)

Technology: