AITopics

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.40)

Neural Information Processing SystemsAug-16-2025, 10:02:20 GMT

Response to Reviews of " Co-Tuning for Transfer Learning "

We thank all reviewers for their detailed reviews. However, the major technique is still feature fine-tuning ( a.k.a. In the following, we respond to common questions first and then to major concerns of each reviewer. Each dataset has a train/test split. Each method has access to the same set of training data.

co-tuning, dataset, fine-tuning, (15 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.43)

Neural Information Processing SystemsAug-15-2025, 17:40:18 GMT

Finding the most relevant auxiliary forecasting tasks for pre-training and knowledge transferring to a given primary

We thank the reviewers for valuable and timely comments. We'd like to first emphasize the challenges and contributions: Section 3.2 explains how to calculate this hyper-gradient of Framework for BackPropagation, LeCun, 1988), and widely adopted in the literature [14, 15, 35]. We would like to further polish the notation to be more consistent. 'Pretrain (Top)' is much better than'Pretrain (Down)'.

pre-training and knowledge, relevant auxiliary forecasting task, target task, (14 more...)

Industry: Health & Medicine (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.32)

Neural Information Processing SystemsAug-15-2025, 14:06:13 GMT

Transfer Learning via null 1 Regularization

However, environments are nonstationary in many real-world applications.

concept drift, sgn, transfer lasso, (16 more...)

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > Canada > Ontario > Toronto (0.04)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Neural Information Processing SystemsAug-14-2025, 21:54:54 GMT

Supplementary Materials for LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning

As presented in Section 3.2, our side networks are built on Transformer blocks (same as the backbone Accuracy on GLUE (%) Adapter block + gates 2.07 6.5 83.1 Transformer block + cross attention 2.68 10.4 83.0 Transformer block + gates (current design) 2.29 7.0 83.8 Table 2: Hyper-parameters used for NLP experiments. Batch size is 100 for all methods.Method Learning Rate Other Hyper-parameters Full fine-tuning 3 10 Batch size is 300 for all methods.Method Learning Rate Other Hyper-parameters Full fine-tuning 3 10

ladder side-tuning, side network, supplementary material, (9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.42)

arXiv.org Artificial IntelligenceAug-14-2025

Dynamic Mixture-of-Experts for Incremental Graph Learning

Kong, Lecheng, Vasiloudis, Theodore, Yun, Seongjun, Xie, Han, Song, Xiang

Graph incremental learning is a learning paradigm that aims to adapt trained models to continuously incremented graphs and data over time without the need for retraining on the full dataset. However, regular graph machine learning methods suffer from catastrophic forgetting when applied to incremental learning settings, where previously learned knowledge is overridden by new knowledge. Previous approaches have tried to address this by treating the previously trained model as an inseparable unit and using techniques to maintain old behaviors while learning new knowledge. These approaches, however, do not account for the fact that previously acquired knowledge at different timestamps contributes differently to learning new tasks. Some prior patterns can be transferred to help learn new data, while others may deviate from the new data distribution and be detrimental. To address this, we propose a dynamic mixture-of-experts (DyMoE) approach for incremental learning. Specifically, a DyMoE GNN layer adds new expert networks specialized in modeling the incoming data blocks. We design a customized regularization loss that utilizes data sequence information so existing experts can maintain their ability to solve old tasks while helping the new expert learn the new data effectively. As the number of data blocks grows over time, the computational cost of the full mixture-of-experts (MoE) model increases. To address this, we introduce a sparse MoE approach, where only the top-$k$ most relevant experts make predictions, significantly reducing the computation time. Our model achieved 4.92\% relative accuracy increase compared to the best baselines on class incremental learning, showing the model's exceptional power.

analogical reasoning, artificial intelligence, machine learning, (18 more...)

2508.09974

Country: North America > United States (0.68)

Genre: Research Report (0.82)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Analogical Reasoning (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.34)

arXiv.org Artificial IntelligenceAug-13-2025

Multi-level Collaborative Distillation Meets Global Workspace Model: A Unified Framework for OCIL

Su, Shibin, Liang, Guoqiang, Cheng, De, Zhang, Shizhou, Ran, Lingyan, Zhang, Yanning

--Online Class-Incremental Learning (OCIL) enables models to learn continuously from non-i.i.d. However, OCIL faces two key challenges: maintaining model stability under strict memory constraints and ensuring adaptability to new tasks. Under stricter memory constraints, current replay-based methods are less effective. While ensemble methods improve adaptability (plasticity), they often struggle with stability. T o overcome these challenges, we propose a novel approach that enhances ensemble learning through a Global Workspace Model (GWM)--a shared, implicit memory that guides the learning of multiple student models. The GWM is formed by fusing the parameters of all students within each training batch, capturing the historical learning trajectory and serving as a dynamic anchor for knowledge consolidation. This fused model is then redistributed periodically to the students to stabilize learning and promote cross-task consistency. In addition, we introduce a multi-level collaborative distillation mechanism. This approach enforces peer-to-peer consistency among students and preserves historical knowledge by aligning each student with the GWM. As a result, student models remain adaptable to new tasks while maintaining previously learned knowledge, striking a better balance between stability and plasticity. Extensive experiments on three standard OCIL benchmarks show that our method delivers significant performance improvement for several OCIL models across various memory budgets. Class-Incremental Learning is designed to integrate the knowledge of classes from a stream of data with an evolved distribution [1].

artificial intelligence, machine learning, student, (19 more...)

2508.08677

Country:

Asia > China (0.46)
North America (0.28)

Genre:

Research Report (0.84)
Instructional Material > Online (0.36)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software (0.76)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

arXiv.org Artificial IntelligenceAug-12-2025

From Time-series Generation, Model Selection to Transfer Learning: A Comparative Review of Pixel-wise Approaches for Large-scale Crop Mapping

Long, Judy, Liu, Tao, Woznicki, Sean Alexander, Marković, Miljana, Marko, Oskar, Sears, Molly

Crop mapping involves identifying and classifying crop types using spatial data, primarily derived from remote sensing imagery. This study presents the first comprehensive review of large-scale, pixel-wise crop mapping workflows, encompassing both conventional supervised methods and emerging transfer learning approaches. To identify the optimal time-series generation approaches and supervised crop mapping models, we conducted systematic experiments, comparing six widely adopted satellite image-based preprocessing methods, alongside eleven supervised pixel-wise classification models. Additionally, we assessed the synergistic impact of varied training sample sizes and variable combinations. Moreover, we identified optimal transfer learning techniques for different magnitudes of domain shift. The evaluation of optimal methods was conducted across five diverse agricultural sites. Landsat 8 served as the primary satellite data source. Labels come from CDL trusted pixels and field surveys. Our findings reveal three key insights. First, fine-scale interval preprocessing paired with Transformer models consistently delivered optimal performance for both supervised and transferable workflows. RF offered rapid training and competitive performance in conventional supervised learning and direct transfer to similar domains. Second, transfer learning techniques enhanced workflow adaptability, with UDA being effective for homogeneous crop classes while fine-tuning remains robust across diverse scenarios. Finally, workflow choice depends heavily on the availability of labeled samples. With a sufficient sample size, supervised training typically delivers more accurate and generalizable results. Below a certain threshold, transfer learning that matches the level of domain shift is a viable alternative to achieve crop mapping. All code is publicly available to encourage reproducibility practice.

accuracy, artificial intelligence, machine learning, (18 more...)

2507.1259

Country:

North America > United States (1.00)
Asia (0.67)
Europe > Serbia > Vojvodina (0.14)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Food & Agriculture > Agriculture (1.00)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.37)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Najjar, Hiba, Alshbib, Bushra, Dengel, Andreas

Can Multitask Learning Enhance Model Explainability?

arXiv.org Artificial IntelligenceAug-12-2025

Remote sensing provides satellite data in diverse types and formats. The usage of multimodal learning networks exploits this diversity to improve model performance, except that the complexity of such networks comes at the expense of their interpretability. In this study, we explore how modalities can be leveraged through multitask learning to intrinsically explain model behavior. In particular, instead of additional inputs, we use certain modalities as additional targets to be predicted along with the main task. The success of this approach relies on the rich information content of satellite data, which remains as input modalities. We show how this modeling context provides numerous benefits: (1) in case of data scarcity, the additional modalities do not need to be collected for model inference at deployment, (2) the model performance remains comparable to the multimodal baseline performance, and in some cases achieves better scores, (3) prediction errors in the main task can be explained via the model behavior in the auxiliary task(s). We demonstrate the efficiency of our approach on three datasets, including segmentation, classification, and regression tasks.

artificial intelligence, machine learning, natural language, (16 more...)

2508.06966

Country: Europe > Germany (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceAug-7-2025

Data-Driven Spectrum Demand Prediction: A Spatio-Temporal Framework with Transfer Learning

Farajzadeh, Amin, Zheng, Hongzhao, Dumoulin, Sarah, Ha, Trevor, Yanikomeroglu, Halim, Ghasemi, Amir

Accurate spectrum demand prediction is crucial for informed spectrum allocation, effective regulatory planning, and fostering sustainable growth in modern wireless communication networks. It supports governmental efforts, particularly those led by the international telecommunication union (ITU), to establish fair spectrum allocation policies, improve auction mechanisms, and meet the requirements of emerging technologies such as advanced 5G, forthcoming 6G, and the internet of things (IoT). This paper presents an effective spatio-temporal prediction framework that leverages crowdsourced user-side key performance indicators (KPIs) and regulatory datasets to model and forecast spectrum demand. The proposed methodology achieves superior prediction accuracy and cross-regional generalizability by incorporating advanced feature engineering, comprehensive correlation analysis, and transfer learning techniques. Unlike traditional ITU models, which are often constrained by arbitrary inputs and unrealistic assumptions, this approach exploits granular, data-driven insights to account for spatial and temporal variations in spectrum utilization. Comparative evaluations against ITU estimates, as the benchmark, underscore our framework's capability to deliver more realistic and actionable predictions. Experimental results validate the efficacy of our methodology, highlighting its potential as a robust approach for policymakers and regulatory bodies to enhance spectrum management and planning.

artificial intelligence, machine learning, spectrum demand, (18 more...)

2508.03863

Country: North America > Canada > Ontario > National Capital Region > Ottawa (0.28)

Genre: Research Report (0.82)

Industry:

Telecommunications (1.00)
Government (0.69)
Information Technology > Networks (0.35)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
(2 more...)