AITopics | mcl

dee254cdacbab59f17dc6a8fbdffa59f-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 00:52:22 GMT

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
Asia (0.93)
North America > United States > California (0.28)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)

Add feedback

Recasting Continual Learning as Sequence Modeling

Neural Information Processing SystemsFeb-17-2026, 13:13:49 GMT

That is, we propose to formulate continual learning as a sequence modeling problem, allowing advanced sequence models to be utilized for continual learning.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Vancouver (0.04)
(15 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)

Add feedback

DP-SSL: TowardsRobustSemi-supervisedLearning withAFewLabeledSamples

Neural Information Processing SystemsFeb-9-2026, 16:06:00 GMT

However, when the size of labeled data is very small (say a few labeled samples per class), SSL performs poorly and unstably, possibly due to the low qualityoflearnedpseudolabels.Inthispaper,weproposeanewSSLmethodcalled DP-SSL that adopts an innovative data programming (DP) scheme to generate probabilistic labels for unlabeled data. Different from existing DP methods that rely on human experts to provide initial labeling functions (LFs), we develop a multiple-choice learning (MCL) based approach to automatically generate LFs fromscratchinSSLstyle. Withthenoisylabelsproduced bytheLFs,wedesign a label model to resolve the conflict and overlap among the noisy labels, and finally infer probabilistic labels for unlabeled samples.

artificial intelligence, machine learning, probabilistic label, (18 more...)

Neural Information Processing Systems

Country: Asia > China (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.36)

Add feedback

Short-Context Dominance: How Much Local Context Natural Language Actually Needs?

Vakilian, Vala, Wang, Zimeng, Rawat, Ankit Singh, Thrampoulidis, Christos

arXiv.org Artificial IntelligenceDec-10-2025

We investigate the short-context dominance hypothesis: that for most sequences, a small local prefix suffices to predict their next tokens. Using large language models as statistical oracles, we measure the minimum context length (MCL) needed to reproduce accurate full-context predictions across datasets with sequences of varying lengths. For sequences with 1-7k tokens from long-context documents, we consistently find that 75-80% require only the last 96 tokens at most. Given the dominance of short-context tokens, we then ask whether it is possible to detect challenging long-context sequences for which a short local prefix does not suffice for prediction. We introduce a practical proxy to MCL, called Distributionally Aware MCL (DaMCL), that does not require knowledge of the actual next-token and is compatible with sampling strategies beyond greedy decoding. Our experiments validate that simple thresholding of the metric defining DaMCL achieves high performance in detecting long vs. short context sequences. Finally, to counter the bias that short-context dominance induces in LLM output distributions, we develop an intuitive decoding algorithm that leverages our detector to identify and boost tokens that are long-range-relevant. Across Q&A tasks and model architectures, we confirm that mitigating the bias improves performance.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2512.08082

Country: North America > United States (0.92)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

959f70ee50044bed305e48e3484005a7-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-17-2025, 02:46:53 GMT

artificial intelligence, kuzushiji, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

854d6fae5ee42911677c739ee1734486-Supplemental.pdf

Neural Information Processing SystemsAug-15-2025, 15:18:26 GMT

artificial intelligence, inductive learning, machine learning, (20 more...)

Neural Information Processing Systems

Country: Asia > China > Shanghai > Shanghai (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

854d6fae5ee42911677c739ee1734486-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 15:18:22 GMT

artificial intelligence, inductive learning, machine learning, (20 more...)

Neural Information Processing Systems

Country: Asia > China > Shanghai > Shanghai (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Merge then Realign: Simple and Effective Modality-Incremental Continual Learning for Multimodal LLMs

Zhang, Dingkun, Qi, Shuhan, Xiao, Xinyu, Chen, Kehai, Wang, Xuan

arXiv.org Artificial IntelligenceMar-8-2025

Recent advances in Multimodal Large Language Models (MLLMs) have enhanced their versatility as they integrate a growing number of modalities. Considering the heavy cost of training MLLMs, it is necessary to reuse the existing ones and further extend them to more modalities through Modality-incremental Continual Learning (MCL). However, this often comes with a performance degradation in the previously learned modalities. In this work, we revisit the MCL and investigate a more severe issue it faces in contrast to traditional continual learning, that its degradation comes not only from catastrophic forgetting but also from the misalignment between the modality-agnostic and modality-specific components. To address this problem, we propose an elegantly simple MCL paradigm called "MErge then ReAlign" (MERA). Our method avoids introducing heavy training overhead or modifying the model architecture, hence is easy to deploy and highly reusable in the MLLM community. Extensive experiments demonstrate that, despite the simplicity of MERA, it shows impressive performance, holding up to a 99.84% Backward Relative Gain when extending to four modalities, achieving a nearly lossless MCL performance.

mera, modality, relative gain, (15 more...)

arXiv.org Artificial Intelligence

2503.07663

Country:

Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learning Mamba as a Continual Learner

Zhao, Chongyang, Gong, Dong

arXiv.org Artificial IntelligenceDec-1-2024

Continual learning (CL) aims to efficiently learn and accumulate knowledge from a data stream with different distributions. By formulating CL as a sequence prediction task, meta-continual learning (MCL) enables to meta-learn an efficient continual learner based on the recent advanced sequence models, e.g., Transformers. Although attention-free models (e.g., Linear Transformers) can ideally match CL's essential objective and efficiency requirements, they usually perform not well in MCL. Considering that the attention-free Mamba achieves excellent performances matching Transformers' on general sequence modeling tasks, in this paper, we aim to answer a question - Can attention-free Mamba perform well on MCL? By formulating Mamba with a selective state space model (SSM) for MCL tasks, we propose to meta-learn Mamba as a continual learner, referred to as MambaCL. By incorporating a selectivity regularization, we can effectively train MambaCL. Through comprehensive experiments across various CL tasks, we also explore how Mamba and other models perform in different MCL scenarios. Our experiments and analyses highlight the promising performance and generalization capabilities of Mamba in MCL. Continual learning (CL) aims to efficiently learn and accumulate knowledge in a non-stationary data stream (De Lange et al., 2021; Wang et al., 2024) containing different tasks. To ensure computational and memory efficiency, CL methods are explored for learning from data streams while minimizing the storage of historical data or limiting running memory growth, such as restricting the increase rate to be constant or sub-linear (De Lange et al., 2021; Ostapenko et al., 2021). D. Gong is the corresponding author. The data stream can also be seen as a context of the tasks for performing prediction for a new query.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.00776

Country:

North America > Canada > Ontario > Toronto (0.14)
Oceania > Australia > New South Wales (0.04)
North America > United States > California (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Multiple Choice Learning for Efficient Speech Separation with Many Speakers

Perera, David, Derrida, François, Mariotte, Théo, Richard, Gaël, Essid, Slim

arXiv.org Machine LearningNov-27-2024

Training speech separation models in the supervised setting raises a permutation problem: finding the best assignation between the model predictions and the ground truth separated signals. This inherently ambiguous task is customarily solved using Permutation Invariant Training (PIT). In this article, we instead consider using the Multiple Choice Learning (MCL) framework, which was originally introduced to tackle ambiguous tasks. We demonstrate experimentally on the popular WSJ0-mix and LibriMix benchmarks that MCL matches the performances of PIT, while being computationally advantageous. This opens the door to a promising research direction, as MCL can be naturally extended to handle a variable number of speakers, or to tackle speech separation in the unsupervised setting.

mcl, separation, speech separation, (14 more...)

arXiv.org Machine Learning

2411.18497

Country: Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.82)

Industry: Education (0.62)

Technology:

Information Technology > Artificial Intelligence > Speech (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Filters

Collaborating Authors

mcl

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

dee254cdacbab59f17dc6a8fbdffa59f-Paper-Conference.pdf

Recasting Continual Learning as Sequence Modeling

DP-SSL: TowardsRobustSemi-supervisedLearning withAFewLabeledSamples

Short-Context Dominance: How Much Local Context Natural Language Actually Needs?

959f70ee50044bed305e48e3484005a7-Supplemental-Conference.pdf

854d6fae5ee42911677c739ee1734486-Supplemental.pdf

854d6fae5ee42911677c739ee1734486-Paper.pdf

Merge then Realign: Simple and Effective Modality-Incremental Continual Learning for Multimodal LLMs

Learning Mamba as a Continual Learner

Multiple Choice Learning for Efficient Speech Separation with Many Speakers