complementarity
The Meta-Learning Gap: Combining Hydra and Quant for Large-Scale Time Series Classification
Time series classification faces a fundamental trade-off between accuracy and computational efficiency. While comprehensive ensembles like HIVE-COTE 2.0 achieve state-of-the-art accuracy, their 340-hour training time on the UCR benchmark renders them impractical for large-scale datasets. We investigate whether targeted combinations of two efficient algorithms from complementary paradigms can capture ensemble benefits while maintaining computational feasibility. Combining Hydra (competing convolutional kernels) and Quant (hierarchical interval quantiles) across six ensemble configurations, we evaluate performance on 10 large-scale MONSTER datasets (7,898 to 1,168,774 training instances). Our strongest configuration improves mean accuracy from 0.829 to 0.836, succeeding on 7 of 10 datasets. However, prediction-combination ensembles capture only 11% of theoretical oracle potential, revealing a substantial meta-learning optimization gap. Feature-concatenation approaches exceeded oracle bounds by learning novel decision boundaries, while prediction-level complementarity shows moderate correlation with ensemble gains. The central finding: the challenge has shifted from ensuring algorithms are different to learning how to combine them effectively. Current meta-learning strategies struggle to exploit the complementarity that oracle analysis confirms exists. Improved combination strategies could potentially double or triple ensemble gains across diverse time series classification applications.
Redundancy-optimized Multi-head Attention Networks for Multi-View Multi-Label Feature Selection
Liu, Yuzhou, Liu, Jiarui, Gao, Wanfu
Multi-view multi-label data offers richer perspectives for artificial intelligence, but simultaneously presents significant challenges for feature selection due to the inherent complexity of interrelations among features, views and labels. Attention mechanisms provide an effective way for analyzing these intricate relationships. They can compute importance weights for information by aggregating correlations between Query and Key matrices to focus on pertinent values. However, existing attention-based feature selection methods predominantly focus on intra-view relationships, neglecting the complementarity of inter-view features and the critical feature-label correlations. Moreover, they often fail to account for feature redundancy, potentially leading to subop-timal feature subsets. To overcome these limitations, we propose a novel method based on R edundancy-optimized Multi-head A ttention Networks for Multi-view Multi-label F eature Selection (RMAN-MMFS). Specifically, we employ each individual attention head to model intra-view feature relationships and use the cross-attention mechanisms between different heads to capture inter-view feature complementarity. Furthermore, we design static and dynamic feature redundancy terms: the static term mitigates redundancy within each view, while the dynamic term explicitly models redundancy between unselected and selected features across the entire selection process, thereby promoting feature compactness. Comprehensive evaluations on six real-world datasets, compared against six multi-view multi-label feature selection methods, demonstrate the superior performance of the proposed method.
- Research Report > Promising Solution (0.48)
- Research Report > New Finding (0.46)
- North America > United States (0.28)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- North America > United States > New York (0.04)
- Asia > Middle East > Jordan (0.04)
We would like to thank all reviewers for their time and effort invested in reviewing our work and for the valuable
We now turn to address each of the reviewers individual comments. We do not share your feelings regarding the claim that "the potential audience in the NeurIPS community is limited". Regarding your comment "presentation is unusually technical for machine learning venues", we would like to point out We believe the example in Table 2 demonstrates exactly this quite nicely. Reviewer #2: Thank you for you positive feedback and for for finding our results significant. We will add an appropriate discussion to make this point clearer.
- North America > United States > Oregon > Benton County > Corvallis (0.04)
- Asia > Singapore (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- (2 more...)
Merging Continual Pretraining Models for Domain-Specialized LLMs: A Case Study in Finance
Ueda, Kentaro, Portet, François, Suwa, Hirohiko, Yasumoto, Keiichi
While LLMs excel at general tasks, they struggle in specialized domains like finance, requiring diverse skills in domain knowledge, mathematical reasoning, and multilingual processing. Merging domain-specific Continual Pre-training (CPT) "experts" offers a practical alternative to costly and unstable multi-skill training. However, unlike established Supervised Fine-Tuning (SFT) model-based merging, CPT model merging remains largely unexplored. We address this gap by creating financial LLMs from experts in finance, math, and Japanese. We propose a three-stage evaluation focusing on knowledge recovery, complementarity, and emergence, and assess three merging methods (Task Arithmetic, TIES, and DARE-TIES) on a comprehensive financial benchmark curated from 18 tasks across 8 established datasets. Results show that merging an expert with its base model recovers general knowledge lost during CPT, while merging experts improves performance and can yield emergent cross-domain skills. Among the methods, Task Arithmetic performs strongly but is hyperparameter-sensitive, whereas TIES is more robust. Our findings also suggest that while model similarity correlates with merging success, emergent skills depend on more complex factors. This work presents the first foundational analysis of CPT model merging, establishing a principled framework and providing clear guidance for building multi-skill LLMs from existing assets.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (8 more...)