extratree
Appendix 1 Proofs
Let B denote the batch size chosen for MABSplit. Note that there are at mostnB rounds in the main while loop (Line 6) of Algorithm 1 and hence at mostnmTB nmT confidence intervals computed across all arms and all steps of the algorithm. Since the mainwhile loop in the algorithm can only run nB times, the algorithm must terminate. Furthermore, if all confidence intervals throughout the algorithm are correct, itisimpossible for(f,t)tobe removed from the set ofcandidate arms. Finally, we consider the complexity of Algorithm 1. Letnused be the total number of arm pulls computed for each arm remaining in the set of candidate arms at a given point in the algorithm.
- North America > United States > California (0.06)
- Asia > China > Beijing > Beijing (0.04)
The Meta-Learning Gap: Combining Hydra and Quant for Large-Scale Time Series Classification
Time series classification faces a fundamental trade-off between accuracy and computational efficiency. While comprehensive ensembles like HIVE-COTE 2.0 achieve state-of-the-art accuracy, their 340-hour training time on the UCR benchmark renders them impractical for large-scale datasets. We investigate whether targeted combinations of two efficient algorithms from complementary paradigms can capture ensemble benefits while maintaining computational feasibility. Combining Hydra (competing convolutional kernels) and Quant (hierarchical interval quantiles) across six ensemble configurations, we evaluate performance on 10 large-scale MONSTER datasets (7,898 to 1,168,774 training instances). Our strongest configuration improves mean accuracy from 0.829 to 0.836, succeeding on 7 of 10 datasets. However, prediction-combination ensembles capture only 11% of theoretical oracle potential, revealing a substantial meta-learning optimization gap. Feature-concatenation approaches exceeded oracle bounds by learning novel decision boundaries, while prediction-level complementarity shows moderate correlation with ensemble gains. The central finding: the challenge has shifted from ensuring algorithms are different to learning how to combine them effectively. Current meta-learning strategies struggle to exploit the complementarity that oracle analysis confirms exists. Improved combination strategies could potentially double or triple ensemble gains across diverse time series classification applications.
Quality analysis and evaluation prediction of RAG retrieval based on machine learning algorithms
Zhang, Ruoxin, Wen, Zhizhao, Wang, Chao, Tang, Chenchen, Xu, Puyang, Jiang, Yifan
With the rapid evolution of large language models, retrieval enhanced generation technology has been widely used due to its ability to integrate external knowledge to improve output accuracy. However, the performance of the system is highly dependent on the quality of the retrieval module. If the retrieval results have low relevance to user needs or contain noisy information, it will directly lead to distortion of the generated content. In response to the performance bottleneck of existing models in processing tabular features, this paper proposes an XGBoost machine learning regression model based on feature engineering and particle swarm optimization. Correlation analysis shows that answer_quality is positively correlated with doc_delevance by 0.66, indicating that document relevance has a significant positive effect on answer quality, and improving document relevance may enhance answer quality; The strong negative correlations between semantic similarity, redundancy, and diversity were -0.89 and -0.88, respectively, indicating a tradeoff between semantic similarity, redundancy, and diversity. In other words, as the former two increased, diversity significantly decreased. The experimental results comparing decision trees, AdaBoost, etc. show that the VMD PSO BiLSTM model is superior in all evaluation indicators, with significantly lower MSE, RMSE, MAE, and MAPE compared to the comparison model. The R2 value is higher, indicating that its prediction accuracy, stability, and data interpretation ability are more outstanding. This achievement provides an effective path for optimizing the retrieval quality and improving the generation effect of RAG system, and has important value in promoting the implementation and application of related technologies.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > Texas > Harris County > Houston (0.05)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.95)
- (2 more...)
Synheart Emotion: Privacy-Preserving On-Device Emotion Recognition from Biosignals
Ademtew, Henok, Goytom, Israel
Human emotions fundamentally shape decision-making, social interactions, and cognitive processes. Modern human-computer interaction (HCI) systems, however, remain largely oblivious to users' affective states, relying exclusively on explicit inputs such as touch, speech, or gaze. The proliferation of consumer wearables such as smartwatches, fitness trackers, and health monitors has democratized access to continuous physiological data, creating unprecedented opportunities for emotionally intelligent computing [1, 2]. Physiological signals offer several advantages over traditional modalities (facial expressions, voice) for emotion recognition: they are continuous, difficult to consciously manipulate, and unaffected by environmental factors such as lighting or occlusion [3]. Among these signals, heart rate variability (HR V), the temporal variation between consecutive heartbeats, has emerged as a robust biomarker of autonomic nervous system activity and emotional states [4, 5]. Despite significant research advances in affective computing, most emotion recognition systems exhibit two critical limitations: 1. Privacy vulnerabilities: Cloud-based inference requires transmitting sensitive bio-metric data to external servers, exposing users to data breaches, surveillance, and loss of autonomy [6].
- Asia > Middle East > Israel (0.40)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Research Report > New Finding (0.69)
- Research Report > Experimental Study (0.47)
Chunked TabPFN: Exact Training-Free In-Context Learning for Long-Context Tabular Data
Sergazinov, Renat, Yin, Shao-An
TabPFN v2 achieves better results than tree-based models on several tabular benchmarks, which is notable since tree-based models are usually the strongest choice for tabular data. However, it cannot handle more than 10K context tokens because transformers have quadratic computation and memory costs. Unlike existing approaches that rely on context compression, such as selecting representative samples via K-nearest neighbors (KNN), we introduce a tiled-block strategy to compute attention within the TabPFN framework. This design is compatible with standard GPU setups and, to the best of our knowledge, is the first to enable TabPFN to process long contexts without any pre-processing. We demonstrate the effectiveness of our approach on the standard TabArena benchmark, with code available at chunk tabpfn.
- North America > United States > Texas > Brazos County > College Station (0.04)
- North America > United States > Minnesota (0.04)
Selective Cascade of Residual ExtraTrees
We propose a novel tree-based ensemble method named Selective Cascade of Residual ExtraTrees (SCORE). SCORE draws inspiration from representation learning, incorporates regularized regression with variable selection features, and utilizes boosting to improve prediction and reduce generalization errors. We also develop a variable importance measure to increase the explainability of SCORE. Our computer experiments show that SCORE provides comparable or superior performance in prediction against ExtraTrees, random forest, gradient boosting machine, and neural networks; and the proposed variable importance measure for SCORE is comparable to studied benchmark methods. Finally, the predictive performance of SCORE remains stable across hyper-parameter values, suggesting potential robustness to hyperparameter specification.
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- North America > United States > New York (0.04)
- North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)