Goto

Collaborating Authors

 adaboost


Isomorphic Functionalities between Ant Colony and Ensemble Learning: Part II-On the Strength of Weak Learnability and the Boosting Paradigm

Fokoué, Ernest, Babbitt, Gregory, Levental, Yuval

arXiv.org Machine Learning

In Part I of this series, we established a rigorous mathematical isomorphism between ant colony decision-making and random forest learning, demonstrating that variance reduction through decorrelation is a universal principle shared by biological and computational ensembles. Here we turn to the complementary mechanism: bias reduction through adaptive weighting. Just as boosting algorithms sequentially focus on difficult instances, ant colonies dynamically amplify successful foraging paths through pheromone-mediated recruitment. We prove that these processes are mathematically isomorphic, establishing that the fundamental theorem of weak learnability has a direct analog in colony decision-making. We develop a formal mapping between AdaBoost's adaptive reweighting and ant recruitment dynamics, show that the margin theory of boosting corresponds to the stability of quorum decisions, and demonstrate through comprehensive simulation that ant colonies implementing adaptive recruitment achieve the same bias-reduction benefits as boosting algorithms. This completes a unified theory of ensemble intelligence, revealing that both variance reduction (Part I) and bias reduction (Part II) are manifestations of the same underlying mathematical principles governing collective intelligence in biological and computational systems.








A Boosting-Type Convergence Result for AdaBoost.MH with Factorized Multi-Class Classifiers

Neural Information Processing Systems

AdaBoost is a well-known algorithm in boosting. Schapire and Singer propose, an extension of AdaBoost, named AdaBoost.MH, for multi-class classification problems. Kégl shows empirically that AdaBoost.MH works better when the classical one-against-all base classifiers are replaced by factorized base classifiers containing a binary classifier and a vote (or code) vector. However, the factorization makes it much more difficult to provide a convergence result for the factorized version of AdaBoost.MH. Then, Kégl raises an open problem in COLT 2014 to look for a convergence result for the factorized AdaBoost.MH. In this work, we resolve this open problem by presenting a convergence result for AdaBoost.MH with factorized multi-class classifiers.


Tight Margin-Based Generalization Bounds for Voting Classifiers over Finite Hypothesis Sets

Larsen, Kasper Green, Schalburg, Natascha

arXiv.org Artificial Intelligence

Ensemble learning is a powerful machine learning tool; it enables us to transform weak learners; hypothesis classes that are barely better than guessing, into learners with state-of-the-art performance. In essence, ensemble methods take a set of base classifiers, weigh those classifiers according to performance on the training set and retrieve the final prediction by aggregating according to those weights. An important historical example is AdaBoost (Freund and Schapire [1997]), a type of voting classifier, which builds the ensemble classifier sequentially; new base classifiers are added to the ensemble to correct the mistakes of the current ensemble. AdaBoost was the first efficient and practical implementation of a boosting algorithm, and hence the relevance of ensemble learners is often attributed to AdaBoost. Much theoretical research has been done to explain the impressive practical performance of AdaBoost and other ensemble methods.


Quality analysis and evaluation prediction of RAG retrieval based on machine learning algorithms

Zhang, Ruoxin, Wen, Zhizhao, Wang, Chao, Tang, Chenchen, Xu, Puyang, Jiang, Yifan

arXiv.org Artificial Intelligence

With the rapid evolution of large language models, retrieval enhanced generation technology has been widely used due to its ability to integrate external knowledge to improve output accuracy. However, the performance of the system is highly dependent on the quality of the retrieval module. If the retrieval results have low relevance to user needs or contain noisy information, it will directly lead to distortion of the generated content. In response to the performance bottleneck of existing models in processing tabular features, this paper proposes an XGBoost machine learning regression model based on feature engineering and particle swarm optimization. Correlation analysis shows that answer_quality is positively correlated with doc_delevance by 0.66, indicating that document relevance has a significant positive effect on answer quality, and improving document relevance may enhance answer quality; The strong negative correlations between semantic similarity, redundancy, and diversity were -0.89 and -0.88, respectively, indicating a tradeoff between semantic similarity, redundancy, and diversity. In other words, as the former two increased, diversity significantly decreased. The experimental results comparing decision trees, AdaBoost, etc. show that the VMD PSO BiLSTM model is superior in all evaluation indicators, with significantly lower MSE, RMSE, MAE, and MAPE compared to the comparison model. The R2 value is higher, indicating that its prediction accuracy, stability, and data interpretation ability are more outstanding. This achievement provides an effective path for optimizing the retrieval quality and improving the generation effect of RAG system, and has important value in promoting the implementation and application of related technologies.