AITopics

2502.07971

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > Dominican Republic (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (1.00)

Industry:

Media (0.45)
Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.88)
(2 more...)

van Arem, Koen W., Goes-Smit, Floris, Söhl, Jakob

Forecasting the future development in quality and value of professional football players for applications in team management

arXiv.org Artificial IntelligenceFeb-11-2025

Transfers in professional football (soccer) are risky investments because of the large transfer fees and high risks involved. Although data-driven models can be used to improve transfer decisions, existing models focus on describing players' historical progress, leaving their future performance unknown. Moreover, recent developments have called for the use of explainable models combined with uncertainty quantification of predictions. This paper assesses explainable machine learning models based on predictive accuracy and uncertainty quantification methods for the prediction of the future development in quality and transfer value of professional football players. Using a historical data set of data-driven indicators describing player quality and the transfer value of a football player, the models are trained to forecast player quality and player value one year ahead. These two prediction problems demonstrate the efficacy of tree-based models, particularly random forest and XGBoost, in making accurate predictions. In general, the random forest model is found to be the most suitable model because it provides accurate predictions as well as an uncertainty quantification method that naturally arises from the bagging procedure of the random forest model. Additionally, our research shows that the development of player performance contains nonlinear patterns and interactions between variables, and that time series information can provide useful information for the modeling of player performance metrics. Our research provides models to help football clubs make more informed, data-driven transfer decisions by forecasting player quality and transfer value.

artificial intelligence, decision tree learning, machine learning, (17 more...)

2502.07528

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > New York > New York County > New York City (0.04)
(9 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.66)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Leisure & Entertainment > Sports > Football (0.94)
Leisure & Entertainment > Sports > Hockey (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.76)

Ying, Zhang, Congcong, Wen, Didier, Sornette, Chengxiang, Zhan

Integrating Artificial Intelligence and Geophysical Insights for Earthquake Forecasting: A Cross-Disciplinary Review

arXiv.org Artificial IntelligenceFeb-10-2025

Earthquake forecasting remains a significant scientific challenge, with current methods falling short of achieving the performance necessary for meaningful societal benefits. Traditional models, primarily based on past seismicity and geomechanical data, struggle to capture the complexity of seismic patterns and often overlook valuable non-seismic precursors such as geophysical, geochemical, and atmospheric anomalies. The integration of such diverse data sources into forecasting models, combined with advancements in AI technologies, offers a promising path forward. AI methods, particularly deep learning, excel at processing complex, large-scale datasets, identifying subtle patterns, and handling multidimensional relationships, making them well-suited for overcoming the limitations of conventional approaches. This review highlights the importance of combining AI with geophysical knowledge to create robust, physics-informed forecasting models. It explores current AI methods, input data types, loss functions, and practical considerations for model development, offering guidance to both geophysicists and AI researchers. While many AI-based studies oversimplify earthquake prediction, neglecting critical features such as data imbalance and spatio-temporal clustering, the integration of specialized geophysical insights into AI models can address these shortcomings. We emphasize the importance of interdisciplinary collaboration, urging geophysicists to experiment with AI architectures thoughtfully and encouraging AI experts to deepen their understanding of seismology. By bridging these disciplines, we can develop more accurate, reliable, and societally impactful earthquake forecasting tools.

data mining, machine learning, natural language, (25 more...)

2502.12161

Country:

Asia > Middle East (0.92)
North America > United States > California (0.46)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.92)

Industry:

Information Technology > Security & Privacy (1.00)
Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(9 more...)

Ayad, Célia Wafa, Bonnier, Thomas, Bosch, Benjamin, Parbhoo, Sonali, Read, Jesse

Feature Importance Depends on Properties of the Data: Towards Choosing the Correct Explanations for Your Data and Decision Trees based Models

arXiv.org Artificial IntelligenceFeb-10-2025

In order to ensure the reliability of the explanations of machine learning models, it is crucial to establish their advantages and limits and in which case each of these methods outperform. However, the current understanding of when and how each method of explanation can be used is insufficient. To fill this gap, we perform a comprehensive empirical evaluation by synthesizing multiple datasets with the desired properties. Our main objective is to assess the quality of feature importance estimates provided by local explanation methods, which are used to explain predictions made by decision tree-based models. By analyzing the results obtained from synthetic datasets as well as publicly available binary classification datasets, we observe notable disparities in the magnitude and sign of the feature importance estimates generated by these methods. Moreover, we find that these estimates are sensitive to specific properties present in the data. Although some model hyper-parameters do not significantly influence feature importance assignment, it is important to recognize that each method of explanation has limitations in specific contexts. Our assessment highlights these limitations and provides valuable insight into the suitability and reliability of different explanatory methods in various scenarios.

artificial intelligence, decision tree learning, machine learning, (15 more...)

2502.07153

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Yang, Ivory, Ma, Weicheng, Zhang, Chunhui, Vosoughi, Soroush

Is It Navajo? Accurate Language Detection in Endangered Athabaskan Languages

arXiv.org Artificial IntelligenceFeb-10-2025

Endangered languages, such as Navajo - the most widely spoken Native American language - are significantly underrepresented in contemporary language technologies, exacerbating the challenges of their preservation and revitalization. This study evaluates Google's Language Identification (LangID) tool, which does not currently support any Native American languages. To address this, we introduce a random forest classifier trained on Navajo and twenty erroneously suggested languages by LangID. Despite its simplicity, the classifier achieves near-perfect accuracy (97-100%). Additionally, the model demonstrates robustness across other Athabaskan languages - a family of Native American languages spoken primarily in Alaska, the Pacific Northwest, and parts of the Southwestern United States - suggesting its potential for broader application. Our findings underscore the pressing need for NLP systems that prioritize linguistic diversity and adaptability over centralized, one-size-fits-all solutions, especially in supporting underrepresented languages in a multicultural world. This work directly contributes to ongoing efforts to address cultural biases in language models and advocates for the development of culturally localized NLP tools that serve diverse linguistic communities.

machine learning, natural language, navajo, (19 more...)

2501.15773

Country:

North America > United States > Alaska (0.24)
Europe > Germany > Saxony > Leipzig (0.05)
North America > United States > Arizona (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.35)

Balaji Lakshminarayanan, Daniel M. Roy, Yee Whye Teh

Mondrian Forests: Efficient Online Random Forests

Neural Information Processing SystemsFeb-9-2025, 20:45:06 GMT

Ensembles of randomized decision trees, usually referred to as random forests, are widely used for classification and regression tasks in machine learning and statistics. Random forests achieve competitive predictive performance and are computationally efficient to train and test, making them excellent candidates for real-world prediction tasks. The most popular random forest variants (such as Breiman's random forest and extremely randomized trees) operate on batches of training data. Online methods are now in greater demand. Existing online random forests, however, require more training data than their batch counterpart to achieve comparable predictive performance. In this work, we use Mondrian processes (Roy and Teh, 2009) to construct ensembles of random decision trees we call Mondrian forests. Mondrian forests can be grown in an incremental/online fashion and remarkably, the distribution of online Mondrian forests is the same as that of batch Mondrian forests. Mondrian forests achieve competitive predictive performance comparable with existing online random forests and periodically retrained batch random forests, while being more than an order of magnitude faster, thus representing a better computation vs accuracy tradeoff.

artificial intelligence, machine learning, random forest, (19 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Neural Information Processing SystemsFeb-9-2025, 08:37:09 GMT

Multi-Class Deep Boosting

Our algorithms can use as a base classifier set a family of deep decision trees or other rich or complex families and yet benefit from strong generalization guarantees. We give new data-dependent learning bounds for convex ensembles in the multiclass classification setting expressed in terms of the Rademacher complexities of the sub-families composing the base classifier set, and the mixture weight assigned to each sub-family. These bounds are finer than existing ones both thanks to an improved dependency on the number of classes and, more crucially, by virtue of a more favorable complexity term expressed as an average of the Rademacher complexities based on the ensemble's mixture weights.

algorithm, artificial intelligence, machine learning, (18 more...)

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.49)

Woonhyun Nam, Piotr Dollar, Joon Hee Han

Local Decorrelation For Improved Pedestrian Detection

Neural Information Processing SystemsFeb-9-2025, 02:11:38 GMT

Even with the advent of more sophisticated, data-hungry methods, boosted decision trees remain extraordinarily successful for fast rigid object detection, achieving top accuracy on numerous datasets. While effective, most boosted detectors use decision trees with orthogonal (single feature) splits, and the topology of the resulting decision boundary may not be well matched to the natural topology of the data. Given highly correlated data, decision trees with oblique (multiple feature) splits can be effective. Use of oblique splits, however, comes at considerable computational expense. Inspired by recent work on discriminative decorrelation of HOG features, we instead propose an efficient feature transform that removes correlations in local neighborhoods. The result is an overcomplete but locally decorrelated representation ideally suited for use with orthogonal decision trees. In fact, orthogonal trees with our locally decorrelated features outperform oblique trees trained over the original features at a fraction of the computational cost. The overall improvement in accuracy is dramatic: on the Caltech Pedestrian Dataset, we reduce false positives nearly tenfold over the previous state-of-the-art.

artificial intelligence, detection, machine learning, (16 more...)

Country: North America > Canada > Ontario > Toronto (0.14)

Industry:

Transportation > Ground > Road (0.42)
Automobiles & Trucks (0.42)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Neural Information Processing SystemsFeb-8-2025, 15:57:30 GMT

Review for NeurIPS paper: Model Class Reliance for Random Forests

This is a relevant and timely paper that has been reviewed by four knowledgeable referees, who also thoroughly considered the author's response to their initial reviews. Three of these reviewers recommend acceptance, providing detailed suggestions on how to improve this work before its final submission. This dissenting opinion was upheld by R3 after discussion with other referees. R3 in my opinion correctly brings up that if the proposed approach aims to improve runtime with an approximate algorithm, this must be sufficiently demonstrated in experiments vs. straightforward alternatives (such as retraining-based methods). That has not been done in the original submission neither in the rebuttal.

model class reliance, neurips paper, random forest, (4 more...)

Genre: Personal > Opinion (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)

Neural Information Processing SystemsFeb-8-2025, 08:08:37 GMT

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Heavy review Summary: The paper concerns multi-class problems with a large number of classes. It introduces a novel label tree classifier that learns and predicts in logarithmic time in the number of classes. Theoretical guarantees in terms of a boosting-like theorem have been proven. Moreover, not only node classifiers, but also the structure of the tree is trained online. Additionally, the authors show a simple subtree swapping procedure that ensures proper balancing of the tree.

author feedback and meta-review, confirmation measure, learning, (11 more...)

Genre: Research Report > New Finding (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.90)