AITopics | cross-validation fold

Collaborating Authors

cross-validation fold

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

OpenFPL: An open-source forecasting method rivaling state-of-the-art Fantasy Premier League services

Groos, Daniel

arXiv.org Artificial IntelligenceAug-15-2025

Fantasy Premier League engages the football community in selecting the Premier League players who will perform best from gameweek to gameweek. Access to accurate performance forecasts gives participants an edge over competitors by guiding expectations about player outcomes and reducing uncertainty in squad selection. However, high-accuracy forecasts are currently limited to commercial services whose inner workings are undisclosed and that rely on proprietary data. This paper aims to democratize access to highly accurate forecasts of player performance by presenting OpenFPL, an open-source Fantasy Premier League forecasting method developed exclusively from public data. Comprising position-specific ensemble models optimized on Fantasy Premier League and Understat data from four previous seasons (2020-21 to 2023-24), OpenFPL achieves accuracy comparable to a leading commercial service when tested prospectively on data from the 2024-25 season. OpenFPL also surpasses the commercial benchmark for high-return players ($>$ 2 points), which are most influential for rank gains. These findings hold across one-, two-, and three-gameweek forecast horizons, supporting long-term planning of transfers and strategies while also informing final-day decisions.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2508.09992

Country: Europe (0.93)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

The Harmonic Structure of Information Contours

Tsipidi, Eleftheria, Kiegeland, Samuel, Nowak, Franz, Xu, Tianyang, Wilcox, Ethan, Warstadt, Alex, Cotterell, Ryan, Giulianelli, Mario

arXiv.org Artificial IntelligenceJun-5-2025

The uniform information density (UID) hypothesis proposes that speakers aim to distribute information evenly throughout a text, balancing production effort and listener comprehension difficulty. However, language typically does not maintain a strictly uniform information rate; instead, it fluctuates around a global average. These fluctuations are often explained by factors such as syntactic constraints, stylistic choices, or audience design. In this work, we explore an alternative perspective: that these fluctuations may be influenced by an implicit linguistic pressure towards periodicity, where the information rate oscillates at regular intervals, potentially across multiple frequencies simultaneously. We apply harmonic regression and introduce a novel extension called time scaling to detect and test for such periodicity in information contours. Analyzing texts in English, Spanish, German, Dutch, Basque, and Brazilian Portuguese, we find consistent evidence of periodic patterns in information rate. Many dominant frequencies align with discourse structure, suggesting these oscillations reflect meaningful linguistic organization. Beyond highlighting the connection between information rate and discourse structure, our approach offers a general framework for uncovering structural pressures at various levels of linguistic granularity.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2506.03902

Country:

North America > United States (1.00)
Europe > United Kingdom > England (0.28)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
(2 more...)

Add feedback

MONSTER: Monash Scalable Time Series Evaluation Repository

Dempster, Angus, Foumani, Navid Mohammadi, Tan, Chang Wei, Miller, Lynn, Mishra, Amish, Salehi, Mahsa, Pelletier, Charlotte, Schmidt, Daniel F., Webb, Geoffrey I.

arXiv.org Artificial IntelligenceFeb-20-2025

We introduce Monster--the MONash Scalable Time Series E valuation R epository--a collection of large datasets for time series classification. The field of time series classification has benefitted from common benchmarks set by the UCR and UEA time series classification repositories. However, the datasets in these benchmarks are small, with median sizes of 217 and 255 examples, respectively. In consequence they favour a narrow subspace of models that are optimised to achieve low classification error on a wide variety of smaller datasets, that is, models that minimise variance, and give little weight to computational issues such as scalability. Our hope is to diversify the field by introducing benchmarks using larger datasets. We believe that there is enormous potential for new progress in the field by engaging with the theoretical and practical challenges of learning effectively from larger quantities of data.

cross-validation fold, dataset, time sery classification, (9 more...)

arXiv.org Artificial Intelligence

2502.15122

Country:

Oceania > Australia > Victoria > Melbourne (0.14)
Africa > La Réunion (0.04)
North America > Canada > Yukon (0.04)
(7 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Evaluating Deep Regression Models for WSI-Based Gene-Expression Prediction

Gustafsson, Fredrik K., Rantalainen, Mattias

arXiv.org Artificial IntelligenceOct-1-2024

Prediction of mRNA gene-expression profiles directly from routine whole-slide images (WSIs) using deep learning models could potentially offer cost-effective and widely accessible molecular phenotyping. While such WSI-based gene-expression prediction models have recently emerged within computational pathology, the high-dimensional nature of the corresponding regression problem offers numerous design choices which remain to be analyzed in detail. This study provides recommendations on how deep regression models should be trained for WSI-based gene-expression prediction. For example, we conclude that training a single model to simultaneously regress all 20530 genes is a computationally efficient yet very strong baseline.

pearson, resnet-in, uni, (13 more...)

arXiv.org Artificial Intelligence

2410.00945

Country:

North America > United States > Massachusetts (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report > Experimental Study (0.34)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.72)

Add feedback

The CAST package for training and assessment of spatial prediction models in R

Meyer, Hanna, Ludwig, Marvin, Milà, Carles, Linnenbrink, Jan, Schumacher, Fabian

arXiv.org Machine LearningApr-10-2024

One key task in environmental science is to map environmental variables continuously in space or even in space and time. Machine learning algorithms are frequently used to learn from local field observations to make spatial predictions by estimating the value of the variable of interest in places where it has not been measured. However, the application of machine learning strategies for spatial mapping involves additional challenges compared to "non-spatial" prediction tasks that often originate from spatial autocorrelation and from training data that are not independent and identically distributed. In the past few years, we developed a number of methods to support the application of machine learning for spatial data which involves the development of suitable cross-validation strategies for performance assessment and model selection, spatial feature selection, and methods to assess the area of applicability of the trained models. The intention of the CAST package is to support the application of machine learning strategies for predictive mapping by implementing such methods and making them available for easy integration into modelling workflows. Here we introduce the CAST package and its core functionalities. At the case study of mapping plant species richness, we will go through the different steps of the modelling workflow and show how CAST can be used to support more reliable spatial predictions.

cross-validation strategy, prediction, training data, (14 more...)

arXiv.org Machine Learning

2404.06978

Country:

Europe > Germany > North Rhine-Westphalia > Münster Region > Münster (0.05)
North America > United States > New York (0.04)
South America > Chile (0.04)
(3 more...)

Genre:

Workflow (0.75)
Research Report (0.50)

Industry: Education > Assessment & Standards (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.45)

Add feedback

Seeing Numbers: Bayesian Optimisation of a LightGBM model

#artificialintelligenceAug-6-2021, 09:16:00 GMT

In a classic case of "be careful what you search for," reading a couple of online articles on model hyper-parameter optimisation has lead to my news feed being bombarded with how-to guides guaranteeing "the most powerful model possible" "in a few easy steps." What I do notice however, is that few articles actually mention that hyper-parameter tuning is only part of the process and is not a silver bullet solution for predictive power. Even fewer articles mention that gains in predictive power from hyper-parameter optimisation are modest and are likely less than gains from decent feature engineering. LightGBM is a gradient boosting framework which uses tree-based learning algorithms. It is an example of an ensemble technique which combines weak individual models to form a single accurate model.

accuracy, feature engineering, optimisation, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.50)

Add feedback