AITopics | lr 0

Collaborating Authors

lr 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Murmur2Vec: A Hashing Based Solution For Embedding Generation Of COVID-19 Spike Sequences

Ali, Sarwan, Murad, Taslim

arXiv.org Artificial IntelligenceDec-12-2025

Early detection and characterization of coronavirus disease (COVID-19), caused by SARS-CoV-2, remain critical for effective clinical response and public-health planning. The global availability of large-scale viral sequence data presents significant opportunities for computational analysis; however, existing approaches face notable limitations. Phylogenetic tree-based methods are computationally intensive and do not scale efficiently to today's multi-million-sequence datasets. Similarly, current embedding-based techniques often rely on aligned sequences or exhibit suboptimal predictive performance and high runtime costs, creating barriers to practical large-scale analysis. In this study, we focus on the most prevalent SARS-CoV-2 lineages associated with the spike protein region and introduce a scalable embedding method that leverages hashing to generate compact, low-dimensional representations of spike sequences. These embeddings are subsequently used to train a variety of machine learning models for supervised lineage classification. We conduct an extensive evaluation comparing our approach with multiple baseline and state-of-the-art biological sequence embedding methods across diverse metrics. Our results demonstrate that the proposed embeddings offer substantial improvements in efficiency, achieving up to 86.4\% classification accuracy while reducing embedding generation time by as much as 99.81\%. This highlights the method's potential as a fast, effective, and scalable solution for large-scale viral sequence analysis.

artificial intelligence, bioinformatics, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2512.10147

Country:

North America > United States (0.46)
Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Benchmark Datasets for Lead-Lag Forecasting on Social Platforms

Kazemian, Kimia, Liu, Zhenzhen, Yang, Yangfanyu, Luo, Katie Z, Gu, Shuhan, Du, Audrey, Yang, Xinyu, Jansons, Jack, Weinberger, Kilian Q, Thickstun, John, Yin, Yian, Dean, Sarah

arXiv.org Artificial IntelligenceNov-7-2025

Social and collaborative platforms emit multivariate time-series traces in which early interactions--such as views, likes, or downloads--are followed, sometimes months or years later, by higher impact like citations, sales, or reviews. We formalize this setting as Lead-Lag Forecasting (LLF): given an early usage channel (the lead), predict a correlated but temporally shifted outcome channel (the lag). Despite the ubiquity of such patterns, LLF has not been treated as a unified forecasting problem within the time-series community, largely due to the absence of standardized datasets. To anchor research in LLF, here we present two high-volume benchmark datasets--arXiv (accesses citations of 2.3M papers) and GitHub (pushes/stars forks of 3M repositories)--and outline additional domains with analogous lead-lag dynamics, including Wikipedia (page-views edits), Spotify (streams concert attendance), e-commerce (click-throughs purchases), and LinkedIn profile (views messages). Our datasets provide ideal testbeds for lead-lag forecasting, by capturing long-horizon dynamics across years, spanning the full spectrum of outcomes, and avoiding sur-vivorship bias in sampling. We documented all technical details of data cura-tion and cleaning, verified the presence of lead-lag dynamics through statistical and classification tests, and benchmarked parametric and non-parametric baselines for regression. Our study establishes LLF as a novel forecasting paradigm and lays an empirical foundation for its systematic exploration in social and usage data. The success of human activities is often measured by their collective impact, ranging from music streams and movie box office revenues to product sales and social media popularity. These impact metrics typically follow heavy-tailed distributions (Clauset et al., 2009) and slow decay patterns across timescales (Candia et al., 2019), making early identification of future hits fundamentally challenging (Cheng et al., 2014; Martin et al., 2016). At the same time, digital platforms increasingly log online user interactions--searches, views, downloads, likes, and shares--that often precede these long-term dynamics. These temporal lead-lag dynamics are remarkably ubiquitous, spanning domains as diverse as science (Haque & Ginsparg, 2009), economics (Wu & Brynjolfsson, 2015), arts (Goel et al., 2010), culture (Gruhl et al., 2005), and social movements (Johnson et al., 2016). A systematic understanding of such lead-lag dynamics is not only crucial for anticipating and optimizing impact in digital ecosystems, but also essential for designing effective strategies that identify and promote emerging innovations and products.

artificial intelligence, machine learning, social media, (21 more...)

arXiv.org Artificial Intelligence

2511.03877

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology (1.00)
Media (0.70)
Government > Regional Government (0.68)
(3 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

The Road Less Scheduled Aaron Defazio 1 Fundamental AI Research Team, Meta Xingyu (Alice) Y ang 2

Neural Information Processing SystemsOct-9-2025, 18:59:32 GMT

Recently, Zamani and Glineur (2023) and Defazio et al. (2023) showed that the exact worst-case Our approach uses an alternative form of momentum that replaces traditional momentum. So from this viewpoint, the Schedule-Free updates can be seen as a version of momentum that has the same immediate effect, but with a greater delay for adding in the remainder of the gradient.

algorithm, optimization, sequence, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

90fd4f88f588ae64038134f1eeaa023f-AuthorFeedback.pdf

Neural Information Processing SystemsOct-3-2025, 05:38:30 GMT

Thank you for all the helpful comments. Several related works were raised by the reviewers which we discuss here. We note that the authors have marked their ArXiv submission as containing errors. Each of their inner loops uses SGD to solve the distance-regularized objectives. First, we use the EMA of slow weights to adjust the training parameters during optimization.

artificial intelligence, lookahead, machine learning, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.32)

Add feedback

A Algorithms. Algorithm 1 Training DHRL 1: sample D

Neural Information Processing SystemsAug-15-2025, 00:26:57 GMT

T time-steps, this upper-bound of error rate is also satisfied in all path from s to g . As shown in the table above, the wider the initial distribution, the easier it is for the agent to explore the map. 'fixed initial state distribution' requires less prior information about the state space. Figure 12: Changes in the graph level over the training; DHRL can explore long tasks with'fixed The results are averaged over 4 random seeds and smoothed equally.

dist, graph, lo graph, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning (0.51)

Add feedback

Categorical Classification of Book Summaries Using Word Embedding Techniques

Keskin, Kerem, Keleş, Mümine Kaya

arXiv.org Artificial IntelligenceJul-30-2025

In this study, book summaries and categories taken from book sites were classified using word embedding methods, natural language processing techniques and machine learning algorithms. In addition, one hot encoding, Word2Vec and Term Frequency - Inverse Document Frequency (TF - IDF) methods, which are frequently used word embedding methods were used in this study and their success was compared. Additionally, the combination table of the pre - processing methods used is shown and added to the table. Looking at the results, it was observed that Support Vector Machine, Naive Bayes and Logistic Regression Models and TF - IDF and One - Hot Encoder word embedding techniques gave more successful results for Turkish texts. Using word2vec to process big text data.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2507.21058

Country:

Europe > Kosovo (0.16)
Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report > New Finding (0.74)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)

Add feedback

Underrepresentation, Label Bias, and Proxies: Towards Data Bias Profiles for the EU AI Act and Beyond

Ceccon, Marina, Cornacchia, Giandomenico, Pezze, Davide Dalle, Fabris, Alessandro, Susto, Gian Antonio

arXiv.org Machine LearningJul-15-2025

Undesirable biases encoded in the data are key drivers of algorithmic discrimination. Their importance is widely recognized in the algorithmic fairness literature, as well as legislation and standards on anti-discrimination in AI. Despite this recognition, data biases remain understudied, hindering the development of computational best practices for their detection and mitigation. In this work, we present three common data biases and study their individual and joint effect on algorithmic discrimination across a variety of datasets, models, and fairness measures. We find that underrepresentation of vulnerable populations in training sets is less conducive to discrimination than conventionally affirmed, while combinations of proxies and label bias can be far more critical. Consequently, we develop dedicated mechanisms to detect specific types of bias, and combine them into a preliminary construct we refer to as the Data Bias Profile (DBP). This initial formulation serves as a proof of concept for how different bias signals can be systematically documented. Through a case study with popular fairness datasets, we demonstrate the effectiveness of the DBP in predicting the risk of discriminatory outcomes and the utility of fairness-enhancing interventions. Overall, this article bridges algorithmic fairness research and anti-discrimination policy through a data-centric lens.

data mining, disadvantaged group, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1016/j.eswa.2025.128266

2507.08866

Country:

Europe > Austria > Vienna (0.14)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(19 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Health & Medicine > Therapeutic Area > Dermatology (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.92)

Add feedback

Explaining Concept Shift with Interpretable Feature Attribution

Lyu, Ruiqi, Turcan, Alistair, Wilder, Bryan

arXiv.org Machine LearningMay-28-2025

Regardless the amount of data a machine learning (ML) model is trained on, there will inevitably be data that differs from their training set, lowering model performance. Concept shift occurs when the distribution of labels conditioned on the features changes, making even a well-tuned ML model to have learned a fundamentally incorrect representation. Identifying these shifted features provides unique insight into how one dataset differs from another, considering the difference may be across a scientifically relevant dimension, such as time, disease status, population, etc. In this paper, we propose SGShift, a model for detecting concept shift in tabular data and attributing reduced model performance to a sparse set of shifted features. SGShift models concept shift with a Generalized Additive Model (GAM) and performs subsequent feature selection to identify shifted features. We propose further extensions of SGShift by incorporating knockoffs to control false discoveries and an absorption term to account for models with poor fit to the data. We conduct extensive experiments in synthetic and real data across various ML models and find SGShift can identify shifted features with AUC $>0.9$ and recall $>90\%$, often 2 or 3 times as high as baseline methods.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Machine Learning

2505.20634

Country:

North America > United States > California (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.93)
Health & Medicine > Therapeutic Area > Immunology (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Add feedback

Sepsyn-OLCP: An Online Learning-based Framework for Early Sepsis Prediction with Uncertainty Quantification using Conformal Prediction

Zhou, Anni, Raheem, Beyah, Kamaleswaran, Rishikesan, Xie, Yao

arXiv.org Artificial IntelligenceMar-18-2025

Sepsis is a life-threatening syndrome with high morbidity and mortality in hospitals. Early prediction of sepsis plays a crucial role in facilitating early interventions for septic patients. However, early sepsis prediction systems with uncertainty quantification and adaptive learning are scarce. This paper proposes Sepsyn-OLCP, a novel online learning algorithm for early sepsis prediction by integrating conformal prediction for uncertainty quantification and Bayesian bandits for adaptive decision-making. By combining the robustness of Bayesian models with the statistical uncertainty guarantees of conformal prediction methodologies, this algorithm delivers accurate and trustworthy predictions, addressing the critical need for reliable and adaptive systems in high-stakes healthcare applications such as early sepsis prediction. We evaluate the performance of Sepsyn-OLCP in terms of regret in stochastic bandit setting, the area under the receiver operating characteristic curve (AUROC), and F-measure. Our results show that Sepsyn-OLCP outperforms existing individual models, increasing AUROC of a neural network from 0.64 to 0.73 without retraining and high computational costs. And the model selection policy converges to the optimal strategy in the long run. We propose a novel reinforcement learning-based framework integrated with conformal prediction techniques to provide uncertainty quantification for early sepsis prediction. The proposed methodology delivers accurate and trustworthy predictions, addressing a critical need in high-stakes healthcare applications like early sepsis prediction.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2503.14663

Country:

North America > United States (0.46)
Europe > Iceland > Capital Region > Reykjavik (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Evaluating link prediction: New perspectives and recommendations

Kalyani, Bhargavi I, Mathi, A Rama Prasad, Sett, Niladri

arXiv.org Artificial IntelligenceFeb-18-2025

Link prediction (LP) is an important problem in network science and machine learning research. The state-of-the-art LP methods are usually evaluated in a uniform setup, ignoring several factors associated with the data and application specific needs. We identify a number of such factors, such as, network-type, problem-type, geodesic distance between the end nodes and its distribution over the classes, nature and applicability of LP methods, class imbalance and its impact on early retrieval, evaluation metric, etc., and present an experimental setup which allows us to evaluate LP methods in a rigorous and controlled manner. We perform extensive experiments with a variety of LP methods over real network datasets in this controlled setup, and gather valuable insights on the interactions of these factors with the performance of LP through an array of carefully designed hypotheses. Following the insights, we provide recommendations to be followed as best practice for evaluating LP methods.

data mining, lr 0, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2502.12777

Country:

North America > United States (0.46)
Asia > India (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education (0.70)
Energy > Power Industry (0.68)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
(5 more...)

Add feedback