AITopics | data noise

Collaborating Authors

data noise

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Measuring all the noises of LLM Evals

Wang, Sida

arXiv.org Machine LearningDec-25-2025

Separating signal from noise is central to experimental science. Applying well-established statistical method effectively to LLM evals requires consideration of their unique noise characteristics. We clearly define and measure three types of noise: prediction noise from generating different answers on a given question, data noise from sampling questions, and their combined total noise following the law of total variance. To emphasize relative comparisons and gain statistical power, we propose the all-pairs paired method, which applies the paired analysis to all pairs of LLMs and measures all the noise components based on millions of question-level predictions across many evals and settings. These measurements revealed clear patterns. First, each eval exhibits a characteristic and highly predictable total noise level across all model pairs. Second, paired prediction noise typically exceeds paired data noise, which means reducing prediction noise by averaging can significantly increase statistical power. These findings enable practitioners to assess significance without custom testing and to detect much smaller effects in controlled experiments.

data noise, noise, prediction noise, (16 more...)

arXiv.org Machine Learning

2512.21326

Country:

Europe > Austria > Vienna (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting

Li, Jiping, Sonthalia, Rishi

arXiv.org Machine LearningOct-3-2025

This paper analyzes the generalization error of minimum-norm interpolating solutions in linear regression using spiked covariance data models. The paper characterizes how varying spike strengths and target-spike alignments can affect risk, especially in overparameterized settings. The study presents an exact expression for the generalization error, leading to a comprehensive classification of benign, tempered, and catastrophic overfitting regimes based on spike strength, the aspect ratio $c=d/n$ (particularly as $c \to \infty$), and target alignment. Notably, in well-specified aligned problems, increasing spike strength can surprisingly induce catastrophic overfitting before achieving benign overfitting. The paper also reveals that target-spike alignment is not always advantageous, identifying specific, sometimes counterintuitive, conditions for its benefit or detriment. Alignment with the spike being detrimental is empirically demonstrated to persist in nonlinear models.

alignment, var, variance, (13 more...)

arXiv.org Machine Learning

2510.01414

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)

Add feedback

MuPlon: Multi-Path Causal Optimization for Claim Verification through Controlling Confounding

Guo, Hanghui, Di, Shimin, De Meo, Pasquale, Chen, Zhangze, Zhu, Jia

arXiv.org Artificial IntelligenceOct-1-2025

Abstract--As a critical task in data quality control, claim verification aims to curb the spread of misinformation by assessing the truthfulness of claims based on a wide range of evidence. However, traditional methods often overlook the complex interactions between evidence, leading to unreliable verification results. A straightforward solution represents the claim and evidence as a fully connected graph, which we define as the Claim-Evidence Graph (C-E Graph). Nevertheless, claim verification methods based on fully connected graphs face two primary confounding challenges, Data Noise and Data Biases. T o address these challenges, we propose a novel framework, Multi-Path Causal Optimization (MuPlon). In the front-door path, MuPlon extracts highly relevant subgraphs and constructs reasoning paths, further applying counterfactual reasoning to eliminate data biases within these paths. The experimental results demonstrate that MuPlon outperforms existing methods and achieves state-of-the-art performance.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.25715

Genre:

Research Report > Strength High (0.46)
Research Report > Experimental Study (0.46)

Industry: Media > News (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Method MAE(R) R2 (R) MAE(t) R2 (t) Random sampling 1.689 0.927 0.011 0.997 Closeness to other points 2.109 0.861 0.013 0.995 L

Neural Information Processing SystemsAug-20-2025, 08:19:14 GMT

We thank reviewers for taking the time to consider our NeurIPS submission. Table 2 shows PRNet consistently outperforms PointNetLK in all settings. PRNet is on a par with PointNetLK while being slower than DCP . We will add "Deep Part Induction from Articulated Object Pairs" to related works and discuss We believe these comments will help to make the work stronger.

experiment, mae, point cloud, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.32)

Add feedback

From Data to Action: Charting A Data-Driven Path to Combat Antimicrobial Resistance

Fu, Qian, Zhang, Yuzhe, Shu, Yanfeng, Ding, Ming, Yao, Lina, Wang, Chen

arXiv.org Artificial IntelligenceJan-30-2025

Antibiotics are often grouped by their mechanisms of action, such as blocking protein synthesis, disrupting folate biosynthesis, changing cell wall construction, compromising the cell membrane integrity and affecting DNA replication [93, 25]. These antibiotics, whether created in labs or found in nature, serve as the primary defence against bacterial infections. However, bacteria employ a series of strategies in response to resist these antibiotics, including inactivating antibiotics through enzymatic degradation, altering the antibiotic target, modifying cell membrane permeability, and using efflux pumps to maintain intracellular antibiotic concentrations of antibiotics below inhibitory levels [25]. Moreover, the gene transfer of antibiotic-resistant bacteria (ARB) further aggravates this challenge [92].

bioinformatics, machine learning, resistance, (23 more...)

arXiv.org Artificial Intelligence

2502.00061

Country:

North America > United States (0.67)
Europe > United Kingdom (0.14)
Asia > Thailand (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government > FDA (0.46)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Quality (1.00)
(7 more...)

Add feedback

ROSS:RObust decentralized Stochastic learning based on Shapley values

Wang, Lina, Yuan, Yunsheng, Li, Feng, Duan, Lingjie

arXiv.org Artificial IntelligenceNov-1-2024

In the paradigm of decentralized learning, a group of agents collaborate to learn a global model using a distributed dataset without a central server; nevertheless, it is severely challenged by the heterogeneity of the data distribution across the agents. For example, the data may be distributed non-independently and identically, and even be noised or poisoned. To address these data challenges, we propose ROSS, a novel robust decentralized stochastic learning algorithm based on Shapley values, in this paper. Specifically, in each round, each agent aggregates the cross-gradient information from its neighbors, i.e., the derivatives of its local model with respect to the datasets of its neighbors, to update its local model in a momentum like manner, while we innovate in weighting the derivatives according to their contributions measured by Shapley values. We perform solid theoretical analysis to reveal the linear convergence speedup of our ROSS algorithm. We also verify the efficacy of our algorithm through extensive experiments on public datasets. Our results demonstrate that, in face of the above variety of data challenges, our ROSS algorithm have oblivious advantages over existing state-of-the-art proposals in terms of both convergence and prediction accuracy.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2411.00365

Country:

Asia > Singapore (0.04)
Asia > China > Shandong Province > Qingdao (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

How to Learn in a Noisy World? Self-Correcting the Real-World Data Noise on Machine Translation

Meng, Yan, Wu, Di, Monz, Christof

arXiv.org Artificial IntelligenceJul-2-2024

The massive amounts of web-mined parallel data contain large amounts of noise. Semantic misalignment, as the primary source of the noise, poses a challenge for training machine translation systems. In this paper, we first study the impact of real-world hard-to-detect misalignment noise by proposing a process to simulate the realistic misalignment controlled by semantic similarity. After quantitatively analyzing the impact of simulated misalignment on machine translation, we show the limited effectiveness of widely used pre-filters to improve the translation performance, underscoring the necessity of more fine-grained ways to handle data noise. By observing the increasing reliability of the model's self-knowledge for distinguishing misaligned and clean data at the token-level, we propose a self-correction approach which leverages the model's prediction distribution to revise the training supervision from the ground-truth data over training time. Through comprehensive experiments, we show that our self-correction method not only improves translation performance in the presence of simulated misalignment noise but also proves effective for real-world noisy web-mined datasets across eight translation tasks.

computational linguistic, misalignment, noise, (14 more...)

arXiv.org Artificial Intelligence

2407.02208

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
(11 more...)

Genre: Research Report > Experimental Study (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Denoising-Aware Contrastive Learning for Noisy Time Series

Zhou, Shuang, Zha, Daochen, Shen, Xiao, Huang, Xiao, Zhang, Rui, Chung, Fu-Lai

arXiv.org Artificial IntelligenceJun-7-2024

Time series self-supervised learning (SSL) aims to exploit unlabeled data for pre-training to mitigate the reliance on labels. Despite the great success in recent years, there is limited discussion on the potential noise in the time series, which can severely impair the performance of existing SSL methods. To mitigate the noise, the de facto strategy is to apply conventional denoising methods before model training. However, this pre-processing approach may not fully eliminate the effect of noise in SSL for two reasons: (i) the diverse types of noise in time series make it difficult to automatically determine suitable denoising methods; (ii) noise can be amplified after mapping raw data into latent space. In this paper, we propose denoising-aware contrastive learning (DECL), which uses contrastive learning objectives to mitigate the noise in the representation and automatically selects suitable denoising methods for every sample. Extensive experiments on various datasets verify the effectiveness of our method. The code is open-sourced.

learning, noise, representation, (14 more...)

arXiv.org Artificial Intelligence

2406.04627

Country:

North America > United States > Minnesota (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Health Care Technology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

The Observer-Observation Dilemma in Neuro-Forecasting

Neural Information Processing SystemsApr-6-2023, 17:56:28 GMT

Human beings believe that they are able to solve a psychological version of the Observer(cid:173) Observation Dilemma. On the one hand, they use their observations to constitute an under(cid:173) standing of the laws of the world, on the other hand, they use this understanding to evaluate the correctness of the incoming pieces of information. Of course, as everybody knows, human beings are not free from making mistakes in this psychological dilemma. We en(cid:173) counter a similar situation when we try to build a mathematical model using data. Learning relationships from the data is only one part of the model building process.

cid, neuro-forecasting, observer-observation dilemma, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.59)

Add feedback

Three Ways AIOps can Strengthen Enterprise Security

#artificialintelligenceApr-25-2022, 15:39:21 GMT

The rapid surge in the number of cyber-attacks has exposed the vulnerabilities present in the infrastructure of the organization. One way for organizations to deal with them is to incorporate AIOps that provide them better visibility into performance and system data on a large scale. Accelerating the digital transformation initiatives enabled organizations to keep their operations alive, but it came at the cost of ignoring vulnerabilities present in the infrastructure. This allowed the threat actors to capitalize on this opportunity and execute their malicious intent. Additionally, a security incident has cost on average USD 4.24 million in 2021, a 10% increase from 2020, as per a report from IBM titled "Cost of Data Breach Report 2021."

aiop, infrastructure, strengthen enterprise security, (7 more...)

#artificialintelligence

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.55)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.78)

Add feedback