Goto

Collaborating Authors

 data noise


Measuring all the noises of LLM Evals

Wang, Sida

arXiv.org Machine Learning

Separating signal from noise is central to experimental science. Applying well-established statistical method effectively to LLM evals requires consideration of their unique noise characteristics. We clearly define and measure three types of noise: prediction noise from generating different answers on a given question, data noise from sampling questions, and their combined total noise following the law of total variance. To emphasize relative comparisons and gain statistical power, we propose the all-pairs paired method, which applies the paired analysis to all pairs of LLMs and measures all the noise components based on millions of question-level predictions across many evals and settings. These measurements revealed clear patterns. First, each eval exhibits a characteristic and highly predictable total noise level across all model pairs. Second, paired prediction noise typically exceeds paired data noise, which means reducing prediction noise by averaging can significantly increase statistical power. These findings enable practitioners to assess significance without custom testing and to detect much smaller effects in controlled experiments.


Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting

Li, Jiping, Sonthalia, Rishi

arXiv.org Machine Learning

This paper analyzes the generalization error of minimum-norm interpolating solutions in linear regression using spiked covariance data models. The paper characterizes how varying spike strengths and target-spike alignments can affect risk, especially in overparameterized settings. The study presents an exact expression for the generalization error, leading to a comprehensive classification of benign, tempered, and catastrophic overfitting regimes based on spike strength, the aspect ratio $c=d/n$ (particularly as $c \to \infty$), and target alignment. Notably, in well-specified aligned problems, increasing spike strength can surprisingly induce catastrophic overfitting before achieving benign overfitting. The paper also reveals that target-spike alignment is not always advantageous, identifying specific, sometimes counterintuitive, conditions for its benefit or detriment. Alignment with the spike being detrimental is empirically demonstrated to persist in nonlinear models.


MuPlon: Multi-Path Causal Optimization for Claim Verification through Controlling Confounding

Guo, Hanghui, Di, Shimin, De Meo, Pasquale, Chen, Zhangze, Zhu, Jia

arXiv.org Artificial Intelligence

Abstract--As a critical task in data quality control, claim verification aims to curb the spread of misinformation by assessing the truthfulness of claims based on a wide range of evidence. However, traditional methods often overlook the complex interactions between evidence, leading to unreliable verification results. A straightforward solution represents the claim and evidence as a fully connected graph, which we define as the Claim-Evidence Graph (C-E Graph). Nevertheless, claim verification methods based on fully connected graphs face two primary confounding challenges, Data Noise and Data Biases. T o address these challenges, we propose a novel framework, Multi-Path Causal Optimization (MuPlon). In the front-door path, MuPlon extracts highly relevant subgraphs and constructs reasoning paths, further applying counterfactual reasoning to eliminate data biases within these paths. The experimental results demonstrate that MuPlon outperforms existing methods and achieves state-of-the-art performance.


Method MAE(R) R2 (R) MAE(t) R2 (t) Random sampling 1.689 0.927 0.011 0.997 Closeness to other points 2.109 0.861 0.013 0.995 L

Neural Information Processing Systems

We thank reviewers for taking the time to consider our NeurIPS submission. Table 2 shows PRNet consistently outperforms PointNetLK in all settings. PRNet is on a par with PointNetLK while being slower than DCP . We will add "Deep Part Induction from Articulated Object Pairs" to related works and discuss We believe these comments will help to make the work stronger.


From Data to Action: Charting A Data-Driven Path to Combat Antimicrobial Resistance

Fu, Qian, Zhang, Yuzhe, Shu, Yanfeng, Ding, Ming, Yao, Lina, Wang, Chen

arXiv.org Artificial Intelligence

Antibiotics are often grouped by their mechanisms of action, such as blocking protein synthesis, disrupting folate biosynthesis, changing cell wall construction, compromising the cell membrane integrity and affecting DNA replication [93, 25]. These antibiotics, whether created in labs or found in nature, serve as the primary defence against bacterial infections. However, bacteria employ a series of strategies in response to resist these antibiotics, including inactivating antibiotics through enzymatic degradation, altering the antibiotic target, modifying cell membrane permeability, and using efflux pumps to maintain intracellular antibiotic concentrations of antibiotics below inhibitory levels [25]. Moreover, the gene transfer of antibiotic-resistant bacteria (ARB) further aggravates this challenge [92].


ROSS:RObust decentralized Stochastic learning based on Shapley values

Wang, Lina, Yuan, Yunsheng, Li, Feng, Duan, Lingjie

arXiv.org Artificial Intelligence

In the paradigm of decentralized learning, a group of agents collaborate to learn a global model using a distributed dataset without a central server; nevertheless, it is severely challenged by the heterogeneity of the data distribution across the agents. For example, the data may be distributed non-independently and identically, and even be noised or poisoned. To address these data challenges, we propose ROSS, a novel robust decentralized stochastic learning algorithm based on Shapley values, in this paper. Specifically, in each round, each agent aggregates the cross-gradient information from its neighbors, i.e., the derivatives of its local model with respect to the datasets of its neighbors, to update its local model in a momentum like manner, while we innovate in weighting the derivatives according to their contributions measured by Shapley values. We perform solid theoretical analysis to reveal the linear convergence speedup of our ROSS algorithm. We also verify the efficacy of our algorithm through extensive experiments on public datasets. Our results demonstrate that, in face of the above variety of data challenges, our ROSS algorithm have oblivious advantages over existing state-of-the-art proposals in terms of both convergence and prediction accuracy.


How to Learn in a Noisy World? Self-Correcting the Real-World Data Noise on Machine Translation

Meng, Yan, Wu, Di, Monz, Christof

arXiv.org Artificial Intelligence

The massive amounts of web-mined parallel data contain large amounts of noise. Semantic misalignment, as the primary source of the noise, poses a challenge for training machine translation systems. In this paper, we first study the impact of real-world hard-to-detect misalignment noise by proposing a process to simulate the realistic misalignment controlled by semantic similarity. After quantitatively analyzing the impact of simulated misalignment on machine translation, we show the limited effectiveness of widely used pre-filters to improve the translation performance, underscoring the necessity of more fine-grained ways to handle data noise. By observing the increasing reliability of the model's self-knowledge for distinguishing misaligned and clean data at the token-level, we propose a self-correction approach which leverages the model's prediction distribution to revise the training supervision from the ground-truth data over training time. Through comprehensive experiments, we show that our self-correction method not only improves translation performance in the presence of simulated misalignment noise but also proves effective for real-world noisy web-mined datasets across eight translation tasks.


Denoising-Aware Contrastive Learning for Noisy Time Series

Zhou, Shuang, Zha, Daochen, Shen, Xiao, Huang, Xiao, Zhang, Rui, Chung, Fu-Lai

arXiv.org Artificial Intelligence

Time series self-supervised learning (SSL) aims to exploit unlabeled data for pre-training to mitigate the reliance on labels. Despite the great success in recent years, there is limited discussion on the potential noise in the time series, which can severely impair the performance of existing SSL methods. To mitigate the noise, the de facto strategy is to apply conventional denoising methods before model training. However, this pre-processing approach may not fully eliminate the effect of noise in SSL for two reasons: (i) the diverse types of noise in time series make it difficult to automatically determine suitable denoising methods; (ii) noise can be amplified after mapping raw data into latent space. In this paper, we propose denoising-aware contrastive learning (DECL), which uses contrastive learning objectives to mitigate the noise in the representation and automatically selects suitable denoising methods for every sample. Extensive experiments on various datasets verify the effectiveness of our method. The code is open-sourced.


The Observer-Observation Dilemma in Neuro-Forecasting

Neural Information Processing Systems

Human beings believe that they are able to solve a psychological version of the Observer(cid:173) Observation Dilemma. On the one hand, they use their observations to constitute an under(cid:173) standing of the laws of the world, on the other hand, they use this understanding to evaluate the correctness of the incoming pieces of information. Of course, as everybody knows, human beings are not free from making mistakes in this psychological dilemma. We en(cid:173) counter a similar situation when we try to build a mathematical model using data. Learning relationships from the data is only one part of the model building process.


Three Ways AIOps can Strengthen Enterprise Security

#artificialintelligence

The rapid surge in the number of cyber-attacks has exposed the vulnerabilities present in the infrastructure of the organization. One way for organizations to deal with them is to incorporate AIOps that provide them better visibility into performance and system data on a large scale. Accelerating the digital transformation initiatives enabled organizations to keep their operations alive, but it came at the cost of ignoring vulnerabilities present in the infrastructure. This allowed the threat actors to capitalize on this opportunity and execute their malicious intent. Additionally, a security incident has cost on average USD 4.24 million in 2021, a 10% increase from 2020, as per a report from IBM titled "Cost of Data Breach Report 2021."