data noise
Measuring all the noises of LLM Evals
Separating signal from noise is central to experimental science. Applying well-established statistical method effectively to LLM evals requires consideration of their unique noise characteristics. We clearly define and measure three types of noise: prediction noise from generating different answers on a given question, data noise from sampling questions, and their combined total noise following the law of total variance. To emphasize relative comparisons and gain statistical power, we propose the all-pairs paired method, which applies the paired analysis to all pairs of LLMs and measures all the noise components based on millions of question-level predictions across many evals and settings. These measurements revealed clear patterns. First, each eval exhibits a characteristic and highly predictable total noise level across all model pairs. Second, paired prediction noise typically exceeds paired data noise, which means reducing prediction noise by averaging can significantly increase statistical power. These findings enable practitioners to assess significance without custom testing and to detect much smaller effects in controlled experiments.
Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting
This paper analyzes the generalization error of minimum-norm interpolating solutions in linear regression using spiked covariance data models. The paper characterizes how varying spike strengths and target-spike alignments can affect risk, especially in overparameterized settings. The study presents an exact expression for the generalization error, leading to a comprehensive classification of benign, tempered, and catastrophic overfitting regimes based on spike strength, the aspect ratio $c=d/n$ (particularly as $c \to \infty$), and target alignment. Notably, in well-specified aligned problems, increasing spike strength can surprisingly induce catastrophic overfitting before achieving benign overfitting. The paper also reveals that target-spike alignment is not always advantageous, identifying specific, sometimes counterintuitive, conditions for its benefit or detriment. Alignment with the spike being detrimental is empirically demonstrated to persist in nonlinear models.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
MuPlon: Multi-Path Causal Optimization for Claim Verification through Controlling Confounding
Guo, Hanghui, Di, Shimin, De Meo, Pasquale, Chen, Zhangze, Zhu, Jia
Abstract--As a critical task in data quality control, claim verification aims to curb the spread of misinformation by assessing the truthfulness of claims based on a wide range of evidence. However, traditional methods often overlook the complex interactions between evidence, leading to unreliable verification results. A straightforward solution represents the claim and evidence as a fully connected graph, which we define as the Claim-Evidence Graph (C-E Graph). Nevertheless, claim verification methods based on fully connected graphs face two primary confounding challenges, Data Noise and Data Biases. T o address these challenges, we propose a novel framework, Multi-Path Causal Optimization (MuPlon). In the front-door path, MuPlon extracts highly relevant subgraphs and constructs reasoning paths, further applying counterfactual reasoning to eliminate data biases within these paths. The experimental results demonstrate that MuPlon outperforms existing methods and achieves state-of-the-art performance.
- Research Report > Strength High (0.46)
- Research Report > Experimental Study (0.46)
Method MAE(R) R2 (R) MAE(t) R2 (t) Random sampling 1.689 0.927 0.011 0.997 Closeness to other points 2.109 0.861 0.013 0.995 L
We thank reviewers for taking the time to consider our NeurIPS submission. Table 2 shows PRNet consistently outperforms PointNetLK in all settings. PRNet is on a par with PointNetLK while being slower than DCP . We will add "Deep Part Induction from Articulated Object Pairs" to related works and discuss We believe these comments will help to make the work stronger.
From Data to Action: Charting A Data-Driven Path to Combat Antimicrobial Resistance
Fu, Qian, Zhang, Yuzhe, Shu, Yanfeng, Ding, Ming, Yao, Lina, Wang, Chen
Antibiotics are often grouped by their mechanisms of action, such as blocking protein synthesis, disrupting folate biosynthesis, changing cell wall construction, compromising the cell membrane integrity and affecting DNA replication [93, 25]. These antibiotics, whether created in labs or found in nature, serve as the primary defence against bacterial infections. However, bacteria employ a series of strategies in response to resist these antibiotics, including inactivating antibiotics through enzymatic degradation, altering the antibiotic target, modifying cell membrane permeability, and using efflux pumps to maintain intracellular antibiotic concentrations of antibiotics below inhibitory levels [25]. Moreover, the gene transfer of antibiotic-resistant bacteria (ARB) further aggravates this challenge [92].
- North America > United States (0.67)
- Europe > United Kingdom (0.14)
- Asia > Thailand (0.04)
- (6 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Overview (1.00)
ROSS:RObust decentralized Stochastic learning based on Shapley values
Wang, Lina, Yuan, Yunsheng, Li, Feng, Duan, Lingjie
In the paradigm of decentralized learning, a group of agents collaborate to learn a global model using a distributed dataset without a central server; nevertheless, it is severely challenged by the heterogeneity of the data distribution across the agents. For example, the data may be distributed non-independently and identically, and even be noised or poisoned. To address these data challenges, we propose ROSS, a novel robust decentralized stochastic learning algorithm based on Shapley values, in this paper. Specifically, in each round, each agent aggregates the cross-gradient information from its neighbors, i.e., the derivatives of its local model with respect to the datasets of its neighbors, to update its local model in a momentum like manner, while we innovate in weighting the derivatives according to their contributions measured by Shapley values. We perform solid theoretical analysis to reveal the linear convergence speedup of our ROSS algorithm. We also verify the efficacy of our algorithm through extensive experiments on public datasets. Our results demonstrate that, in face of the above variety of data challenges, our ROSS algorithm have oblivious advantages over existing state-of-the-art proposals in terms of both convergence and prediction accuracy.
How to Learn in a Noisy World? Self-Correcting the Real-World Data Noise on Machine Translation
Meng, Yan, Wu, Di, Monz, Christof
The massive amounts of web-mined parallel data contain large amounts of noise. Semantic misalignment, as the primary source of the noise, poses a challenge for training machine translation systems. In this paper, we first study the impact of real-world hard-to-detect misalignment noise by proposing a process to simulate the realistic misalignment controlled by semantic similarity. After quantitatively analyzing the impact of simulated misalignment on machine translation, we show the limited effectiveness of widely used pre-filters to improve the translation performance, underscoring the necessity of more fine-grained ways to handle data noise. By observing the increasing reliability of the model's self-knowledge for distinguishing misaligned and clean data at the token-level, we propose a self-correction approach which leverages the model's prediction distribution to revise the training supervision from the ground-truth data over training time. Through comprehensive experiments, we show that our self-correction method not only improves translation performance in the presence of simulated misalignment noise but also proves effective for real-world noisy web-mined datasets across eight translation tasks.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- (11 more...)
Denoising-Aware Contrastive Learning for Noisy Time Series
Zhou, Shuang, Zha, Daochen, Shen, Xiao, Huang, Xiao, Zhang, Rui, Chung, Fu-Lai
Time series self-supervised learning (SSL) aims to exploit unlabeled data for pre-training to mitigate the reliance on labels. Despite the great success in recent years, there is limited discussion on the potential noise in the time series, which can severely impair the performance of existing SSL methods. To mitigate the noise, the de facto strategy is to apply conventional denoising methods before model training. However, this pre-processing approach may not fully eliminate the effect of noise in SSL for two reasons: (i) the diverse types of noise in time series make it difficult to automatically determine suitable denoising methods; (ii) noise can be amplified after mapping raw data into latent space. In this paper, we propose denoising-aware contrastive learning (DECL), which uses contrastive learning objectives to mitigate the noise in the representation and automatically selects suitable denoising methods for every sample. Extensive experiments on various datasets verify the effectiveness of our method. The code is open-sourced.
- North America > United States > Minnesota (0.04)
- Asia > China > Hong Kong (0.04)
The Observer-Observation Dilemma in Neuro-Forecasting
Human beings believe that they are able to solve a psychological version of the Observer(cid:173) Observation Dilemma. On the one hand, they use their observations to constitute an under(cid:173) standing of the laws of the world, on the other hand, they use this understanding to evaluate the correctness of the incoming pieces of information. Of course, as everybody knows, human beings are not free from making mistakes in this psychological dilemma. We en(cid:173) counter a similar situation when we try to build a mathematical model using data. Learning relationships from the data is only one part of the model building process.
Three Ways AIOps can Strengthen Enterprise Security
The rapid surge in the number of cyber-attacks has exposed the vulnerabilities present in the infrastructure of the organization. One way for organizations to deal with them is to incorporate AIOps that provide them better visibility into performance and system data on a large scale. Accelerating the digital transformation initiatives enabled organizations to keep their operations alive, but it came at the cost of ignoring vulnerabilities present in the infrastructure. This allowed the threat actors to capitalize on this opportunity and execute their malicious intent. Additionally, a security incident has cost on average USD 4.24 million in 2021, a 10% increase from 2020, as per a report from IBM titled "Cost of Data Breach Report 2021."
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.55)