Goto

Collaborating Authors

 random label


Supplementary Material 7 Elements of Group and Representation Theory

Neural Information Processing Systems

In this section, we provide a brief introduction to the concepts from Group Theory which we need in our derivations. A group is a pair (G,)containing a set Gand a binary operation: G G! G,(h,g) 7! h g which satisfies the group axioms: Associativity: 8a,b,c 2 Ga (b c)=( a b) c Identity: 9e 2 G: 8g 2 Gg e = e g = g Inverse: 8g 2 G 9g 1 2 G: g g 1 = g 1 g = e The operation is the group law of G. The inverse elements g 1 of an element g, and the identity element e are unique. In addition, if the group law is also commutative, the group G is an abelian group. To simplify the notation, we commonly write ab instead of a b. It is also common to denote the group (G,) just with the name of its underlying set G. The order of a group G is the cardinality of its set and is indicated by |G|. A group G is finite when |G|2 N, i.e., when it has a finite number of elements. A compact group is a group that is also a compact topological space with continuous group operation. Given a group G, its action on a set X is a map . A simple example of group action is the group law itself: G G! Gwhich defines an action of G on its own elements (X = G). Another important action is the one defined on signals overs the group G. Given a signal x: G! R, the action of an element g 2 G maps x 7! g.x, [g.x](h):= x(g 1h).


Spectrally-normalized margin bounds for neural networks

Neural Information Processing Systems

This paper presents a margin-based multiclass generalization bound for neural networks that scales with their margin-normalized spectral complexity: their Lipschitz constant, meaning the product of the spectral norms of the weight matrices, times a certain correction factor. This bound is empirically investigated for a standard AlexNet network trained with SGD on the mnistand cifar10datasets, with both original and random labels; the bound, the Lipschitz constants, and the excess risks are all in direct correlation, suggesting both that SGD selects predictors whose complexity scales with the difficulty of the learning task, and secondly that the presented bound is sensitive to this complexity.







What Do Neural Networks Learn When Trained With Random Labels?

Neural Information Processing Systems

We study deep neural networks (DNNs) trained on natural image data with entirely random labels. Despite its popularity in the literature, where it is often used to study memorization, generalization, and other phenomena, little is known about what DNNs learn in this setting. In this paper, we show analytically for convolutional and fully connected networks that an alignment between the principal components of network parameters and data takes place when training with random labels. We study this alignment effect by investigating neural networks pre-trained on randomly labelled image data and subsequently fine-tuned on disjoint datasets with random or real labels. We show how this alignment produces a positive transfer: networks pre-trained with random labels train faster downstream compared to training from scratch even after accounting for simple effects, such as weight scaling. We analyze how competing effects, such as specialization at later layers, may hide the positive transfer. These effects are studied in several network architectures, including VGG16 and ResNet18, on CIFAR10 and ImageNet.



Humans Hallucinate Too: Language Models Identify and Correct Subjective Annotation Errors With Label-in-a-Haystack Prompts

arXiv.org Artificial Intelligence

Modeling complex subjective tasks in Natural Language Processing, such as recognizing emotion and morality, is considerably challenging due to significant variation in human annotations. This variation often reflects reasonable differences in semantic interpretations rather than mere noise, necessitating methods to distinguish between legitimate subjectivity and error. We address this challenge by exploring label verification in these contexts using Large Language Models (LLMs). First, we propose a simple In-Context Learning binary filtering baseline that estimates the reasonableness of a document-label pair. We then introduce the Label-in-a-Haystack setting: the query and its label(s) are included in the demonstrations shown to LLMs, which are prompted to predict the label(s) again, while receiving task-specific instructions (e.g., emotion recognition) rather than label copying. We show how the failure to copy the label(s) to the output of the LLM are task-relevant and informative. Building on this, we propose the Label-in-a-Haystack Rectification (LiaHR) framework for subjective label correction: when the model outputs diverge from the reference gold labels, we assign the generated labels to the example instead of discarding it. This approach can be integrated into annotation pipelines to enhance signal-to-noise ratios. Comprehensive analyses, human evaluations, and ecological validity studies verify the utility of LiaHR for label correction. Code is available at https://github.com/gchochla/liahr.