Goto

Collaborating Authors

 unsupervised




ResiDual Transformer Alignment with Spectral Decomposition

Basile, Lorenzo, Maiorca, Valentino, Bortolussi, Luca, Rodolà, Emanuele, Locatello, Francesco

arXiv.org Artificial Intelligence

When examined through the lens of their residual streams, a puzzling property emerges in transformer networks: residual contributions (e.g., attention heads) sometimes specialize in specific tasks or input attributes. In this paper, we analyze this phenomenon in vision transformers, focusing on the spectral geometry of residuals, and explore its implications for modality alignment in vision-language models. First, we link it to the intrinsically low-dimensional structure of visual head representations, zooming into their principal components and showing that they encode specialized roles across a wide variety of input data distributions. Then, we analyze the effect of head specialization in multimodal models, focusing on how improved alignment between text and specialized heads impacts zero-shot classification performance. This specialization-performance link consistently holds across diverse pre-training data, network sizes, and objectives, demonstrating a powerful new mechanism for boosting zero-shot classification through targeted alignment. Ultimately, we translate these insights into actionable terms by introducing ResiDual, a technique for spectral alignment of the residual stream. Much like panning for gold, it lets the noise from irrelevant unit principal components (i.e., attributes) wash away to amplify task-relevant ones. Remarkably, this dual perspective on modality alignment yields fine-tuning level performances on different data distributions while modeling an extremely interpretable and parameter-efficient transformation, as we extensively show on more than 50 (pre-trained network, dataset) pairs.


Cluster-norm for Unsupervised Probing of Knowledge

Laurito, Walter, Maiya, Sharan, Dhimoïla, Grégoire, Owen, null, Yeung, null, Hänni, Kaarel

arXiv.org Artificial Intelligence

The deployment of language models brings challenges in generating reliable information, especially when these models are fine-tuned using human preferences. To extract encoded knowledge without (potentially) biased human labels, unsupervised probing techniques like Contrast-Consistent Search (CCS) have been developed (Burns et al., 2022). However, salient but unrelated features in a given dataset can mislead these probes (Farquhar et al., 2023). Addressing this, we propose a cluster normalization method to minimize the impact of such features by clustering and normalizing activations of contrast pairs before applying unsupervised probing techniques. While this approach does not address the issue of differentiating between knowledge in general and simulated knowledge - a major issue in the literature of latent knowledge elicitation (Christiano et al., 2021) - it significantly improves the ability of unsupervised probes to identify the intended knowledge amidst distractions.


ULF: Unsupervised Labeling Function Correction using Cross-Validation for Weak Supervision

Sedova, Anastasiia, Roth, Benjamin

arXiv.org Artificial Intelligence

A cost-effective alternative to manual data labeling is weak supervision (WS), where data samples are automatically annotated using a predefined set of labeling functions (LFs), rule-based mechanisms that generate artificial labels for the associated classes. In this work, we investigate noise reduction techniques for WS based on the principle of k-fold cross-validation. We introduce a new algorithm ULF for Unsupervised Labeling Function correction, which denoises WS data by leveraging models trained on all but some LFs to identify and correct biases specific to the held-out LFs. Specifically, ULF refines the allocation of LFs to classes by re-estimating this assignment on highly reliable cross-validated samples. Evaluation on multiple datasets confirms ULF's effectiveness in enhancing WS learning without the need for manual labeling.


Unsupervised clustering of disturbances in power systems via deep convolutional autoencoders

Islam, Md Maidul, Faruque, Md Omar, Butterfield, Joshua, Singh, Gaurav, Cooke, Thomas A.

arXiv.org Artificial Intelligence

Power quality (PQ) events are recorded by PQ meters whenever anomalous events are detected on the power grid. Using neural networks with machine learning can aid in accurately classifying the recorded waveforms and help power system engineers diagnose and rectify the root causes of problems. However, many of the waveforms captured during a disturbance in the power system need to be labeled for supervised learning, leaving a large number of data recordings for engineers to process manually or go unseen. This paper presents an autoencoder and K-means clustering-based unsupervised technique that can be used to cluster PQ events into categories like sag, interruption, transients, normal, and harmonic distortion to enable filtering of anomalous waveforms from recurring or normal waveforms. The method is demonstrated using three-phase, field-obtained voltage waveforms recorded in a distribution grid. First, a convolutional autoencoder compresses the input signals into a set of lower feature dimensions which, after further processing, is passed to the K-means algorithm to identify data clusters. Using a small, labeled dataset, numerical labels are then assigned to events based on a cosine similarity analysis. Finally, the study analyzes the clusters using the t-distributed stochastic neighbor embedding (t-SNE) visualization tool, demonstrating that the technique can help investigate a large number of captured events in a quick manner.


Beyond Words: A Comprehensive Survey of Sentence Representations

Kashyap, Abhinav Ramesh, Nguyen, Thanh-Tung, Schlegel, Viktor, Winkler, Stefan, Ng, See-Kiong, Poria, Soujanya

arXiv.org Artificial Intelligence

Sentence representations have become a critical component in natural language processing applications, such as retrieval, question answering, and text classification. They capture the semantics and meaning of a sentence, enabling machines to understand and reason over human language. In recent years, significant progress has been made in developing methods for learning sentence representations, including unsupervised, supervised, and transfer learning approaches. In this paper, we provide an overview of the different methods for sentence representation learning, including both traditional and deep learning-based techniques. We provide a systematic organization of the literature on sentence representation learning, highlighting the key contributions and challenges in this area. Overall, our review highlights the progress made in sentence representation learning, the importance of this area in natural language processing, and the challenges that remain. We conclude with directions for future research, suggesting potential avenues for improving the quality and efficiency of sentence representations in NLP applications.


Unsupervised learning models of primary cortical receptive fields and receptive field plasticity

Neural Information Processing Systems

The efficient coding hypothesis holds that neural receptive fields are adapted to the statistics of the environment, but is agnostic to the timescale of this adaptation, which occurs on both evolutionary and developmental timescales. In this work we focus on that component of adaptation which occurs during an organism's lifetime, and show that a number of unsupervised feature learning algorithms can account for features of normal receptive field properties across multiple primary sensory cortices. Furthermore, we show that the same algorithms account for altered receptive field properties in response to experimentally altered environmental statistics. Based on these modeling results we propose these models as phenomenological models of receptive field plasticity during an organism's lifetime. Finally, due to the success of the same models in multiple sensory areas, we suggest that these algorithms may provide a constructive realization of the theory, first proposed by Mountcastle (1978), that a qualitatively similar learning algorithm acts throughout primary sensory cortices.


Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations

Zhong, Yiwu, Yu, Licheng, Bai, Yang, Li, Shangwen, Yan, Xueting, Li, Yin

arXiv.org Artificial Intelligence

The abundance of instructional videos and their narrations over the Internet offers an exciting avenue for understanding procedural activities. In this work, we propose to learn video representation that encodes both action steps and their temporal ordering, based on a large-scale dataset of web instructional videos and their narrations, without using human annotations. Our method jointly learns a video representation to encode individual step concepts, and a deep probabilistic model to capture both temporal dependencies and immense individual variations in the step ordering. We empirically demonstrate that learning temporal ordering not only enables new capabilities for procedure reasoning, but also reinforces the recognition of individual steps. Our model significantly advances the state-of-the-art results on step classification (+2.8% / +3.3% on COIN / EPIC-Kitchens) and step forecasting (+7.4% on COIN). Moreover, our model attains promising results in zero-shot inference for step classification and forecasting, as well as in predicting diverse and plausible steps for incomplete procedures. Our code is available at https://github.com/facebookresearch/ProcedureVRL.


A match made in transportation heaven: AI and self-driving cars

#artificialintelligence

Artificial intelligence (AI) has the potential to revolutionize the way we drive and transport goods and people. Self-driving cars, also known as autonomous vehicles, are a type of vehicle that use AI and other advanced technologies to navigate roads and highways without the need for a human driver. There are several benefits to self-driving cars. For one, they have the potential to significantly reduce the number of accidents caused by human error. This could lead to fewer deaths and injuries on the road.