AITopics

2404.00415

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > Middle East > Israel (0.04)
(16 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Leisure & Entertainment (1.00)
Law (1.00)
Health & Medicine (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Jain, Nishant, Suggala, Arun S., Shenoy, Pradeep

Improving Generalization via Meta-Learning on Hard Samples

arXiv.org Artificial IntelligenceMar-29-2024

Learned reweighting (LRW) approaches to supervised learning use an optimization criterion to assign weights for training instances, in order to maximize performance on a representative validation dataset. We pose and formalize the problem of optimized selection of the validation set used in LRW training, to improve classifier generalization. In particular, we show that using hard-to-classify instances in the validation set has both a theoretical connection to, and strong empirical evidence of generalization. We provide an efficient algorithm for training this meta-optimized model, as well as a simple train-twice heuristic for careful comparative study. We demonstrate that LRW with easy validation data performs consistently worse than LRW with hard validation data, establishing the validity of our meta-optimization problem. Our proposed algorithm outperforms a wide range of baselines on a range of datasets and domain shift challenges (Imagenet-1K, CIFAR-100, Clothing-1M, CAMELYON, WILDS, etc.), with ~1% gains using VIT-B on Imagenet. We also show that using naturally hard examples for validation (Imagenet-R / Imagenet-A) in LRW training for Imagenet improves performance on both clean and naturally hard test instances by 1-2%. Secondary analyses show that using hard validation data in an LRW framework improves margins on test data, hinting at the mechanism underlying our empirical gains. We believe this work opens up new research directions for the meta-optimization of meta-learning in a supervised learning context.

classifier, dataset, validation, (14 more...)

2403.12236

Country:

South America > Peru > Lima Department > Lima Province > Lima (0.04)
North America > United States (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Asia > India (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Min, Yimeng, Gomes, Carla P.

On Size and Hardness Generalization in Unsupervised Learning for the Travelling Salesman Problem

arXiv.org Artificial IntelligenceMar-29-2024

We study the generalization capability of Unsupervised Learning in solving the Travelling Salesman Problem (TSP). We use a Graph Neural Network (GNN) trained with a surrogate loss function to generate an embedding for each node. We use these embeddings to construct a heat map that indicates the likelihood of each edge being part of the optimal route. We then apply local search to generate our final predictions. Our investigation explores how different training instance sizes, embedding dimensions, and distributions influence the outcomes of Unsupervised Learning methods. Our results show that training with larger instance sizes and increasing embedding dimensions can build a more effective representation, enhancing the model's ability to solve TSP. Furthermore, in evaluating generalization across different distributions, we first determine the hardness of various distributions and explore how different hardnesses affect the final results. Our findings suggest that models trained on harder instances exhibit better generalization capabilities, highlighting the importance of selecting appropriate training instances in solving TSP using Unsupervised Learning.

different distribution, size and hardness generalization, unsupervised learning, (11 more...)

2403.20212

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

arXiv.org Artificial IntelligenceMar-28-2024

OpenGraph: Towards Open Graph Foundation Models

Xia, Lianghao, Kao, Ben, Huang, Chao

Graph learning has become indispensable for interpreting and harnessing relational data in diverse fields, ranging from recommendation systems to social network analysis. In this context, a variety of GNNs have emerged as promising methodologies for encoding the structural information of graphs. By effectively capturing the graph's underlying structure, these GNNs have shown great potential in enhancing performance in graph learning tasks, such as link prediction and node classification. However, despite their successes, a significant challenge persists: these advanced methods often face difficulties in generalizing to unseen graph data that significantly differs from the training instances. In this work, our aim is to advance the graph learning paradigm by developing a general graph foundation model. This model is designed to understand the complex topological patterns present in diverse graph data, enabling it to excel in zero-shot graph learning tasks across different downstream datasets. To achieve this goal, we address several key technical challenges in our OpenGraph model. Firstly, we propose a unified graph tokenizer to adapt our graph model to generalize well on unseen graph data, even when the underlying graph properties differ significantly from those encountered during training. Secondly, we develop a scalable graph transformer as the foundational encoder, which effectively captures node-wise dependencies within the global topological context. Thirdly, we introduce a data augmentation mechanism enhanced by a LLM to alleviate the limitations of data scarcity in real-world scenarios. Extensive experiments validate the effectiveness of our framework. By adapting our OpenGraph to new graph characteristics and comprehending the nuances of diverse graphs, our approach achieves remarkable zero-shot graph learning performance across various settings and domains.

dataset, graph, node, (15 more...)

2403.01121

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > New York > New York County > New York City (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.46)

Industry:

Information Technology (0.88)
Media (0.67)
Consumer Products & Services > Restaurants (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Djahnine, Aissam, Popoff, Alexandre, Jupin-Delevaux, Emilien, Cottin, Vincent, Nempont, Olivier, Boussel, Loic

CT-3DFlow : Leveraging 3D Normalizing Flows for Unsupervised Detection of Pathological Pulmonary CT scans

Unsupervised pathology detection can be implemented by training a model on healthy data only and measuring the deviation from the training set upon inference, for example with CNN-based feature extraction and one-class classifiers, or reconstruction-score-based methods such as AEs, GANs and Diffusion models. Normalizing Flows (NF) have the ability to directly learn the probability distribution of training examples through an invertible architecture. We leverage this property in a novel 3D NF-based model named CT-3DFlow, specifically tailored for patient-level pulmonary pathology detection in chest CT data. Our model is trained unsupervised on healthy 3D pulmonary CT patches, and detects deviations from its log-likelihood distribution as anomalies. We aggregate patches-level likelihood values from a patient's CT scan to provide a patient-level 'normal'/'abnormal' prediction. Out-of-distribution detection performance is evaluated using expert annotations on a separate chest CT test dataset, outperforming other state-of-the-art methods.

artificial intelligence, inductive learning, machine learning, (4 more...)

2403.18514

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.53)

Virgo, Felix, Cheng, Fei, Pereira, Lis Kanashiro, Asahara, Masayuki, Kobayashi, Ichiro, Kurohashi, Sadao

AcTED: Automatic Acquisition of Typical Event Duration for Semi-supervised Temporal Commonsense QA

We propose a voting-driven semi-supervised approach to automatically acquire the typical duration of an event and use it as pseudo-labeled data. The human evaluation demonstrates that our pseudo labels exhibit surprisingly high accuracy and balanced coverage. In the temporal commonsense QA task, experimental results show that using only pseudo examples of 400 events, we achieve performance comparable to the existing BERT-based weakly supervised approaches that require a significant amount of training examples. When compared to the RoBERTa baselines, our best approach establishes state-of-the-art performance with a 7% improvement in Exact Match.

artificial intelligence, computational linguistic, machine learning, (18 more...)

2403.18504

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
(7 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Leisure & Entertainment (0.70)
Media (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.87)

MVEB: Self-Supervised Learning with Multi-View Entropy Bottleneck

Wen, Liangjian, Wang, Xiasi, Liu, Jianzhuang, Xu, Zenglin

Self-supervised learning aims to learn representation that can be effectively generalized to downstream tasks. Many self-supervised approaches regard two views of an image as both the input and the self-supervised signals, assuming that either view contains the same task-relevant information and the shared information is (approximately) sufficient for predicting downstream tasks. Recent studies show that discarding superfluous information not shared between the views can improve generalization. Hence, the ideal representation is sufficient for downstream tasks and contains minimal superfluous information, termed minimal sufficient representation. One can learn this representation by maximizing the mutual information between the representation and the supervised view while eliminating superfluous information. Nevertheless, the computation of mutual information is notoriously intractable. In this work, we propose an objective termed multi-view entropy bottleneck (MVEB) to learn minimal sufficient representation effectively. MVEB simplifies the minimal sufficient learning to maximizing both the agreement between the embeddings of two views and the differential entropy of the embedding distribution. Our experiments confirm that MVEB significantly improves performance. For example, it achieves top-1 accuracy of 76.9\% on ImageNet with a vanilla ResNet-50 backbone on linear evaluation. To the best of our knowledge, this is the new state-of-the-art result with ResNet-50.

information, learning, representation, (17 more...)

2403.19078

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
Asia > China > Hong Kong (0.05)
Asia > China > Sichuan Province > Chengdu (0.04)
(5 more...)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Mørk, Jacob, Bovbjerg, Holger Severin, Kiss, Gergely, Tan, Zheng-Hua

Noise-Robust Keyword Spotting through Self-supervised Pretraining

Voice assistants are now widely available, and to activate them a keyword spotting (KWS) algorithm is used. Modern KWS systems are mainly trained using supervised learning methods and require a large amount of labelled data to achieve a good performance. Leveraging unlabelled data through self-supervised learning (SSL) has been shown to increase the accuracy in clean conditions. This paper explores how SSL pretraining such as Data2Vec can be used to enhance the robustness of KWS models in noisy conditions, which is under-explored. Models of three different sizes are pretrained using different pretraining approaches and then fine-tuned for KWS. These models are then tested and compared to models trained using two baseline supervised learning methods, one being standard training using clean data and the other one being multi-style training (MTR). The results show that pretraining and fine-tuning on clean data is superior to supervised learning on clean data across all testing conditions, and superior to supervised MTR for testing conditions of SNR above 5 dB. This indicates that pretraining alone can increase the model's robustness. Finally, it is found that using noisy data for pretraining models, especially with the Data2Vec-denoising approach, significantly enhances the robustness of KWS models in noisy conditions.

data2vec, noise type, robustness, (15 more...)

2403.1856

Country: Europe > Denmark > North Jutland > Aalborg (0.04)

Genre: Research Report (0.85)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Branch-Tuning: Balancing Stability and Plasticity for Continual Self-Supervised Learning

Liu, Wenzhuo, Zhu, Fei, Liu, Cheng-Lin

Self-supervised learning (SSL) has emerged as an effective paradigm for deriving general representations from vast amounts of unlabeled data. However, as real-world applications continually integrate new content, the high computational and resource demands of SSL necessitate continual learning rather than complete retraining. This poses a challenge in striking a balance between stability and plasticity when adapting to new information. In this paper, we employ Centered Kernel Alignment for quantitatively analyzing model stability and plasticity, revealing the critical roles of batch normalization layers for stability and convolutional layers for plasticity. Motivated by this, we propose Branch-tuning, an efficient and straightforward method that achieves a balance between stability and plasticity in continual SSL. Branch-tuning consists of branch expansion and compression, and can be easily applied to various SSL methods without the need of modifying the original methods, retaining old data or models. We validate our method through incremental experiments on various benchmark datasets, demonstrating its effectiveness and practical value in real-world scenarios. We hope our work offers new insights for future continual self-supervised learning research. The code will be made publicly available.

computer vision, learning, plasticity, (13 more...)

2403.18266

Country:

North America > United States (0.14)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Asia > China > Beijing > Beijing (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Education (0.93)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.92)

Leger, Victor, Couillet, Romain

Asymptotic Bayes risk of semi-supervised learning with uncertain labeling

arXiv.org Machine LearningMar-27-2024

This article considers a semi-supervised classification setting on a Gaussian mixture model, where the data is not labeled strictly as usual, but instead with uncertain labels. Our main aim is to compute the Bayes risk for this model. We compare the behavior of the Bayes risk and the best known algorithm for this model. This comparison eventually gives new insights over the algorithm.

algorithm, tanh, unlabeled data, (15 more...)

arXiv.org Machine Learning

2403.17767

Country: Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.06)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.42)