Inductive Learning
Generative vs. Discriminative modeling under the lens of uncertainty quantification
Argouarc'h, Elouan, Desbouvries, François, Barat, Eric, Kawasaki, Eiji
Learning a parametric model from a given dataset indeed enables to capture intrinsic dependencies between random variables via a parametric conditional probability distribution and in turn predict the value of a label variable given observed variables. In this paper, we undertake a comparative analysis of generative and discriminative approaches which differ in their construction and the structure of the underlying inference problem. Our objective is to compare the ability of both approaches to leverage information from various sources in an epistemic uncertainty aware inference via the posterior predictive distribution. We assess the role of a prior distribution, explicit in the generative case and implicit in the discriminative case, leading to a discussion about discriminative models suffering from imbalanced dataset. We next examine the double role played by the observed variables in the generative case, and discuss the compatibility of both approaches with semi-supervised learning. We also provide with practical insights and we examine how the modeling choice impacts the sampling from the posterior predictive distribution. With regard to this, we propose a general sampling scheme enabling supervised learning for both approaches, as well as semi-supervised learning when compatible with the considered modeling approach. Throughout this paper, we illustrate our arguments and conclusions using the example of affine regression, and validate our comparative analysis through classification simulations using neural network based models.
Young woman breaks fishing record set in place for nearly half a century
Fishing enthusiast Hunter Ham recently captured footage of an alligator on a Texas beach eating a bull redfish. Gators are primarily freshwater creatures. A 21-year-old woman from Georgia recently broke a statewide fishing record, officials say. The Georgia Department of Natural Resources announced the new state record in a press release on June 5. St. Marys resident Lauren E. Harden caught a 33-pound crevalle jack on May 24 while fishing on Cumberland Island.
Figuratively Speaking: Authorship Attribution via Multi-Task Figurative Language Modeling
Katsios, Gregorios A, Sa, Ning, Strzalkowski, Tomek
The identification of Figurative Language (FL) features in text is crucial for various Natural Language Processing (NLP) tasks, where understanding of the author's intended meaning and its nuances is key for successful communication. At the same time, the use of a specific blend of various FL forms most accurately reflects a writer's style, rather than the use of any single construct, such as just metaphors or irony. Thus, we postulate that FL features could play an important role in Authorship Attribution (AA) tasks. We believe that our is the first computational study of AA based on FL use. Accordingly, we propose a Multi-task Figurative Language Model (MFLM) that learns to detect multiple FL features in text at once. We demonstrate, through detailed evaluation across multiple test sets, that the our model tends to perform equally or outperform specialized binary models in FL detection. Subsequently, we evaluate the predictive capability of joint FL features towards the AA task on three datasets, observing improved AA performance through the integration of MFLM embeddings.
Graph Transductive Defense: a Two-Stage Defense for Graph Membership Inference Attacks
Niu, Peizhi, Pan, Chao, Chen, Siheng, Milenkovic, Olgica
Graph neural networks (GNNs) have become instrumental in diverse real-world applications, offering powerful graph learning capabilities for tasks such as social networks and medical data analysis. Despite their successes, GNNs are vulnerable to adversarial attacks, including membership inference attacks (MIA), which threaten privacy by identifying whether a record was part of the model's training data. While existing research has explored MIA in GNNs under graph inductive learning settings, the more common and challenging graph transductive learning setting remains understudied in this context. This paper addresses this gap and proposes an effective two-stage defense, Graph Transductive Defense (GTD), tailored to graph transductive learning characteristics. The gist of our approach is a combination of a train-test alternate training schedule and flattening strategy, which successfully reduces the difference between the training and testing loss distributions. Extensive empirical results demonstrate the superior performance of our method (a decrease in attack AUROC by $9.42\%$ and an increase in utility performance by $18.08\%$ on average compared to LBP), highlighting its potential for seamless integration into various classification models with minimal overhead.
A deep cut into Split Federated Self-supervised Learning
Przewięźlikowski, Marcin, Osial, Marcin, Zieliński, Bartosz, Śmieja, Marek
Collaborative self-supervised learning has recently become feasible in highly distributed environments by dividing the network layers between client devices and a central server. However, state-of-the-art methods, such as MocoSFL, are optimized for network division at the initial layers, which decreases the protection of the client data and increases communication overhead. In this paper, we demonstrate that splitting depth is crucial for maintaining privacy and communication efficiency in distributed training. We also show that MocoSFL suffers from a catastrophic quality deterioration for the minimal communication overhead. As a remedy, we introduce Momentum-Aligned contrastive Split Federated Learning (MonAcoSFL), which aligns online and momentum client models during training procedure. Consequently, we achieve state-of-the-art accuracy while significantly reducing the communication overhead, making MonAcoSFL more practical in real-world scenarios.
SEGAN: semi-supervised learning approach for missing data imputation
Pan, Xiaohua, Wu, Weifeng, Liu, Peiran, Li, Zhen, Lu, Peng, Cao, Peijian, Zhang, Jianfeng, Qiu, Xianfei, Wu, YangYang
In many practical real-world applications, data missing is a very common phenomenon, making the development of data-driven artificial intelligence theory and technology increasingly difficult. Data completion is an important method for missing data preprocessing. Most existing miss-ing data completion models directly use the known information in the missing data set but ignore the impact of the data label information contained in the data set on the missing data completion model. To this end, this paper proposes a missing data completion model SEGAN based on semi-supervised learning, which mainly includes three important modules: generator, discriminator and classifier. In the SEGAN model, the classifier enables the generator to make more full use of known data and its label information when predicting missing data values. In addition, the SE-GAN model introduces a missing hint matrix to allow the discriminator to more effectively distinguish between known data and data filled by the generator. This paper theoretically proves that the SEGAN model that introduces a classifier and a missing hint matrix can learn the real known data distribution characteristics when reaching Nash equilibrium. Finally, a large number of experiments were conducted in this article, and the experimental results show that com-pared with the current state-of-the-art multivariate data completion method, the performance of the SEGAN model is improved by more than 3%.
Counterfactual Reasoning for Multi-Label Image Classification via Patching-Based Training
Xie, Ming-Kun, Xiao, Jia-Hao, Peng, Pei, Niu, Gang, Sugiyama, Masashi, Huang, Sheng-Jun
The key to multi-label image classification (MLC) is to improve model performance by leveraging label correlations. Unfortunately, it has been shown that overemphasizing co-occurrence relationships can cause the overfitting issue of the model, ultimately leading to performance degradation. In this paper, we provide a causal inference framework to show that the correlative features caused by the target object and its co-occurring objects can be regarded as a mediator, which has both positive and negative impacts on model predictions. On the positive side, the mediator enhances the recognition performance of the model by capturing co-occurrence relationships; on the negative side, it has the harmful causal effect that causes the model to make an incorrect prediction for the target object, even when only co-occurring objects are present in an image. To address this problem, we propose a counterfactual reasoning method to measure the total direct effect, achieved by enhancing the direct effect caused only by the target object. Due to the unknown location of the target object, we propose patching-based training and inference to accomplish this goal, which divides an image into multiple patches and identifies the pivot patch that contains the target object. Experimental results on multiple benchmark datasets with diverse configurations validate that the proposed method can achieve state-of-the-art performance.
Sustainable self-supervised learning for speech representations
Lugo, Luis, Vielzeuf, Valentin
Sustainable artificial intelligence focuses on data, hardware, and algorithms to make machine learning models more environmentally responsible. In particular, machine learning models for speech representations are computationally expensive, generating environmental concerns because of their high energy consumption. Thus, we propose a sustainable self-supervised model to learn speech representation, combining optimizations in neural layers and training to reduce computing costs. The proposed model improves over a resource-efficient baseline, reducing both memory usage and computing cost estimations. It pretrains using a single GPU in less than a day. On top of that, it improves the error rate performance of the baseline in downstream task evaluations. When comparing it to large speech representation approaches, there is an order of magnitude reduction in memory usage, while computing cost reductions represent almost three orders of magnitude improvement.
Arbitrary-Length Generalization for Addition in a Tiny Transformer
The Transformer architecture, as introduced by Vaswani et al. (2017), appears sufficiently robust to learn how to generalize addition, a fundamental operation (a+b=c) taught in elementary school. However, Nogueira et al. (2021) demonstrated that Transformers struggle to generalize this simple procedure effectively. Although some researchers have explored the use of both simplified and complex scratchpads to aid in training Transformers (Nye et al., 2021; Lee et al., 2024), they have not achieved generalization to numbers with arbitrary digit lengths. Recently, McLeish et al. (2024) argue that, by integrating an embedding for each digit that encodes its position relative to the start of the number, it is possible to train Transformers on 20-digit numbers and achieve approximately 99% accuracy on addition problems involving up to 100 digits. However, the authors do not study the accuracy for numbers exceeding 100 digits, which leaves an open question about the scalability of this approach to even larger numbers. This gap presents a significant opportunity for future research to explore the limits of Transformer generalization in arithmetic operations. I would like to thank Fernanda Cristiane de Oliveira for helping me to make parts of this work clearer.
Comparing Data Augmentation Methods for End-to-End Task-Oriented Dialog Systems
Vlachos, Christos, Stafylakis, Themos, Androutsopoulos, Ion
Creating effective and reliable task-oriented dialog systems (ToDSs) is challenging, not only because of the complex structure of these systems, but also due to the scarcity of training data, especially when several modules need to be trained separately, each one with its own input/output training examples. Data augmentation (DA), whereby synthetic training examples are added to the training data, has been successful in other NLP systems, but has not been explored as extensively in ToDSs. We empirically evaluate the effectiveness of DA methods in an end-to-end ToDS setting, where a single system is trained to handle all processing stages, from user inputs to system outputs. We experiment with two ToDSs (UBAR, GALAXY) on two datasets (MultiWOZ, KVRET). We consider three types of DA methods (word-level, sentence-level, dialog-level), comparing eight DA methods that have shown promising results in ToDSs and other NLP systems. We show that all DA methods considered are beneficial, and we highlight the best ones, also providing advice to practitioners. We also introduce a more challenging few-shot cross-domain ToDS setting, reaching similar conclusions.