Goto

Collaborating Authors

 Inductive Learning


Improved architectures and training algorithms for deep operator networks

arXiv.org Machine Learning

Operator learning techniques have recently emerged as a powerful tool for learning maps between infinite-dimensional Banach spaces. Trained under appropriate constraints, they can also be effective in learning the solution operator of partial differential equations (PDEs) in an entirely self-supervised manner. In this work we analyze the training dynamics of deep operator networks (DeepONets) through the lens of Neural Tangent Kernel (NTK) theory, and reveal a bias that favors the approximation of functions with larger magnitudes. To correct this bias we propose to adaptively re-weight the importance of each training example, and demonstrate how this procedure can effectively balance the magnitude of back-propagated gradients during training via gradient descent. We also propose a novel network architecture that is more resilient to vanishing gradient pathologies. Taken together, our developments provide new insights into the training of DeepONets and consistently improve their predictive accuracy by a factor of 10-50x, demonstrated in the challenging setting of learning PDE solution operators in the absence of paired input-output observations. All code and data accompanying this manuscript are publicly available at \url{https://github.com/PredictiveIntelligenceLab/ImprovedDeepONets.}


Google AI Introduces FLAN, A Language Model with Instruction Fine-Tuning

#artificialintelligence

Google AI recently introduced their new Natural Language Processing (NLP) model, known as Fine-tuned LAnguage Net (FLAN), which explores a simple technique called instruction fine-tuning, or instruction tuning for short. In general, fine-tuning requires a large number of training examples, along with stored model weights for each downstream task which is not always practical, particularly for large models. FLAN's instruction fine-tuning technique involves fine-tuning a model not to solve a specific task, but to also make it more amenable to solving NLP tasks in particular. FLAN is fine-tuned on a large set of varied instructions that use a simple and intuitive description of the task, such as "Classify this movie review as positive or negative," or "Translate this sentence to Danish." Creating a dataset of instructions from scratch to fine-tune the model would take a considerable amount of resources.


Automated Feature-Specific Tree Species Identification from Natural Images using Deep Semi-Supervised Learning

arXiv.org Machine Learning

Prior work on plant species classification predominantly focuses on building models from isolated plant attributes. Hence, there is a need for tools that can assist in species identification in the natural world. We present a novel and robust two-fold approach capable of identifying trees in a real-world natural setting. Further, we leverage unlabelled data through deep semi-supervised learning and demonstrate superior performance to supervised learning. Our single-GPU implementation for feature recognition uses minimal annotated data and achieves accuracies of 93.96% and 93.11% for leaves and bark, respectively. Further, we extract feature-specific datasets of 50 species by employing this technique. Finally, our semi-supervised species classification method attains 94.04% top-5 accuracy for leaves and 83.04% top-5 accuracy for bark.


What is Hybrid Natural Language Understanding?

#artificialintelligence

We find it in everything from emails to videos to business documents and beyond. However, as pervasive as language data is to the enterprise, organizations struggle to maximize its value. Not only is there an incredible amount of language data available to and contained within organizations, but an exponentially increasing volume of it, as well. There is no ignoring the importance of language to the enterprise ecosystem. Organizations are listening, as 42% have already adopted natural language processing (NLP) systems while 26% plan to within the next year, according to IBM's Global AI Adoption Index 2021.


How AI is helping improve the healthcare experience -- three use cases

#artificialintelligence

The amount of unstructured and structured data being generated within healthcare has increased significantly due to factors such as an aging population and the rise of telehealth -- virtual consultations -- as a method of delivering healthcare. This has only increased during the pandemic. In this article, we explore, through various use cases showing how healthcare organisations can leverage the increasing amount of data available using artificial intelligence (AI), and machine learning (ML) and analytics to improve the patient care experience and drive operational efficiencies. Unstructured data in healthcare refers to anything from clinicians handwritten notes to prescription forms and patient call center logs. This information is increasing in volume and new ways of capturing and analyzing this data are needed.


SSFL: Tackling Label Deficiency in Federated Learning via Personalized Self-Supervision

arXiv.org Artificial Intelligence

Federated Learning (FL) is transforming the ML training ecosystem from a centralized over-the-cloud setting to distributed training over edge devices in order to strengthen data privacy. An essential but rarely studied challenge in FL is label deficiency at the edge. This problem is even more pronounced in FL compared to centralized training due to the fact that FL users are often reluctant to label their private data. Furthermore, due to the heterogeneous nature of the data at edge devices, it is crucial to develop personalized models. In this paper we propose self-supervised federated learning (SSFL), a unified self-supervised and personalized federated learning framework, and a series of algorithms under this framework which work towards addressing these challenges. First, under the SSFL framework, we demonstrate that the standard FedAvg algorithm is compatible with recent breakthroughs in centralized self-supervised learning such as SimSiam networks. Moreover, to deal with data heterogeneity at the edge devices in this framework, we have innovated a series of algorithms that broaden existing supervised personalization algorithms into the setting of self-supervised learning. We further propose a novel personalized federated self-supervised learning algorithm, Per-SSFL, which balances personalization and consensus by carefully regulating the distance between the local and global representations of data. To provide a comprehensive comparative analysis of all proposed algorithms, we also develop a distributed training system and related evaluation protocol for SSFL. Our findings show that the gap of evaluation accuracy between supervised learning and unsupervised learning in FL is both small and reasonable. The performance comparison indicates the representation regularization-based personalization method is able to outperform other variants.


Co-training an Unsupervised Constituency Parser with Weak Supervision

arXiv.org Artificial Intelligence

We introduce a method for unsupervised parsing that relies on bootstrapping classifiers to identify if a node dominates a specific span in a sentence. There are two types of classifiers, an inside classifier that acts on a span, and an outside classifier that acts on everything outside of a given span. Through self-training and co-training with the two classifiers, we show that the interplay between them helps improve the accuracy of both, and as a result, effectively parse. A seed bootstrapping technique prepares the data to train these classifiers. Our analyses further validate that such an approach in conjunction with weak supervision using prior branching knowledge of a known language (left/right-branching) and minimal heuristics injects strong inductive bias into the parser, achieving 63.1 F$_1$ on the English (PTB) test set. In addition, we show the effectiveness of our architecture by evaluating on treebanks for Chinese (CTB) and Japanese (KTB) and achieve new state-of-the-art results.\footnote{For code or data, please contact the authors.}


The Power of Contrast for Feature Learning: A Theoretical Analysis

arXiv.org Machine Learning

Deep supervised learning has achieved great success in various applications, including computer vision (Krizhevsky et al., 2012), natural language processing (Devlin et al., 2018), and scientific computing (Han et al., 2018). However, its dependence on manually assigned labels, which is usually difficult and costly, has motivated research into alternative approaches to exploit unlabeled data. Self-supervised learning is a promising approach that leverages the unlabeled data itself as supervision and learns representations that are beneficial to potential downstream tasks. At a high level, there are two common approaches for feature extraction in self-supervised learning: generative and contrastive (Liu et al., 2021). Both approaches aim to learn latent representations of the original data, while the difference is that the generative approach focused on minimizing the reconstruction error from latent representations, and the contrastive approach targets to decrease the similarity between the representations of contrastive pairs. Recent works have shown the benefits of contrastive learning in practice (Chen et al., 2020a,b,c; He et al., 2020).


Hypernetworks for Continual Semi-Supervised Learning

arXiv.org Machine Learning

Learning from data sequentially arriving, possibly in a non i.i.d. way, with changing task distribution over time is called continual learning. Much of the work thus far in continual learning focuses on supervised learning and some recent works on unsupervised learning. In many domains, each task contains a mix of labelled (typically very few) and unlabelled (typically plenty) training examples, which necessitates a semi-supervised learning approach. To address this in a continual learning setting, we propose a framework for semi-supervised continual learning called Meta-Consolidation for Continual Semi-Supervised Learning (MCSSL). Our framework has a hypernetwork that learns the meta-distribution that generates the weights of a semi-supervised auxiliary classifier generative adversarial network $(\textit{Semi-ACGAN})$ as the base network. We consolidate the knowledge of sequential tasks in the hypernetwork, and the base network learns the semi-supervised learning task. Further, we present $\textit{Semi-Split CIFAR-10}$, a new benchmark for continual semi-supervised learning, obtained by modifying the $\textit{Split CIFAR-10}$ dataset, in which the tasks with labelled and unlabelled data arrive sequentially. Our proposed model yields significant improvements in the continual semi-supervised learning setting. We compare the performance of several existing continual learning approaches on the proposed continual semi-supervised learning benchmark of the Semi-Split CIFAR-10 dataset.


Inductive learning for product assortment graph completion

arXiv.org Artificial Intelligence

Global retailers have assortments that contain hundreds of thousands of products that can be linked by several types of relationships like style compatibility, "bought together", "watched together", etc. Graphs are a natural representation for assortments, where products are nodes and relations are edges. Relations like style compatibility are often produced by a manual process and therefore do not cover uniformly the whole graph. We propose to use inductive learning to enhance a graph encoding style compatibility of a fashion assortment, leveraging rich node information comprising textual descriptions and visual data. Then, we show how the proposed graph enhancement improves substantially the performance on transductive tasks with a minor impact on graph sparsity.