Goto

Collaborating Authors

 Inductive Learning


Mining Functionally Related Genes with Semi-Supervised Learning

arXiv.org Artificial Intelligence

The study of biological processes can greatly benefit from tools that automatically predict gene functions or directly cluster genes based on shared functionality. Existing data mining methods predict protein functionality by exploiting data obtained from high-throughput experiments or meta-scale information from public databases. Most existing prediction tools are targeted at predicting protein functions that are described in the gene ontology (GO). However, in many cases biologists wish to discover functionally related genes for which GO terms are inadequate. In this paper, we introduce a rich set of features and use them in conjunction with semisupervised learning approaches in order to expand an initial set of seed genes to a larger cluster of functionally related genes. Among all the semi-supervised methods that were evaluated, the framework of learning with positive and unlabeled examples (LPU) is shown to be especially appropriate for mining functionally related genes. When evaluated on experimentally validated benchmark data, the LPU approaches1 significantly outperform a standard supervised learning algorithm as well as an established state-of-the-art method. Given an initial set of seed genes, our best performing approach could be used to mine functionally related genes in a wide range of organisms.


Fairness without Demographics through Adversarially Reweighted Learning

arXiv.org Machine Learning

Much of the previous machine learning (ML) fairness literature assumes that protected features such as race and sex are present in the dataset, and relies upon them to mitigate fairness concerns. However, in practice factors like privacy and regulation often preclude the collection of protected features, or their use for training or inference, severely limiting the applicability of traditional fairness research. Therefore we ask: How can we train an ML model to improve fairness when we do not even know the protected group memberships? In this work we address this problem by proposing Adversarially Reweighted Learning (ARL). In particular, we hypothesize that non-protected features and task labels are valuable for identifying fairness issues, and can be used to co-train an adversarial reweighting approach for improving fairness. Our results show that {ARL} improves Rawlsian Max-Min fairness, with notable AUC improvements for worst-case protected groups in multiple datasets, outperforming state-of-the-art alternatives.


Simulating and classifying behavior in adversarial environments based on action-state traces: an application to money laundering

arXiv.org Artificial Intelligence

Many business applications involve adversarial relationships in which both sides adapt their strategies to optimize their opposing benefits. One of the key characteristics of these applications is the wide range of strategies that an adversary may choose as they adapt their strategy dynamically to sustain benefits and evade authorities. In this paper, we present a novel way of approaching these types of applications, in particular in the context of Anti-Money Laundering. We provide a mechanism through which diverse, realistic and new unobserved behavior may be generated to discover potential unobserved adversarial actions to enable organizations to preemptively mitigate these risks. In this regard, we make three main contributions. (a) Propose a novel behavior-based model as opposed to individual transactions-based models currently used by financial institutions. We introduce behavior traces as enriched relational representation to represent observed human behavior. (b) A modelling approach that observes these traces and is able to accurately infer the goals of actors by classifying the behavior into money laundering or standard behavior despite significant unobserved activity. And (c) a synthetic behavior simulator that can generate new previously unseen traces. The simulator incorporates a high level of flexibility in the behavioral parameters so that we can challenge the detection algorithm. Finally, we provide experimental results that show that the learning module (automated investigator) that has only partial observability can still successfully infer the type of behavior, and thus the simulated goals, followed by customers based on traces - a key aspiration for many applications today.


Learning to Optimise General TSP Instances

arXiv.org Artificial Intelligence

The Travelling Salesman Problem (TSP) is a classical combinatorial optimisation problem. Deep learning has been successfully extended to meta-learning, where previous solving efforts assist in learning how to optimise future optimisation instances. In recent years, learning to optimise approaches have shown success in solving TSP problems. However, they focus on one type of TSP problem, namely ones where the points are uniformly distributed in Euclidean spaces and have issues in generalising to other embedding spaces, e.g., spherical distance spaces, and to TSP instances where the points are distributed in a non-uniform manner. An aim of learning to optimise is to train once and solve across a broad spectrum of (TSP) problems. Although supervised learning approaches have shown to achieve more optimal solutions than unsupervised approaches, they do require the generation of training data and running a solver to obtain solutions to learn from, which can be time-consuming and difficult to find reasonable solutions for harder TSP instances. Hence this paper introduces a new learning-based approach to solve a variety of different and common TSP problems that are trained on easier instances which are faster to train and are easier to obtain better solutions. We name this approach the non-Euclidean TSP network (NETSP-Net). The approach is evaluated on various TSP instances using the benchmark TSPLIB dataset and popular instance generator used in the literature. We performed extensive experiments that indicate our approach generalises across many types of instances and scales to instances that are larger than what was used during training.


Hybrid method of Data Envelopment Analysis with Supervised Learning

#artificialintelligence

I look forward for any suggestions from anyone, related to my paper about convenience store performance measurement. Background Problems: Convenience stores have recently been a trend place of daily necessities shopping for Indonesians. This condition boost the growth of convenience store's numbers and encourage the management to improve its performance in order to face tight business competition, while the performance of convenience stores is actually determined by the efficiency of various product categories. In relation to this, the concept of benchmarking through Data Envelopment Analysis (DEA) is one of the well-known method used to measure company's efficiency that can be utilized to measure firm performance. However, DEA has limitation in handling large amounts of data, but supervised learning technique can be used as an alternative method to overcome it.


Instance based Generalization in Reinforcement Learning

arXiv.org Machine Learning

Agents trained via deep reinforcement learning (RL) routinely fail to generalize to unseen environments, even when these share the same underlying dynamics as the training levels. Understanding the generalization properties of RL is one of the challenges of modern machine learning. Towards this goal, we analyze policy learning in the context of Partially Observable Markov Decision Processes (POMDPs) and formalize the dynamics of training levels as instances. We prove that, independently of the exploration strategy, reusing instances introduces significant changes on the effective Markov dynamics the agent observes during training. Maximizing expected rewards impacts the learned belief state of the agent by inducing undesired instance-specific speed-running policies instead of generalizable ones, which are sub-optimal on the training set. We provide generalization bounds to the value gap in train and test environments based on the number of training instances, and use insights based on these to improve performance on unseen levels. We propose training a shared belief representation over an ensemble of specialized policies, from which we compute a consensus policy that is used for data collection, disallowing instance-specific exploitation. We experimentally validate our theory, observations, and the proposed computational solution over the CoinRun benchmark.


Learning Output Embeddings in Structured Prediction

arXiv.org Machine Learning

A powerful and flexible approach to structured prediction consists in embedding the structured objects to be predicted into a feature space of possibly infinite dimension by means of output kernels, and then, solving a regression problem in this output space. A prediction in the original space is computed by solving a pre-image problem. In such an approach, the embedding, linked to the target loss, is defined prior to the learning phase. In this work, we propose to jointly learn a finite approximation of the output embedding and the regression function into the new feature space. For that purpose, we leverage a priori information on the outputs and also unexploited unsupervised output data, which are both often available in structured prediction problems. We prove that the resulting structured predictor is a consistent estimator, and derive an excess risk bound. Moreover, the novel structured prediction tool enjoys a significantly smaller computational complexity than former output kernel methods. The approach empirically tested on various structured prediction problems reveals to be versatile and able to handle large datasets.


Emergent Communication Pretraining for Few-Shot Machine Translation

arXiv.org Artificial Intelligence

While state-of-the-art models that rely upon massively multilingual pretrained encoders achieve sample efficiency in downstream applications, they still require abundant amounts of unlabelled text. Nevertheless, most of the world's languages lack such resources. Hence, we investigate a more radical form of unsupervised knowledge transfer in the absence of linguistic data. In particular, for the first time we pretrain neural networks via emergent communication from referential games. Our key assumption is that grounding communication on images---as a crude approximation of real-world environments---inductively biases the model towards learning natural languages. On the one hand, we show that this substantially benefits machine translation in few-shot settings. On the other hand, this also provides an extrinsic evaluation protocol to probe the properties of emergent languages ex vitro. Intuitively, the closer they are to natural languages, the higher the gains from pretraining on them should be. For instance, in this work we measure the influence of communication success and maximum sequence length on downstream performances. Finally, we introduce a customised adapter layer and annealing strategies for the regulariser of maximum-a-posteriori inference during fine-tuning. These turn out to be crucial to facilitate knowledge transfer and prevent catastrophic forgetting. Compared to a recurrent baseline, our method yields gains of $59.0\%$$\sim$$147.6\%$ in BLEU score with only $500$ NMT training instances and $65.1\%$$\sim$$196.7\%$ with $1,000$ NMT training instances across four language pairs. These proof-of-concept results reveal the potential of emergent communication pretraining for both natural language processing tasks in resource-poor settings and extrinsic evaluation of artificial languages.


MixKD: Towards Efficient Distillation of Large-scale Language Models

arXiv.org Machine Learning

Large-scale language models have recently demonstrated impressive empirical performance. Nevertheless, the improved results are attained at the price of bigger models, more power consumption, and slower inference, which hinder their applicability to low-resource (memory and computation) platforms. Knowledge distillation (KD) has been demonstrated as an effective framework for compressing such big models. However, large-scale neural network systems are prone to memorize training instances, and thus tend to make inconsistent predictions when the data distribution is altered slightly. Moreover, the student model has few opportunities to request useful information from the teacher model when there is limited task-specific data available. To address these issues, we propose MixKD, a data-agnostic distillation framework that leverages mixup, a simple yet efficient data augmentation approach, to endow the resulting model with stronger generalization ability. Concretely, in addition to the original training examples, the student model is encouraged to mimic the teacher's behavior on the linear interpolation of example pairs as well. We prove, from a theoretical perspective, that under reasonable conditions MixKD gives rise to a smaller gap between the generalization error and the empirical error. To verify its effectiveness, we conduct experiments on the GLUE benchmark, where MixKD consistently leads to significant gains over the standard KD training, and outperforms several competitive baselines. Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.


Pointwise Binary Classification with Pairwise Confidence Comparisons

arXiv.org Machine Learning

Ordinary (pointwise) binary classification aims to learn a binary classifier from pointwise labeled data. However, such pointwise labels may not be directly accessible due to privacy, confidentiality, or security considerations. In this case, can we still learn an accurate binary classifier? This paper proposes a novel setting, namely pairwise comparison (Pcomp) classification, where we are given only pairs of unlabeled data that we know one is more likely to be positive than the other, instead of pointwise labeled data. Pcomp classification is useful for private or subjective classification tasks. To solve this problem, we present a mathematical formulation for the generation process of pairwise comparison data, based on which we exploit an unbiased risk estimator (URE) to train a binary classifier by empirical risk minimization and establish an estimation error bound. We first prove that a URE can be derived and improve it using correction functions. Then, we start from the noisy-label learning perspective to introduce a progressive URE and improve it by imposing consistency regularization. Finally, experiments validate the effectiveness of our proposed solutions for Pcomp classification.