to

### See, Hear, Explore: curiosity via audio-visual association

To compute audio features, we take an audio clip spanning 4 time steps (th of a second for these 60 frame per second environments) and apply a Fast Fourier Transform (FFT). The FFT output is downsampled using max pooling to a 512-dimensional feature vector, which is used as input to the discriminator along with a 512-dimensional visual feature vector.

### Probing Pre-Trained Language Models for Disease Knowledge

Pre-trained language models such as ClinicalBERT have achieved impressive results on tasks such as medical Natural Language Inference. At first glance, this may suggest that these models are able to perform medical reasoning tasks, such as mapping symptoms to diseases. However, we find that standard benchmarks such as MedNLI contain relatively few examples that require such forms of reasoning. To better understand the medical reasoning capabilities of existing language models, in this paper we introduce DisKnE, a new benchmark for Disease Knowledge Evaluation. To construct this benchmark, we annotated each positive MedNLI example with the types of medical reasoning that are needed. We then created negative examples by corrupting these positive examples in an adversarial way. Furthermore, we define training-test splits per disease, ensuring that no knowledge about test diseases can be learned from the training data, and we canonicalize the formulation of the hypotheses to avoid the presence of artefacts. This leads to a number of binary classification problems, one for each type of reasoning and each disease. When analysing pre-trained models for the clinical/biomedical domain on the proposed benchmark, we find that their performance drops considerably.

We focus on the problem of domain adaptation when the goal is shifting the model towards the target distribution, rather than learning domain invariant representations. It has been shown that under the following two assumptions: (a) access to samples from intermediate distributions, and (b) samples being annotated with the amount of change from the source distribution, self-training can be successfully applied on gradually shifted samples to adapt the model toward the target distribution. We hypothesize having (a) is enough to enable iterative self-training to slowly adapt the model to the target distribution, by making use of an implicit curriculum. In the case where (a) does not hold, we observe that iterative self-training falls short. We propose GIFT, a method that creates virtual samples from intermediate distributions by interpolating representations of examples from source and target domains. We evaluate an iterative-self-training method on datasets with natural distribution shifts, and show that when applied on top of other domain adaptation methods, it improves the performance of the model on the target dataset. We run an analysis on a synthetic dataset to show that in the presence of (a) iterative-self-training naturally forms a curriculum of samples. Furthermore, we show that when (a) does not hold, GIFT performs better than iterative self-training.

### Multi-output Gaussian Processes for Uncertainty-aware Recommender Systems

A database describing such user-item interactions often takes the form of a matrix, where each entry describes the interaction between one user and one item. The overall Recommender systems are often designed based rating or purchasing pattern of a user can therefore be described on a collaborative filtering approach, where user by the corresponding row in such a matrix. However, preferences are predicted by modelling interactions since there are typically large numbers of users and items between users and items. Many common approaches in the database, and each user is usually only interested in to solve the collaborative filtering task a small subset of items, this user-item matrix is often large are based on learning representations of users and and sparse. It is therefore inefficient to define the similarity items, including simple matrix factorization, Gaussian between users in the high dimensional feature space defined process latent variable models, and neuralnetwork by all items. Instead, it is more advantageous to derive abstract based embeddings. While matrix factorization feature vectors that represent users and items, which approaches fail to model nonlinear relations, inspired a large variety of low-rank matrix decomposition neural networks can potentially capture such models such as non-negative matrix decomposition [Zhang complex relations with unprecedented predictive et al., 2006], biased matrix decomposition [Koren et al., power and are highly scalable. However, neither 2009] and non-parametric decomposition [Yu et al., 2009]. of them is able to model predictive uncertainties. These methods aim at learning low dimensional representations In contrast, Gaussian Process based models can for all users and items, allowing for the prediction of generate a predictive distribution, but cannot scale the unobserved interaction between a new pair of user and to large amounts of data.

### DAMSL: Domain Agnostic Meta Score-based Learning

In this paper, we propose Domain Agnostic Meta Score-based Learning (DAMSL), a novel, versatile and highly effective solution that delivers significant out-performance over state-of-the-art methods for cross-domain few-shot learning. We identify key problems in previous meta-learning methods over-fitting to the source domain, and previous transfer-learning methods under-utilizing the structure of the support set. The core idea behind our method is that instead of directly using the scores from a fine-tuned feature encoder, we use these scores to create input coordinates for a domain agnostic metric space. A graph neural network is applied to learn an embedding and relation function over these coordinates to process all information contained in the score distribution of the support set. We test our model on both established CD-FSL benchmarks and new domains and show that our method overcomes the limitations of previous meta-learning and transfer-learning methods to deliver substantial improvements in accuracy across both smaller and larger domain shifts.

### Statistical embedding: Beyond principal components

There has been an intense recent activity in embedding of very high dimensional and nonlinear data structures, much of it in the data science and machine learning literature. We survey this activity in four parts. In the first part we cover nonlinear methods such as principal curves, multidimensional scaling, local linear methods, ISOMAP, graph based methods and kernel based methods. The second part is concerned with topological embedding methods, in particular mapping topological properties into persistence diagrams. Another type of data sets with a tremendous growth is very high-dimensional network data. The task considered in part three is how to embed such data in a vector space of moderate dimension to make the data amenable to traditional techniques such as cluster and classification techniques. The final part of the survey deals with embedding in $\mathbb{R}^2$, which is visualization. Three methods are presented: $t$-SNE, UMAP and LargeVis based on methods in parts one, two and three, respectively. The methods are illustrated and compared on two simulated data sets; one consisting of a triple of noisy Ranunculoid curves, and one consisting of networks of increasing complexity and with two types of nodes.

### Weighting vectors for machine learning: numerical harmonic analysis applied to boundary detection

Metric space magnitude, an active field of research in algebraic topology, is a scalar quantity that summarizes the effective number of distinct points that live in a general metric space. The {\em weighting vector} is a closely-related concept that captures, in a nontrivial way, much of the underlying geometry of the original metric space. Recent work has demonstrated that when the metric space is Euclidean, the weighting vector serves as an effective tool for boundary detection. We recast this result and show the weighting vector may be viewed as a solution to a kernelized SVM. As one consequence, we apply this new insight to the task of outlier detection, and we demonstrate performance that is competitive or exceeds performance of state-of-the-art techniques on benchmark data sets. Under mild assumptions, we show the weighting vector, which has computational cost of matrix inversion, can be efficiently approximated in linear time. We show how nearest neighbor methods can approximate solutions to the minimization problems defined by SVMs.

### Open-world Machine Learning: Applications, Challenges, and Opportunities

Traditional machine learning especially supervised learning follows the assumptions of closed-world learning i.e., for each testing class a training class is available. However, such machine learning models fail to identify the classes which were not available during training time. These classes can be referred to as unseen classes. Whereas, open-world machine learning deals with arbitrary inputs (data with unseen classes) to machine learning systems. Moreover, traditional machine learning is static learning which is not appropriate for an active environment where the perspective and sources, and/or volume of data are changing rapidly. In this paper, first, we present an overview of open-world learning with importance to the real-world context. Next, different dimensions of open-world learning are explored and discussed. The area of open-world learning gained the attention of the research community in the last decade only. We have searched through different online digital libraries and scrutinized the work done in the last decade. This paper presents a systematic review of various techniques for open-world machine learning. It also presents the research gaps, challenges, and future directions in open-world learning. This paper will help researchers to understand the comprehensive developments of open-world learning and the likelihoods to extend the research in suitable areas. It will also help to select applicable methodologies and datasets to explore this further.

### Learning to Bridge Metric Spaces: Few-shot Joint Learning of Intent Detection and Slot Filling

In this paper, we investigate few-shot joint learning for dialogue language understanding. Most existing few-shot models learn a single task each time with only a few examples. However, dialogue language understanding contains two closely related tasks, i.e., intent detection and slot filling, and often benefits from jointly learning the two tasks. This calls for new few-shot learning techniques that are able to capture task relations from only a few examples and jointly learn multiple tasks. To achieve this, we propose a similarity-based few-shot learning scheme, named Contrastive Prototype Merging network (ConProm), that learns to bridge metric spaces of intent and slot on data-rich domains, and then adapt the bridged metric space to the specific few-shot domain. Experiments on two public datasets, Snips and FewJoint, show that our model significantly outperforms the strong baselines in one and five shots settings.

### A Deep Metric Learning Approach to Account Linking

We consider the task of linking social media accounts that belong to the same author in an automated fashion on the basis of the content and metadata of their corresponding document streams. We focus on learning an embedding that maps variable-sized samples of user activity -- ranging from single posts to entire months of activity -- to a vector space, where samples by the same author map to nearby points. The approach does not require human-annotated data for training purposes, which allows us to leverage large amounts of social media content. The proposed model outperforms several competitive baselines under a novel evaluation framework modeled after established recognition benchmarks in other domains. Our method achieves high linking accuracy, even with small samples from accounts not seen at training time, a prerequisite for practical applications of the proposed linking framework.