Goto

Collaborating Authors

 Representation Of Examples


Human-machine cooperation for semantic feature listing

arXiv.org Artificial Intelligence

A central goal in cognitive science is to characterize human knowledge of concepts and their properties. Many have used human-generated feature lists as norms for establishing the structural relationship between concepts in the human mind (McRae et al., 2005; Devereux et al., 2014; De Deyne et al., 2008; Buchanan et al., 2019), but this requires extensive human labor. Large language models (LLMs) have recently shown impressive capabilities when generating properties of objects (Hansen & Hebart, 2022) or answering questions(Ouyang et al., 2022; Brown et al., 2020; Hoffmann et al., 2022; Chowdhery et al., 2022; Wei et al., 2021) and thus suggest an avenue for more efficient characterization of human knowledge structures, but even state-of-the-art models can routinely fail on many common-sense questions of fact. GTP3-davinci, for instance, will deny that alligators are green, while asserting that they can be used to suck dust up from surfaces. Thus, human effort can generate high-quality norms, but with prohibitive costs, while LLMs can produce norms with little human effort, but with considerably less accuracy. This paper considers whether human and machine effort can combine to efficiently estimate high-quality semantic feature vectors.


Classification in Non-Metric Spaces

Neural Information Processing Systems

A key question in vision is how to represent our knowledge of previously encountered objects to classify new ones. The answer depends on how we determine the similarity of two objects. Similarity tells us how relevant each previously seen object is in determining the category to which a new object belongs. Complex notions of similar(cid:173) ity appear necessary for cognitive models and applications, while simple notions of similarity form a tractable basis for current computational ap(cid:173) proaches to classification. We explore the nature of this dichotomy and why it calls for new approaches to well-studied problems in learning. We begin this process by demonstrating new computational methods for supervised learning that can handle complex notions of similarity.


Joint Probabilistic Curve Clustering and Alignment

Neural Information Processing Systems

Clustering and prediction of sets of curves is an important problem in many areas of science and engineering. It is often the case that curves tend to be misaligned from each other in a continuous manner, either in space (across the measurements) or in time. We develop a probabilistic framework that allows for joint clustering and continuous alignment of sets of curves in curve space (as opposed to a fixed-dimensional feature- vector space). The proposed methodology integrates new probabilistic alignment models with model-based curve clustering algorithms. The probabilistic approach allows for the derivation of consistent EM learn- ing algorithms for the joint clustering-alignment problem. Experimental results are shown for alignment of human growth data, and joint cluster- ing and alignment of gene expression time-course data.


From Lasso regression to Feature vector machine

Neural Information Processing Systems

Lasso regression tends to assign zero weights to most irrelevant or redundant features, and hence is a promising technique for feature selection. Its limitation, however, is that it only offers solutions to linear models. Kernel machines with feature scaling techniques have been studied for feature selection with non-linear models. However, such approaches require to solve hard non-convex optimization problems. This paper proposes a new approach named the Feature Vector Machine (FVM). It reformulates the standard Lasso regression into a form isomorphic to SVM, and this form can be easily extended for feature selection with non-linear models by introducing kernels defined on feature vectors.


Robust Nonparametric Regression with Metric-Space Valued Output

Neural Information Processing Systems

Motivated by recent developments in manifold-valued regression we propose a family of nonparametric kernel-smoothing estimators with metric-space valued output including a robust median type estimator and the classical Frechet mean. Depending on the choice of the output space and the chosen metric the estimator reduces to partially well-known procedures for multi-class classification, multivariate regression in Euclidean space, regression with manifold-valued output and even some cases of structured output learning. In this paper we focus on the case of regression with manifold-valued input and output. We show pointwise and Bayes consistency for all estimators in the family for the case of manifold-valued output and illustrate the robustness properties of the estimator with experiments.


Multi-armed bandits on implicit metric spaces

Neural Information Processing Systems

The multi-armed bandit (MAB) setting is a useful abstraction of many online learning tasks which focuses on the trade-off between exploration and exploitation. In this setting, an online algorithm has a fixed set of alternatives ("arms"), and in each round it selects one arm and then observes the corresponding reward. While the case of small number of arms is by now well-understood, a lot of recent work has focused on multi-armed bandits with (infinitely) many arms, where one needs to assume extra structure in order to make the problem tractable. In particular, in the Lipschitz MAB problem there is an underlying similarity metric space, known to the algorithm, such that any two arms that are close in this metric space have similar payoffs. In this paper we consider the more realistic scenario in which the metric space is implicit -- it is defined by the available structure but not revealed to the algorithm directly.


PWESuite: Phonetic Word Embeddings and Tasks They Facilitate

arXiv.org Artificial Intelligence

Word embeddings that map words into a fixed-dimensional vector space are the backbone of modern NLP. Most word embedding methods encode semantic information. However, phonetic information, which is important for some tasks, is often overlooked. In this work, we develop several novel methods which leverage articulatory features to build phonetically informed word embeddings, and present a set of phonetic word embeddings to encourage their community development, evaluation and use. While several methods for learning phonetic word embeddings already exist, there is a lack of consistency in evaluating their effectiveness. Thus, we also proposes several ways to evaluate both intrinsic aspects of phonetic word embeddings, such as word retrieval and correlation with sound similarity, and extrinsic performances, such as rhyme and cognate detection and sound analogies. We hope that our suite of tasks will promote reproducibility and provide direction for future research on phonetic word embeddings.


BioSequence2Vec: Efficient Embedding Generation For Biological Sequences

arXiv.org Artificial Intelligence

Representation learning is an important step in the machine learning pipeline. Given the current biological sequencing data volume, learning an explicit representation is prohibitive due to the dimensionality of the resulting feature vectors. Kernel-based methods, e.g., SVM, are a proven efficient and useful alternative for several machine learning (ML) tasks such as sequence classification. Three challenges with kernel methods are (i) the computation time, (ii) the memory usage (storing an $n\times n$ matrix), and (iii) the usage of kernel matrices limited to kernel-based ML methods (difficult to generalize on non-kernel classifiers). While (i) can be solved using approximate methods, challenge (ii) remains for typical kernel methods. Similarly, although non-kernel-based ML methods can be applied to kernel matrices by extracting principal components (kernel PCA), it may result in information loss, while being computationally expensive. In this paper, we propose a general-purpose representation learning approach that embodies kernel methods' qualities while avoiding computation, memory, and generalizability challenges. This involves computing a low-dimensional embedding of each sequence, using random projections of its $k$-mer frequency vectors, significantly reducing the computation needed to compute the dot product and the memory needed to store the resulting representation. Our proposed fast and alignment-free embedding method can be used as input to any distance (e.g., $k$ nearest neighbors) and non-distance (e.g., decision tree) based ML method for classification and clustering tasks. Using different forms of biological sequences as input, we perform a variety of real-world classification tasks, such as SARS-CoV-2 lineage and gene family classification, outperforming several state-of-the-art embedding and kernel methods in predictive performance.


A stability theorem for bigraded persistence barcodes

arXiv.org Artificial Intelligence

We define the bigraded persistent homology modules and the bigraded barcodes of a finite pseudo-metric space X using the ordinary and double homology of the moment-angle complex associated with the Vietoris-Rips filtration of X. We prove the stability theorem for the bigraded persistent double homology modules and barcodes.


Universal approximation and model compression for radial neural networks

arXiv.org Artificial Intelligence

We introduce a class of fully-connected neural networks whose activation functions, rather than being pointwise, rescale feature vectors by a function depending only on their norm. We call such networks radial neural networks, extending previous work on rotation equivariant networks that considers rescaling activations in less generality. We prove universal approximation theorems for radial neural networks, including in the more difficult cases of bounded widths and unbounded domains. Our proof techniques are novel, distinct from those in the pointwise case. Additionally, radial neural networks exhibit a rich group of orthogonal change-of-basis symmetries on the vector space of trainable parameters. Factoring out these symmetries leads to a practical lossless model compression algorithm. Optimization of the compressed model by gradient descent is equivalent to projected gradient descent for the full model.