Goto

Collaborating Authors

 Representation Of Examples


Learning Weakly Convex Sets in Metric Spaces

arXiv.org Artificial Intelligence

We introduce the notion of weak convexity in metric spaces, a generalization of ordinary convexity commonly used in machine learning. It is shown that weakly convex sets can be characterized by a closure operator and have a unique decomposition into a set of pairwise disjoint connected blocks. We give two generic efficient algorithms, an extensional and an intensional one for learning weakly convex concepts and study their formal properties. Our experimental results concerning vertex classification clearly demonstrate the excellent predictive performance of the extensional algorithm. Two non-trivial applications of the intensional algorithm to polynomial PAC-learnability are presented. The first one deals with learning $k$-convex Boolean functions, which are already known to be efficiently PAC-learnable. It is shown how to derive this positive result in a fairly easy way by the generic intensional algorithm. The second one is concerned with the Euclidean space equipped with the Manhattan distance. For this metric space, weakly convex sets are a union of pairwise disjoint axis-aligned hyperrectangles. We show that a weakly convex set that is consistent with a set of examples and contains a minimum number of hyperrectangles can be found in polynomial time. In contrast, this problem is known to be NP-complete if the hyperrectangles may be overlapping.


Approximate Fr\'echet Mean for Data Sets of Sparse Graphs

arXiv.org Machine Learning

To characterize the location (mean, median) of a set of graphs, one needs a notion of centrality that is adapted to metric spaces, since graph sets are not Euclidean spaces. A standard approach is to consider the Fr\'echet mean. In this work, we equip a set of graph with the pseudometric defined by the $\ell_2$ norm between the eigenvalues of their respective adjacency matrix . Unlike the edit distance, this pseudometric reveals structural changes at multiple scales, and is well adapted to studying various statistical problems on sets of graphs. We describe an algorithm to compute an approximation to the Fr\'echet mean of a set of undirected unweighted graphs with a fixed size.


Efficient Retrieval of Matrix Factorization-Based Top-k Recommendations: A Survey of Recent Approaches

Journal of Artificial Intelligence Research

Top-k recommendation seeks to deliver a personalized list of k items to each individual user. An established methodology in the literature based on matrix factorization (MF), which usually represents users and items as vectors in low-dimensional space, is an effective approach to recommender systems, thanks to its superior performance in terms of recommendation quality and scalability. A typical matrix factorization recommender system has two main phases: preference elicitation and recommendation retrieval. The former analyzes user-generated data to learn user preferences and item characteristics in the form of latent feature vectors, whereas the latter ranks the candidate items based on the learnt vectors and returns the top-k items from the ranked list. For preference elicitation, there have been numerous works to build accurate MF-based recommendation algorithms that can learn from large datasets. However, for the recommendation retrieval phase, naively scanning a large number of items to identify the few most relevant ones may inhibit truly real-time applications. In this work, we survey recent advances and state-of-the-art approaches in the literature that enable fast and accurate retrieval for MF-based personalized recommendations. Also, we include analytical discussions of approaches along different dimensions to provide the readers with a more comprehensive understanding of the surveyed works.


Aggregation over Metric Spaces: Proposing and Voting in Elections, Budgeting, and Legislation

Journal of Artificial Intelligence Research

We present a unifying framework encompassing a plethora of social choice settings. Viewing each social choice setting as voting in a suitable metric space, we offer a general model of social choice over metric spaces, in which--similarly to the spatial model of elections--each voter specifies an ideal element of the metric space. The ideal element acts as a vote, where each voter prefers elements that are closer to her ideal element. But it also acts as a proposal, thus making all participants equal not only as voters but also as proposers. We consider Condorcet aggregation and a continuum of solution concepts, ranging from minimizing the sum of distances to minimizing the maximum distance. We study applications of our abstract model to various social choice settings, including single-winner elections, committee elections, participatory budgeting, and participatory legislation. For each setting, we compare each solution concept to known voting rules and study various properties of the resulting voting rules. Our framework provides expressive aggregation for a broad range of social choice settings while remaining simple for voters; and may enable a unified and integrated implementation for all these settings, as well as unified extensions such as sybil-resiliency, proxy voting, and deliberative decision making.


Online learning with exponential weights in metric spaces

arXiv.org Machine Learning

The problem of online convex optimization (Cesa-Bianchi and Lugosi, 2006, Shalev-Shwartz, 2012, Hazan, 2016) has become a strandard model of online learning. Its simple and flexible formulation as a repeated game, devoid of distributional assumptions on the data, has proven effective in framing theoretically a number of online prediction tasks including online recommendation systems, online portfolio selection or network routing problems. Traditionally studied in the context of Euclidean spaces, less seems to be known when the decision space is a more general metric space, with potentially no linear structure. In this paper, we extend the analysis of the exponentially weighted average (ewa) forecaster to some geodesic metric spaces. Motivations for this level of generality arise, for example, when the decision space is a smooth manifold. Such a scenario is routinely encountered in directional or shape statistics (Mardia, 1999) where observations take values in spheres, projective spaces or shape spaces.


Small Sample Spaces for Gaussian Processes

arXiv.org Machine Learning

It is known that the membership in a given reproducing kernel Hilbert space (RKHS) of the samples of a Gaussian process $X$ is controlled by a certain nuclear dominance condition. However, it is less clear how to identify a "small" set of functions (not necessarily a vector space) that contains the samples. This article presents a general approach for identifying such sets. We use scaled RKHSs, which can be viewed as a generalisation of Hilbert scales, to define the sample support set as the largest set which is contained in every element of full measure under the law of $X$ in the $\sigma$-algebra induced by the collection of scaled RKHS. This potentially non-measurable set is then shown to consist of those functions that can be expanded in terms of an orthonormal basis of the RKHS of the covariance kernel of $X$ and have their squared basis coefficients bounded away from zero and infinity, a result suggested by the Karhunen-Lo\`{e}ve theorem.


Foundations of Population-Based SHM, Part IV: The Geometry of Spaces of Structures and their Feature Spaces

arXiv.org Machine Learning

One of the requirements of the population-based approach to Structural Health Monitoring (SHM) proposed in the earlier papers in this sequence, is that structures be represented by points in an abstract space. Furthermore, these spaces should be metric spaces in a loose sense; i.e. there should be some measure of distance applicable to pairs of points; similar structures should then be close in the metric. However, this geometrical construction is not enough for the framing of problems in data-based SHM, as it leaves undefined the notion of feature spaces. Interpreting the feature values on a structure-by-structure basis as a type of field over the space of structures, it seems sensible to borrow an idea from modern theoretical physics, and define feature assignments as sections in a vector bundle over the structure space. With this idea in place, one can interpret the effect of environmental and operational variations as gauge degrees of freedom, as in modern gauge field theories. This paper will discuss the various geometrical structures required for an abstract theory of feature spaces in SHM, and will draw analogies with how these structures have shown their power in modern physics. In the second part of the paper, the problem of determining the normal condition cross section of a feature bundle is addressed. The solution is provided by the application of Graph Neural Networks (GNN), a versatile non-Euclidean machine learning algorithm which is not restricted to inputs and outputs from vector spaces. In particular, the algorithm is well suited to operating directly on the sort of graph structures which are an important part of the proposed framework for PBSHM. The solution of the normal section problem is demonstrated for a heterogeneous population of truss structures for which the feature of interest is the first natural frequency.


Moreau-Yosida $f$-divergences

arXiv.org Machine Learning

Another is the family of optimal transport central to many machine learning algorithms, with distances (Villani, 2008), including the Wasserstein-1 metric. Lipschitz constrained variants recently gaining In general, variational representations are supremums attention. Inspired by this, we generalize the of integral formulas taken over sets of functions, such as the so-called tight variational representation of f-Donsker-Varadhan formula (Donsker & Varadhan, 1976) divergences in the case of probability measures for the Kullback-Leibler divergence or the Kantorovich-on compact metric spaces to be taken over the Rubinstein formula (Villani, 2008) for the Wasserstein-1 space of Lipschitz functions vanishing at an arbitrary metric. Informally speaking, one can implement (Nowozin base point, characterize functions achieving et al., 2016; Arjovsky et al., 2017) such a formula by constructing the supremum in the variational representation, a real-valued neural network taking samples from propose a practical algorithm to calculate the the two probability measures as inputs, which is then trained tight convex conjugate of f-divergences compatible to maximize the integral formula in order to approximate with automatic differentiation frameworks, the supremum, resulting in a learned proxy to the actual define the Moreau-Yosida approximation of f-divergence of said probability measures. Implementing the divergences with respect to the Wasserstein-1 metric, Kantorovich-Rubinstein formula in such a way involves and derive the corresponding variational formulas, restricting the Lipschitz constant of the neural network (Gulrajani providing a generalization of a number et al., 2017; Petzka et al., 2018; Miyato et al., 2018), of recent results, novel special cases of interest which effectively stabilizes the approximation procedure.


Linear Classifiers in Mixed Constant Curvature Spaces

arXiv.org Machine Learning

Embedding methods for mixed-curvature spaces are powerful techniques for low-distortion and low-dimensional representation of complex data structures. Nevertheless, little is known regarding downstream learning and optimization in the embedding space. Here, we address for the first time the problem of linear classification in a product space form -- a mix of Euclidean, spherical, and hyperbolic spaces with different dimensions. First, we revisit the definition of a linear classifier on a Riemannian manifold by using geodesics and Riemannian metrics which generalize the notions of straight lines and inner products in vector spaces, respectively. Second, we prove that linear classifiers in $d$-dimensional constant curvature spaces can shatter exactly $d+1$ points: Hence, Euclidean, hyperbolic and spherical classifiers have the same expressive power. Third, we formalize linear classifiers in product space forms, describe a novel perceptron classification algorithm, and establish rigorous convergence results. We support our theoretical findings with simulation results on several datasets, including synthetic data, MNIST and Omniglot. Our results reveal that learning methods applied to small-dimensional embeddings in product space forms significantly outperform their algorithmic counterparts in Euclidean spaces.


Disambiguation of weak supervision with exponential convergence rates

arXiv.org Artificial Intelligence

In many applications of machine learning, such as recommender systems, where an input characterizing a user should be matched with a target representing an ordering of a large number of items, accessing fully supervised data (,) is not an option. Instead, one should expect weak information on the target, which could be a list of previously taken (if items are online courses), watched (if items are plays), etc., items by a user characterized by the feature vector. This motivates weakly supervised learning, aiming at learning a mapping from inputs to targets in such a setting where tools from supervised learning can not be applied off-the-shelves. Recent applications of weakly supervised learning showcase impressive results in solving complex tasks such as action retrieval on instructional videos (Miech et al., 2019), image semantic segmentation (Papandreou et al., 2015), salient object detection (Wang et al., 2017), 3D pose estimation (Dabral et al., 2018), text-to-speech synthesis (Jia et al., 2018), to name a few. However, those applications of weakly supervised learning are usually based on clever heuristics, and theoretical foundations of learning from weakly supervised data are scarce, especially when compared to statistical learning literature on supervised learning (Vapnik, 1995; Boucheron et al., 2005; Steinwart and Christmann, 2008). We aim to provide a step in this direction. In this paper, we focus on partial labelling, a popular instance of weak supervision, approached with a structured prediction point of view Ciliberto et al. (2020). We detail this setup in Section 2. Our contributions are organized as follows.