Representation Of Examples
Task-Oriented Over-the-Air Computation for Multi-Device Edge AI
Wen, Dingzhu, Jiao, Xiang, Liu, Peixi, Zhu, Guangxu, Shi, Yuanming, Huang, Kaibin
Departing from the classic paradigm of data-centric designs, the 6G networks for supporting edge AI features task-oriented techniques that focus on effective and efficient execution of AI task. Targeting end-to-end system performance, such techniques are sophisticated as they aim to seamlessly integrate sensing (data acquisition), communication (data transmission), and computation (data processing). Aligned with the paradigm shift, a task-oriented over-the-air computation (AirComp) scheme is proposed in this paper for multi-device split-inference system. In the considered system, local feature vectors, which are extracted from the real-time noisy sensory data on devices, are aggregated over-the-air by exploiting the waveform superposition in a multiuser channel. Then the aggregated features as received at a server are fed into an inference model with the result used for decision making or control of actuators. To design inference-oriented AirComp, the transmit precoders at edge devices and receive beamforming at edge server are jointly optimized to rein in the aggregation error and maximize the inference accuracy. The problem is made tractable by measuring the inference accuracy using a surrogate metric called discriminant gain, which measures the discernibility of two object classes in the application of object/event classification. It is discovered that the conventional AirComp beamforming design for minimizing the mean square error in generic AirComp with respect to the noiseless case may not lead to the optimal classification accuracy. The reason is due to the overlooking of the fact that feature dimensions have different sensitivity towards aggregation errors and are thus of different importance levels for classification. This issue is addressed in this work via a new task-oriented AirComp scheme designed by directly maximizing the derived discriminant gain.
Opening the Black Box of wav2vec Feature Encoder
Self-supervised models, namely, wav2vec and its variants, have shown promising results in various downstream tasks in the speech domain. However, their inner workings are poorly understood, calling for in-depth analyses on what the model learns. In this paper, we concentrate on the convolutional feature encoder where its latent space is often speculated to represent discrete acoustic units. To analyze the embedding space in a reductive manner, we feed the synthesized audio signals, which is the summation of simple sine waves. Through extensive experiments, we conclude that various information is embedded inside the feature encoder representations: (1) fundamental frequency, (2) formants, and (3) amplitude, packed with (4) sufficient temporal detail. Further, the information incorporated inside the latent representations is analogous to spectrograms but with a fundamental difference: latent representations construct a metric space so that closer representations imply acoustic similarity.
Mirror Descent with Relative Smoothness in Measure Spaces, with application to Sinkhorn and EM
Aubin-Frankowski, Pierre-Cyril, Korba, Anna, Léger, Flavien
Many problems in machine learning can be formulated as optimizing a convex functional over a vector space of measures. This paper studies the convergence of the mirror descent algorithm in this infinite-dimensional setting. Defining Bregman divergences through directional derivatives, we derive the convergence of the scheme for relatively smooth and convex pairs of functionals. Such assumptions allow to handle non-smooth functionals such as the Kullback--Leibler (KL) divergence. Applying our result to joint distributions and KL, we show that Sinkhorn's primal iterations for entropic optimal transport in the continuous setting correspond to a mirror descent, and we obtain a new proof of its (sub)linear convergence. We also show that Expectation Maximization (EM) can always formally be written as a mirror descent. When optimizing only on the latent distribution while fixing the mixtures parameters -- which corresponds to the Richardson--Lucy deconvolution scheme in signal processing -- we derive sublinear rates of convergence.
The magnitude vector of images
Adamer, Michael F., De Brouwer, Edward, O'Bray, Leslie, Rieck, Bastian
The topology community has recently invested much effort in studying a newly introduced quantity called magnitude [1]. While it originates from category theory, where it can be seen as a generalisation of the Euler characteristic to metric spaces, the magnitude of a metric space is most intuitively understood as an attempt to measure the effective size of a metric space [2]. As a descriptive scalar, this quantity extends the set of other well known descriptors such as the rank, diameter or dimension. However, unlike those descriptors, the properties and potential use cases of magnitude are still under-explored. Because the metric space structure of datasets is a natural object of study when it comes to the understanding of fundamental machine learning concepts such as regularization, magnitude appears like a promising and powerful concept in machine learning: next to its abilities to describe the metric space of whole datasets, the magnitude can also be studied at the sample level, by considering each sample as its own metric space. Following this line of thought, magnitude vectors were introduced as a way to characterise the contribution of each data sample to the overall magnitude of the dataset, such that the sum of the elements of the magnitude vector amounts to the magnitude. This allowed to assess the individual contribution of each data point and their relative connectivity in the whole dataset. Indeed, magnitude vectors have been shown to detect boundaries of metric spaces, with boundary points exhibiting larger contributions to the magnitude [3].
Efficient search of active inference policy spaces using k-means
Kiefer, Alex B., Albarracin, Mahault
We develop an approach to policy selection in active inference that allows us to efficiently search large policy spaces by mapping each policy to its embedding in a vector space. We sample the expected free energy of representative points in the space, then perform a more thorough policy search around the most promising point in this initial sample. We consider various approaches to creating the policy embedding space, and propose using k-means clustering to select representative points. We apply our technique to a goal-oriented graph-traversal problem, for which naive policy selection is intractable for even moderately large graphs.
A Recommendation Approach based on Similarity-Popularity Models of Complex Networks
Alhadlaq, Abdullah, Kerrache, Said, Aboalsamh, Hatim
Recommender systems have become an essential tool for providers and users of online services and goods, especially with the increased use of the Internet to access information and purchase products and services. This work proposes a novel recommendation method based on complex networks generated by a similarity-popularity model to predict ones. We first construct a model of a network having users and items as nodes from observed ratings and then use it to predict unseen ratings. The prospect of producing accurate rating predictions using a similarity-popularity model with hidden metric spaces and dot-product similarity is explored. The proposed approach is implemented and experimentally compared against baseline and state-of-the-art recommendation methods on 21 datasets from various domains. The experimental results demonstrate that the proposed method produces accurate predictions and outperforms existing methods. We also show that the proposed approach produces superior results in low dimensions, proving its effectiveness for data visualization and exploration.
First-Order Algorithms for Min-Max Optimization in Geodesic Metric Spaces
Jordan, Michael I., Lin, Tianyi, Vlatakis-Gkaragkounis, Emmanouil-Vasileios
From optimal transport to robust dimensionality reduction, a plethora of machine learning applications can be cast into the min-max optimization problems over Riemannian manifolds. Though many min-max algorithms have been analyzed in the Euclidean setting, it has proved elusive to translate these results to the Riemannian case. Zhang et al. [2022] have recently shown that geodesic convex concave Riemannian problems always admit saddle-point solutions. Inspired by this result, we study whether a performance gap between Riemannian and optimal Euclidean space convex-concave algorithms is necessary. We answer this question in the negative-we prove that the Riemannian corrected extragradient (RCEG) method achieves last-iterate convergence at a linear rate in the geodesically strongly-convex-concave case, matching the Euclidean result. Our results also extend to the stochastic or non-smooth case where RCEG and Riemanian gradient ascent descent (RGDA) achieve near-optimal convergence rates up to factors depending on curvature of the manifold.
Provable Stochastic Optimization for Global Contrastive Learning: Small Batch Does Not Harm Performance
Yuan, Zhuoning, Wu, Yuexin, Qiu, Zi-Hao, Du, Xianzhi, Zhang, Lijun, Zhou, Denny, Yang, Tianbao
In this paper, we study contrastive learning from an optimization perspective, aiming to analyze and address a fundamental issue of existing contrastive learning methods that either rely on a large batch size or a large dictionary of feature vectors. We consider a global objective for contrastive learning, which contrasts each positive pair with all negative pairs for an anchor point. From the optimization perspective, we explain why existing methods such as SimCLR require a large batch size in order to achieve a satisfactory result. In order to remove such requirement, we propose a memory-efficient Stochastic Optimization algorithm for solving the Global objective of Contrastive Learning of Representations, named SogCLR. We show that its optimization error is negligible under a reasonable condition after a sufficient number of iterations or is diminishing for a slightly different global contrastive objective. Empirically, we demonstrate that SogCLR with small batch size (e.g., 256) can achieve similar performance as SimCLR with large batch size (e.g., 8192) on self-supervised learning task on ImageNet-1K. We also attempt to show that the proposed optimization technique is generic and can be applied to solving other contrastive losses, e.g., two-way contrastive losses for bimodal contrastive learning. The proposed method is implemented in our open-sourced library LibAUC (www.libauc.org).
HyP$^2$ Loss: Beyond Hypersphere Metric Space for Multi-label Image Retrieval
Xu, Chengyin, Chai, Zenghao, Xu, Zhengzhuo, Yuan, Chun, Fan, Yanbo, Wang, Jue
Image retrieval has become an increasingly appealing technique with broad multimedia application prospects, where deep hashing serves as the dominant branch towards low storage and efficient retrieval. In this paper, we carried out in-depth investigations on metric learning in deep hashing for establishing a powerful metric space in multi-label scenarios, where the pair loss suffers high computational overhead and converge difficulty, while the proxy loss is theoretically incapable of expressing the profound label dependencies and exhibits conflicts in the constructed hypersphere space. To address the problems, we propose a novel metric learning framework with Hybrid Proxy-Pair Loss (HyP$^2$ Loss) that constructs an expressive metric space with efficient training complexity w.r.t. the whole dataset. The proposed HyP$^2$ Loss focuses on optimizing the hypersphere space by learnable proxies and excavating data-to-data correlations of irrelevant pairs, which integrates sufficient data correspondence of pair-based methods and high-efficiency of proxy-based methods. Extensive experiments on four standard multi-label benchmarks justify the proposed method outperforms the state-of-the-art, is robust among different hash bits and achieves significant performance gains with a faster, more stable convergence speed. Our code is available at https://github.com/JerryXu0129/HyP2-Loss.
A Learned Index for Exact Similarity Search in Metric Spaces
Tian, Yao, Yan, Tingyun, Zhao, Xi, Huang, Kai, Zhou, Xiaofang
Indexing is an effective way to support efficient query processing in large databases. Recently the concept of learned index, which replaces or complements traditional index structures with machine learning models, has been actively explored to reduce storage and search costs. However, accurate and efficient similarity query processing in high-dimensional metric spaces remains to be an open challenge. In this paper, we propose a novel indexing approach called LIMS that uses data clustering, pivot-based data transformation techniques and learned indexes to support efficient similarity query processing in metric spaces. In LIMS, the underlying data is partitioned into clusters such that each cluster follows a relatively uniform data distribution. Data redistribution is achieved by utilizing a small number of pivots for each cluster. Similar data are mapped into compact regions and the mapped values are totally ordinal. Machine learning models are developed to approximate the position of each data record on disk. Efficient algorithms are designed for processing range queries and nearest neighbor queries based on LIMS, and for index maintenance with dynamic updates. Extensive experiments on real-world and synthetic datasets demonstrate the superiority of LIMS compared with traditional indexes and state-of-the-art learned indexes.