Plotting

 Chen, Lin


Sequential Attention for Feature Selection

arXiv.org Artificial Intelligence

Feature selection is the problem of selecting a subset of features for a machine learning model that maximizes model quality subject to a budget constraint. We propose a feature selection algorithm called Sequential Attention that achieves state-of-the-art empirical results for neural networks. This algorithm is based on an efficient one-pass implementation of greedy forward selection and uses attention weights at each step as a proxy for feature importance. We give theoretical insights into our algorithm for linear regression by showing that an adaptation to this setting is equivalent to the classical Orthogonal Matching Pursuit (OMP) algorithm, and thus inherits all of its provable guarantees. Our theoretical and empirical analyses offer new explanations towards the effectiveness of attention and its connections to overparameterization, which may be of independent interest. Feature selection is a classic problem in machine learning and statistics where one is asked to find a subset of features from a larger set of features, such that the prediction quality of the model trained using the subset of features is maximized. Finding a small and high-quality feature subset is desirable for many reasons: improving model interpretability, reducing inference latency, decreasing model size, regularization, and removing redundant or noisy features to improve generalization. We direct the reader to Li et al. (2017b) for a comprehensive survey on the role of feature selection in machine learning. The widespread success of deep learning has prompted an intense study of feature selection algorithms for neural networks, especially in the supervised setting. While many methods have been proposed, we focus on a line of work that studies the use of attention for feature selection.


HybridFusion: LiDAR and Vision Cross-Source Point Cloud Fusion

arXiv.org Artificial Intelligence

Recently, cross-source point cloud registration from different sensors has become a significant research focus. However, traditional methods confront challenges due to the varying density and structure of cross-source point clouds. In order to solve these problems, we propose a cross-source point cloud fusion algorithm called HybridFusion. It can register cross-source dense point clouds from different viewing angle in outdoor large scenes. The entire registration process is a coarse-to-fine procedure. First, the point cloud is divided into small patches, and a matching patch set is selected based on global descriptors and spatial distribution, which constitutes the coarse matching process. To achieve fine matching, 2D registration is performed by extracting 2D boundary points from patches, followed by 3D adjustment. Finally, the results of multiple patch pose estimates are clustered and fused to determine the final pose. The proposed approach is evaluated comprehensively through qualitative and quantitative experiments. In order to compare the robustness of cross-source point cloud registration, the proposed method and generalized iterative closest point method are compared. Furthermore, a metric for describing the degree of point cloud filling is proposed. The experimental results demonstrate that our approach achieves state-of-the-art performance in cross-source point cloud registration.


ClusterFusion: Real-time Relative Positioning and Dense Reconstruction for UAV Cluster

arXiv.org Artificial Intelligence

As robotics technology advances, dense point cloud maps are increasingly in demand. However, dense reconstruction using a single unmanned aerial vehicle (UAV) suffers from limitations in flight speed and battery power, resulting in slow reconstruction and low coverage. Cluster UAV systems offer greater flexibility and wider coverage for map building. Existing methods of cluster UAVs face challenges with accurate relative positioning, scale drift, and high-speed dense point cloud map generation. To address these issues, we propose a cluster framework for large-scale dense reconstruction and real-time collaborative localization. The front-end of the framework is an improved visual odometry which can effectively handle large-scale scenes. Collaborative localization between UAVs is enabled through a two-stage joint optimization algorithm and a relative pose optimization algorithm, effectively achieving accurate relative positioning of UAVs and mitigating scale drift. Estimated poses are used to achieve real-time dense reconstruction and fusion of point cloud maps. To evaluate the performance of our proposed method, we conduct qualitative and quantitative experiments on real-world data. The results demonstrate that our framework can effectively suppress scale drift and generate large-scale dense point cloud maps in real-time, with the reconstruction speed increasing as more UAVs are added to the system.


HalluAudio: Hallucinating Frequency as Concepts for Few-Shot Audio Classification

arXiv.org Artificial Intelligence

ABSTRACT Few-shot audio classification is an emerging topic that attracts more and more attention from the research community. Most existing work ignores the specificity of the form of the audio spectrogram and focuses largely on the embedding space borrowed from image tasks, while in this work, we aim to take advantage of this special audio format and propose a new method by hallucinating high-frequency and low-frequency parts as structured concepts. Extensive experiments on ESC-50 and our curated balanced Kaggle18 dataset show the proposed method outperforms the baseline by a notable margin. The way that our method hallucinates high-frequency and low-frequency parts also enables its interpretability and Figure 1. Detailed structure opens up new potentials for the few-shot audio classification.


Collaboration in Participant-Centric Federated Learning: A Game-Theoretical Perspective

arXiv.org Artificial Intelligence

Federated learning (FL) is a promising distributed framework for collaborative artificial intelligence model training while protecting user privacy. A bootstrapping component that has attracted significant research attention is the design of incentive mechanism to stimulate user collaboration in FL. The majority of works adopt a broker-centric approach to help the central operator to attract participants and further obtain a well-trained model. Few works consider forging participant-centric collaboration among participants to pursue an FL model for their common interests, which induces dramatic differences in incentive mechanism design from the broker-centric FL. To coordinate the selfish and heterogeneous participants, we propose a novel analytic framework for incentivizing effective and efficient collaborations for participant-centric FL. Specifically, we respectively propose two novel game models for contribution-oblivious FL (COFL) and contribution-aware FL (CAFL), where the latter one implements a minimum contribution threshold mechanism. We further analyze the uniqueness and existence for Nash equilibrium of both COFL and CAFL games and design efficient algorithms to achieve equilibrium solutions. Extensive performance evaluations show that there exists free-riding phenomenon in COFL, which can be greatly alleviated through the adoption of CAFL model with the optimized minimum threshold.


Feature Cross Search via Submodular Optimization

arXiv.org Artificial Intelligence

In this paper, we study feature cross search as a fundamental primitive in feature engineering. The importance of feature cross search especially for the linear model has been known for a while, with well-known textbook examples. In this problem, the goal is to select a small subset of features, combine them to form a new feature (called the crossed feature) by considering their Cartesian product, and find feature crosses to learn an \emph{accurate} model. In particular, we study the problem of maximizing a normalized Area Under the Curve (AUC) of the linear model trained on the crossed feature column. First, we show that it is not possible to provide an $n^{1/\log\log n}$-approximation algorithm for this problem unless the exponential time hypothesis fails. This result also rules out the possibility of solving this problem in polynomial time unless $\mathsf{P}=\mathsf{NP}$. On the positive side, by assuming the \naive\ assumption, we show that there exists a simple greedy $(1-1/e)$-approximation algorithm for this problem. This result is established by relating the AUC to the total variation of the commutator of two probability measures and showing that the total variation of the commutator is monotone and submodular. To show this, we relate the submodularity of this function to the positive semi-definiteness of a corresponding kernel matrix. Then, we use Bochner's theorem to prove the positive semi-definiteness by showing that its inverse Fourier transform is non-negative everywhere. Our techniques and structural results might be of independent interest.


Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm

arXiv.org Artificial Intelligence

In this paper, we investigate the sample complexity of policy evaluation in infinite-horizon offline reinforcement learning (also known as the off-policy evaluation problem) with linear function approximation. We identify a hard regime $d\gamma^{2}>1$, where $d$ is the dimension of the feature vector and $\gamma$ is the discount rate. In this regime, for any $q\in[\gamma^{2},1]$, we can construct a hard instance such that the smallest eigenvalue of its feature covariance matrix is $q/d$ and it requires $\Omega\left(\frac{d}{\gamma^{2}\left(q-\gamma^{2}\right)\varepsilon^{2}}\exp\left(\Theta\left(d\gamma^{2}\right)\right)\right)$ samples to approximate the value function up to an additive error $\varepsilon$. Note that the lower bound of the sample complexity is exponential in $d$. If $q=\gamma^{2}$, even infinite data cannot suffice. Under the low distribution shift assumption, we show that there is an algorithm that needs at most $O\left(\max\left\{ \frac{\left\Vert \theta^{\pi}\right\Vert _{2}^{4}}{\varepsilon^{4}}\log\frac{d}{\delta},\frac{1}{\varepsilon^{2}}\left(d+\log\frac{1}{\delta}\right)\right\} \right)$ samples ($\theta^{\pi}$ is the parameter of the policy in linear function approximation) and guarantees approximation to the value function up to an additive error of $\varepsilon$ with probability at least $1-\delta$.


Coarse-to-Fine Pseudo-Labeling Guided Meta-Learning for Inexactly-Supervised Few-Shot Classification

arXiv.org Machine Learning

Meta-learning has recently emerged as a promising technique to address the challenge of few-shot learning. However, most existing meta-learning algorithms require fine-grained supervision, thereby involving prohibitive annotation cost. In this paper, we present a new problem named inexactly-supervised meta-learning to alleviate such limitation, focusing on tackling few-shot classification tasks with only coarse-grained supervision. Accordingly, we propose a Coarse-to-Fine (C2F) pseudo-labeling process to construct pseudo-tasks from coarsely-labeled data by grouping each coarse-class into pseudo-fine-classes via similarity matching. Moreover, we develop a Bi-level Discriminative Embedding (BDE) to obtain a good image similarity measure in both visual and semantic aspects with inexact supervision. Experiments across representative benchmarks indicate that our approach shows profound advantages over baseline models.


Multiple Descent: Design Your Own Generalization Curve

arXiv.org Machine Learning

This paper explores the generalization loss of linear regression in variably parameterized families of models, both under-parameterized and over-parameterized. We show that the generalization curve can have an arbitrary number of peaks, and moreover, locations of those peaks can be explicitly controlled. Our results highlight the fact that both classical U-shaped generalization curve and the recently observed double descent curve are not intrinsic properties of the model family. Instead, their emergence is due to the interaction between the properties of the data and the inductive biases of learning algorithms.


Deep Neural Tangent Kernel and Laplace Kernel Have the Same RKHS

arXiv.org Machine Learning

We prove that the reproducing kernel Hilbert spaces (RKHS) of a deep neural tangent kernel and the Laplace kernel include the same set of functions, when both kernels are restricted to the sphere $\mathbb{S}^{d-1}$. Additionally, we prove that the exponential power kernel with a smaller power (making the kernel more non-smooth) leads to a larger RKHS, when it is restricted to the sphere $\mathbb{S}^{d-1}$ and when it is defined on the entire $\mathbb{R}^d$.