dictionary matrix
A Universal Analysis of Large-Scale Regularized Least Squares Solutions
A problem that has been of recent interest in statistical inference, machine learning and signal processing is that of understanding the asymptotic behavior of regularized least squares solutions under random measurement matrices (or dictionaries). The Least Absolute Shrinkage and Selection Operator (LASSO or least-squares with $\ell_1$ regularization) is perhaps one of the most interesting examples. Precise expressions for the asymptotic performance of LASSO have been obtained for a number of different cases, in particular when the elements of the dictionary matrix are sampled independently from a Gaussian distribution. It has also been empirically observed that the resulting expressions remain valid when the entries of the dictionary matrix are independently sampled from certain non-Gaussian distributions. In this paper, we confirm these observations theoretically when the distribution is sub-Gaussian. We further generalize the previous expressions for a broader family of regularization functions and under milder conditions on the underlying random, possibly non-Gaussian, dictionary matrix. In particular, we establish the universality of the asymptotic statistics (e.g., the average quadratic risk) of LASSO with non-Gaussian dictionaries.
A Universal Analysis of Large-Scale Regularized Least Squares Solutions
A problem that has been of recent interest in statistical inference, machine learning and signal processing is that of understanding the asymptotic behavior of regularized least squares solutions under random measurement matrices (or dictionaries). The Least Absolute Shrinkage and Selection Operator (LASSO or least-squares with $\ell_1$ regularization) is perhaps one of the most interesting examples. Precise expressions for the asymptotic performance of LASSO have been obtained for a number of different cases, in particular when the elements of the dictionary matrix are sampled independently from a Gaussian distribution. It has also been empirically observed that the resulting expressions remain valid when the entries of the dictionary matrix are independently sampled from certain non-Gaussian distributions. In this paper, we confirm these observations theoretically when the distribution is sub-Gaussian. We further generalize the previous expressions for a broader family of regularization functions and under milder conditions on the underlying random, possibly non-Gaussian, dictionary matrix. In particular, we establish the universality of the asymptotic statistics (e.g., the average quadratic risk) of LASSO with non-Gaussian dictionaries.
Recovery of Coherent Data via Low-Rank Dictionary Pursuit
The recently established RPCA [4] method provides a convenient way to restore low-rank matrices from grossly corrupted observations. While elegant in theory and powerful in reality, RPCA is not an ultimate solution to the low-rank matrix recovery problem. Indeed, its performance may not be perfect even when data are strictly low-rank. This is because RPCA ignores clustering structures of the data which are ubiquitous in applications. As the number of cluster grows, the coherence of data keeps increasing, and accordingly, the recovery performance of RPCA degrades.
Recovery of Coherent Data via Low-Rank Dictionary Pursuit
The recently established RPCA [4] method provides a convenient way to restore low-rank matrices from grossly corrupted observations. While elegant in theory and powerful in reality, RPCA is not an ultimate solution to the low-rank matrix recovery problem. Indeed, its performance may not be perfect even when data are strictly low-rank. This is because RPCA ignores clustering structures of the data which are ubiquitous in applications. As the number of cluster grows, the coherence of data keeps increasing, and accordingly, the recovery performance of RPCA degrades.
A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features
Zeger, Emi, Wang, Yifei, Mishkin, Aaron, Ergen, Tolga, Candès, Emmanuel, Pilanci, Mert
We prove that training neural networks on 1-D data is equivalent to solving a convex Lasso problem with a fixed, explicitly defined dictionary matrix of features. The specific dictionary depends on the activation and depth. We consider 2-layer networks with piecewise linear activations, deep narrow ReLU networks with up to 4 layers, and rectangular and tree networks with sign activation and arbitrary depth. Interestingly in ReLU networks, a fourth layer creates features that represent reflections of training data about themselves.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Federated Knowledge Graph Completion via Latent Embedding Sharing and Tensor Factorization
Wang, Maolin, Zeng, Dun, Xu, Zenglin, Guo, Ruocheng, Zhao, Xiangyu
Knowledge graphs (KGs), which consist of triples, are inherently incomplete and always require completion procedure to predict missing triples. In real-world scenarios, KGs are distributed across clients, complicating completion tasks due to privacy restrictions. Many frameworks have been proposed to address the issue of federated knowledge graph completion. However, the existing frameworks, including FedE, FedR, and FEKG, have certain limitations. = FedE poses a risk of information leakage, FedR's optimization efficacy diminishes when there is minimal overlap among relations, and FKGE suffers from computational costs and mode collapse issues. To address these issues, we propose a novel method, i.e., Federated Latent Embedding Sharing Tensor factorization (FLEST), which is a novel approach using federated tensor factorization for KG completion. FLEST decompose the embedding matrix and enables sharing of latent dictionary embeddings to lower privacy risks. Empirical results demonstrate FLEST's effectiveness and efficiency, offering a balanced solution between performance and privacy. FLEST expands the application of federated tensor factorization in KG completion tasks.
- Asia > China > Hong Kong (0.05)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (2 more...)
Semiparametric Latent Topic Modeling on Consumer-Generated Corpora
Dayta, Dominic B., Barrios, Erniel B.
The fields of natural language processing and information retrieval saw a productive past two decades due largely to the emergence and worldwide adoption of two modern technologies: large-scale document indexing and storage facilities, of which perhaps the two most prominent brands are JSTOR and Google Books, and social networking sites that allow individual users to create and distribute various types of content, a considerable fraction of which exist in the form of texts (status updates, blog posts, and tweets). All these have led to a relentless growth in information-rich but unstructured collections of text data - referred to as corpora in natural language terminology - in terms of volume, velocity, and frequency such that manual approaches to document indexing and classification are quickly becoming obsolete. Outside the context of online archives, methods that enable automated classification and analysis of voluminous corpora would prove to be valuable technology. It has been applied to legal research [Ravi-kumar and Raghuveer, 2012] and for analyzing patterns behind railroad accidents [Williams and Betak, 2018]. In the commercial space, companies can take advantage of thousands of posts being contributed by users on a daily basis about their products and services on social media and review aggregator websites like Yelp and TripAdvisor.
- Asia > Philippines > Luzon > National Capital Region > City of Quezon (0.04)
- Asia > Middle East > Jordan (0.04)
- Africa > Chad > Salamat (0.04)
Function Approximation via Sparse Random Features
Hashemi, Abolfazl, Schaeffer, Hayden, Shi, Robert, Topcu, Ufuk, Tran, Giang, Ward, Rachel
Random feature methods have been successful in various machine learning tasks, are easy to compute, and come with theoretical accuracy bounds. They serve as an alternative approach to standard neural networks since they can represent similar function spaces without a costly training phase. However, for accuracy, random feature methods require more measurements than trainable parameters, limiting their use for data-scarce applications or problems in scientific machine learning. This paper introduces the sparse random feature method that learns parsimonious random feature models utilizing techniques from compressive sensing. We provide uniform bounds on the approximation error for functions in a reproducing kernel Hilbert space depending on the number of samples and the distribution of features. The error bounds improve with additional structural conditions, such as coordinate sparsity, compact clusters of the spectrum, or rapid spectral decay. We show that the sparse random feature method outperforms shallow networks for well-structured functions and applications to scientific machine learning tasks.
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Metalearning: Sparse Variable-Structure Automata
Fekri, Pedram, Safavi, Ali Akbar, Zadeh, Mehrdad Hosseini, Setoodeh, Peyman
Dimension of the encoder output (i.e., the code layer) in an autoencoder is a key hyper-parameter for representing the input data in a proper space. This dimension must be carefully selected in order to guarantee the desired reconstruction accuracy. Although overcomplete representation can address this dimension issue, the computational complexity will increase with dimension. Inspired by non-parametric methods, here, we propose a metalearning approach to increase the number of basis vectors used in dynamic sparse coding on the fly. An actor-critic algorithm is deployed to automatically choose an appropriate dimension for feature vectors regarding the required level of accuracy. The proposed method benefits from online dictionary learning and fast iterative shrinkage-thresholding algorithm (FISTA) as the optimizer in the inference phase. It aims at choosing the minimum number of bases for the overcomplete representation regarding the reconstruction error threshold. This method allows for online controlling of both the representation dimension and the reconstruction error in a dynamic framework.
- North America > United States > Michigan > Genesee County > Flint (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Middle East > Iran > Fars Province > Shiraz (0.04)
Signal classification using weighted orthogonal regression method
In this paper, a new classifier based on the intrinsic properties of the data is proposed. Classification is an essential task in data mining-based applications. The classification problem will be challenging when the size of the training set is not sufficient to compare to the dimension of the problem. This paper proposes a new classification method that exploits the intrinsic structure of each class through the corresponding Eigen components. Each component contributes to the learned span of each class by specific weight. The weight is determined by the associated eigenvalue. This approach results in reliable learning robust in the case of facing a classification problem with limited training data. The proposed method involves the obtained Eigenvectors by SVD of data from each class to select the bases for each subspace. Moreover, it considers an efficient weighting for the decision-making criterion to discriminate two classes. In addition to high performance on artificial data, this method has increased the best result of international competition.