tensor network model
Lower and Upper Bounds on the Pseudo-Dimension of Tensor Network Models
Tensor network methods have been a key ingredient of advances in condensed matter physics and have recently sparked interest in the machine learning community for their ability to compactly represent very high-dimensional objects. Tensor network methods can for example be used to efficiently learn linear models in exponentially large feature spaces [Stoudenmire and Schwab, 2016]. In this work, we derive upper and lower bounds on the VC dimension and pseudo-dimension of a large class of tensor network models for classification, regression and completion. Our upper bounds hold for linear models parameterized by arbitrary tensor network structures, and we derive lower bounds for common tensor decomposition models~(CP, Tensor Train, Tensor Ring and Tucker) showing the tightness of our general upper bound. These results are used to derive a generalization bound which can be applied to classification with low rank matrices as well as linear classifiers based on any of the commonly used tensor decomposition models. As a corollary of our results, we obtain a bound on the VC dimension of the matrix product state classifier introduced in [Stoudenmire and Schwab, 2016] as a function of the so-called bond dimension~(i.e.
Lower and Upper Bounds on the Pseudo-Dimension of Tensor Network Models
Tensor network methods have been a key ingredient of advances in condensed matter physics and have recently sparked interest in the machine learning community for their ability to compactly represent very high-dimensional objects. Tensor network methods can for example be used to efficiently learn linear models in exponentially large feature spaces [Stoudenmire and Schwab, 2016]. In this work, we derive upper and lower bounds on the VC dimension and pseudo-dimension of a large class of tensor network models for classification, regression and completion. Our upper bounds hold for linear models parameterized by arbitrary tensor network structures, and we derive lower bounds for common tensor decomposition models (CP, Tensor Train, Tensor Ring and Tucker) showing the tightness of our general upper bound. These results are used to derive a generalization bound which can be applied to classification with low rank matrices as well as linear classifiers based on any of the commonly used tensor decomposition models.
Entangling Machine Learning with Quantum Tensor Networks
van der Poel, Constantijn, Zhao, Dan
This paper examines the use of tensor networks, which can efficiently represent high-dimensional quantum states, in language modeling. It is a distillation and continuation of the work done in (van der Poel, 2023). To do so, we will abstract the problem down to modeling Motzkin spin chains, which exhibit long-range correlations reminiscent of those found in language. The Matrix Product State (MPS), also known as the tensor train, has a bond dimension which scales as the length of the sequence it models. To combat this, we use the factored core MPS, whose bond dimension scales sub-linearly. We find that the tensor models reach near perfect classifying ability, and maintain a stable level of performance as the number of valid training examples is decreased.
Interaction Decompositions for Tensor Network Regression
Convy, Ian, Whaley, K. Birgitta
Tensor network regression has emerged as a promising and active area of machine learning research, having achieved impressive results on common benchmark tasks such as the Movie 100K [1], MNIST [2][3][4][5], and Fashion MNIST [3][4][5] datasets. The effectiveness of these models can be attributed to the tensor-product transformation that is applied to the data features, which maps the original feature vector into an exponentially large vector space. By performing linear operations on this expanded feature space, tensor network models are able to generate regression outputs that are highly non-linear functions of the original features. In most tensor network models, the tensor-product transformation is constructed from a set of vector-valued functions that each act on only a single data feature. The form of these functions is important to the operation of the model, as it determines how regression on the transformed space is related to regression on the original feature space. Conventional wisdom regarding the choice of these functions can be traced back to the parallel works of Stoudenmire and Schwab [2] and Novikov et al. [1], who each proposed a different transformation scheme.
- North America > United States > California > Alameda County > Berkeley (0.14)
- North America > United States > New York (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.54)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
The Bayesian Method of Tensor Networks
Bayesian learning is a powerful learning framework which combines the external information of the data (background information) with the internal information (training data) in a logically consistent way in inference and prediction. By Bayes rule, the external information (prior distribution) and the internal information (training data likelihood) are combined coherently, and the posterior distribution and the posterior predictive (marginal) distribution obtained by Bayes rule summarize the total information needed in the inference and prediction, respectively. In this paper, we study the Bayesian framework of the Tensor Network from two perspective. First, we introduce the prior distribution to the weights in the Tensor Network and predict the labels of the new observations by the posterior predictive (marginal) distribution. Since the intractability of the parameter integral in the normalization constant computation, we approximate the posterior predictive distribution by Laplace approximation and obtain the out-product approximation of the hessian matrix of the posterior distribution of the Tensor Network model. Second, to estimate the parameters of the stationary mode, we propose a stable initialization trick to accelerate the inference process by which the Tensor Network can converge to the stationary path more efficiently and stably with gradient descent method. We verify our work on the MNIST, Phishing Website and Breast Cancer data set. We study the Bayesian properties of the Bayesian Tensor Network by visualizing the parameters of the model and the decision boundaries in the two dimensional synthetic data set. For a application purpose, our work can reduce the overfitting and improve the performance of normal Tensor Network model.
- North America > United States > Wisconsin (0.05)
- North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Entanglement and Tensor Networks for Supervised Image Classification
Martyn, John, Vidal, Guifre, Roberts, Chase, Leichenauer, Stefan
Tensor networks, originally designed to address computational problems in quantum many-body physics, have recently been applied to machine learning tasks. However, compared to quantum physics, where the reasons for the success of tensor network approaches over the last 30 years is well understood, very little is yet known about why these techniques work for machine learning. The goal of this paper is to investigate entanglement properties of tensor network models in a current machine learning application, in order to uncover general principles that may guide future developments. We revisit the use of tensor networks for supervised image classification using the MNIST data set of handwritten digits, as pioneered by Stoudenmire and Schwab [Adv. in Neur. Inform. Proc. Sys. 29, 4799 (2016)]. Firstly we hypothesize about which state the tensor network might be learning during training. For that purpose, we propose a plausible candidate state $|\Sigma_{\ell}\rangle$ (built as a superposition of product states corresponding to images in the training set) and investigate its entanglement properties. We conclude that $|\Sigma_{\ell}\rangle$ is so robustly entangled that it cannot be approximated by the tensor network used in that work, which must therefore be representing a very different state. Secondly, we use tensor networks with a block product structure, in which entanglement is restricted within small blocks of $n \times n$ pixels/qubits. We find that these states are extremely expressive (e.g. training accuracy of $99.97 \%$ already for $n=2$), suggesting that long-range entanglement may not be essential for image classification. However, in our current implementation, optimization leads to over-fitting, resulting in test accuracies that are not competitive with other current approaches.
Tensor network approaches for learning non-linear dynamical laws
Goeßmann, A., Götte, M., Roth, I., Sweke, R., Kutyniok, G., Eisert, J.
Given observations of a physical system, identifying the underlying non-linear governing equation is a fundamental task, necessary both for gaining understanding and generating deterministic future predictions. Of most practical relevance are automated approaches to theory building that scale efficiently for complex systems with many degrees of freedom. To date, available scalable methods aim at a data-driven interpolation, without exploiting or offering insight into fundamental underlying physical principles, such as locality of interactions. In this work, we show that various physical constraints can be captured via tensor network based parameterizations for the governing equation, which naturally ensures scalability. In addition to providing analytic results motivating the use of such models for realistic physical systems, we demonstrate that efficient rank-adaptive optimization algorithms can be used to learn optimal tensor network models without requiring a~priori knowledge of the exact tensor ranks. As such, we provide a physics-informed approach to recovering structured dynamical laws from data, which adaptively balances the need for expressivity and scalability.
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- Europe > Germany > Berlin (0.04)
- North America > United States > New York (0.04)
- North America > United States > California (0.04)