AITopics | Tarokh, Vahid

Learning Partial Differential Equations from Data Using Neural Networks

Hasan, Ali, Pereira, João M., Ravier, Robert, Farsiu, Sina, Tarokh, Vahid

arXiv.org Machine LearningOct-22-2019

We develop a framework for estimating unknown partial differential equations from noisy data, using a deep learning approach. Given noisy samples of a solution to an unknown PDE, our method interpolates the samples using a neural network, and extracts the PDE by equating derivatives of the neural network approximation. Our method applies to PDEs which are linear combinations of user-defined dictionary functions, and generalizes previous methods that only consider parabolic PDEs. We introduce a regularization scheme that prevents the function approximation from overfitting the data and forces it to be a solution of the underlying PDE. We validate the model on simulated data generated by the known PDEs and added Gaussian noise, and we study our method under different levels of noise. We also compare the error of our method with a Cramer-Rao lower bound for an ordinary differential equation. Our results indicate that our method outperforms other methods in estimating PDEs, especially in the low signal-to-noise regime.

deep learning, neural network, pde, (16 more...)

arXiv.org Machine Learning

1910.10262

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SGD Converges to Global Minimum in Deep Learning via Star-convex Path

Zhou, Yi, Yang, Junjie, Zhang, Huishuai, Liang, Yingbin, Tarokh, Vahid

arXiv.org Machine LearningJan-2-2019

Stochastic gradient descent (SGD) has been found to be surprisingly effective in training a variety of deep neural networks. However, there is still a lack of understanding on how and why SGD can train these complex networks towards a global minimum. In this study, we establish the convergence of SGD to a global minimum for nonconvex optimization problems that are commonly encountered in neural network training. Our argument exploits the following two important properties: 1) the training loss can achieve zero value (approximately), which has been widely observed in deep learning; 2) SGD follows a star-convex path, which is verified by various experiments in this paper. In such a context, our analysis shows that SGD, although has long been considered as a randomized algorithm, converges in an intrinsically deterministic manner to a global minimum.

deep learning, neural network, sgd, (18 more...)

arXiv.org Machine Learning

1901.00451

Country: Asia (0.28)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning Bounds for Greedy Approximation with Explicit Feature Maps from Multiple Kernels

Shahrampour, Shahin, Tarokh, Vahid

Neural Information Processing SystemsDec-31-2018

Nonlinear kernels can be approximated using finite-dimensional feature maps for efficient risk minimization. Due to the inherent trade-off between the dimension of the (mapped) feature space and the approximation accuracy, the key problem is to identify promising (explicit) features leading to a satisfactory out-of-sample performance. In this work, we tackle this problem by efficiently choosing such features from multiple kernels in a greedy fashion. Our method sequentially selects these explicit features from a set of candidate features using a correlation metric. We establish an out-of-sample error bound capturing the trade-off between the error in terms of explicit features (approximation error) and the error due to spectral properties of the best model in the Hilbert space associated to the combined kernel (spectral error). The result verifies that when the (best) underlying data model is sparse enough, i.e., the spectral error is negligible, one can control the test error with a small number of explicit features, that can scale poly-logarithmically with data. Our empirical results show that given a fixed number of explicit features, the method can achieve a lower test error with a smaller time cost, compared to the state-of-the-art in data-dependent random features.

artificial intelligence, kernel, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Texas > Brazos County > College Station (0.14)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

Learning Bounds for Greedy Approximation with Explicit Feature Maps from Multiple Kernels

Shahrampour, Shahin, Tarokh, Vahid

Neural Information Processing SystemsDec-31-2018

Nonlinear kernels can be approximated using finite-dimensional feature maps for efficient risk minimization. Due to the inherent trade-off between the dimension of the (mapped) feature space and the approximation accuracy, the key problem is to identify promising (explicit) features leading to a satisfactory out-of-sample performance. In this work, we tackle this problem by efficiently choosing such features from multiple kernels in a greedy fashion. Our method sequentially selects these explicit features from a set of candidate features using a correlation metric. We establish an out-of-sample error bound capturing the trade-off between the error in terms of explicit features (approximation error) and the error due to spectral properties of the best model in the Hilbert space associated to the combined kernel (spectral error). The result verifies that when the (best) underlying data model is sparse enough, i.e., the spectral error is negligible, one can control the test error with a small number of explicit features, that can scale poly-logarithmically with data. Our empirical results show that given a fixed number of explicit features, the method can achieve a lower test error with a smaller time cost, compared to the state-of-the-art in data-dependent random features.

artificial intelligence, kernel, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Texas > Brazos County > College Station (0.14)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization

Wang, Zhe, Ji, Kaiyi, Zhou, Yi, Liang, Yingbin, Tarokh, Vahid

arXiv.org Machine LearningOct-24-2018

There has been extensive research on developing stochastic variance reduced methods to solve large-scale optimization problems. More recently, a novel algorithm of such a type named SPIDER has been developed in \cite{Fang2018}, which was shown to outperform existing algorithms of the same type and meet the lower bound in certain regimes. Though interesting in theory, SPIDER requires $\epsilon$-level stepsize to guarantee the convergence, and consequently runs slow in practice. This paper proposes SpiderBoost as an improved SPIDER scheme, which comes with two major advantages compared to SPIDER. First, it allows much larger stepsize without sacrificing the convergence rate, and hence runs substantially faster than SPIDER in practice. Second, it extends much more easily to proximal algorithms with guaranteed convergence for solving composite optimization problems, which appears challenging for SPIDER due to stringent requirement on per-iteration increment to guarantee its convergence. Both advantages can be attributed to the new convergence analysis we develop for SpiderBoost that allows much more flexibility for choosing algorithm parameters. As further generalization of SpiderBoost, we show that proximal SpiderBoost achieves a stochastic first-order oracle (SFO) complexity of $\mathcal{O}(\min\{n^{1/2}\epsilon^{-1},\epsilon^{-3/2}\})$ for composite optimization, which improves the existing best results by a factor of $\mathcal{O}(\min\{n^{1/6},\epsilon^{-1/6}\})$.

artificial intelligence, optimization, optimization problem, (16 more...)

arXiv.org Machine Learning

1810.1069

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

Model Selection Techniques -- An Overview

Ding, Jie, Tarokh, Vahid, Yang, Yuhong

arXiv.org Machine LearningOct-22-2018

Abstract--In the era of "big data", analysts usually explore various statistical models or machine learning methods for observed data in order to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction, and thus central to scientific studies in fields such as ecology, economics, engineering, finance, political science, biology, and epidemiology. There has been a long history of model selection techniques that arise from researches in statistics, information theory, and signal processing. A considerable number of methods have been proposed, following different philosophies and exhibiting varying performances. The purpose of this article is to bring a comprehensive overview of them, in terms of their motivation, large sample performance, and applicability. We provide integrated and practically relevant discussions on theoretical properties of state-ofthe-art model selection approaches. We also share our thoughts on some controversial views on the practice of model selection. Vast development in hardware storage, precision instrument manufacture, economic globalization, etc. have generated huge volumes of data that can be analyzed to extract useful information. Typical statistical inference or machine learning procedures learn from and make predictions on data by fitting parametric or nonparametric models (in a broad sense). However, there exists no model that is universally suitable for any data and goal. This research was funded in part by the Defense Advanced Research Projects Agency (DARPA) under grant number W911NF-18-1-0134. J. Ding and Y. Yang are with the School of Statistics, University of Minnesota, Minneapolis, Minnesota 55455, United States. V. Tarokh is with the Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina 27708, United States. Therefore, a crucial step in a typical data analysis is to consider a set of candidate models (referred to as the model class), and then select the most appropriate one. In other words, model selection is the task of selecting a statistical model from a model class, given a set of data. There have been many overview papers on model selection scattered in the communities of signal processing [1], statistics [2], machine learning [3], epidemiology [4], chemometrics [5], ecology and evolution [6]. Despite the abundant literature on model selection, existing overviews usually focus on derivations, descriptions, or applications of particular model selection principles.

model selection, survey article, us government, (20 more...)

arXiv.org Machine Learning

doi: 10.1109/MSP.2018.2867638

1810.09583

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.88)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Industry:

Government > Regional Government > North America Government > United States Government (0.85)
Health & Medicine > Epidemiology (0.68)
Government > Military (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Learning Bounds for Greedy Approximation with Explicit Feature Maps from Multiple Kernels

Shahrampour, Shahin, Tarokh, Vahid

arXiv.org Machine LearningOct-9-2018

Nonlinear kernels can be approximated using finite-dimensional feature maps for efficient risk minimization. Due to the inherent trade-off between the dimension of the (mapped) feature space and the approximation accuracy, the key problem is to identify promising (explicit) features leading to a satisfactory out-of-sample performance. In this work, we tackle this problem by efficiently choosing such features from multiple kernels in a greedy fashion. Our method sequentially selects these explicit features from a set of candidate features using a correlation metric. We establish an out-of-sample error bound capturing the trade-off between the error in terms of explicit features (approximation error) and the error due to spectral properties of the best model in the Hilbert space associated to the combined kernel (spectral error). The result verifies that when the (best) underlying data model is sparse enough, i.e., the spectral error is negligible, one can control the test error with a small number of explicit features, that can scale poly-logarithmically with data. Our empirical results show that given a fixed number of explicit features, the method can achieve a lower test error with a smaller time cost, compared to the state-of-the-art in data-dependent random features.

artificial intelligence, kernel, machine learning, (17 more...)

arXiv.org Machine Learning

1810.03817

Country: North America > United States > Texas > Brazos County > College Station (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

Stationary Geometric Graphical Model Selection

Soloveychik, Ilya, Tarokh, Vahid

arXiv.org Machine LearningJun-9-2018

We consider the problem of model selection in Gaussian Markov fields in the sample deficient scenario. In many cases, the underlying networks are embedded into Euclidean spaces which induces significant structure on them. Using this natural spatial structure, we introduce the notion of spatially stationary distributions over geometric graphs directly generalizing the notion of stationary time series to the multidimensional setup lacking time axis. We show that the idea of spatial stationarity leads to a dramatic decrease in the sample complexity of the model selection compared to abstract graphs with the same level of sparsity. For geometric graphs on randomly spread vertices and edges of bounded length, we develop tight information-theoretic bounds on the sample complexity and show that a finite number of independent samples is sufficient for a consistent recovery. Finally, we develop an efficient technique capable of reliably and consistently reconstructing graphs with a bounded number of measurements.

artificial intelligence, graph, machine learning, (17 more...)

arXiv.org Machine Learning

1806.03571

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.94)

Add feedback

Region Detection in Markov Random Fields: Gaussian Case

Soloveychik, Ilya, Tarokh, Vahid

arXiv.org Machine LearningMar-22-2018

In this work we consider the problem of model selection in Gaussian Markov fields in the sample deficient scenario. The benchmark information-theoretic results in the case of d-regular graphs require the number of samples to be at least proportional to the logarithm of the number of vertices to allow consistent graph recovery. When the number of samples is less than this amount, reliable detection of all edges is impossible. In many applications, it is more important to learn the distribution of the edge (coupling) parameters over the network than the specific locations of the edges. Assuming that the entire graph can be partitioned into a number of spatial regions with similar edge parameters and reasonably regular boundaries, we develop new information-theoretic sample complexity bounds and show that even bounded number of samples can be enough to consistently recover these regions. We also introduce and analyze an efficient region growing algorithm capable of recovering the regions with high accuracy. We show that it is consistent and demonstrate its performance benefits in synthetic simulations.

artificial intelligence, graph, health & medicine, (18 more...)

arXiv.org Machine Learning

1802.03848

Country:

North America > United States (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.64)

Add feedback

On Data-Dependent Random Features for Improved Generalization in Supervised Learning

Shahrampour, Shahin (Harvard University) | Beirami, Ahmad (Harvard University) | Tarokh, Vahid (Harvard University)

AAAI ConferencesFeb-8-2018

The randomized-feature approach has been successfully employed in large-scale kernel approximation and supervised learning. The distribution from which the random features are drawn impacts the number of features required to efficiently perform a learning task. Recently, it has been shown that employing data-dependent randomization improves the performance in terms of the required number of random features. In this paper, we are concerned with the randomized-feature approach in supervised learning for good generalizability. We propose the Energy-based Exploration of Random Features (EERF) algorithm based on a data-dependent score function that explores the set of possible features and exploits the promising regions. We prove that the proposed score function with high probability recovers the spectrum of the best fit within the model class. Our empirical results on several benchmark datasets further verify that our method requires smaller number of random features to achieve a certain generalization error compared to the state-of-the-art while introducing negligible pre-processing overhead. EERF can be implemented in a few lines of code and requires no additional tuning parameters.

artificial intelligence, inductive learning, random feature, (17 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > United States (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.91)

Add feedback

Filters

Collaborating Authors

Tarokh, Vahid

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Learning Partial Differential Equations from Data Using Neural Networks

SGD Converges to Global Minimum in Deep Learning via Star-convex Path

Learning Bounds for Greedy Approximation with Explicit Feature Maps from Multiple Kernels

Learning Bounds for Greedy Approximation with Explicit Feature Maps from Multiple Kernels

SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization

Model Selection Techniques -- An Overview

Learning Bounds for Greedy Approximation with Explicit Feature Maps from Multiple Kernels

Stationary Geometric Graphical Model Selection

Region Detection in Markov Random Fields: Gaussian Case

On Data-Dependent Random Features for Improved Generalization in Supervised Learning