AITopics | Mathematical & Statistical Methods

Collaborating Authors

Mathematical & Statistical Methods

News Overviews Instructional Materials AI-Alerts Classics

Efficient Learning in Non-Stationary Linear Markov Decision Processes

arXiv.org Machine LearningOct-24-2020

We study episodic reinforcement learning in non-stationary linear (a.k.a. low-rank) Markov Decision Processes (MDPs), i.e, both the reward and transition kernel are linear with respect to a given feature map and are allowed to evolve either slowly or abruptly over time. For this problem setting, we propose OPT-WLSVI an optimistic model-free algorithm based on weighted least squares value iteration which uses exponential weights to smoothly forget data that are far in the past. We show that our algorithm, when competing against the best policy at each time, achieves a regret that is upped bounded by $\widetilde{\mathcal{O}}(d^{7/6}H^2 \Delta^{1/3} K^{2/3})$ where $d$ is the dimension of the feature space, $H$ is the planning horizon, $K$ is the number of episodes and $\Delta$ is a suitable measure of non-stationarity of the MDP. This is the first regret bound for non-stationary reinforcement learning with linear function approximation.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2010.1287

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)

Add feedback

Convergence and Stability of Graph Convolutional Networks on Large Random Graphs

Keriven, Nicolas, Bietti, Alberto, Vaiter, Samuel

arXiv.org Machine LearningOct-23-2020

We study properties of Graph Convolutional Networks (GCNs) by analyzing their behavior on standard models of random graphs, where nodes are represented by random latent variables and edges are drawn according to a similarity kernel. This allows us to overcome the difficulties of dealing with discrete notions such as isomorphisms on very large graphs, by considering instead more natural geometric aspects. We first study the convergence of GCNs to their continuous counterpart as the number of nodes grows. Our results are fully non-asymptotic and are valid for relatively sparse graphs with an average degree that grows logarithmically with the number of nodes. We then analyze the stability of GCNs to small deformations of the random graph model. In contrast to previous studies of stability in discrete settings, our continuous setup allows us to provide more intuitive deformation-based metrics for understanding stability, which have proven useful for explaining the success of convolutional representations on Euclidean domains.

artificial intelligence, graph, machine learning, (15 more...)

arXiv.org Machine Learning

2006.01868

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Kernel Smoothing, Mean Shift, and Their Learning Theory with Directional Data

Zhang, Yikun, Chen, Yen-Chi

arXiv.org Machine LearningOct-22-2020

Directional data consist of observations distributed on a (hyper)sphere, and appear in many applied fields, such as astronomy, ecology, and environmental science. This paper studies both statistical and computational problems of kernel smoothing for directional data. We generalize the classical mean shift algorithm to directional data, which allows us to identify local modes of the directional kernel density estimator (KDE). The statistical convergence rates of the directional KDE and its derivatives are derived, and the problem of mode estimation is examined. We also prove the ascending property of our directional mean shift algorithm and investigate a general problem of gradient ascent on the unit hypersphere. To demonstrate the applicability of our proposed algorithm, we evaluate it as a mode clustering method on both simulated and real-world datasets.

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Machine Learning

2010.13523

Country:

North America > United States > Washington > King County > Seattle (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
(14 more...)

Genre: Research Report (1.00)

Industry: Energy (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Master Linear Algebra for Data Science & Machine Learning DL

#artificialintelligenceOct-21-2020, 10:32:08 GMT

Then, this course is for you. The Common mistake by a data scientist is Applying the tools without the intuition of how it works and behaves. Having the solid foundation of mathematics will help you to understand how each algorithm work, its limitations and its underlying assumptions. With this, you will have an edge over your peers and makes you more confident in all the applications of Machine Learning, Data Science, and Deep Learning. As a common saying: It always pays to know the machinery under the hood, rather than being a guy who is just behind the wheel with no knowledge about the car.

artificial intelligence, linear algebra, machine learning, (8 more...)

#artificialintelligence

Genre: Instructional Material (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.43)

Add feedback

On Random Subset Generalization Error Bounds and the Stochastic Gradient Langevin Dynamics Algorithm

Rodríguez-Gálvez, Borja, Bassi, Germán, Thobaben, Ragnar, Skoglund, Mikael

arXiv.org Machine LearningOct-21-2020

In this work, we unify several expected generalization error bounds based on random subsets using the framework developed by Hellstr\"om and Durisi [1]. First, we recover the bounds based on the individual sample mutual information from Bu et al. [2] and on a random subset of the dataset from Negrea et al. [3]. Then, we introduce their new, analogous bounds in the randomized subsample setting from Steinke and Zakynthinou [4], and we identify some limitations of the framework. Finally, we extend the bounds from Haghifam et al. [5] for Langevin dynamics to stochastic gradient Langevin dynamics and we refine them for loss functions with potentially large gradient norms.

algorithm, generalization error, proposition 6, (11 more...)

arXiv.org Machine Learning

2010.10994

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Add feedback

Characterizing Deep Gaussian Processes via Nonlinear Recurrence Systems

Tong, Anh, Choi, Jaesik

arXiv.org Machine LearningOct-20-2020

Recent advances in Deep Gaussian Processes (DGPs) show the potential to have more expressive representation than that of traditional Gaussian Processes (GPs). However, there exists a pathology of deep Gaussian processes that their learning capacities reduce significantly when the number of layers increases. In this paper, we present a new analysis in DGPs by studying its corresponding nonlinear dynamic systems to explain the issue. Existing work reports the pathology for the squared exponential kernel function. We extend our investigation to four types of common stationary kernel functions. The recurrence relations between layers are analytically derived, providing a tighter bound and the rate of convergence of the dynamic systems. We demonstrate our finding with a number of experimental results.

artificial intelligence, machine learning, recurrence relation, (18 more...)

arXiv.org Machine Learning

2010.09301

Country: Asia > South Korea > Ulsan > Ulsan (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.73)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.48)

Add feedback

A Kernel Two-Sample Test for Functional Data

Wynne, George, Duncan, Andrew B.

arXiv.org Machine LearningOct-19-2020

Nonparametric two-sample tests for equality of distributions are widely studied in statistics, driven by applications in goodness-of-fit tests, anomaly and change-point detection and clustering. Classical examples of such tests include the Kolmogorov-Smirnov test [41, 69, 62] and Wald-Wolfowitz runs test [84] with subsequent multivariate extensions [25]. Due to advances in the ability to collect large amounts of real time or spatially distributed data there is a need to develop statistical methods appropriate for functional data, where each data sample is a discretised function. Such data has been studied for decades in the Functional Data Analysis (FDA) literature [32, 35] particularly in the context of analysing populations of time series, or in statistical shape analysis [45]. More recently, due to this modern abundance of functional data, increased study has been made in the machine learning literature for algorithms suited to such data [7, 15, 37, 12, 88].

health & medicine, kernel, us government, (21 more...)

arXiv.org Machine Learning

2008.11095

Country:

North America > United States (0.66)
Europe (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.88)
Energy > Oil & Gas (0.65)
Government > Regional Government > North America Government > United States Government (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.67)

Add feedback

Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling

Zou, Difan, Xu, Pan, Gu, Quanquan

arXiv.org Machine LearningOct-19-2020

We establish a new convergence analysis of stochastic gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave. At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain. Under certain conditions on the target distribution, we prove that $\tilde O(d^4\epsilon^{-2})$ stochastic gradient evaluations suffice to guarantee $\epsilon$-sampling error in terms of the total variation distance, where $d$ is the problem dimension, which improves existing results on the convergence rate of SGLD (Raginsky et al., 2017; Xu et al., 2018). We further show that provided an additional Hessian Lipschitz condition on the log-density function, SGLD is guaranteed to achieve $\epsilon$-sampling error within $\tilde O(d^{15/4}\epsilon^{-3/2})$ stochastic gradient evaluations. Our proof technique provides a new way to study the convergence of Langevin based algorithms, and sheds some light on the design of fast stochastic gradient based sampling algorithms.

algorithm 1, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

2010.09597

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

IBM Machine Learning

#artificialintelligenceOct-15-2020, 07:11:06 GMT

Offered by IBM. Machine Learning is one of the most in-demand skills for jobs related to modern AI applications, a field in which hiring has grown 74% annually for the last four years (LinkedIn). This Professional Certificate from IBM is intended for anyone interested in developing skills and experience to pursue a career in Machine Learning and leverage the main types of Machine Learning: Unsupervised Learning, Supervised Learning, Deep Learning, and Reinforcement Learning. It also complements your learning with special topics, including Time Series Analysis and Survival Analysis. This program consists of 6 courses providing you with solid theoretical understanding and considerable practice of the main algorithms, uses, and best practices related to Machine Learning . You will follow along and code your own projects using some of the most relevant open source frameworks and libraries. Although it is recommended that you have some background in Python programming, statistics, and linear algebra, this intermediate series is suitable for anyone who has some computer skills, interest in leveraging data, and a passion for self-learning. We start small, provide a solid theoretical background and code-along labs and demos, and build up to more complex topics. In addition to earning a Professional Certificate from Coursera, you will also receive a digital Badge from IBM recognizing your proficiency in Machine Learning.

artificial intelligence, ibm machine learning, machine learning

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.67)

Industry:

Information Technology (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.60)
Education > Educational Setting > Online (0.60)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.44)

Add feedback

Joint Inference of Multiple Graphs from Matrix Polynomials

Navarro, Madeline, Wang, Yuhao, Marques, Antonio G., Uhler, Caroline, Segarra, Santiago

arXiv.org Machine LearningOct-15-2020

Inferring graph structure from observations on the nodes is an important and popular network science task. Departing from the more common inference of a single graph and motivated by social and biological networks, we study the problem of jointly inferring multiple graphs from the observation of signals at their nodes (graph signals), which are assumed to be stationary in the sought graphs. From a mathematical point of view, graph stationarity implies that the mapping between the covariance of the signals and the sparse matrix representing the underlying graph is given by a matrix polynomial. A prominent example is that of Markov random fields, where the inverse of the covariance yields the sparse matrix of interest. From a modeling perspective, stationary graph signals can be used to model linear network processes evolving on a set of (not necessarily known) networks. Leveraging that matrix polynomials commute, a convex optimization method along with sufficient conditions that guarantee the recovery of the true graphs are provided when perfect covariance information is available. Particularly important from an empirical viewpoint, we provide high-probability bounds on the recovery error as a function of the number of signals observed and other key problem parameters. Numerical experiments using synthetic and real-world data demonstrate the effectiveness of the proposed method with perfect covariance information as well as its robustness in the noisy regime.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Machine Learning

2010.0812

Country:

Asia > China > Shanghai > Shanghai (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.67)
Information Technology (0.46)
Government > Regional Government (0.46)

Technology:

Information Technology > Communications > Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.34)

Add feedback