Goto

Collaborating Authors

 nlog


SparseDeepLearning: ANewFrameworkImmune toLocalTrapsandMiscalibration

Neural Information Processing Systems

Dn) 1 as n, which means the most posterior mass falls in the neighbourhood of true parameter. Remarkonthenotation: ν() is similar toν() defined in Section 2.1 of the main text. Thenotationsweusedinthis proof are the same as in the proof of Theorem 2.1. Theorem 2.2 implies that a faithful prediction interval can be constructed for the sparse neural network learned by the proposed algorithms. In practice, for a normal regression problem with noise N(0,σ2), to construct the prediction interval for a test pointx0, the terms σ2 and Σ = γ µ(β,x0)TH 1 γ µ(β,x0) in Theorem 2.2 need to be estimated from data.



ParallelandEfficientHierarchicalk-Median Clustering

Neural Information Processing Systems

Inparticular,standardmetricformulations as hierarchical k-center,k-means, andk-median received a lot of attention and the problems have been studied extensively in different models of computation.



P(x,y)dy=1 x K. Wedefinethefollowingstates,whichcapturekeypropertiesofthequantumwalk: |ψxi: = Z

Neural Information Processing Systems

In this section, we first define the quantum walk operators and introduce some spectral properties. Then,theeigenvaluesofW are n 1,λj q 1 λ2ji o . Let {(λi,fi)} be the set of eigenvalues and eigenfunctions of P, and |ψii be the eigenvectors of the corresponding quantum walk operator W. Let ρ0 be a probability density that is a warm start for ρ and mixes up to TV-distancein t steps of M. Furthermore, assume that kρ/ρ0k = R Theorem 5 (Quantum walk implementation cost). Let M0,M1 be two ergodic reversible Markov chains with stationary distributions π0,π1, respectively. Suppose π0 is β0-warm with respect to M1 and mixes up to total variation distancein t0() steps.


The empirical median for estimating the common mean of heteroscedastic random variables

Louati, Sirine

arXiv.org Machine Learning

We study the problem of mean estimation in the heteroscedastic setting. In particular, we consider symmetric random variables having the same location parameter and different and unknown scale parameters. Our goal is then to estimate their unknown common location parameter. It is an elementary topic but yet a not very well-studied one since we always make the assumption that the random variables are independent and identically distributed. In this paper, we study the median estimator and we establish upper and lower bounds on its estimation error that are of the same order and that generalize and improve recent results of Devroye et al. and Xia.


Robust Gradient Descent for Phase Retrieval

Buna, Alex, Rebeschini, Patrick

arXiv.org Machine Learning

Recent progress in robust statistical learning has mainly tackled convex problems, like mean estimation or linear regression, with non-convex challenges receiving less attention. Phase retrieval exemplifies such a non-convex problem, requiring the recovery of a signal from only the magnitudes of its linear measurements, without phase (sign) information. While several non-convex methods, especially those involving the Wirtinger Flow algorithm, have been proposed for noiseless or mild noise settings, developing solutions for heavy-tailed noise and adversarial corruption remains an open challenge. In this paper, we investigate an approach that leverages robust gradient descent techniques to improve the Wirtinger Flow algorithm's ability to simultaneously cope with fourth moment bounded noise and adversarial contamination in both the inputs (covariates) and outputs (responses). We address two scenarios: known zero-mean noise and completely unknown noise. For the latter, we propose a preprocessing step that alters the problem into a new format that does not fit traditional phase retrieval approaches but can still be resolved with a tailored version of the algorithm for the zero-mean noise context.


Recent Advances in Non-convex Smoothness Conditions and Applicability to Deep Linear Neural Networks

Patel, Vivak, Varner, Christian

arXiv.org Artificial Intelligence

The presence of non-convexity in smooth optimization problems arising from deep learning have sparked new smoothness conditions in the literature and corresponding convergence analyses. We discuss these smoothness conditions, order them, provide conditions for determining whether they hold, and evaluate their applicability to training a deep linear neural network for binary classification.


Theoretical analysis of deep neural networks for temporally dependent observations

Ma, Mingliang, Safikhani, Abolfazl

arXiv.org Artificial Intelligence

Deep neural networks are powerful tools to model observations over time with non-linear patterns. Despite the widespread use of neural networks in such settings, most theoretical developments of deep neural networks are under the assumption of independent observations, and theoretical results for temporally dependent observations are scarce. To bridge this gap, we study theoretical properties of deep neural networks on modeling non-linear time series data. Specifically, non-asymptotic bounds for prediction error of (sparse) feed-forward neural network with ReLU activation function is established under mixing-type assumptions. These assumptions are mild such that they include a wide range of time series models including auto-regressive models. Compared to independent observations, established convergence rates have additional logarithmic factors to compensate for additional complexity due to dependence among data points. The theoretical results are supported via various numerical simulation settings as well as an application to a macroeconomic data set.


A useful criterion on studying consistent estimation in community detection

Qing, Huan

arXiv.org Artificial Intelligence

In network analysis, developing a unified theoretical framework that can compare methods under different models is an interesting problem. This paper proposes a partial solution to this problem. We summarize the idea of using separation condition for a standard network and sharp threshold of Erd\"os-R\'enyi random graph to study consistent estimation, compare theoretical error rates and requirements on network sparsity of spectral methods under models that can degenerate to stochastic block model as a four-step criterion SCSTC. Using SCSTC, we find some inconsistent phenomena on separation condition and sharp threshold in community detection. Especially, we find original theoretical results of the SPACL algorithm introduced to estimate network memberships under the mixed membership stochastic blockmodel were sub-optimal. To find the formation mechanism of inconsistencies, we re-establish theoretical convergence rates of this algorithm by applying recent techniques on row-wise eigenvector deviation. The results are further extended to the degree corrected mixed membership model. By comparison, our results enjoy smaller error rates, lesser dependence on the number of communities, weaker requirements on network sparsity, and so forth. Furthermore, separation condition and sharp threshold obtained from our theoretical results match classical results, which shows the usefulness of this criterion on studying consistent estimation.