nlog
SparseDeepLearning: ANewFrameworkImmune toLocalTrapsandMiscalibration
Dn) 1 as n, which means the most posterior mass falls in the neighbourhood of true parameter. Remarkonthenotation: ν() is similar toν() defined in Section 2.1 of the main text. Thenotationsweusedinthis proof are the same as in the proof of Theorem 2.1. Theorem 2.2 implies that a faithful prediction interval can be constructed for the sparse neural network learned by the proposed algorithms. In practice, for a normal regression problem with noise N(0,σ2), to construct the prediction interval for a test pointx0, the terms σ2 and Σ = γ µ(β,x0)TH 1 γ µ(β,x0) in Theorem 2.2 need to be estimated from data.
- Europe > Austria > Vienna (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (10 more...)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
P(x,y)dy=1 x K. Wedefinethefollowingstates,whichcapturekeypropertiesofthequantumwalk: |ψxi: = Z
In this section, we first define the quantum walk operators and introduce some spectral properties. Then,theeigenvaluesofW are n 1,λj q 1 λ2ji o . Let {(λi,fi)} be the set of eigenvalues and eigenfunctions of P, and |ψii be the eigenvectors of the corresponding quantum walk operator W. Let ρ0 be a probability density that is a warm start for ρ and mixes up to TV-distancein t steps of M. Furthermore, assume that kρ/ρ0k = R Theorem 5 (Quantum walk implementation cost). Let M0,M1 be two ergodic reversible Markov chains with stationary distributions π0,π1, respectively. Suppose π0 is β0-warm with respect to M1 and mixes up to total variation distancein t0() steps.
The empirical median for estimating the common mean of heteroscedastic random variables
We study the problem of mean estimation in the heteroscedastic setting. In particular, we consider symmetric random variables having the same location parameter and different and unknown scale parameters. Our goal is then to estimate their unknown common location parameter. It is an elementary topic but yet a not very well-studied one since we always make the assumption that the random variables are independent and identically distributed. In this paper, we study the median estimator and we establish upper and lower bounds on its estimation error that are of the same order and that generalize and improve recent results of Devroye et al. and Xia.
- North America > United States > Massachusetts > Middlesex County > Reading (0.04)
- North America > United States > California (0.04)
- Europe > France (0.04)
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > Data Science > Data Mining (0.49)
Robust Gradient Descent for Phase Retrieval
Buna, Alex, Rebeschini, Patrick
Recent progress in robust statistical learning has mainly tackled convex problems, like mean estimation or linear regression, with non-convex challenges receiving less attention. Phase retrieval exemplifies such a non-convex problem, requiring the recovery of a signal from only the magnitudes of its linear measurements, without phase (sign) information. While several non-convex methods, especially those involving the Wirtinger Flow algorithm, have been proposed for noiseless or mild noise settings, developing solutions for heavy-tailed noise and adversarial corruption remains an open challenge. In this paper, we investigate an approach that leverages robust gradient descent techniques to improve the Wirtinger Flow algorithm's ability to simultaneously cope with fourth moment bounded noise and adversarial contamination in both the inputs (covariates) and outputs (responses). We address two scenarios: known zero-mean noise and completely unknown noise. For the latter, we propose a preprocessing step that alters the problem into a new format that does not fit traditional phase retrieval approaches but can still be resolved with a tailored version of the algorithm for the zero-mean noise context.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report (0.64)
- Workflow (0.49)
Recent Advances in Non-convex Smoothness Conditions and Applicability to Deep Linear Neural Networks
Patel, Vivak, Varner, Christian
The presence of non-convexity in smooth optimization problems arising from deep learning have sparked new smoothness conditions in the literature and corresponding convergence analyses. We discuss these smoothness conditions, order them, provide conditions for determining whether they hold, and evaluate their applicability to training a deep linear neural network for binary classification.
- North America > United States > Wisconsin (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Hampshire > Hillsborough County > Nashua (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Theoretical analysis of deep neural networks for temporally dependent observations
Ma, Mingliang, Safikhani, Abolfazl
Deep neural networks are powerful tools to model observations over time with non-linear patterns. Despite the widespread use of neural networks in such settings, most theoretical developments of deep neural networks are under the assumption of independent observations, and theoretical results for temporally dependent observations are scarce. To bridge this gap, we study theoretical properties of deep neural networks on modeling non-linear time series data. Specifically, non-asymptotic bounds for prediction error of (sparse) feed-forward neural network with ReLU activation function is established under mixing-type assumptions. These assumptions are mild such that they include a wide range of time series models including auto-regressive models. Compared to independent observations, established convergence rates have additional logarithmic factors to compensate for additional complexity due to dependence among data points. The theoretical results are supported via various numerical simulation settings as well as an application to a macroeconomic data set.
- North America > United States > Virginia > Fairfax County > Fairfax (0.04)
- North America > United States > Florida > Alachua County > Gainesville (0.04)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- (3 more...)
A useful criterion on studying consistent estimation in community detection
In network analysis, developing a unified theoretical framework that can compare methods under different models is an interesting problem. This paper proposes a partial solution to this problem. We summarize the idea of using separation condition for a standard network and sharp threshold of Erd\"os-R\'enyi random graph to study consistent estimation, compare theoretical error rates and requirements on network sparsity of spectral methods under models that can degenerate to stochastic block model as a four-step criterion SCSTC. Using SCSTC, we find some inconsistent phenomena on separation condition and sharp threshold in community detection. Especially, we find original theoretical results of the SPACL algorithm introduced to estimate network memberships under the mixed membership stochastic blockmodel were sub-optimal. To find the formation mechanism of inconsistencies, we re-establish theoretical convergence rates of this algorithm by applying recent techniques on row-wise eigenvector deviation. The results are further extended to the degree corrected mixed membership model. By comparison, our results enjoy smaller error rates, lesser dependence on the number of communities, weaker requirements on network sparsity, and so forth. Furthermore, separation condition and sharp threshold obtained from our theoretical results match classical results, which shows the usefulness of this criterion on studying consistent estimation.
- North America > United States (0.14)
- Asia > China (0.04)