AITopics | relu regression

d9d3837ee7981e8c064774da6cdd98bf-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 10:27:17 GMT

algorithm, regression, transformation, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > California > Monterey County > Pacific Grove (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Revisiting Differentially Private ReLU Regression

Neural Information Processing SystemsDec-26-2025, 05:46:54 GMT

As one of the most fundamental non-convex learning problems, ReLU regression under differential privacy (DP) constraints, especially in high-dimensional settings, remains a challenging area in privacy-preserving machine learning. Existing results are limited to the assumptions of bounded norm $ \|\mathbf{x}\|_2 \leq 1$, which becomes meaningless with increasing data dimensionality. In this work, we revisit the problem of DP ReLU regression in high-dimensional regimes. We propose two innovative algorithms DP-GLMtron and DP-TAGLMtron that outperform the conventional DPSGD. DP-GLMtron is based on a generalized linear model perceptron approach, integrating adaptive clipping and Gaussian mechanism for enhanced privacy. To overcome the constraints of small privacy budgets in DP-GLMtron, represented by $\widetilde{O}(\sqrt{1/N})$ where $N$ is the sample size, we introduce DP-TAGLMtron, which utilizes a tree aggregation protocol to balance privacy and utility effectively, showing that DP-TAGLMtron achieves comparable performance with only an additional factor of $O(\log N)$ in the utility upper bound.Moreover, our theoretical analysis extends beyond Gaussian-like data distributions to settings with eigenvalue decay, showing how data distribution impacts learning in high dimensions. Notably, our findings suggest that the utility upper bound could be independent of the dimension $d$, even when $d \gg N$.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.76)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ReLU Regression with Massart Noise

Neural Information Processing SystemsDec-25-2025, 00:35:48 GMT

We study the fundamental problem of ReLU regression, where the goal is to fit Rectified Linear Units (ReLUs) to data. This supervised learning task is efficiently solvable in the realizable setting, but is known to be computationally hard with adversarial label noise. In this work, we focus on ReLU regression in the Massart noise model, a natural and well-studied semi-random noise model. In this model, the label of every point is generated according to a function in the class, but an adversary is allowed to change this value arbitrarily with some probability, which is {\em at most} $\eta < 1/2$. We develop an efficient algorithm that achieves exact parameter recovery in this model under mild anti-concentration assumptions on the underlying distribution. Such assumptions are necessary for exact recovery to be information-theoretically possible. We demonstrate that our algorithm significantly outperforms naive applications of $\ell_1$ and $\ell_2$ regression on both synthetic and real data.

massart noise, name change, relu regression, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.78)

Add feedback

ReLU Regression with Massart Noise

Neural Information Processing SystemsAug-17-2025, 19:13:18 GMT

Such assumptions are necessary for exact recovery to be information-theoretically possible.

artificial intelligence, machine learning, regression, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > California > Monterey County > Pacific Grove (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

ReLU Regression with Massart Noise

Neural Information Processing SystemsJan-19-2025, 09:02:39 GMT

We study the fundamental problem of ReLU regression, where the goal is to fit Rectified Linear Units (ReLUs) to data. This supervised learning task is efficiently solvable in the realizable setting, but is known to be computationally hard with adversarial label noise. In this work, we focus on ReLU regression in the Massart noise model, a natural and well-studied semi-random noise model. In this model, the label of every point is generated according to a function in the class, but an adversary is allowed to change this value arbitrarily with some probability, which is {\em at most} \eta 1/2 . We develop an efficient algorithm that achieves exact parameter recovery in this model under mild anti-concentration assumptions on the underlying distribution.

massart noise, noise model, relu regression, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.84)

Add feedback

Stochastic gradient descent for streaming linear and rectified linear systems with Massart noise

Jeong, Halyun, Needell, Deanna, Rebrova, Elizaveta

arXiv.org Machine LearningMar-2-2024

We propose SGD-exp, a stochastic gradient descent approach for linear and ReLU regressions under Massart noise (adversarial semi-random corruption model) for the fully streaming setting. We show novel nearly linear convergence guarantees of SGD-exp to the true parameter with up to $50\%$ Massart corruption rate, and with any corruption rate in the case of symmetric oblivious corruptions. This is the first convergence guarantee result for robust ReLU regression in the streaming setting, and it shows the improved convergence rate over previous robust methods for $L_1$ linear regression due to a choice of an exponentially decaying step size, known for its efficiency in practice. Our analysis is based on the drift analysis of a discrete stochastic process, which could also be interesting on its own.

artificial intelligence, machine learning, regression, (16 more...)

arXiv.org Machine Learning

2403.01204

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron

Wu, Jingfeng, Zou, Difan, Chen, Zixiang, Braverman, Vladimir, Gu, Quanquan, Kakade, Sham M.

arXiv.org Artificial IntelligenceJun-26-2023

This paper considers the problem of learning a single ReLU neuron with squared loss (a.k.a., ReLU regression) in the overparameterized regime, where the input dimension can exceed the number of samples. We analyze a Perceptron-type algorithm called GLM-tron (Kakade et al., 2011) and provide its dimension-free risk upper bounds for high-dimensional ReLU regression in both well-specified and misspecified settings. Our risk bounds recover several existing results as special cases. Moreover, in the well-specified setting, we provide an instance-wise matching risk lower bound for GLM-tron. Our upper and lower risk bounds provide a sharp characterization of the high-dimensional ReLU regression problems that can be learned via GLM-tron. On the other hand, we provide some negative results for stochastic gradient descent (SGD) for ReLU regression with symmetric Bernoulli data: if the model is well-specified, the excess risk of SGD is provably no better than that of GLM-tron ignoring constant factors, for each problem instance; and in the noiseless case, GLM-tron can achieve a small risk while SGD unavoidably suffers from a constant risk in expectation. These results together suggest that GLM-tron might be preferable to SGD for high-dimensional ReLU regression.

artificial intelligence, glm-tron, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2303.02255

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.34)

Add feedback

ReLU Regression with Massart Noise

Diakonikolas, Ilias, Park, Jongho, Tzamos, Christos

arXiv.org Machine LearningSep-9-2021

We study the fundamental problem of ReLU regression, where the goal is to fit Rectified Linear Units (ReLUs) to data. This supervised learning task is efficiently solvable in the realizable setting, but is known to be computationally hard with adversarial label noise. In this work, we focus on ReLU regression in the Massart noise model, a natural and well-studied semi-random noise model. In this model, the label of every point is generated according to a function in the class, but an adversary is allowed to change this value arbitrarily with some probability, which is {\em at most} $\eta < 1/2$. We develop an efficient algorithm that achieves exact parameter recovery in this model under mild anti-concentration assumptions on the underlying distribution. Such assumptions are necessary for exact recovery to be information-theoretically possible. We demonstrate that our algorithm significantly outperforms naive applications of $\ell_1$ and $\ell_2$ regression on both synthetic and real data.

artificial intelligence, machine learning, transformation, (19 more...)

arXiv.org Machine Learning

2109.04623

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Maryland > Baltimore (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)

Add feedback

Approximation Schemes for ReLU Regression

Diakonikolas, Ilias, Goel, Surbhi, Karmalkar, Sushrut, Klivans, Adam R., Soltanolkotabi, Mahdi

arXiv.org Machine LearningSep-28-2020

We consider the fundamental problem of ReLU regression, where the goal is to output the best fitting ReLU with respect to square loss given access to draws from some unknown distribution. We give the first efficient, constant-factor approximation algorithm for this problem assuming the underlying distribution satisfies some weak concentration and anti-concentration conditions (and includes, for example, all log-concave distributions). This solves the main open problem of Goel et al., who proved hardness results for any exact algorithm for ReLU regression (up to an additive $\epsilon$). Using more sophisticated techniques, we can improve our results and obtain a polynomial-time approximation scheme for any subgaussian distribution. Given the aforementioned hardness results, these guarantees can not be substantially improved. Our main insight is a new characterization of surrogate losses for nonconvex activations. While prior work had established the existence of convex surrogates for monotone activations, we show that properties of the underlying distribution actually induce strong convexity for the loss, allowing us to relate the global minimum to the activation's Chow parameters.

artificial intelligence, machine learning, surr, (18 more...)

arXiv.org Machine Learning

2005.12844

Country: