AITopics | lipschitz-continuous

Country:

North America > United States > Massachusetts (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Machine LearningDec-30-2025

On Admissible Rank-based Input Normalization Operators

Kim, Taeyun

Rank-based input normalization is a workhorse of modern machine learning, prized for its robustness to scale, monotone transformations, and batch-to-batch variation. In many real systems, the ordering of feature values matters far more than their raw magnitudes - yet the structural conditions that a rank-based normalization operator must satisfy to remain stable under these invariances have never been formally pinned down. We show that widely used differentiable sorting and ranking operators fundamentally fail these criteria. Because they rely on value gaps and batch-level pairwise interactions, they are intrinsically unstable under strictly monotone transformations, shifts in mini-batch composition, and even tiny input perturbations. Crucially, these failures stem from the operators' structural design, not from incidental implementation choices. To address this, we propose three axioms that formalize the minimal invariance and stability properties required of rank-based input normalization. We prove that any operator satisfying these axioms must factor into (i) a feature-wise rank representation and (ii) a scalarization map that is both monotone and Lipschitz-continuous. We then construct a minimal operator that meets these criteria and empirically show that the resulting constraints are non-trivial in realistic setups. Together, our results sharply delineate the design space of valid rank-based normalization operators and formally separate them from existing continuous-relaxation-based sorting methods.

artificial intelligence, machine learning, operator, (18 more...)

2512.22587

Country:

North America > United States > California (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Neural Information Processing SystemsDec-27-2025, 18:34:06 GMT

Reciprocal Learning

These instances range from active learning over multi-armed bandits to self-training. We show that all these algorithms not only learn parameters from data but also vice versa: They iteratively alter training data in a way that depends on the current model fit. We introduce reciprocal learning as a generalization of these algorithms using the language of decision theory. This allows us to study under what conditions they converge.

artificial intelligence, data mining, machine learning, (16 more...)

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(5 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.67)

Mathieu Fehr, Olivier Buffet, Vincent Thomas, Jilles Dibangoye

rho-POMDPs have Lipschitz-Continuous epsilon-Optimal Value Functions

Neural Information Processing SystemsNov-20-2025, 20:38:12 GMT

Many state-of-the-art algorithms for solving Partially Observable Markov Decision Processes (POMDPs) rely on turning the problem into a "fully observable" problem--a belief MDP--and exploiting the piece-wise linearity and convexity

artificial intelligence, lipschitz constant, machine learning, (17 more...)

Country:

North America > United States > Massachusetts (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Neural Information Processing SystemsOct-9-2025, 17:14:03 GMT

0337b41b4e8b2eb5d7ab161ffd42cf3b-Paper-Conference.pdf

data mining, machine learning, natural language, (16 more...)

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(5 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

arXiv.org Machine LearningJun-16-2025

Convergence of Momentum-Based Optimization Algorithms with Time-Varying Parameters

Vidyasagar, Mathukumalli

In this paper, we present a unified algorithm for stochastic optimization that makes use of a "momentum" term; in other words, the stochastic gradient depends not only on the current true gradient of the objective function, but also on the true gradient at the previous iteration. Our formulation includes the Stochastic Heavy Ball (SHB) and the Stochastic Nesterov Accelerated Gradient (SNAG) algorithms as special cases. In addition, in our formulation, the momentum term is allowed to vary as a function of time (i.e., the iteration counter). The assumptions on the stochastic gradient are the most general in the literature, in that it can be biased, and have a conditional variance that grows in an unbounded fashion as a function of time. This last feature is crucial in order to make the theory applicable to "zero-order" methods, where the gradient is estimated using just two function evaluations. We present a set of sufficient conditions for the convergence of the unified algorithm. These conditions are natural generalizations of the familiar Robbins-Monro and Kiefer-Wolfowitz-Blum conditions for standard stochastic gradient descent. We also analyze another method from the literature for the SHB algorithm with a time-varying momentum parameter, and show that it is impracticable.

algorithm, artificial intelligence, machine learning, (16 more...)

2506.11904

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > India > Telangana (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.96)

Rodemann, Julian, Jansen, Christoph, Schollmeyer, Georg

Reciprocal Learning

arXiv.org Machine LearningAug-12-2024

We demonstrate that a wide array of machine learning algorithms are specific instances of one single paradigm: reciprocal learning. These instances range from active learning over multi-armed bandits to self-training. We show that all these algorithms do not only learn parameters from data but also vice versa: They iteratively alter training data in a way that depends on the current model fit. We introduce reciprocal learning as a generalization of these algorithms using the language of decision theory. This allows us to study under what conditions they converge. The key is to guarantee that reciprocal learning contracts such that the Banach fixed-point theorem applies. In this way, we find that reciprocal learning algorithms converge at linear rates to an approximately optimal model under relatively mild assumptions on the loss function, if their predictions are probabilistic and the sample adaption is both non-greedy and either randomized or regularized. We interpret these findings and provide corollaries that relate them to specific active learning, self-training, and bandit algorithms.

learning, lipschitz-continuous, reciprocal learning, (12 more...)

2408.06257

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(5 more...)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

Aziznejad, Shayan, Gupta, Harshit, Campos, Joaquim, Unser, Michael

Deep Neural Networks with Trainable Activations and Controlled Lipschitz Constant

arXiv.org Machine LearningJan-17-2020

We introduce a variational framework to learn the activation functions of deep neural networks. The main motivation is to control the Lipschitz regularity of the input-output relation. To that end, we first establish a global bound for the Lipschitz constant of neural networks. Based on the obtained bound, we then formulate a variational problem for learning activation functions. Our variational problem is infinite-dimensional and is not computationally tractable. However, we prove that there always exists a solution that has continuous and piecewise-linear (linear-spline) activations. This reduces the original problem to a finite-dimensional minimization. We numerically compare our scheme with standard ReLU network and its variations, PReLU and LeakyReLU.

artificial intelligence, deep learning, machine learning, (17 more...)

2001.06263

Country:

Europe > United Kingdom > England > East Sussex > Brighton (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Fehr, Mathieu, Buffet, Olivier, Thomas, Vincent, Dibangoye, Jilles

rho-POMDPs have Lipschitz-Continuous epsilon-Optimal Value Functions

Neural Information Processing SystemsDec-31-2018

Many state-of-the-art algorithms for solving Partially Observable Markov Decision Processes (POMDPs) rely on turning the problem into a “fully observable” problem—a belief MDP—and exploiting the piece-wise linearity and convexity (PWLC) of the optimal value function in this new state space (the belief simplex ∆). This approach has been extended to solving ρ-POMDPs—i.e., for information-oriented criteria—when the reward ρ is convex in ∆. General ρ-POMDPs can also be turned into “fully observable” problems, but with no means to exploit the PWLC property. In this paper, we focus on POMDPs and ρ-POMDPs with λ ρ -Lipschitz reward function, and demonstrate that, for finite horizons, the optimal value function is Lipschitz-continuous. Then, value function approximators are proposed for both upper- and lower-bounding the optimal value function, which are shown to provide uniformly improvable bounds. This allows proposing two algorithms derived from HSVI which are empirically evaluated on various benchmark problems.

artificial intelligence, lipschitz constant, machine learning, (16 more...)