AITopics | cosh

Collaborating Authors

cosh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Nonlinearly Preconditioned Gradient Methods: Momentum and Stochastic Analysis

Neural Information Processing SystemsJun-16-2026, 07:48:25 GMT

We study nonlinearly preconditioned gradient methods for smooth nonconvex optimization problems, focusing on sigmoid preconditioners that inherently perform a form of gradient clipping akin to the widely used gradient clipping technique. Building upon this idea, we introduce a novel heavy ball-type algorithm and provide convergence guarantees under a generalized smoothness condition that is less restrictive than traditional Lipschitz smoothness, thus covering a broader class of functions. Additionally, we develop a stochastic variant of the base method and study its convergence properties under different noise assumptions. We compare the proposed algorithms with baseline methods on diverse tasks from machine learning including neural network training.

artificial intelligence, inequality, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Limitations of Learning Tanh Neural Networks with Finite Precision

Grohs, Philipp, Trödler, Matěj

arXiv.org Machine LearningJun-12-2026

We investigate limitations of learning $\tanh$ neural networks from point evaluations under finite-precision computations and $L^p$ accuracy guarantees, building on Berner, Grohs, and Voigtländer (2023). Our approach is based on a novel construction of sharply localized bump functions via iterated $\tanh$ activations. Using this mechanism, we show that, in a finite-precision setting, no adaptive randomized algorithm based on $m$ samples can achieve a convergence rate higher than the Monte Carlo rate $O(m^{-1/p})$ in the $L^p$ norm, unless the sampling budget grows exponentially with the size of the network parameters and architecture. The results reveal fundamental limitations imposed by finite precision on the learnability of classes containing localized bump functions, extending previous results for ReLU networks to the $\tanh$ setting.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2606.11104

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Response Time Enhances Alignment with Heterogeneous Preferences

Echenique, Federico, Fallah, Alireza, Huang, Baihe, Jordan, Michael I.

arXiv.org Machine LearningMay-11-2026

Aligning large language models (LLMs) to human preferences typically relies on aggregating pooled feedback into a single reward model. However, this standard approach assumes that all labelers share the same underlying preferences, ignoring the fact that real-world labelers are highly heterogeneous and usually anonymous. Consequently, relying solely on binary choice data fundamentally distorts the learned policy, making the true population-average preference unidentifiable. To overcome this critical limitation, we demonstrate that augmenting preference datasets with a simple, secondary signal -- the user's response time -- can restore the identifiability of the population's average preference. By modeling each decision as a Drift-Diffusion Model (DDM), we introduce a novel, consistent estimator of heterogeneous preferences that successfully corrects the distortions of standard choice-only labels. We prove that our estimator asymptotically converges to the true average preference even in extreme cases where each anonymous labeler contributes only a single choice. Empirically, across both synthetic and real-world datasets, our method consistently outperforms standard baselines that otherwise fail and plateau at a bias floor. Because response times are essentially free to record and require zero user tracking or identification, our results bring promises and open up new opportunities for future data-collection pipelines to improve the social benefit without requiring user-level identifiers or repeated elicitations.

large language model, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2605.06987

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

LiftingWeakSupervisionToStructuredPrediction

Neural Information Processing SystemsFeb-12-2026, 20:57:37 GMT

For labels taking values in a finite metric space, we introduce techniques new to weak supervision based on pseudo-Euclidean embeddings andtensor decompositions, providing anearly-consistent noise rate estimator.

artificial intelligence, cosh, machine learning, (18 more...)

Neural Information Processing Systems

Country:

South America > Brazil (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.36)

Add feedback

a8901c5e85fb8e1823bbf0f755053672-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 05:42:09 GMT

Unlike in Metamath or Lean, we do not have access to a training set of human annotated proofs for this environment.

artificial intelligence, cosh, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.88)

Add feedback

2c15b0221da28bc6f4373a7e78b896dd-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 05:06:06 GMT

freedman, inequality, log 3, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

ContinualImageCaptioning

Neural Information Processing SystemsFeb-10-2026, 04:06:29 GMT

These images are taken from the last task, so there is no catastrophic interference.

artificial intelligence, machine learning, validation, (14 more...)

Neural Information Processing Systems

Country:

Europe > Spain (0.05)
Europe > Italy (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

7c40c5050bd029a3ea7ff8b01412f735-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 03:44:14 GMT

Additional notation For a matrix A Rd1 d2, A op is the operator norm (with respect to Euclidean norms), and A F istheFrobenius norm ofA. The main intuition behind the HMM considered in this paper comes from the correlation decay phenomenon ingraphicalmodel. Informally, we expect that there is one sign flip (i.e., Si = Si+1) per 1δ samples. To begin with the analysis of the estimator in Figure 2, the following lemma is a simple, yet key tool for the proof. It establishes the variance of the random gainS.

artificial intelligence, divergence, xn1, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.47)

Add feedback

70431e77d378d760c3c5456519f06efe-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 21:02:23 GMT

graph, ising model, sinh 2, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > Canada (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)

Add feedback

Convexity Certificates from Hessians (Supplementary Material)

Neural Information Processing SystemsFeb-8-2026, 03:15:15 GMT

The formal language for mathematical expressions to which our certification algorithm is applied is specified by the grammar depicted in Figure 1. The language is rich enough to cover all the examples in the main paper and this supplement. In this grammar, number is a placeholder for an arbitrary floating point number, variable is a placeholder for variable names starting with a Latin character and function is a placeholder for the supported elementary differentiable functions like exp,log and sum. Here, is used for transposition and a preceding . Here are some examples from the language (the fist example uses a transposition and the fifth and seventh example use elementwise operations): 2-norm Xw y 2: (X*w-y)'*(X*w-y) logistic log(1+exp(x)): log(1+exp(x)) 1 quadratic x2: x^2 relative entropy xlog(x/y): x*log(x/y), x>0, y>0 logistic regression Our implementation of the Hessian approach works on vectorized and normalized expression DAGs (directed acyclic graphs) for Hessians that contain every subexpression exactly once.

artificial intelligence, exp, programming language, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence (0.55)
Information Technology > Software > Programming Languages (0.35)

Add feedback