AITopics | theta

Collaborating Authors

theta

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

b125999bde7e80910cbdbd323087df8f-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 09:47:30 GMT

Foreachprompt, wecompare 6 pairs of models: Quark versus other baselines, as shown in Table 2. These agreement scores are moderate as result of subjectivity involved in ratings of text quality. PPLM (Plug and Play Language Model) uses one or more classifiers to control attributes of model generations. Figure 8: Screenshot of the mechanical turk interfaced used to gather human judgments for the sentimentevaluation. Unlikelihood represents a GPT-2 model fine-tuned with unlikelihoodobjective(Eqn.5)[79].

artificial intelligence, chatbot, natural language, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.40)

Add feedback

A Trichotomy for Transductive Online Learning

Neural Information Processing SystemsDec-24-2025, 19:53:29 GMT

This setting is similar to standard online learning, except that the adversary fixes a sequence of instances $x_1,\dots,x_n$ to be labeled at the start of the game, and this sequence is known to the learner. Qualitatively, we prove a \emph{trichotomy}, stating that the minimal number of mistakes made by the learner as $n$ grows can take only one of precisely three possible values: $n$, $\Theta\left(\log (n)\right)$, or $\Theta(1)$. Furthermore, this behavior is determined by a combination of the VC dimension and the Littlestone dimension. Quantitatively, we show a variety of bounds relating the number of mistakes to well-known combinatorial dimensions. In particular, we improve the known lower bound on the constant in the $\Theta(1)$ case from $\Omega\left(\sqrt{\log(d)}\right)$ to $\Omega(\log(d))$ where $d$ is the Littlestone dimension. Finally, we extend our results to cover multiclass classification and the agnostic setting.

name change, transductive online learning, trichotomy, (8 more...)

Neural Information Processing Systems

Industry: Education > Educational Setting > Online (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

The Semi-Random Satisfaction of Voting Axioms

Neural Information Processing SystemsDec-23-2025, 23:11:45 GMT

We initiate the work towards a comprehensive picture of the worst average-case satisfaction of voting axioms in semi-random models, to provide a finer and more realistic foundation for comparing voting rules. We adopt the semi-random model and formulation in [Xia 2020], where an adversary chooses arbitrarily correlated ``ground truth'' preferences for the agents, on top of which random noises are added. We focus on characterizing the semi-random satisfaction of two well-studied voting axioms: Condorcet criterion and participation. We prove that for any fixed number of alternatives, when the number of voters $n$ is sufficiently large, the semi-random satisfaction of the Condorcet criterion under a wide range of voting rules is $1$, $1-\exp(-\Theta(n))$, $\Theta(n^{-0.5})$,

name change, semi-random satisfaction, theta, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

Stability and Deviation Optimal Risk Bounds with Convergence Rate O(1/n)

Neural Information Processing SystemsDec-23-2025, 22:11:11 GMT

The sharpest known high probability generalization bounds for uniformly stable algorithms (Feldman, Vondrak, NeurIPS 2018, COLT, 2019), (Bousquet, Klochkov, Zhivotovskiy, COLT, 2020) contain a generally inevitable sampling error term of order $\Theta(1/\sqrt{n})$. When applied to excess risk bounds, this leads to suboptimal results in several standard stochastic convex optimization problems. We show that if the so-called Bernstein condition is satisfied, the term $\Theta(1/\sqrt{n})$ can be avoided, and high probability excess risk bounds of order up to $O(1/n)$ are possible via uniform stability. Using this result, we show a high probability excess risk bound with the rate $O(\log n/n)$ for strongly convex and Lipschitz losses valid for \emph{any} empirical risk minimization method.

deviation optimal risk bound, high probability excess risk, name change, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.58)

Add feedback

Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space

Neural Information Processing SystemsDec-23-2025, 21:03:47 GMT

Models of many real-life applications, such as queueing models of communication networks or computing systems, have a countably infinite state-space. Algorithmic and learning procedures that have been developed to produce optimal policies mainly focus on finite state settings, and do not directly apply to these models. To overcome this lacuna, in this work we study the problem of optimal control of a family of discrete-time countable state-space Markov Decision Processes (MDPs) governed by an unknown parameter $\theta\in\Theta$, and defined on a countably-infinite state-space $\mathcal X=\mathbb{Z}_+^d$, with finite action space $\mathcal A$, and an unbounded cost function. We take a Bayesian perspective with the random unknown parameter $\boldsymbol{\theta}^*$ generated via a given fixed prior distribution on $\Theta$. To optimally control the unknown MDP, we propose an algorithm based on Thompson sampling with dynamically-sized episodes: at the beginning of each episode, the posterior distribution formed via Bayes' rule is used to produce a parameter estimate, which then decides the policy applied during the episode. To ensure the stability of the Markov chain obtained by following the policy chosen for each parameter, we impose ergodicity assumptions. From this condition and using the solution of the average cost Bellman equation, we establish an $\tilde O(dh^d\sqrt{|\mathcal A|T})$ upper bound on the Bayesian regret of our algorithm, where $T$ is the time-horizon. Finally, to elucidate the applicability of our algorithm, we consider two different queueing models with unknown dynamics, and show that our algorithm can be applied to develop approximately optimal control algorithms.

Add feedback

Beyond the Birkhoff Polytope: Convex Relaxations for Vector Permutation Problems

Neural Information Processing SystemsSep-30-2025, 09:26:27 GMT

The Birkhoff polytope (the convex hull of the set of permutation matrices), which is represented using $\Theta(n^2)$ variables and constraints, is frequently invoked in formulating relaxations of optimization problems over permutations. Using a recent construction of Goemans (2010), we show that when optimizing over the convex hull of the permutation vectors (the permutahedron), we can reduce the number of variables and constraints to $\Theta(n \log n)$ in theory and $\Theta(n \log^2 n)$ in practice. We modify the recent convex formulation of the 2-SUM problem introduced by Fogel et al. (2013) to use this polytope, and demonstrate how we can attain results of similar quality in significantly less computational time for large $n$. To our knowledge, this is the first usage of Goemans' compact formulation of the permutahedron in a convex optimization problem. We also introduce a simpler regularization scheme for this convex formulation of the 2-SUM problem that yields good empirical results.

birkhoff polytope, convex relaxation, vector permutation problem, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.64)

Add feedback

A Near-optimal Algorithm for Learning Margin Halfspaces with Massart Noise

Neural Information Processing SystemsMay-27-2025, 11:02:01 GMT

We study the problem of PAC learning \gamma -margin halfspaces in the presence of Massart noise. Without computational considerations, the sample complexity of this learning problem is known to be \widetilde{\Theta}(1/(\gamma 2 \epsilon)) . Prior computationally efficient algorithms for the problem incur sample complexity \tilde{O}(1/(\gamma 4 \epsilon 3)) and achieve 0-1 error of \eta \epsilon, where \eta 1/2 is the upper bound on the noise rate.Recent work gave evidence of an information-computation tradeoff, suggesting that a quadratic dependence on 1/\epsilon is required for computationally efficient algorithms. Our main result is a computationally efficient learner with sample complexity \widetilde{\Theta}(1/(\gamma 2 \epsilon 2)), nearly matching this lower bound. In addition, our algorithm is simple and practical, relying on online SGD on a carefully selected sequence of convex losses.

learning margin halfspace, massart noise, near-optimal algorithm, (4 more...)

Neural Information Processing Systems

Industry: Education > Focused Education > Special Education (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.86)

Add feedback

Reviews: On Neuronal Capacity

Neural Information Processing SystemsOct-7-2024, 22:12:38 GMT

The main result of the paper is giving tight bounds on the number of n-variable boolean functions that can be expressed as degree d-polynomial threshold functions for any fixed d (d growing very mildly with n) also works. To me the rest of the results while interesting seem to be mostly easy applications of the key result to other more fashionable neural network models. However, the tightness of the main result does not translate to tight bounds when other neuroidal models are considered, because of the kind of non-linearities or weight constraints involved. The main result is highly non-trivial, the proof quite lengthy though elegant, and resolves a 25 year old open problem. Although the proof uses a lot of heavy mathematics, the key contribution seems to be generalizing random matrix theory to a random tensor theory -- the key result being that a large number of stochastically independent random vectors in low-dimension (and hence clearly not linearly indpendent) still yield a high degree of linear independence when tensorized.

main result, neuronal capacity, review, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.58)

Add feedback

Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs

Johnson, Daniel D., Tarlow, Daniel, Duvenaud, David, Maddison, Chris J.

arXiv.org Artificial IntelligenceFeb-13-2024

Identifying how much a model ${\widehat{p}}_{\theta}(Y|X)$ knows about the stochastic real-world process $p(Y|X)$ it was trained on is important to ensure it avoids producing incorrect or "hallucinated" answers or taking unsafe actions. But this is difficult for generative models because probabilistic predictions do not distinguish between per-response noise (aleatoric uncertainty) and lack of knowledge about the process (epistemic uncertainty), and existing epistemic uncertainty quantification techniques tend to be overconfident when the model underfits. We propose a general strategy for teaching a model to both approximate $p(Y|X)$ and also estimate the remaining gaps between ${\widehat{p}}_{\theta}(Y|X)$ and $p(Y|X)$: train it to predict pairs of independent responses drawn from the true conditional distribution, allow it to "cheat" by observing one response while predicting the other, then measure how much it cheats. Remarkably, we prove that being good at cheating (i.e. cheating whenever it improves your prediction) is equivalent to being second-order calibrated, a principled extension of ordinary calibration that allows us to construct provably-correct frequentist confidence intervals for $p(Y|X)$ and detect incorrect responses with high probability. We demonstrate empirically that our approach accurately estimates how much models don't know across ambiguous image classification, (synthetic) language modeling, and partially-observable navigation tasks, outperforming existing techniques.

calibration, learning, probability, (16 more...)

arXiv.org Artificial Intelligence

2402.08733

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Portugal > Porto > Porto (0.04)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)

Genre: Research Report > Experimental Study (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

The Edge of Orthogonality: A Simple View of What Makes BYOL Tick

Richemond, Pierre H., Tam, Allison, Tang, Yunhao, Strub, Florian, Piot, Bilal, Hill, Felix

arXiv.org Artificial IntelligenceFeb-9-2023

Self-predictive unsupervised learning methods such as BYOL or SimSiam have shown impressive results, and counter-intuitively, do not collapse to trivial representations. In this work, we aim at exploring the simplest possible mathematical arguments towards explaining the underlying mechanisms behind self-predictive unsupervised learning. We start with the observation that those methods crucially rely on the presence of a predictor network (and stop-gradient). With simple linear algebra, we show that when using a linear predictor, the optimal predictor is close to an orthogonal projection, and propose a general framework based on orthonormalization that enables to interpret and give intuition on why BYOL works. In addition, this framework demonstrates the crucial role of the exponential moving average and stop-gradient operator in BYOL as an efficient orthonormalization mechanism. We use these insights to propose four new \emph{closed-form predictor} variants of BYOL to support our analysis. Our closed-form predictors outperform standard linear trainable predictor BYOL at $100$ and $300$ epochs (top-$1$ linear accuracy on ImageNet).

artificial intelligence, machine learning, predictor, (19 more...)

arXiv.org Artificial Intelligence

2302.04817

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.34)

Add feedback