AITopics | egd

Collaborating Authors

egd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Egalitarian Gradient Descent: A Simple Approach to Accelerated Grokking

Pasand, Ali Saheb, Dohmatob, Elvis

arXiv.org Artificial IntelligenceOct-7-2025

Grokking is the phenomenon whereby, unlike the training performance, which peaks early in the training process, the test/generalization performance of a model stagnates over arbitrarily many epochs and then suddenly jumps to usually close to perfect levels. In practice, it is desirable to reduce the length of such plateaus, that is to make the learning process "grok" faster. In this work, we provide new insights into grokking. First, we show both empirically and theoretically that grokking can be induced by asymmetric speeds of (stochastic) gradient descent, along different principal (i.e singular directions) of the gradients. We then propose a simple modification that normalizes the gradients so that dynamics along all the principal directions evolves at exactly the same speed. Then, we establish that this modified method, which we call egalitarian gradient descent (EGD) and can be seen as a carefully modified form of natural gradient descent, groks much faster. In fact, in some cases the stagnation is completely removed. Finally, we empirically show that on classical arithmetic problems such as modular addition and sparse parity problem which this stagnation has been widely observed and intensively studied, that our proposed method eliminates the plateaus.

artificial intelligence, arxivpreprintarxiv, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.0493

Genre: Research Report (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.97)

Add feedback

Aioli: A Unified Optimization Framework for Language Model Data Mixing

Chen, Mayee F., Hu, Michael Y., Lourie, Nicholas, Cho, Kyunghyun, Ré, Christopher

arXiv.org Machine LearningNov-8-2024

Language model performance depends on identifying the optimal mixture of data groups to train on (e.g., law, code, math). Prior work has proposed a diverse set of methods to efficiently learn mixture proportions, ranging from fitting regression models over training runs to dynamically updating proportions throughout training. Surprisingly, we find that no existing method consistently outperforms a simple stratified sampling baseline in terms of average test perplexity per group. In this paper, we study the cause of this inconsistency by unifying existing methods into a standard optimization framework. We show that all methods set proportions to minimize total loss, subject to a method-specific mixing law -- an assumption on how loss is a function of mixture proportions. We find that existing parameterizations of mixing laws can express the true loss-proportion relationship empirically, but the methods themselves often set the mixing law parameters inaccurately, resulting in poor and inconsistent performance. Finally, we leverage the insights from our framework to derive a new online method named Aioli, which directly estimates the mixing law parameters throughout training and uses them to dynamically adjust proportions. Empirically, Aioli outperforms stratified sampling on 6 out of 6 datasets by an average of 0.28 test perplexity points, whereas existing methods fail to consistently beat stratified sampling, doing up to 6.9 points worse. Moreover, in a practical setting where proportions are learned on shorter runs due to computational constraints, Aioli can dynamically adjust these proportions over the full training run, consistently improving performance over existing methods by up to 12.01 test perplexity points.

ioli, proportion, val, (16 more...)

arXiv.org Machine Learning

2411.05735

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Asia > Middle East > Jordan (0.04)
(13 more...)

Genre: Research Report (0.83)

Industry:

Information Technology (0.67)
Semiconductors & Electronics (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

Convex optimization over a probability simplex

Chok, James, Vasil, Geoffrey M.

arXiv.org Artificial IntelligenceMay-15-2023

We propose a new iteration scheme, the Cauchy-Simplex, to optimize convex problems over the probability simplex $\{w\in\mathbb{R}^n\ |\ \sum_i w_i=1\ \textrm{and}\ w_i\geq0\}$. Other works have taken steps to enforce positivity or unit normalization automatically but never simultaneously within a unified setting. This paper presents a natural framework for manifestly requiring the probability condition. Specifically, we map the simplex to the positive quadrant of a unit sphere, envisage gradient descent in latent variables, and map the result back in a way that only depends on the simplex variable. Moreover, proving rigorous convergence results in this formulation leads inherently to tools from information theory (e.g. cross entropy and KL divergence). Each iteration of the Cauchy-Simplex consists of simple operations, making it well-suited for high-dimensional problems. We prove that it has a convergence rate of ${O}(1/T)$ for convex functions, and numerical experiments of projection onto convex hulls show faster convergence than similar algorithms. Finally, we apply our algorithm to online learning problems and prove the convergence of the average regret for (1) Prediction with expert advice and (2) Universal Portfolios.

constraint, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2305.09046

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(6 more...)

Genre: Research Report (0.40)

Industry:

Banking & Finance > Trading (0.68)
Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)

Add feedback

Depth Without the Magic: Inductive Bias of Natural Gradient Descent

Kerekes, Anna, Mészáros, Anna, Huszár, Ferenc

arXiv.org Machine LearningNov-22-2021

In gradient descent, changing how we parametrize the model can lead to drastically different optimization trajectories, giving rise to a surprising range of meaningful inductive biases: identifying sparse classifiers or reconstructing low-rank matrices without explicit regularization. This implicit regularization has been hypothesised to be a contributing factor to good generalization in deep learning. However, natural gradient descent is approximately invariant to reparameterization, it always follows the same trajectory and finds the same optimum. The question naturally arises: What happens if we eliminate the role of parameterization, which solution will be found, what new properties occur? We characterize the behaviour of natural gradient flow in deep linear networks for separable classification under logistic loss and deep matrix factorization. Some of our findings extend to nonlinear neural networks with sufficient but finite over-parametrization. We demonstrate that there exist learning problems where natural gradient descent fails to generalize, while gradient descent with the right architecture performs well.

fisher information matrix, gradient descent, parametrization, (12 more...)

arXiv.org Machine Learning

2111.11542

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Hungary (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Harmless but Useful: Beyond Separable Equality Constraints in Datalog+/-

Bellomarini, Luigi, Sallinger, Emanuel

arXiv.org Artificial IntelligenceMay-24-2021

Ontological query answering is the problem of answering queries in the presence of schema constraints representing the domain of interest. Datalog+/- is a common family of languages for schema constraints, including tuple-generating dependencies (TGDs) and equality-generating dependencies (EGDs). The interplay of TGDs and EGDs leads to undecidability or intractability of query answering when adding EGDs to tractable Datalog+/- fragments, like Warded Datalog+/-, for which, in the sole presence of TGDs, query answering is PTIME in data complexity. There have been attempts to limit the interaction of TGDs and EGDs and guarantee tractability, in particular with the introduction of separable EGDs, to make EGDs irrelevant for query answering as long as the set of constraints is satisfied. While being tractable, separable EGDs have limited expressive power. We propose a more general class of EGDs, which we call ``harmless'', that subsume separable EGDs and allow to model a much broader class of problems. Unlike separable EGDs, harmless EGDs, besides enforcing ground equality constraints, specialize the query answer by grounding or renaming the labelled nulls introduced by existential quantification in the TGDs. Harmless EGDs capture the cases when the answer obtained in the presence of EGDs is less general than the one obtained with TGDs only. We conclude that the theoretical problem of deciding whether a set of constraints contains harmless EGDs is undecidable. We contribute a sufficient syntactic condition characterizing harmless EGDs, broad and useful in practice. We focus on Warded Datalog+/- with harmless EGDs and argue that, in such fragment, query answering is decidable and PTIME in data complexity. We study chase-based techniques for query answering in Warded Datalog+/- with harmless EGDs, conducive to an efficient algorithm to be implemented in state-of-the-art reasoners.

egd, harmless egd, labelled null, (15 more...)

arXiv.org Artificial Intelligence

2105.11147

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)

Add feedback

Randomised controlled trial of WISENSE, a real-time quality improving system for monitoring blind spots during esophagogastroduodenoscopy

#artificialintelligenceMar-13-2019, 21:43:13 GMT

Esophagogastroduodenoscopy (EGD) is the pivotal procedure in the diagnosis of upper gastrointestinal lesions.1 High-quality endoscopy delivers better health outcomes.2 However, there are significant variations in EGD performance among endoscopists, impairing the discovery rate of gastric cancers (GC) and precursor lesions.3 The diagnosis rate of early GC in China is still under 20% and similar results are seen in most part of the world.4 5 While further expanding endoscopic technology, it is vital to raise the quality of everyday endoscopy. Ensuring competence is a seminal prerequisite for discovering lesions in EGD.6 Plenty of guidelines or expert consensus have been reached to optimise EGD examination.7 The American Society for Gastrointestinal Endoscopy (ASGE) and American College of Gastroenterology (ACG) developed safety and quality indicators for EGD.8–10

artificial intelligence, machine learning, wisense, (14 more...)

#artificialintelligence

Country: Asia > China (0.26)

Genre:

Research Report > Experimental Study (0.61)
Research Report > Strength High (0.42)

Industry:

Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.42)

Add feedback

Inference and Mixture Modeling with the Elliptical Gamma Distribution

Hosseini, Reshad, Sra, Suvrit, Theis, Lucas, Bethge, Matthias

arXiv.org Machine LearningDec-20-2015

We study modeling and inference with the Elliptical Gamma Distribution (EGD). We consider maximum likelihood (ML) estimation for EGD scatter matrices, a task for which we develop new fixed-point algorithms. Our algorithms are efficient and converge to global optima despite nonconvexity. Moreover, they turn out to be much faster than both a well-known iterative algorithm of Kent & Tyler (1991) and sophisticated manifold optimization algorithms. Subsequently, we invoke our ML algorithms as subroutines for estimating parameters of a mixture of EGDs. We illustrate our methods by applying them to model natural image statistics---the proposed EGD mixture model yields the most parsimonious model among several competing approaches.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

1410.4812

Country:

Europe (0.68)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback