AITopics | Kavis, Ali

Collaborating Authors

Kavis, Ali

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting

Sanyal, Sunny, Prairie, Hayden, Das, Rudrajit, Kavis, Ali, Sanghavi, Sujay

arXiv.org Machine LearningFeb-4-2025

Fine-tuning a pre-trained model on a downstream task often degrades its original capabilities, a phenomenon known as "catastrophic forgetting". This is especially an issue when one does not have access to the data and recipe used to develop the pre-trained model. Under this constraint, most existing methods for mitigating forgetting are inapplicable. To address this challenge, we propose a sample weighting scheme for the fine-tuning data solely based on the pre-trained model's losses. Specifically, we upweight the easy samples on which the pre-trained model's loss is low and vice versa to limit the drift from the pre-trained model. Our approach is orthogonal and yet complementary to existing methods; while such methods mostly operate on parameter or gradient space, we concentrate on the sample space. We theoretically analyze the impact of fine-tuning with our method in a linear setting, showing that it stalls learning in a certain subspace which inhibits overfitting to the target task. We empirically demonstrate the efficacy of our method on both language and vision tasks. As an example, when fine-tuning Gemma 2 2B on MetaMathQA, our method results in only a $0.8\%$ drop in accuracy on GSM8K (another math dataset) compared to standard fine-tuning, while preserving $5.4\%$ more accuracy on the pre-training datasets. Our code is publicly available at https://github.com/sanyalsunny111/FLOW_finetuning .

large language model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2502.02797

Country:

North America > United States > Texas (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Education (0.46)
Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
(2 more...)

Add feedback

Understanding Contrastive Learning via Gaussian Mixture Models

Bansal, Parikshit, Kavis, Ali, Sanghavi, Sujay

arXiv.org Artificial IntelligenceNov-5-2024

Contrastive learning attempts to learn representations from un-labeled data; it does so via a loss function that encourages the embedding of a point to be close to that of its augmentations, and far from the embeddings of random other points. This simple idea performs remarkably well, yet it is not precisely theoretically understood why this is the case. In this paper we analyze contrastive learning (specifically, the InfoNCE loss) in a natural context: dimensionality reduction in Gaussian Mixture Models. Crucially, we define an augmentation of a data point as being another independent draw from the same underlying mixture component. We show that vanilla InfoNCE is able to find the optimal lower-dimensional subspace even when the Gaussians are not isotropic -- something that vanilla spectral techniques cannot do. We further extend our analyses to multi-modal contrastive learning algorithms (e.g., CLIP). In this setting we show that contrastive learning learns the subset of fisher-optimal subspace, effectively filtering out all the noise from the learnt representations.

artificial intelligence, machine learning, subspace, (11 more...)

arXiv.org Artificial Intelligence

2411.03517

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)

Add feedback

Adaptive and Optimal Second-order Optimistic Methods for Minimax Optimization

Jiang, Ruichen, Kavis, Ali, Jin, Qiujiang, Sanghavi, Sujay, Mokhtari, Aryan

arXiv.org Machine LearningJun-4-2024

We propose adaptive, line search-free second-order methods with optimal rate of convergence for solving convex-concave min-max problems. By means of an adaptive step size, our algorithms feature a simple update rule that requires solving only one linear system per iteration, eliminating the need for line search or backtracking mechanisms. Specifically, we base our algorithms on the optimistic method and appropriately combine it with second-order information. Moreover, distinct from common adaptive schemes, we define the step size recursively as a function of the gradient norm and the prediction error in the optimistic update. We first analyze a variant where the step size requires knowledge of the Lipschitz constant of the Hessian. Under the additional assumption of Lipschitz continuous gradients, we further design a parameter-free version by tracking the Hessian Lipschitz constant locally and ensuring the iterates remain bounded. We also evaluate the practical performance of our algorithm by comparing it to existing second-order algorithms for minimax optimization.

artificial intelligence, inequality, machine learning, (19 more...)

arXiv.org Machine Learning

2406.02016

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Extra-Newton: A First Approach to Noise-Adaptive Accelerated Second-Order Methods

Antonakopoulos, Kimon, Kavis, Ali, Cevher, Volkan

arXiv.org Artificial IntelligenceDec-12-2022

This work proposes a universal and adaptive second-order method for minimizing second-order smooth, convex functions. Our algorithm achieves $O(\sigma / \sqrt{T})$ convergence when the oracle feedback is stochastic with variance $\sigma^2$, and improves its convergence to $O( 1 / T^3)$ with deterministic oracles, where $T$ is the number of iterations. Our method also interpolates these rates without knowing the nature of the oracle apriori, which is enabled by a parameter-free adaptive step-size that is oblivious to the knowledge of smoothness modulus, variance bounds and the diameter of the constrained set. To our knowledge, this is the first universal algorithm with such global guarantees within the second-order optimization literature.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2211.01832

Country: Europe (0.28)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems

Mertikopoulos, Panayotis, Hallak, Nadav, Kavis, Ali, Cevher, Volkan

arXiv.org Machine LearningJun-19-2020

This paper analyzes the trajectories of stochastic gradient descent (SGD) to help understand the algorithm's convergence properties in non-convex problems. We first show that the sequence of iterates generated by SGD remains bounded and converges with probability $1$ under a very broad range of step-size schedules. Subsequently, going beyond existing positive probability guarantees, we show that SGD avoids strict saddle points/manifolds with probability $1$ for the entire spectrum of step-size policies considered. Finally, we prove that the algorithm's rate of convergence to Hurwicz minimizers is $\mathcal{O}(1/n^{p})$ if the method is employed with a $\Theta(1/n^p)$ step-size schedule. This provides an important guideline for tuning the algorithm's step-size as it suggests that a cool-down phase with a vanishing step-size could lead to faster convergence; we demonstrate this heuristic using ResNet architectures on CIFAR.

artificial intelligence, machine learning, probability 1, (15 more...)

arXiv.org Machine Learning

2006.11144

Country:

Europe (0.67)
North America > United States > New York (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Mirrored Langevin Dynamics

Hsieh, Ya-Ping, Kavis, Ali, Rolland, Paul, Cevher, Volkan

Neural Information Processing SystemsDec-31-2018

We consider the problem of sampling from constrained distributions, which has posed significant challenges to both non-asymptotic analysis and algorithmic design. We propose a unified framework, which is inspired by the classical mirror descent, to derive novel first-order sampling schemes. We prove that, for a general target distribution with strongly convex potential, our framework implies the existence of a first-order algorithm achieving O~(\epsilon^{-2}d) convergence, suggesting that the state-of-the-art O~(\epsilon^{-6}d^5) can be vastly improved. With the important Latent Dirichlet Allocation (LDA) application in mind, we specialize our algorithm to sample from Dirichlet posteriors, and derive the first non-asymptotic O~(\epsilon^{-2}d^2) rate for first-order sampling. We further extend our framework to the mini-batch setting and prove convergence rates when only stochastic gradients are available. Finally, we report promising experimental results for LDA on real datasets.

artificial intelligence, langevin dynamic, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.14)
Europe > Switzerland (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.37)

Add feedback

Mirrored Langevin Dynamics

Hsieh, Ya-Ping, Kavis, Ali, Rolland, Paul, Cevher, Volkan

Neural Information Processing SystemsDec-31-2018

artificial intelligence, langevin dynamic, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.14)
Europe > Switzerland (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.37)

Add feedback

Efficient learning of smooth probability functions from Bernoulli tests with guarantees

Rolland, Paul, Kavis, Ali, Singla, Adish, Cevher, Volkan

arXiv.org Machine LearningDec-11-2018

We study the fundamental problem of learning an unknown, smooth probability function via point-wise Bernoulli tests. We provide the first scalable algorithm for efficiently solving this problem with rigorous guarantees. In particular, we prove the convergence rate of our posterior update rule to the true probability function in L2-norm. Moreover, we allow the Bernoulli tests to depend on contextual features, and provide a modified inference engine with provable guarantees for this novel setting. Numerical results show that the empirical convergence rates match the theory, and illustrate the superiority of our approach in handling contextual features over the state-of-the-art.

artificial intelligence, health & medicine, probability function, (19 more...)

arXiv.org Machine Learning

1812.04428

Country: North America > United States (0.14)

Genre:

Research Report > Experimental Study (0.47)
Research Report > New Finding (0.34)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)

Add feedback