AITopics | Dexter, Gregory

Collaborating Authors

Dexter, Gregory

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficient AI in Practice: Training and Deployment of Efficient LLMs for Industry Applications

Behdin, Kayhan, Dai, Yun, Fatahibaarzi, Ata, Gupta, Aman, Song, Qingquan, Tang, Shao, Sang, Hejian, Dexter, Gregory, Zhu, Sirou, Zhu, Siyu, Dharamsi, Tejas, Sanjabi, Maziar, Kothapalli, Vignesh, Firooz, Hamed, Fu, Zhoutong, Cao, Yihan, Hsu, Pin-Lun, Borisyuk, Fedor, Wang, Zhipeng, Mazumder, Rahul, Pillai, Natesh, Simon, Luke

arXiv.org Artificial IntelligenceFeb-20-2025

Large language models (LLMs) have demonstrated remarkable performance across a wide range of industrial applications, from search and recommendations to generative tasks. Although scaling laws indicate that larger models generally yield better generalization and performance, their substantial computational requirements often render them impractical for many real-world scenarios at scale. In this paper, we present methods and insights for training small language models (SLMs) that deliver high performance and efficiency in deployment. We focus on two key techniques: (1) knowledge distillation and (2) model compression via quantization and pruning. These approaches enable SLMs to retain much of the quality of their larger counterparts while significantly reducing training, serving costs, and latency. We detail the impact of these techniques on a variety of use cases at a large professional social network platform and share deployment lessons - including hardware optimization strategies that enhance speed and throughput for both predictive and reasoning-based applications.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.14305

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry: Information Technology (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

The Space Complexity of Approximating Logistic Loss

Dexter, Gregory, Drineas, Petros, Khanna, Rajiv

arXiv.org Artificial IntelligenceDec-3-2024

We provide space complexity lower bounds for data structures that approximate logistic loss up to $\epsilon$-relative error on a logistic regression problem with data $\mathbf{X} \in \mathbb{R}^{n \times d}$ and labels $\mathbf{y} \in \{-1,1\}^d$. The space complexity of existing coreset constructions depend on a natural complexity measure $\mu_\mathbf{y}(\mathbf{X})$, first defined in (Munteanu, 2018). We give an $\tilde{\Omega}(\frac{d}{\epsilon^2})$ space complexity lower bound in the regime $\mu_\mathbf{y}(\mathbf{X}) = O(1)$ that shows existing coresets are optimal in this regime up to lower order factors. We also prove a general $\tilde{\Omega}(d\cdot \mu_\mathbf{y}(\mathbf{X}))$ space lower bound when $\epsilon$ is constant, showing that the dependency on $\mu_\mathbf{y}(\mathbf{X})$ is not an artifact of mergeable coresets. Finally, we refute a prior conjecture that $\mu_\mathbf{y}(\mathbf{X})$ is hard to compute by providing an efficient linear programming formulation, and we empirically compare our algorithm to prior approximate methods.

approximation, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.02639

Genre: Research Report > New Finding (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

Stochastic Rounding Implicitly Regularizes Tall-and-Thin Matrices

Dexter, Gregory, Boutsikas, Christos, Ma, Linkai, Ipsen, Ilse C. F., Drineas, Petros

arXiv.org Artificial IntelligenceMar-18-2024

Motivated by the popularity of stochastic rounding in the context of machine learning and the training of large-scale deep neural network models, we consider stochastic nearness rounding of real matrices $\mathbf{A}$ with many more rows than columns. We provide novel theoretical evidence, supported by extensive experimental evaluation that, with high probability, the smallest singular value of a stochastically rounded matrix is well bounded away from zero -- regardless of how close $\mathbf{A}$ is to being rank deficient and even if $\mathbf{A}$ is rank-deficient. In other words, stochastic rounding \textit{implicitly regularizes} tall and skinny matrices $\mathbf{A}$ so that the rounded version has full column rank. Our proofs leverage powerful results in random matrix theory, and the idea that stochastic rounding errors do not concentrate in low-dimensional column spaces.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2403.12278

Country:

North America > United States > North Carolina (0.14)
North America > United States > Indiana > Tippecanoe County (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

A Precise Characterization of SGD Stability Using Loss Surface Geometry

Dexter, Gregory, Ocejo, Borja, Keerthi, Sathiya, Gupta, Aman, Acharya, Ayan, Khanna, Rajiv

arXiv.org Artificial IntelligenceJan-22-2024

Stochastic Gradient Descent (SGD) stands as a cornerstone optimization algorithm with proven real-world empirical successes but relatively limited theoretical understanding. Recent research has illuminated a key factor contributing to its practical efficacy: the implicit regularization it instigates. Several studies have investigated the linear stability property of SGD in the vicinity of a stationary point as a predictive proxy for sharpness and generalization error in overparameterized neural networks (Wu et al., 2022; Jastrzebski et al., 2019; Cohen et al., 2021). In this paper, we delve deeper into the relationship between linear stability and sharpness. More specifically, we meticulously delineate the necessary and sufficient conditions for linear stability, contingent on hyperparameters of SGD and the sharpness at the optimum. Towards this end, we introduce a novel coherence measure of the loss Hessian that encapsulates pertinent geometric properties of the loss function that are relevant to the linear stability of SGD. It enables us to provide a simplified sufficient condition for identifying linear instability at an optimum. Notably, compared to previous works, our analysis relies on significantly milder assumptions and is applicable for a broader class of loss functions than known before, encompassing not only mean-squared error but also cross-entropy loss.

artificial intelligence, machine learning, stability, (20 more...)

arXiv.org Artificial Intelligence

2401.12332

Country: North America > United States > Indiana > Tippecanoe County (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Add feedback

Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile Streaming

Dexter, Gregory, Drineas, Petros, Woodruff, David P., Yasuda, Taisuke

arXiv.org Artificial IntelligenceOct-29-2023

Sketching algorithms have recently proven to be a powerful approach both for designing low-space streaming algorithms as well as fast polynomial time approximation schemes (PTAS). In this work, we develop new techniques to extend the applicability of sketching-based approaches to the sparse dictionary learning and the Euclidean $k$-means clustering problems. In particular, we initiate the study of the challenging setting where the dictionary/clustering assignment for each of the $n$ input points must be output, which has surprisingly received little attention in prior work. On the fast algorithms front, we obtain a new approach for designing PTAS's for the $k$-means clustering problem, which generalizes to the first PTAS for the sparse dictionary learning problem. On the streaming algorithms front, we obtain new upper bounds and lower bounds for dictionary learning and $k$-means clustering. In particular, given a design matrix $\mathbf A\in\mathbb R^{n\times d}$ in a turnstile stream, we show an $\tilde O(nr/\epsilon^2 + dk/\epsilon)$ space upper bound for $r$-sparse dictionary learning of size $k$, an $\tilde O(n/\epsilon^2 + dk/\epsilon)$ space upper bound for $k$-means clustering, as well as an $\tilde O(n)$ space upper bound for $k$-means clustering on random order row insertion streams with a natural "bounded sensitivity" assumption. On the lower bounds side, we obtain a general $\tilde\Omega(n/\epsilon + dk/\epsilon)$ lower bound for $k$-means clustering, as well as an $\tilde\Omega(n/\epsilon^2)$ lower bound for algorithms which can estimate the cost of a single fixed set of candidate centers.

artificial intelligence, machine learning, probability, (16 more...)

arXiv.org Artificial Intelligence

2310.19068

Country:

Europe (0.67)
North America > United States > California (0.67)
North America > Canada (0.67)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.71)

Add feedback

mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization

Behdin, Kayhan, Song, Qingquan, Gupta, Aman, Keerthi, Sathiya, Acharya, Ayan, Ocejo, Borja, Dexter, Gregory, Khanna, Rajiv, Durfee, David, Mazumder, Rahul

arXiv.org Machine LearningSep-30-2023

Modern deep learning models are over-parameterized, where different optima can result in widely varying generalization performance. The Sharpness-Aware Minimization (SAM) technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima, which are believed to exhibit enhanced generalization prowess. Our study delves into a specific variant of SAM known as micro-batch SAM (mSAM). This variation involves aggregating updates derived from adversarial perturbations across multiple shards (micro-batches) of a mini-batch during training. We extend a recently developed and well-studied general framework for flatness analysis to theoretically show that SAM achieves flatter minima than SGD, and mSAM achieves even flatter minima than SAM. We provide a thorough empirical evaluation of various image classification and natural language processing tasks to substantiate this theoretical advancement. We also show that contrary to previous work, mSAM can be implemented in a flexible and parallelizable manner without significantly increasing computational costs. Our implementation of mSAM yields superior generalization performance across a wide range of tasks compared to SAM, further supporting our theoretical framework.

machine learning, msam, natural language, (17 more...)

arXiv.org Machine Learning

2302.09693

Country:

North America > United States (0.14)
Europe > Belgium (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Feature Space Sketching for Logistic Regression

Dexter, Gregory, Khanna, Rajiv, Raheel, Jawad, Drineas, Petros

arXiv.org Artificial IntelligenceMar-24-2023

All three approaches can be thought of as sketching the logistic regression inputs. On the coreset construction front, we resolve open problems from prior work and present novel bounds for the complexity of coreset construction methods. On the feature selection and dimensionality reduction front, we initiate the study of forward error bounds for logistic regression. Our bounds are tight up to constant factors and our forward error bounds can be extended to Generalized Linear Models.

artificial intelligence, machine learning, regression, (18 more...)

arXiv.org Artificial Intelligence

2303.14284

Country: North America > United States (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.95)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Inverse Reinforcement Learning in the Continuous Setting with Formal Guarantees

Dexter, Gregory, Bello, Kevin, Honorio, Jean

arXiv.org Machine LearningFeb-15-2021

Inverse Reinforcement Learning (IRL) is the problem of finding a reward function which describes observed/known expert behavior. IRL is useful for automated control in situations where the reward function is difficult to specify manually, which impedes reinforcement learning. We provide a new IRL algorithm for the continuous state space setting with unknown transition dynamics by modeling the system using a basis of orthonormal functions. We provide a proof of correctness and formal guarantees on the sample and time complexity of our algorithm.

artificial intelligence, reinforcement learning, reward function, (18 more...)

arXiv.org Machine Learning

2102.07937

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback