vaswani
AltGDmin: Alternating GD and Minimization for Partly-Decoupled (Federated) Optimization
This article describes a novel optimization solution framework, called alternating gradient descent (GD) and minimization (AltGDmin), that is useful for many problems for which alternating minimization (AltMin) is a popular solution. AltMin is a special case of the block coordinate descent algorithm that is useful for problems in which minimization w.r.t one subset of variables keeping the other fixed is closed form or otherwise reliably solved. Denote the two blocks/subsets of the optimization variables Z by Za, Zb, i.e., Z = {Za, Zb}. AltGDmin is often a faster solution than AltMin for any problem for which (i) the minimization over one set of variables, Zb, is much quicker than that over the other set, Za; and (ii) the cost function is differentiable w.r.t. Za. Often, the reason for one minimization to be quicker is that the problem is ``decoupled" for Zb and each of the decoupled problems is quick to solve. This decoupling is also what makes AltGDmin communication-efficient for federated settings. Important examples where this assumption holds include (a) low rank column-wise compressive sensing (LRCS), low rank matrix completion (LRMC), (b) their outlier-corrupted extensions such as robust PCA, robust LRCS and robust LRMC; (c) phase retrieval and its sparse and low-rank model based extensions; (d) tensor extensions of many of these problems such as tensor LRCS and tensor completion; and (e) many partly discrete problems where GD does not apply -- such as clustering, unlabeled sensing, and mixed linear regression. LRCS finds important applications in multi-task representation learning and few shot learning, federated sketching, and accelerated dynamic MRI. LRMC and robust PCA find important applications in recommender systems, computer vision and video analytics.
Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits
Lin, Jiabin, Moothedath, Shana, Vaswani, Namrata
We study how representation learning can improve the learning efficiency of contextual bandit problems. We study the setting where we play T contextual linear bandits with dimension d simultaneously, and these T bandit tasks collectively share a common linear representation with a dimensionality of r much smaller than d. We present a new algorithm based on alternating projected gradient descent (GD) and minimization estimator to recover a low-rank feature matrix. Using the proposed estimator, we present a multi-task learning algorithm for linear contextual bandits and prove the regret bound of our algorithm. We presented experiments and compared the performance of our algorithm against benchmark algorithms.
Noisy Low Rank Column-wise Sensing
Singh, Ankit Pratap, Vaswani, Namrata
This letter studies the AltGDmin algorithm for solving the noisy low rank column-wise sensing (LRCS) problem. Our sample complexity guarantee improves upon the best existing one by a factor $\max(r, \log(1/\epsilon))/r$ where $r$ is the rank of the unknown matrix and $\epsilon$ is the final desired accuracy. A second contribution of this work is a detailed comparison of guarantees from all work that studies the exact same mathematical problem as LRCS, but refers to it by different names.
Was Linguistic A.I. Created by Accident?
In the spring of 2017, in a room on the second floor of Google's Building 1965, a college intern named Aidan Gomez stretched out, exhausted. It was three in the morning, and Gomez and Ashish Vaswani, a scientist focussed on natural language processing, were working on their team's contribution to the Neural Information Processing Systems conference, the biggest annual meeting in the field of artificial intelligence. Along with the rest of their eight-person group at Google, they had been pushing flat out for twelve weeks, sometimes sleeping in the office, on couches by a curtain that had a neuron-like pattern. They were nearing the finish line, but Gomez didn't have the energy to go out to a bar and celebrate. He couldn't have even if he'd wanted to: he was only twenty, too young to drink in the United States.
ChatGPT Spawns Investor Gold Rush in AI
Before their startup had customers, a business plan or even a formal name, former Google AI researchers Niki Parmar and Ashish Vaswani were fielding interest from investors eager to back the next big thing in artificial intelligence. At Google, Ms. Parmar and Mr. Vaswani were among the co-authors of a seminal 2017 paper that helped pave the way for the boom in so-called generative AI. Earlier this year, only weeks after striking out on their own, they raised funds that valued their fledgling company--now called Essential AI--at around $50 million, people familiar with the company said.
"Attention Is All You Need": USC Alumni Paved Path for ChatGPT - USC Viterbi
Niki Parmar and Ashish Vaswani co-authored a seminal paper that set the groundwork for ChatGPT and other generative AI models. ChatGPT has taken the world by storm, but seeds of the groundbreaking technology were sown at the USC Viterbi School of Engineering. The seminal paper "Attention Is All You Need," which laid the foundation for ChatGPT and other generative AI systems, was co-authored by Ashish Vaswani, a PhD computer science graduate ('14) and Niki Parmar, a master's in computer science graduate ('15). The landmark paper was presented at the 2017 Conference on Neural Information Processing Systems (NeurIPS), one of the top conferences in AI and machine learning. In the paper, the researchers introduced the transformer architecture, a powerful type of neural network that has become widely used for natural language processing tasks, from text classification to language modeling.
What Is a Transformer Model?
If you want to ride the next big wave in AI, grab a transformer. A transformer model is a neural network that learns context and thus meaning by tracking relationships in sequential data like the words in this sentence. Transformer models apply an evolving set of mathematical techniques, called attention or self-attention, to detect subtle ways even distant data elements in a series influence and depend on each other. First described in a 2017 paper from Google, transformers are among the newest and one of the most powerful classes of models invented to date. They're driving a wave of advances in machine learning some have dubbed transformer AI.