AITopics | vaswani

Collaborating Authors

vaswani

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Stand-Alone Self-Attention in Vision Models

Niki Parmar, Prajit Ramachandran, Ashish Vaswani, Irwan Bello, Anselm Levskaya, Jon Shlens

Neural Information Processing SystemsFeb-11-2026, 21:07:50 GMT

Detailed ablation studies demonstrate thatself-attention isespecially impactful when used inlater layers. These results establish that stand-alone self-attention is an important addition to the vision practitioner'stoolbox.

artificial intelligence, arxivpreprintarxiv, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > Canada > Ontario > Toronto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Mixer

Neural Information Processing SystemsFeb-11-2026, 04:53:57 GMT

Recently, attention-based networks, such as the Vision Transformer, have also become popular.

artificial intelligence, arxivpreprintarxiv, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

AssociatingObjectswithTransformersfor VideoObjectSegmentation

Neural Information Processing SystemsFeb-7-2026, 14:23:56 GMT

To solvetheproblem, weproposeanAssociating ObjectswithTransformers (AOT) approach to match and decode multiple objects uniformly.

artificial intelligence, machine learning, mechanism, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

AltGDmin: Alternating GD and Minimization for Partly-Decoupled (Federated) Optimization

Vaswani, Namrata

arXiv.org Artificial IntelligenceApr-22-2025

This article describes a novel optimization solution framework, called alternating gradient descent (GD) and minimization (AltGDmin), that is useful for many problems for which alternating minimization (AltMin) is a popular solution. AltMin is a special case of the block coordinate descent algorithm that is useful for problems in which minimization w.r.t one subset of variables keeping the other fixed is closed form or otherwise reliably solved. Denote the two blocks/subsets of the optimization variables Z by Za, Zb, i.e., Z = {Za, Zb}. AltGDmin is often a faster solution than AltMin for any problem for which (i) the minimization over one set of variables, Zb, is much quicker than that over the other set, Za; and (ii) the cost function is differentiable w.r.t. Za. Often, the reason for one minimization to be quicker is that the problem is ``decoupled" for Zb and each of the decoupled problems is quick to solve. This decoupling is also what makes AltGDmin communication-efficient for federated settings. Important examples where this assumption holds include (a) low rank column-wise compressive sensing (LRCS), low rank matrix completion (LRMC), (b) their outlier-corrupted extensions such as robust PCA, robust LRCS and robust LRMC; (c) phase retrieval and its sparse and low-rank model based extensions; (d) tensor extensions of many of these problems such as tensor LRCS and tensor completion; and (e) many partly discrete problems where GD does not apply -- such as clustering, unlabeled sensing, and mixed linear regression. LRCS finds important applications in multi-task representation learning and few shot learning, federated sketching, and accelerated dynamic MRI. LRMC and robust PCA find important applications in recommender systems, computer vision and video analytics.

altgdmin, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2504.14741

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)

Add feedback

Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits

Lin, Jiabin, Moothedath, Shana, Vaswani, Namrata

arXiv.org Machine LearningJan-6-2025

We study how representation learning can improve the learning efficiency of contextual bandit problems. We study the setting where we play T contextual linear bandits with dimension d simultaneously, and these T bandit tasks collectively share a common linear representation with a dimensionality of r much smaller than d. We present a new algorithm based on alternating projected gradient descent (GD) and minimization estimator to recover a low-rank feature matrix. Using the proposed estimator, we present a multi-task learning algorithm for linear contextual bandits and prove the regret bound of our algorithm. We presented experiments and compared the performance of our algorithm against benchmark algorithms.

algorithm, probability, sample efficient multi-task representation learning, (10 more...)

arXiv.org Machine Learning

2410.02068

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Iowa > Story County > Ames (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.66)

Add feedback

Noisy Low Rank Column-wise Sensing

Singh, Ankit Pratap, Vaswani, Namrata

arXiv.org Artificial IntelligenceSep-12-2024

This letter studies the AltGDmin algorithm for solving the noisy low rank column-wise sensing (LRCS) problem. Our sample complexity guarantee improves upon the best existing one by a factor $\max(r, \log(1/\epsilon))/r$ where $r$ is the rank of the unknown matrix and $\epsilon$ is the final desired accuracy. A second contribution of this work is a detailed comparison of guarantees from all work that studies the exact same mathematical problem as LRCS, but refers to it by different names.

exp, lemma 3, sd 2, (14 more...)

arXiv.org Artificial Intelligence

2409.08384

Country:

North America > United States > Iowa > Story County > Ames (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

Was Linguistic A.I. Created by Accident?

The New YorkerAug-23-2024, 10:00:00 GMT

In the spring of 2017, in a room on the second floor of Google's Building 1965, a college intern named Aidan Gomez stretched out, exhausted. It was three in the morning, and Gomez and Ashish Vaswani, a scientist focussed on natural language processing, were working on their team's contribution to the Neural Information Processing Systems conference, the biggest annual meeting in the field of artificial intelligence. Along with the rest of their eight-person group at Google, they had been pushing flat out for twelve weeks, sometimes sleeping in the office, on couches by a curtain that had a neuron-like pattern. They were nearing the finish line, but Gomez didn't have the energy to go out to a bar and celebrate. He couldn't have even if he'd wanted to: he was only twenty, too young to drink in the United States.

gomez, transformer, translation, (14 more...)

The New Yorker

Country:

North America > United States (0.24)
North America > Canada > Ontario > Toronto (0.14)

Industry:

Leisure & Entertainment (0.70)
Media (0.48)
Information Technology > Services (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

ChatGPT Spawns Investor Gold Rush in AI

WSJ.com: WSJD - TechnologyMay-8-2023, 15:11:00 GMT

Before their startup had customers, a business plan or even a formal name, former Google AI researchers Niki Parmar and Ashish Vaswani were fielding interest from investors eager to back the next big thing in artificial intelligence. At Google, Ms. Parmar and Mr. Vaswani were among the co-authors of a seminal 2017 paper that helped pave the way for the boom in so-called generative AI. Earlier this year, only weeks after striking out on their own, they raised funds that valued their fledgling company--now called Essential AI--at around $50 million, people familiar with the company said.

chatgpt spawn investor gold rush, vaswani

WSJ.com: WSJD - Technology

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

"Attention Is All You Need": USC Alumni Paved Path for ChatGPT - USC Viterbi

#artificialintelligenceMar-12-2023, 06:30:15 GMT

Niki Parmar and Ashish Vaswani co-authored a seminal paper that set the groundwork for ChatGPT and other generative AI models. ChatGPT has taken the world by storm, but seeds of the groundbreaking technology were sown at the USC Viterbi School of Engineering. The seminal paper "Attention Is All You Need," which laid the foundation for ChatGPT and other generative AI systems, was co-authored by Ashish Vaswani, a PhD computer science graduate ('14) and Niki Parmar, a master's in computer science graduate ('15). The landmark paper was presented at the 2017 Conference on Neural Information Processing Systems (NeurIPS), one of the top conferences in AI and machine learning. In the paper, the researchers introduced the transformer architecture, a powerful type of neural network that has become widely used for natural language processing tasks, from text classification to language modeling.

chatgpt, parmar, vaswani, (11 more...)

#artificialintelligence

Country:

Oceania > Papua New Guinea (0.05)
North America > United States > Oregon (0.05)
Europe > Middle East (0.05)
(3 more...)

Genre: Overview (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.56)

Add feedback

What Is a Transformer Model?

#artificialintelligenceMar-27-2022, 18:30:07 GMT

If you want to ride the next big wave in AI, grab a transformer. A transformer model is a neural network that learns context and thus meaning by tracking relationships in sequential data like the words in this sentence. Transformer models apply an evolving set of mathematical techniques, called attention or self-attention, to detect subtle ways even distant data elements in a series influence and depend on each other. First described in a 2017 paper from Google, transformers are among the newest and one of the most powerful classes of models invented to date. They're driving a wave of advances in machine learning some have dubbed transformer AI.

neural network, transformer, transformer model, (15 more...)

#artificialintelligence

Country: Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Industry:

Information Technology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.97)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback