AITopics | Lakshminarayanan, Guna

Collaborating Authors

Lakshminarayanan, Guna

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Wukong: Towards a Scaling Law for Large-Scale Recommendation

Zhang, Buyun, Luo, Liang, Chen, Yuxin, Nie, Jade, Liu, Xi, Guo, Daifeng, Zhao, Yanli, Li, Shen, Hao, Yuchen, Yao, Yantao, Lakshminarayanan, Guna, Wen, Ellie Dingqiao, Park, Jongsoo, Naumov, Maxim, Chen, Wenlin

arXiv.org Artificial IntelligenceJun-4-2024

Scaling laws play an instrumental role in the sustainable improvement in model quality. Unfortunately, recommendation models to date do not exhibit such laws similar to those observed in the domain of large language models, due to the inefficiencies of their upscaling mechanisms. This limitation poses significant challenges in adapting these models to increasingly more complex real-world datasets. In this paper, we propose an effective network architecture based purely on stacked factorization machines, and a synergistic upscaling strategy, collectively dubbed Wukong, to establish a scaling law in the domain of recommendation. Wukong's unique design makes it possible to capture diverse, any-order of interactions simply through taller and wider layers. We conducted extensive evaluations on six public datasets, and our results demonstrate that Wukong consistently outperforms state-of-the-art models quality-wise. Further, we assessed Wukong's scalability on an internal, large-scale dataset. The results show that Wukong retains its superiority in quality over state-of-the-art models, while holding the scaling law across two orders of magnitude in model complexity, extending beyond 100 GFLOP/example, where prior arts fall short.

machine learning, natural language, wukong, (19 more...)

arXiv.org Artificial Intelligence

2403.02545

Country: Europe > Austria > Vienna (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation

Luo, Liang, Zhang, Buyun, Tsang, Michael, Ma, Yinbin, Chu, Ching-Hsiang, Chen, Yuxin, Li, Shen, Hao, Yuchen, Zhao, Yanli, Lakshminarayanan, Guna, Wen, Ellie Dingqiao, Park, Jongsoo, Mudigere, Dheevatsa, Naumov, Maxim

arXiv.org Artificial IntelligenceMay-2-2024

We study a mismatch between the deep learning recommendation models' flat architecture, common distributed training paradigm and hierarchical data center topology. To address the associated inefficiencies, we propose Disaggregated Multi-Tower (DMT), a modeling technique that consists of (1) Semantic-preserving Tower Transform (SPTT), a novel training paradigm that decomposes the monolithic global embedding lookup process into disjoint towers to exploit data center locality; (2) Tower Module (TM), a synergistic dense component attached to each tower to reduce model complexity and communication volume through hierarchical feature interaction; and (3) Tower Partitioner (TP), a feature partitioner to systematically create towers with meaningful feature interactions and load balanced assignments to preserve model quality and training throughput via learned embeddings. We show that DMT can achieve up to 1.9 speedup compared to the state-of-the-art baselines without losing accuracy across multiple generations of hardware at large data center scales. Since the embedding tables can be huge, the state-of-the-art practices train these models in a hybrid fashion: the sparse are synchronized through AllReduce operations. Nvidia, work done while at Meta.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2403.00877

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Information Technology > Cloud Computing (0.89)

Add feedback