AITopics | sm3

Collaborating Authors

sm3

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

8f1fa0193ca2b5d2fa0695827d8270e9-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-19-2026, 19:12:26 GMT

adafactor, final version, skm, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.54)

Add feedback

Memory Efficient Adaptive Optimization

Rohan Anil, Vineet Gupta, Tomer Koren, Yoram Singer

Neural Information Processing SystemsFeb-12-2026, 22:21:05 GMT

Figure 2: Testlog-perplexityofa Transformer -Bigmodelon WMT'14 en!fr, whentrainingwith batchsizesof 384 (left) and 768 (right).

adagrad, artificial intelligence, machine learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.50)

Add feedback

Memory Efficient Adaptive Optimization

Rohan Anil, Vineet Gupta, Tomer Koren, Yoram Singer

Neural Information Processing SystemsOct-3-2025, 05:15:59 GMT

Our method retains the benefits of per-parameter adaptivity while allowing significantly larger models and batch sizes.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.46)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)

Add feedback

8f1fa0193ca2b5d2fa0695827d8270e9-AuthorFeedback.pdf

Neural Information Processing SystemsOct-3-2025, 05:13:44 GMT

adafactor, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.34)

Add feedback

SMMF: Square-Matricized Momentum Factorization for Memory-Efficient Optimization

Park, Kwangryeol, Lee, Seulki

arXiv.org Artificial IntelligenceDec-12-2024

We propose SMMF (Square-Matricized Momentum Factorization), a memory-efficient optimizer that reduces the memory requirement of the widely used adaptive learning rate optimizers, such as Adam, by up to 96%. SMMF enables flexible and efficient factorization of an arbitrary rank (shape) of the first and second momentum tensors during optimization, based on the proposed square-matricization and one-time single matrix factorization. From this, it becomes effectively applicable to any rank (shape) of momentum tensors, i.e., bias, matrix, and any rank-d tensors, prevalent in various deep model architectures, such as CNNs (high rank) and Transformers (low rank), in contrast to existing memory-efficient optimizers that applies only to a particular (rank-2) momentum tensor, e.g., linear layers. We conduct a regret bound analysis of SMMF, which shows that it converges similarly to non-memory-efficient adaptive learning rate optimizers, such as AdamNC, providing a theoretical basis for its competitive optimization capability. In our experiment, SMMF takes up to 96% less memory compared to state-of-the-art memory efficient optimizers, e.g., Adafactor, CAME, and SM3, while achieving comparable model performance on various CNN and Transformer tasks.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2412.08894

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Monaco (0.04)
Europe > Germany > Berlin (0.04)
(2 more...)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Deep Learning-based Detection of Bacterial Swarm Motion Using a Single Image

Li, Yuzhu, Li, Hao, Chen, Weijie, O'Riordan, Keelan, Mani, Neha, Qi, Yuxuan, Liu, Tairan, Mani, Sridhar, Ozcan, Aydogan

arXiv.org Artificial IntelligenceOct-19-2024

Distinguishing between swarming and swimming, the two principal forms of bacterial movement, holds significant conceptual and clinical relevance. This is because bacteria that exhibit swarming capabilities often possess unique properties crucial to the pathogenesis of infectious diseases and may also have therapeutic potential. Here, we report a deep learning-based swarming classifier that rapidly and autonomously predicts swarming probability using a single blurry image. Compared with traditional video-based, manually-processed approaches, our method is particularly suited for high-throughput environments and provides objective, quantitative assessments of swarming probability. The swarming classifier demonstrated in our work was trained on Enterobacter sp. SM3 and showed good performance when blindly tested on new swarming (positive) and swimming (negative) test images of SM3, achieving a sensitivity of 97.44% and a specificity of 100%. Furthermore, this classifier demonstrated robust external generalization capabilities when applied to unseen bacterial species, such as Serratia marcescens DB10 and Citrobacter koseri H6. It blindly achieved a sensitivity of 97.92% and a specificity of 96.77% for DB10, and a sensitivity of 100% and a specificity of 97.22% for H6. This competitive performance indicates the potential to adapt our approach for diagnostic applications through portable devices or even smartphones. This adaptation would facilitate rapid, objective, on-site screening for bacterial swarming motility, potentially enhancing the early detection and treatment assessment of various diseases, including inflammatory bowel diseases (IBD) and urinary tract infections (UTI).

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.15229

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.30)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Water & Waste Management > Water Management > Constituents > Bacteria (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Monte Carlo is a good sampling strategy for polynomial approximation in high dimensions

Adcock, Ben, Brugiapaglia, Simone

arXiv.org Artificial IntelligenceNov-5-2023

This paper concerns the approximation of smooth, high-dimensional functions from limited samples using polynomials. This task lies at the heart of many applications in computational science and engineering - notably, some of those arising from parametric modelling and computational uncertainty quantification. It is common to use Monte Carlo sampling in such applications, so as not to succumb to the curse of dimensionality. However, it is well known that such a strategy is theoretically suboptimal. Specifically, there are many polynomial spaces of dimension $n$ for which the sample complexity scales log-quadratically, i.e., like $c \cdot n^2 \cdot \log(n)$ as $n \rightarrow \infty$. This well-documented phenomenon has led to a concerted effort over the last decade to design improved, and moreover, near-optimal strategies, whose sample complexities scale log-linearly, or even linearly in $n$. In this work we demonstrate that Monte Carlo is actually a perfectly good strategy in high dimensions, despite its apparent suboptimality. We first document this phenomenon empirically via a systematic set of numerical experiments. Next, we present a theoretical analysis that rigorously justifies this fact in the case of holomorphic functions of infinitely-many variables. We show that there is a least-squares approximation based on $m$ Monte Carlo samples whose error decays algebraically fast in $m/\log(m)$, with a rate that is the same as that of the best $n$-term polynomial approximation. This result is non-constructive, since it assumes knowledge of a suitable polynomial subspace in which to perform the approximation. We next present a compressed sensing-based scheme that achieves the same rate, except for a larger polylogarithmic factor. This scheme is practical, and numerically it performs as well as or better than well-known adaptive least-squares schemes.

approximation, dimension, polynomial approximation, (14 more...)

arXiv.org Artificial Intelligence

2208.09045

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.34)

Add feedback

Memory-Efficient Adaptive Optimization for Large-Scale Learning

Anil, Rohan, Gupta, Vineet, Koren, Tomer, Singer, Yoram

arXiv.org Machine LearningJan-30-2019

Adaptive gradient-based optimizers such as AdaGrad and Adam are among the methods of choice in modern machine learning. These methods maintain second-order statistics of each parameter, thus doubling the memory footprint of the optimizer. In behemoth-size applications, this memory overhead restricts the size of the model being used as well as the number of examples in a mini-batch. We describe a novel, simple, and flexible adaptive optimization method with sublinear memory cost that retains the benefits of per-parameter adaptivity while allowing for larger models and mini-batches. We give convergence guarantees for our method and demonstrate its effectiveness in training very large deep models.

algorithm, batch size, sm3, (15 more...)

arXiv.org Machine Learning

1901.1115

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback