AITopics | Budzinskiy, Stanislav

Collaborating Authors

Budzinskiy, Stanislav

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Numerical Error Analysis of Large Language Models

Budzinskiy, Stanislav, Fang, Wenyi, Zeng, Longbin, Petersen, Philipp

arXiv.org Machine LearningMar-13-2025

Large language models based on transformer architectures have become integral to state-of-the-art natural language processing applications. However, their training remains computationally expensive and exhibits instabilities, some of which are expected to be caused by finite-precision computations. We provide a theoretical analysis of the impact of round-off errors within the forward pass of a transformer architecture which yields fundamental bounds for these effects. In addition, we conduct a series of numerical experiments which demonstrate the practical relevance of our bounds. Our results yield concrete guidelines for choosing hyperparameters that mitigate round-off errors, leading to more robust and stable inference.

large language model, machine learning, natural language, (22 more...)

arXiv.org Machine Learning

2503.10251

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

When big data actually are low-rank, or entrywise approximation of certain function-generated matrices

Budzinskiy, Stanislav

arXiv.org Artificial IntelligenceJul-4-2024

The article concerns low-rank approximation of matrices generated by sampling a smooth function of two $m$-dimensional variables. We refute an argument made in the literature that, for a specific class of analytic functions, such matrices admit accurate entrywise approximation of rank that is independent of $m$. We provide a theoretical explanation of the numerical results presented in support of this argument, describing three narrower classes of functions for which $n \times n$ function-generated matrices can be approximated within an entrywise error of order $\varepsilon$ with rank $\mathcal{O}(\log(n) \varepsilon^{-2} \mathrm{polylog}(\varepsilon^{-1}))$ that is independent of the dimension $m$: (i) functions of the inner product of the two variables, (ii) functions of the squared Euclidean distance between the variables, and (iii) shift-invariant positive-definite kernels. We extend our argument to low-rank tensor-train approximation of tensors generated with functions of the multi-linear product of their $m$-dimensional variables. We discuss our results in the context of low-rank approximation of attention in transformer neural networks.

approximation, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2407.0325

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Data Science > Data Mining > Big Data (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.34)

Add feedback

Tensor train completion: local recovery guarantees via Riemannian optimization

Budzinskiy, Stanislav, Zamarashkin, Nikolai

arXiv.org Artificial IntelligenceAug-30-2023

The problem of recovering algebraically structured data from scarce measurements has already become a classic one. The data under consideration are typically sparse vectors or low-rank matrices and tensors, while the measurements are obtained by applying a linear operator that satisfies a variant of the so-called restricted isometry property (RIP) [1]. In this work, we focus on tensor completion, which consists in recovering a tensor in the tensor train (TT) format [2, 3] from a small subset of its entries. Specifically, we consider it as a Riemannian optimization problem [4, 5] on the smooth manifold of tensors with fixed TT ranks and derive sufficient conditions (essentially, the RIP) for local convergence of the Riemannian gradient descent. We further estimate the number of randomly selected entries of a tensor with low TT ranks that is sufficient for the RIP to hold with high probability and, as a consequence, for the Riemannian gradient descent to converge locally.

artificial intelligence, completion, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1002/nla.2520

2110.03975

Country: North America > United States (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Variational Bayesian inference for CP tensor completion with side information

Budzinskiy, Stanislav, Zamarashkin, Nikolai

arXiv.org Artificial IntelligenceJun-29-2022

We propose a message passing algorithm, based on variational Bayesian inference, for low-rank tensor completion with automatic rank determination in the canonical polyadic format when additional side information (SI) is given. The SI comes in the form of low-dimensional subspaces the contain the fiber spans of the tensor (columns, rows, tubes, etc.). We validate the regularization properties induced by SI with extensive numerical experiments on synthetic and real-world data and present the results about tensor recovery and rank determination. The results show that the number of samples required for successful completion is significantly reduced in the presence of SI. We also discuss the origin of a bump in the phase transition curves that exists when the dimensionality of SI is comparable with that of the tensor.

artificial intelligence, completion, machine learning, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1134/S1995080223080103

2206.12486

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback