AITopics

2503.17894

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.72)

arXiv.org Artificial IntelligenceDec-31-2024

Kolmogorov GAM Networks are all you need!

Polson, Sarah, Sokolov, Vadim

Kolmogorov GAM (K-GAM) networks are shown to be an efficient architecture for training and inference. They are an additive model with an embedding that is independent of the function of interest. They provide an alternative to the transformer architecture. They are the machine learning version of Kolmogorov's Superposition Theorem (KST) which provides an efficient representations of a multivariate function. Such representations have use in machine learning for encoding dictionaries (a.k.a. "look-up" tables). KST theory also provides a representation based on translates of the K\"oppen function. The goal of our paper is to interpret this representation in a machine learning context for applications in Artificial Intelligence (AI). Our architecture is equivalent to a topological embedding which is independent of the function together with an additive layer that uses a Generalized Additive Model (GAM). This provides a class of learning procedures with far fewer parameters than current deep learning algorithms. Implementation can be parallelizable which makes our algorithms computationally attractive. To illustrate our methodology, we use the Iris data from statistical learning. We also show that our additive model with non-linear embedding provides an alternative to transformer architectures which from a statistical viewpoint are kernel smoothers. Additive KAN models therefore provide a natural alternative to transformers. Finally, we conclude with directions for future research.

artificial intelligence, machine learning, representation, (18 more...)

2501.00704

Country:

North America > United States > New York (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceDec-24-2024

Generative Modeling: A Review

Polson, Nick, Sokolov, Vadim

Generative methods (Gen-AI) are reviewed with a particular goal to solving tasks in Machine Learning and Bayesian inference. Generative models require one to simulate a large training dataset and to use deep neural networks to solve a supervised learning problem. To do this, we require high dimensional regression methods and tools for dimensionality reduction (a.k.a feature selection). The main advantage of Gen-AI methods is their ability to be model-free and to use deep neural networks to estimate conditional densities or posterior quantiles of interest. To illustrate generative methods, we analyze the well-known Ebola data-set. Finally, we conclude with directions for future research.

artificial intelligence, machine learning, neural network, (14 more...)

2501.05458

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.87)
Health & Medicine > Therapeutic Area > Immunology (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningOct-9-2023

Deep Learning: A Tutorial

Polson, Nick, Sokolov, Vadim

Our goal is to provide a review of deep learning methods which provide insight into structured high-dimensional data. Rather than using shallow additive architectures common to most statistical models, deep learning uses layers of semi-affine input transformations to provide a predictive rule. Applying these layers of transformations leads to a set of attributes (or, features) to which probabilistic statistical methods can be applied. Thus, the best of both worlds can be achieved: scalable prediction rules fortified with uncertainty quantification, where sparse regularization finds the features. Deep learning is one of the widely used machine learning method for analysis of large scale and highdimensional data sets.

artificial intelligence, machine learning, survey article, (15 more...)

2310.06251

Genre:

Research Report (0.53)
Overview (0.48)
Instructional Material > Course Syllabus & Notes (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-9-2023

The Value of Chess Squares

Gupta, Aditya, Maharaj, Shiva, Polson, Nicholas, Sokolov, Vadim

We propose a neural network-based approach to calculate the value of a chess square-piece combination. Our model takes a triplet (Color, Piece, Square) as an input and calculates a value that measures the advantage/disadvantage of having this piece on this square. Our methods build on recent advances in chess AI, and can accurately assess the worth of positions in a game of chess. The conventional approach assigns fixed values to pieces $(\symking=\infty, \symqueen=9, \symrook=5, \symbishop=3, \symknight=3, \sympawn=1)$. We enhance this analysis by introducing marginal valuations. We use deep Q-learning to estimate the parameters of our model. We demonstrate our method by examining the positioning of Knights and Bishops, and also provide valuable insights into the valuation of pawns. Finally, we conclude by suggesting potential avenues for future research.

artificial intelligence, machine learning, pawn, (20 more...)

doi: 10.3390/e25101374

2307.0533

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Chess (1.00)

Technology:

Information Technology > Artificial Intelligence > Games > Chess (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Artificial IntelligenceMar-4-2023

Quantum Bayesian Computation

Polson, Nick, Sokolov, Vadim, Xu, Jianeng

Quantum Bayesian Computation (QBC) is an emerging field that levers the computational gains available from quantum computers to provide an exponential speed-up in Bayesian computation. Our paper adds to the literature in two ways. First, we show how von Neumann quantum measurement can be used to simulate machine learning algorithms such as Markov chain Monte Carlo (MCMC) and Deep Learning (DL) that are fundamental to Bayesian learning. Second, we describe data encoding methods needed to implement quantum machine learning including the counterparts to traditional feature extraction and kernel embeddings methods. Our goal then is to show how to apply quantum algorithms directly to statistical machine learning problems. On the theoretical side, we provide quantum versions of high dimensional regression, Gaussian processes (Q-GP) and stochastic gradient descent (Q-SGD). On the empirical side, we apply a Quantum FFT model to Chicago housing data. Finally, we conclude with directions for future research.

artificial intelligence, bayesian inference, machine learning, (16 more...)

2208.08068

Country: North America > United States > Illinois > Cook County > Chicago (0.25)

Genre: Research Report (0.50)

Industry: Banking & Finance > Real Estate (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Machine LearningSep-5-2022

Bayesian Calibration for Activity Based Models

Schultz, Laura, Auld, Joshua, Sokolov, Vadim

Transportation activity-based simulators (ABMs) represent an individual traveler's activity patterns and trips throughout the day by using nested choice models. The generated trips are then simulated in a traffic flow simulator to learn system-level patterns. These behaviorally-realistic models require a high-resolution representation of network flows and, thus, are computationally expensive. The very same flexibility which makes these simulation models appealing, also makes their calibration problems intractable, with the number of simulations required to find an optimal solution growing exponentially as the input dimension increases [90, 70]. As a result, the use of these simulators is currently limited to what-if analysis. This paper focuses on calibrating the static choice model parameters used in activity-based simulators. The goal of calibration is to find values of the simulator's input parameters θ that minimizes the deviance between observed data and simulator's outputs.

artificial intelligence, machine learning, modeling & simulation, (16 more...)

2203.04414

Country:

North America > United States (0.67)
Europe (0.46)

Genre: Research Report (0.82)

Industry:

Transportation > Infrastructure & Services (0.93)
Transportation > Ground > Road (0.93)
Consumer Products & Services > Travel (0.88)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

arXiv.org Machine LearningOct-21-2021

Merging Two Cultures: Deep and Statistical Learning

Bhadra, Anindya, Datta, Jyotishka, Polson, Nick, Sokolov, Vadim, Xu, Jianeng

Merging the two cultures of deep and statistical learning provides insights into structured high-dimensional data. Traditional statistical modeling is still a dominant strategy for structured tabular data. Deep learning can be viewed through the lens of generalized linear models (GLMs) with composite link functions. Sufficient dimensionality reduction (SDR) and sparsity performs nonlinear feature engineering. We show that prediction, interpolation and uncertainty quantification can be achieved using probabilistic methods at the output layer of the model. Thus a general framework for machine learning arises that first generates nonlinear features (a.k.a factors) via sparse regularization and stochastic gradient optimisation and second uses a stochastic output layer for predictive uncertainty. Rather than using shallow additive architectures as in many statistical models, deep learning uses layers of semi affine input transformations to provide a predictive rule. Applying these layers of transformations leads to a set of attributes (a.k.a features) to which predictive statistical methods can be applied. Thus we achieve the best of both worlds: scalability and fast predictive rule construction together with uncertainty quantification. Sparse regularisation with un-supervised or supervised learning finds the features. We clarify the duality between shallow and wide models such as PCA, PPR, RRR and deep but skinny architectures such as autoencoders, MLPs, CNN, and LSTM. The connection with data transformations is of practical importance for finding good network architectures. By incorporating probabilistic components at the output level we allow for predictive uncertainty. For interpolation we use deep Gaussian process and ReLU trees for classification. We provide applications to regression, classification and interpolation. Finally, we conclude with directions for future research.

artificial intelligence, machine learning, neural network, (15 more...)

2110.11561

Country:

North America > United States (0.46)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningJun-11-2019

Solving Large-Scale 0-1 Knapsack Problems and its Application to Point Cloud Resampling

Li, Duanshun, Liu, Jing, Park, Noseong, Lee, Dongeun, Ramachandran, Giridhar, Seyedmazloom, Ali, Lee, Kookjin, Feng, Chen, Sokolov, Vadim, Ganesan, Rajesh

In this paper, we present a deep learning technique-based method to solve large-scale 0-1 knapsack problems where the number of products (items) is large and/or the values of products are not necessarily predetermined but decided by an external value assignment function during the optimization process. Our solution is greatly inspired by the method of Lagrange multiplier and some recent adoptions of game theory to deep learning. After formally defining our proposed method based on them, we develop an adaptive gradient ascent method to stabilize its optimization process. In our experiments, the presented method solves all the large-scale benchmark KP instances in about a minute, whereas existing methods show fluctuating runtime. We also show that our method can be used for other applications, including but not limited to the point cloud resampling.

constraint, deep learning, neural network, (17 more...)

1906.05929

Country: North America > United States > New York (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningAug-26-2018

Deep Learning: Computational Aspects

Polson, Nicholas, Sokolov, Vadim

In this article we review computational aspects of Deep Learning (DL). Deep learning uses network architectures consisting of hierarchical layers of latent variables to construct predictors for high-dimensional input-output models. Training a deep learning architecture is computationally intensive, and efficient linear algebra libraries is the key for training and inference. Stochastic gradient descent (SGD) optimization and batch sampling are used to learn from massive data sets.

deep learning, neural network, survey article, (19 more...)

1808.08618

Country: North America > United States (0.14)

Genre:

Research Report (0.50)
Overview (0.48)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)