AITopics | jnull

Collaborating Authors

jnull

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Generalization Bounds for Few-Shot Transfer Learning with Pretrained Classifiers

Galanti, Tomer, György, András, Hutter, Marcus

arXiv.org Artificial IntelligenceJul-16-2023

We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes. Recent results in the literature show that representations learned by a single classifier over many classes are competitive on few-shot learning problems with representations learned by special-purpose algorithms designed for such problems. We offer a theoretical explanation for this behavior based on the recently discovered phenomenon of class-feature-variability collapse, that is, that during the training of deep classification networks the feature embeddings of samples belonging to the same class tend to concentrate around their class means. More specifically, we show that the few-shot error of the learned feature map on new classes (defined as the classification error of the nearest class-center classifier using centers learned from a small number of random samples from each new class) is small in case of class-feature-variability collapse, under the assumption that the classes are selected independently from a fixed distribution. This suggests that foundation models can provide feature maps that are transferable to new downstream tasks, even with very few samples; to our knowledge, this is the first performance bound for transfer-learning that is non-vacuous in the few-shot setting. Keywords: Generalization bounds, foundation models, few-shot learning, transfer learning, neural collapse, class-features variability collapse.

artificial intelligence, machine learning, neural collapse, (18 more...)

arXiv.org Artificial Intelligence

2212.12532

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.81)

Add feedback

Feature selection in stratification estimators of causal effects: lessons from potential outcomes, causal diagrams, and structural equations

Hahn, P. Richard, Herren, Andrew

arXiv.org Machine LearningSep-23-2022

What is the ideal regression (if any) for estimating average causal effects? We study this question in the setting of discrete covariates, deriving expressions for the finite-sample variance of various stratification estimators. This approach clarifies the fundamental statistical phenomena underlying many widely-cited results. Our exposition combines insights from three distinct methodological traditions for studying causal effect estimation: potential outcomes, causal diagrams, and structural models with additive errors.

artificial intelligence, jnull, machine learning, (14 more...)

arXiv.org Machine Learning

2209.114

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.67)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Variational quantum algorithm for Gaussian discrete solitons and their boson sampling

Conti, Claudio

arXiv.org Artificial IntelligenceJul-21-2022

We miss general methods for quantum solitons, although they can act as entanglement generators or as self-organized quantum processors. We develop a computational approach that uses a neural network as a variational ansatz for quantum solitons in an array of waveguides. By training the resulting phase space quantum machine learning model, we find different soliton solutions varying the number of particles and interaction strength. We consider Gaussian states that enable measuring the degree of entanglement and sampling the probability distribution of many-particle events. We also determine the probability of generating particle pairs and unveil that soliton bound states emit correlated pairs. These results may have a role in boson sampling with nonlinear systems and in quantum processors for entangled nonlinear waves. A soliton is a non-perturbative solution of a classical nonlinear wave-equation; it may describe mean-field states of atoms (as in Bose-Einstein condensation) or photons (as in nonlinear optics) [1]. From a quantum mechanical perspective, a soliton may correspond to a coherent state; however, the nonlinearity may induce squeezing or non-Gaussianity [2]. The quantum properties of solitons inspired experimental investigations, as quantum non-demolition, squeezing [3-6] and photon bound states [7]. Authors reported on theoretical studies on the soliton quantum features, as evaporation and breathing [8-13].

gaussian state, soliton, transformation, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1103/PhysRevA.106.013518

2110.12379

Country:

Europe > Italy > Lazio > Rome (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks

Fang, Cong, Lee, Jason D., Yang, Pengkun, Zhang, Tong

arXiv.org Machine LearningJul-2-2020

This paper proposes a new mean-field framework for over-parameterized deep neural networks (DNNs), which can be used to analyze neural network training. In this framework, a DNN is represented by probability measures and functions over its features (that is, the function values of the hidden units over the training data) in the continuous limit, instead of the neural network parameters as most existing studies have done. This new representation overcomes the degenerate situation where all the hidden units essentially have only one meaningful hidden unit in each middle layer, and further leads to a simpler representation of DNNs, for which the training objective can be reformulated as a convex optimization problem via suitable re-parameterization. Moreover, we construct a non-linear dynamics called neural feature flow, which captures the evolution of an over-parameterized DNN trained by Gradient Descent. We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures. Furthermore, we show, for Res-Net, when the neural feature flow process converges, it reaches a global minimal solution under suitable conditions. Our analysis leads to the first global convergence proof for over-parameterized neural network training with more than $3$ layers in the mean-field regime.

artificial intelligence, machine learning, nullnull, (16 more...)

arXiv.org Machine Learning

2007.01452

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

Add feedback

Estimating the Number of Components in Finite Mixture Models via the Group-Sort-Fuse Procedure

Manole, Tudor, Khalili, Abbas

arXiv.org Machine LearningMay-23-2020

Estimation of the number of components (or order) of a finite mixture model is a long standing and challenging problem in statistics. We propose the Group-Sort-Fuse (GSF) procedure---a new penalized likelihood approach for simultaneous estimation of the order and mixing measure in multidimensional finite mixture models. Unlike methods which fit and compare mixtures with varying orders using criteria involving model complexity, our approach directly penalizes a continuous function of the model parameters. More specifically, given a conservative upper bound on the order, the GSF groups and sorts mixture component parameters to fuse those which are redundant. For a wide range of finite mixture models, we show that the GSF is consistent in estimating the true mixture order and achieves the $n^{-1/2}$ convergence rate for parameter estimation up to polylogarithmic factors. The GSF is implemented for several univariate and multivariate mixture models in the R package GroupSortFuse. Its finite sample performance is supported by a thorough simulation study, and its application is illustrated on two real data examples.

artificial intelligence, machine learning, mixture model, (21 more...)

arXiv.org Machine Learning

2005.11641

Country:

North America > United States > New York (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
(5 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Modeling & Simulation (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

A Corrective View of Neural Networks: Representation, Memorization and Learning

Bresler, Guy, Nagaraj, Dheeraj

arXiv.org Machine LearningFeb-1-2020

We develop a corrective mechanism for neural network approximation: the total available non-linear units are divided into multiple groups and the first group approximates the function under consideration, the second group approximates the error in approximation produced by the first group and corrects it, the third group approximates the error produced by the first and second groups together and so on. This technique yields several new representation and learning results for neural networks. First, we show that two-layer neural networks in the random features regime (RF) can memorize arbitrary labels for arbitrary points under under Euclidean distance separation condition using $\tilde{O}(n)$ ReLU or Step activation functions which is optimal in $n$ up to logarithmic factors. Next, we give a powerful representation result for two-layer neural networks with ReLU and smoothed ReLU units which can achieve a squared error of at most $\epsilon$ with $O(C(a,d)\epsilon^{-1/(a+1)})$ for $a \in \mathbb{N}\cup\{0\}$ when the function is smooth enough (roughly when it has $\Theta(ad)$ bounded derivatives). In certain cases $d$ can be replaced with effective dimension $q \ll d$. Previous results of this type implement Taylor series approximation using deep architectures. We also consider three-layer neural networks and show that the corrective mechanism yields faster representation rates for smooth radial functions. Lastly, we obtain the first $O(\mathrm{subpoly}(1/\epsilon))$ upper bound on the number of neurons required for a two layer network to learn low degree polynomials up to squared error $\epsilon$ via gradient descent. Even though deep networks can express these polynomials with $O(\mathrm{polylog}(1/\epsilon))$ neurons, the best learning bounds on this problem require $\mathrm{poly}(1/\epsilon)$ neurons.

equation, fourier transform, neural network, (14 more...)

arXiv.org Machine Learning

2002.00274

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.40)

Add feedback

Inference in Multi-Layer Networks with Matrix-Valued Unknowns

Pandit, Parthe, Sahraee-Ardakan, Mojtaba, Rangan, Sundeep, Schniter, Philip, Fletcher, Alyson K.

arXiv.org Machine LearningJan-25-2020

We consider the problem of inferring the input and hidden variables of a stochastic multi-layer neural network from an observation of the output. The hidden variables in each layer are represented as matrices. This problem applies to signal recovery via deep generative prior models, multi-task and mixed regression and learning certain classes of two-layer neural networks. A unified approximation algorithm for both MAP and MMSE inference is proposed by extending a recently-developed Multi-Layer Vector Approximate Message Passing (ML-VAMP) algorithm to handle matrix-valued unknowns. It is shown that the performance of the proposed Multi-Layer Matrix VAMP (ML-Mat-VAMP) algorithm can be exactly predicted in a certain random large-system limit, where the dimensions $N\times d$ of the unknown quantities grow as $N\rightarrow\infty$ with $d$ fixed. In the two-layer neural-network learning problem, this scaling corresponds to the case where the number of input features and training samples grow to infinity but the number of hidden nodes stays fixed. The analysis enables a precise prediction of the parameter and test error of the learning.

jnull, knull, matrix, (17 more...)

arXiv.org Machine Learning

2001.09396

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > New York > Kings County > New York City (0.04)

Genre: Research Report (0.40)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Inference with Deep Generative Priors in High Dimensions

Pandit, Parthe, Sahraee-Ardakan, Mojtaba, Rangan, Sundeep, Schniter, Philip, Fletcher, Alyson K.

arXiv.org Machine LearningNov-8-2019

Deep generative priors offer powerful models for complex-structured data, such as images, audio, and text. Using these priors in inverse problems typically requires estimating the input and/or hidden signals in a multi-layer deep neural network from observation of its output. While these approaches have been successful in practice, rigorous performance analysis is complicated by the non-convex nature of the underlying optimization problems. This paper presents a novel algorithm, Multi-Layer Vector Approximate Message Passing (ML-VAMP), for inference in multi-layer stochastic neural networks. ML-VAMP can be configured to compute maximum a priori (MAP) or approximate minimum mean-squared error (MMSE) estimates for these networks. We show that the performance of ML-VAMP can be exactly predicted in a certain high-dimensional random limit. Furthermore, under certain conditions, ML-VAMP yields estimates that achieve the minimum (i.e., Bayes-optimal) MSE as predicted by the replica method. In this way, ML-VAMP provides a computationally efficient method for multi-layer inference with an exact performance characterization and testable conditions for optimality in the large-system limit.

algorithm, inference, knull, (15 more...)

arXiv.org Machine Learning

1911.03409

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.27)
Asia > Middle East > Jordan (0.04)
North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > New York > Kings County > New York City (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Optimal Clustering from Noisy Binary Feedback

Ariu, Kaito, Ok, Jungseul, Proutiere, Alexandre, Yun, Se-Young

arXiv.org Machine LearningOct-14-2019

We study the problem of recovering clusters from binary user feedback. Items are grouped into initially unknown non-overlapping clusters. To recover these clusters, the learner sequentially presents to users a finite list of items together with a question with a binary answer selected from a fixed finite set. For each of these items, the user provides a random answer whose expectation is determined by the item cluster and the question and by an item-specific parameter characterizing the hardness of classifying the item. The objective is to devise an algorithm with a minimal cluster recovery error rate. We derive problem-specific information-theoretical lower bounds on the error rate satisfied by any algorithm, for both uniform and adaptive (list, question) selection strategies. For uniform selection, we present a simple algorithm built upon K-means whose performance almost matches the fundamental limits. For adaptive selection, we develop an adaptive algorithm that is inspired by the derivation of the information-theoretical error lower bounds, and in turn allocates the budget in an efficient way. The algorithm learns to select items hard to cluster and relevant questions more often. We compare numerically the performance of our algorithms with or without adaptive selection strategy, and illustrate the gain achieved by being adaptive. Our inference problems are motivated by the problem of solving large-scale labeling tasks with minimal effort put on the users. For example, in some of the recent CAPTCHA systems, users clicks (binary answers) can be used to efficiently label images, by optimally finding the best questions to present.

algorithm, error rate, inull, (16 more...)

arXiv.org Machine Learning

1910.06002

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

KL property of exponent $1/2$ of $\ell_{2,0}$-norm and DC regularized factorizations for low-rank matrix recovery

Bi, Shujun, Tao, Ting, Pan, Shaohua

arXiv.org Machine LearningAug-23-2019

This paper is concerned with the factorization form of the rank regularized loss minimization problem. To cater for the scenario in which only a coarse estimation is available for the rank of the true matrix, an $\ell_{2,0}$-norm regularized term is added to the factored loss function to reduce the rank adaptively; and account for the ambiguities in the factorization, a balanced term is then introduced. For the least squares loss, under a restricted condition number assumption on the sampling operator, we establish the KL property of exponent $1/2$ of the nonsmooth factored composite function and its equivalent DC reformulations in the set of their global minimizers. We also confirm the theoretical findings by applying a proximal linearized alternating minimization method to the regularized factorizations.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

1908.09078

Country: Asia (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback