AITopics | Baldi, Pierre

Collaborating Authors

Baldi, Pierre

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sherpa: Robust Hyperparameter Optimization for Machine Learning

Hertel, Lars, Collado, Julian, Sadowski, Peter, Ott, Jordan, Baldi, Pierre

arXiv.org Machine LearningMay-8-2020

Sherpa is a hyperparameter optimization library for machine learning models. It is specifically designed for problems with computationally expensive, iterative function evaluations, such as the hyperparameter tuning of deep neural networks. With Sherpa, scientists can quickly optimize hyperparameters using a variety of powerful and interchangeable algorithms. Sherpa can be run on either a single machine or in parallel on a cluster. Finally, an interactive dashboard enables users to view the progress of models as they are trained, cancel trials, and explore which hyperparameter combinations are working best. Sherpa empowers machine learning practitioners by automating the more tedious aspects of model tuning. Its source code and documentation are available at https://github.com/sherpa-ai/sherpa.

deep learning, hyperparameter, neural network, (15 more...)

arXiv.org Machine Learning

2005.04048

Country: North America > United States > California (0.28)

Industry: Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Curiosity-Driven Multi-Criteria Hindsight Experience Replay

Lanier, John B., McAleer, Stephen, Baldi, Pierre

arXiv.org Artificial IntelligenceJun-9-2019

Dealing with sparse rewards is a longstanding challenge in reinforcement learning. The recent use of hindsight methods have achieved success on a variety of sparse-reward tasks, but they fail on complex tasks such as stacking multiple blocks with a robot arm in simulation. Curiosity-driven exploration using the prediction error of a learned dynamics model as an intrinsic reward has been shown to be effective for exploring a number of sparse-reward environments. We present a method that combines hindsight with curiosity-driven exploration and curriculum learning in order to solve the challenging sparse-reward block stacking task. We are the first to stack more than two blocks using only sparse reward without human demonstrations.

artificial intelligence, exploration, neural network, (16 more...)

arXiv.org Artificial Intelligence

1906.0371

Country: North America > United States > California (0.14)

Genre: Research Report (0.50)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)

Add feedback

The capacity of feedforward neural networks

Baldi, Pierre, Vershynin, Roman

arXiv.org Machine LearningJan-2-2019

A long standing open problem in the theory of neural networks is the development of quantitative methods to estimate and compare the capabilities of different architectures. Here we define the capacity of an architecture by the binary logarithm of the number of functions it can compute, as the synaptic weights are varied. The capacity is an upper bound on the number of bits that can be "communicated" from the training data to the architecture over the learning channel. We study the capacity of layered, fully-connected, architectures of linear threshold neurons with $L$ layers of size $n_1,n_2, \ldots, n_L$ and show that in essence the capacity is given by a cubic polynomial in the layer sizes: $C(n_1,\ldots, n_L)=\sum_{k=1}^{L-1} \min(n_1,\ldots,n_k)n_kn_{k+1}$. In proving the main result, we also develop new techniques (multiplexing, enrichment, and stacking) as well as new bounds on the capacity of finite sets. We use the main result to identify architectures with maximal or minimal capacity under a number of natural constraints. This leads to the notion of structural regularization for deep architectures. While in general, everything else being equal, shallow networks compute more functions than deep networks, the functions computed by deep networks are more regular and "interesting".

architecture, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

1901.00434

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On Neuronal Capacity

Baldi, Pierre, Vershynin, Roman

Neural Information Processing SystemsDec-31-2018

We define the capacity of a learning machine to be the logarithm of the number (or volume) of the functions it can implement. We review known results, and derive new results, estimating the capacity of several neuronal models: linear and polynomial threshold gates, linear and polynomial threshold gates with constrained weights (binary weights, positive weights), and ReLU neurons. We also derive capacity estimates and bounds for fully recurrent networks and layered feedforward networks.

neural network, survey article, threshold function, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California > Orange County > Irvine (0.14)

Genre: Overview (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

On Neuronal Capacity

Baldi, Pierre, Vershynin, Roman

Neural Information Processing SystemsDec-31-2018

neural network, survey article, threshold function, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California > Orange County > Irvine (0.14)

Genre: Overview (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Efficient Neutrino Oscillation Parameter Inference with Gaussian Process

Li, Lingge, Nayak, Nitish, Bian, Jianming, Baldi, Pierre

arXiv.org Machine LearningNov-16-2018

Neutrino oscillation study involves inferences from tiny samples of data which have complicated dependencies on multiple oscillation parameters simultaneously. This is typically carried out using the unified approach of Feldman and Cousins which is very computationally expensive, on the order of tens of millions of CPU hours. In this work, we propose an iterative method using Gaussian Process to efficiently find a confidence contour for the oscillation parameters and show that it produces the same results at a fraction of the computation cost.

artificial intelligence, confidence contour, machine learning, (13 more...)

arXiv.org Machine Learning

1811.0705

Country: North America > United States > California > Orange County > Irvine (0.15)

Genre: Research Report (0.40)

Technology:

Information Technology > Modeling & Simulation (0.73)
Information Technology > Artificial Intelligence > Machine Learning (0.49)

Add feedback

Solving the Rubik's Cube Without Human Knowledge

McAleer, Stephen, Agostinelli, Forest, Shmakov, Alexander, Baldi, Pierre

arXiv.org Artificial IntelligenceMay-18-2018

A generally intelligent agent must be able to teach itself how to solve problems in complex domains with minimal human supervision. Recently, deep reinforcement learning algorithms combined with self-play have achieved superhuman proficiency in Go, Chess, and Shogi without human data or domain knowledge. In these environments, a reward is always received at the end of the game, however, for many combinatorial optimization environments, rewards are sparse and episodes are not guaranteed to terminate. We introduce Autodidactic Iteration: a novel reinforcement learning algorithm that is able to teach itself how to solve the Rubik's Cube with no human assistance. Our algorithm is able to solve 100% of randomly scrambled cubes while achieving a median solve length of 30 moves -- less than or equal to solvers that employ human domain knowledge.

artificial intelligence, reinforcement learning, rubik's cube, (18 more...)

arXiv.org Artificial Intelligence

1805.0747

Country: North America > United States > California (0.15)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Rubik's Cube (0.76)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning in the Machine: Random Backpropagation and the Deep Learning Channel

Baldi, Pierre, Sadowski, Peter, Lu, Zhiqin

arXiv.org Artificial IntelligenceDec-22-2017

Random backpropagation (RBP) is a variant of the backpropagation algorithm for training neural networks, where the transpose of the forward matrices are replaced by fixed random matrices in the calculation of the weight updates. It is remarkable both because of its effectiveness, in spite of using random matrices to communicate error information, and because it completely removes the taxing requirement of maintaining symmetric weights in a physical neural system. To better understand random backpropagation, we first connect it to the notions of local learning and learning channels. Through this connection, we derive several alternatives to RBP, including skipped RBP (SRPB), adaptive RBP (ARBP), sparse RBP, and their combinations (e.g. ASRBP) and analyze their computational complexity. We then study their behavior through simulations using the MNIST and CIFAR-10 bechnmark datasets. These simulations show that most of these variants work robustly, almost as well as backpropagation, and that multiplication by the derivatives of the activation functions is important. As a follow-up, we study also the low-end of the number of bits required to communicate error information over the learning channel. We then provide partial intuitive explanations for some of the remarkable properties of RBP and its variations. Finally, we prove several mathematical results, including the convergence to fixed points of linear chains of arbitrary length, the convergence to fixed points of linear autoencoders with decorrelated data, the long-term existence of solutions for linear systems with a single hidden layer and convergence in special cases, and the convergence to fixed points of non-linear chains, when the derivative of the activation functions is included.

deep learning, matrix, neural network, (18 more...)

arXiv.org Artificial Intelligence

1612.02734

Country: North America > United States > California (0.14)

Industry: Energy > Oil & Gas (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (1.00)

Add feedback

Decorrelated Jet Substructure Tagging using Adversarial Neural Networks

Shimmin, Chase, Sadowski, Peter, Baldi, Pierre, Weik, Edison, Whiteson, Daniel, Goul, Edward, Søgaard, Andreas

arXiv.org Machine LearningMar-9-2017

We describe a strategy for constructing a neural network jet substructure tagger which powerfully discriminates boosted decay signals while remaining largely uncorrelated with the jet mass. This reduces the impact of systematic uncertainties in background modeling while enhancing signal purity, resulting in improved discovery significance relative to existing taggers. The network is trained using an adversarial strategy, resulting in a tagger that learns to balance classification accuracy with decorrelation. As a benchmark scenario, we consider the case where large-radius jets originating from a boosted resonance decay are discriminated from a background of nonresonant quark and gluon jets. We show that in the presence of systematic uncertainties on the background rate, our adversarially-trained, decorrelated tagger considerably outperforms a conventionally trained neural network, despite having a slightly worse signal-background separation power. We generalize the adversarial training technique to include a parametric dependence on the signal hypothesis, training a single network that provides optimized, interpolatable decorrelated jet tagging across a continuous range of hypothetical resonance masses, after training on discrete choices of the signal mass.

artificial intelligence, jet mass, neural network, (17 more...)

arXiv.org Machine Learning

doi: 10.1103/PhysRevD.96.074034

1703.03507

Country:

Europe (0.93)
North America > United States (0.68)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Revealing Fundamental Physics from the Daya Bay Neutrino Experiment using Deep Neural Networks

Racah, Evan, Ko, Seyoon, Sadowski, Peter, Bhimji, Wahid, Tull, Craig, Oh, Sang-Yun, Baldi, Pierre, Prabhat, null

arXiv.org Machine LearningDec-6-2016

Experiments in particle physics produce enormous quantities of data that must be analyzed and interpreted by teams of physicists. This analysis is often exploratory, where scientists are unable to enumerate the possible types of signal prior to performing the experiment. Thus, tools for summarizing, clustering, visualizing and classifying high-dimensional data are essential. In this work, we show that meaningful physical content can be revealed by transforming the raw data into a learned high-level representation using deep neural networks, with measurements taken at the Daya Bay Neutrino Experiment as a case study. We further show how convolutional deep neural networks can provide an effective classification filter with greater than 97% accuracy across different classes of physics events, significantly better than other machine learning approaches.

deep learning, neural network, representation, (14 more...)

arXiv.org Machine Learning

doi: 10.1109/ICMLA.2016.0160

1601.07621

Country:

North America > United States > California > Santa Barbara County > Santa Barbara (0.14)
North America > United States > California > Orange County > Irvine (0.14)

Genre: Research Report (0.65)

Industry: Energy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback