compute


Debugging a Machine Learning model written in TensorFlow and Keras

#artificialintelligence

In this article, you get to look over my shoulder as I go about debugging a TensorFlow model. I did a lot of dumb things, so please don't judge. You can see the final (working) model on GitHub. I'm building a model to predict lightning 30 minutes into the future and plan to present it at the American Meteorological Society. A model trained in this way can be used to predict lightning 30 minutes ahead in real-time given the current infrared and GLM data. I wrote up a convnet model borrowing liberally from the training loop of the ResNet model written for the TPU and adapted the input function (to read my data, not JPEG) and the model (a simple convolutional network, not ResNet).


Will Google's more-efficient 'Reformer' mitigate or accelerate the arms race in AI? ZDNet

#artificialintelligence

The promise of technology is always more for less -- faster processors at lower prices, thanks to more circuits crammed into the same silicon area. And artificial intelligence has an analogue, it turns out, based on recent work by engineers at Google, who have found a way to take the "Transformer" language model and make a version of it run in a single graphics processing unit, or GPU, rather than the multiple graphics processing units it normally requires to operate. That presents users with an interesting choice. If you could choose between getting the top technology in AI in a more easy-to-use fashion, would you opt for that, or would you instead want to stretch the power of your existing computer budget to do more? It's like asking, Would you like to pay less for a PC or get even more power for what you have been paying?


Leveraging Data In Chipmaking

#artificialintelligence

John Kibarian, president and CEO of PDF Solutions, sat down with Semiconductor Engineering to talk about the impact of data analytics on everything from yield and reliability to the inner structure of organizations, how the cloud and edge will work together, and where the big threats are in the future. SE: When did you recognize that data would be so critical to hardware design and manufacturing? Kibarian: It goes back to 2014, when we realized that consolidation in foundries was part of a bigger shift toward fabless companies. Every fabless company was going to become a systems company, and many systems companies were rapidly becoming fabless. We had been using our analytics to help customers with advanced nodes, and one of them told me that they were never going to build another factory again. Our analytics had been used for materials review board and better control of our supply chain and packaging before that.


Opening the Black Box: How Neural Nets See our World

#artificialintelligence

Convolutional Neural Networks (CNNs) and other deep networks have enabled unprecedented breakthroughs in a variety of Computer Vision tasks, ranging from Image Classification (classify the image into a category from a given set of categories), to semantic segmentation (segment the detected category), image captioning (describe the image in natural language), and more recently, visual question answering (answer a natural language question about an image). Despite their success, when these systems fail, they fail disgracefully, without any warning or explanation, leaving us staring at an incoherent output, wondering why it said what it said. Their lack of decomposability into intuitive and understandable components makes them extremely hard to interpret. Let us consider a situation where we want to classify between a car and a ship in satellite imagery. Let's say that we have the data, and trained a CNN to classify the images.


Safe Policy Improvement by Minimizing Robust Baseline Regret

Neural Information Processing Systems

An important problem in sequential decision-making under uncertainty is to use limited data to compute a safe policy, i.e., a policy that is guaranteed to perform at least as well as a given baseline strategy. In this paper, we develop and analyze a new model-based approach to compute a safe policy when we have access to an inaccurate dynamics model of the system with known accuracy guarantees. Our proposed robust method uses this (inaccurate) model to directly minimize the (negative) regret w.r.t. the baseline policy. Contrary to the existing approaches, minimizing the regret allows one to improve the baseline policy in states with accurate dynamics and seamlessly fall back to the baseline policy, otherwise. We show that our formulation is NP-hard and propose an approximate algorithm.


Introduction to Neural Networks

#artificialintelligence

Neural Networks is the heart of Deep learning. Neural networks roughly resemble the working process of our brain. Let us look into an classification example for better understanding. Consider two parameters age and income level. If age is 18 or above and income level is verified so that you can repay the debt then credit card is approved else not approved.


An Homotopy Algorithm for the Lasso with Online Observations

Neural Information Processing Systems

We propose in this paper an algorithm to solve the Lasso with online observations. We introduce an optimization problem that allows us to compute an homotopy from the current solution to the solution after observing a new data point. We compare our method to Lars and present an application to compressed sensing with sequential observations. Our approach can also be easily extended to compute an homotopy from the current solution to the solution after removing a data point, which leads to an efficient algorithm for leave-one-out cross-validation. Papers published at the Neural Information Processing Systems Conference.


Value Function in Frequency Domain and the Characteristic Value Iteration Algorithm

Neural Information Processing Systems

This paper considers the problem of estimating the distribution of returns in reinforcement learning (i.e., distributional RL problem). It presents a new representational framework to maintain the uncertainty of returns and provides mathematical tools to compute it. We show that instead of representing a probability distribution function of returns, one can represent their characteristic function instead, the Fourier transform of their distribution. We call the new representation Characteristic Value Function (CVF), which can be interpreted as the frequency domain representation of the probability distribution of returns. We show that the CVF satisfies a Bellman-like equation, and its corresponding Bellman operator is contraction with respect to certain metrics.


Sliced Gromov-Wasserstein

Neural Information Processing Systems

Recently used in various machine learning contexts, the Gromov-Wasserstein distance (GW) allows for comparing distributions whose supports do not necessarily lie in the same metric space. However, this Optimal Transport (OT) distance requires solving a complex non convex quadratic program which is most of the time very costly both in time and memory. Contrary to GW, the Wasserstein distance (W) enjoys several properties ({\em e.g.} duality) that permit large scale optimization. Among those, the solution of W on the real line, that only requires sorting discrete samples in 1D, allows defining the Sliced Wasserstein (SW) distance. This paper proposes a new divergence based on GW akin to SW.


Computing Linear Restrictions of Neural Networks

Neural Information Processing Systems

A linear restriction of a function is the same function with its domain restricted to points on a given line. This paper addresses the problem of computing a succinct representation for a linear restriction of a piecewise-linear neural network. This primitive, which we call ExactLine, allows us to exactly characterize the result of applying the network to all of the infinitely many points on a line. In particular, ExactLine computes a partitioning of the given input line segment such that the network is affine on each partition. We present an efficient algorithm for computing ExactLine for networks that use ReLU, MaxPool, batch normalization, fully-connected, convolutional, and other layers, along with several applications.