# compute

### Leveraging Data In Chipmaking

John Kibarian, president and CEO of PDF Solutions, sat down with Semiconductor Engineering to talk about the impact of data analytics on everything from yield and reliability to the inner structure of organizations, how the cloud and edge will work together, and where the big threats are in the future. SE: When did you recognize that data would be so critical to hardware design and manufacturing? Kibarian: It goes back to 2014, when we realized that consolidation in foundries was part of a bigger shift toward fabless companies. Every fabless company was going to become a systems company, and many systems companies were rapidly becoming fabless. We had been using our analytics to help customers with advanced nodes, and one of them told me that they were never going to build another factory again. Our analytics had been used for materials review board and better control of our supply chain and packaging before that.

### Opening the Black Box: How Neural Nets See our World

Convolutional Neural Networks (CNNs) and other deep networks have enabled unprecedented breakthroughs in a variety of Computer Vision tasks, ranging from Image Classification (classify the image into a category from a given set of categories), to semantic segmentation (segment the detected category), image captioning (describe the image in natural language), and more recently, visual question answering (answer a natural language question about an image). Despite their success, when these systems fail, they fail disgracefully, without any warning or explanation, leaving us staring at an incoherent output, wondering why it said what it said. Their lack of decomposability into intuitive and understandable components makes them extremely hard to interpret. Let us consider a situation where we want to classify between a car and a ship in satellite imagery. Let's say that we have the data, and trained a CNN to classify the images.

### Safe Policy Improvement by Minimizing Robust Baseline Regret

An important problem in sequential decision-making under uncertainty is to use limited data to compute a safe policy, i.e., a policy that is guaranteed to perform at least as well as a given baseline strategy. In this paper, we develop and analyze a new model-based approach to compute a safe policy when we have access to an inaccurate dynamics model of the system with known accuracy guarantees. Our proposed robust method uses this (inaccurate) model to directly minimize the (negative) regret w.r.t. the baseline policy. Contrary to the existing approaches, minimizing the regret allows one to improve the baseline policy in states with accurate dynamics and seamlessly fall back to the baseline policy, otherwise. We show that our formulation is NP-hard and propose an approximate algorithm.

### Introduction to Neural Networks

Neural Networks is the heart of Deep learning. Neural networks roughly resemble the working process of our brain. Let us look into an classification example for better understanding. Consider two parameters age and income level. If age is 18 or above and income level is verified so that you can repay the debt then credit card is approved else not approved.

### An Homotopy Algorithm for the Lasso with Online Observations

We propose in this paper an algorithm to solve the Lasso with online observations. We introduce an optimization problem that allows us to compute an homotopy from the current solution to the solution after observing a new data point. We compare our method to Lars and present an application to compressed sensing with sequential observations. Our approach can also be easily extended to compute an homotopy from the current solution to the solution after removing a data point, which leads to an efficient algorithm for leave-one-out cross-validation. Papers published at the Neural Information Processing Systems Conference.

### Value Function in Frequency Domain and the Characteristic Value Iteration Algorithm

This paper considers the problem of estimating the distribution of returns in reinforcement learning (i.e., distributional RL problem). It presents a new representational framework to maintain the uncertainty of returns and provides mathematical tools to compute it. We show that instead of representing a probability distribution function of returns, one can represent their characteristic function instead, the Fourier transform of their distribution. We call the new representation Characteristic Value Function (CVF), which can be interpreted as the frequency domain representation of the probability distribution of returns. We show that the CVF satisfies a Bellman-like equation, and its corresponding Bellman operator is contraction with respect to certain metrics.

### Sliced Gromov-Wasserstein

Recently used in various machine learning contexts, the Gromov-Wasserstein distance (GW) allows for comparing distributions whose supports do not necessarily lie in the same metric space. However, this Optimal Transport (OT) distance requires solving a complex non convex quadratic program which is most of the time very costly both in time and memory. Contrary to GW, the Wasserstein distance (W) enjoys several properties ({\em e.g.} duality) that permit large scale optimization. Among those, the solution of W on the real line, that only requires sorting discrete samples in 1D, allows defining the Sliced Wasserstein (SW) distance. This paper proposes a new divergence based on GW akin to SW.

### Computing Linear Restrictions of Neural Networks

A linear restriction of a function is the same function with its domain restricted to points on a given line. This paper addresses the problem of computing a succinct representation for a linear restriction of a piecewise-linear neural network. This primitive, which we call ExactLine, allows us to exactly characterize the result of applying the network to all of the infinitely many points on a line. In particular, ExactLine computes a partitioning of the given input line segment such that the network is affine on each partition. We present an efficient algorithm for computing ExactLine for networks that use ReLU, MaxPool, batch normalization, fully-connected, convolutional, and other layers, along with several applications.

### k-Means Clustering of Lines for Big Data

The input to the \emph{$k$-mean for lines} problem is a set $L$ of $n$ lines in $\mathbb{R} d$, and the goal is to compute a set of $k$ centers (points) in $\mathbb{R} d$ that minimizes the sum of squared distances over every line in $L$ and its nearest center. This is a straightforward generalization of the $k$-mean problem where the input is a set of $n$ points instead of lines. We suggest the first PTAS that computes a $(1 \epsilon)$-approximation to this problem in time $O(n \log n)$ for any constant approximation error $\epsilon \in (0, 1)$, and constant integers $k, d \geq 1$. This is by proving that there is always a weighted subset (called coreset) of $dk {O(k)}\log (n)/\epsilon 2$ lines in $L$ that approximates the sum of squared distances from $L$ to \emph{any} given set of $k$ points. Using traditional merge-and-reduce technique, this coreset implies results for a streaming set (possibly infinite) of lines to $M$ machines in one pass (e.g.

### Distributed estimation of the inverse Hessian by determinantal averaging

In distributed optimization and distributed numerical linear algebra, we often encounter an inversion bias: if we want to compute a quantity that depends on the inverse of a sum of distributed matrices, then the sum of the inverses does not equal the inverse of the sum. An example of this occurs in distributed Newton's method, where we wish to compute (or implicitly work with) the inverse Hessian multiplied by the gradient. In this case, locally computed estimates are biased, and so taking a uniform average will not recover the correct solution. To address this, we propose determinantal averaging, a new approach for correcting the inversion bias. This approach involves reweighting the local estimates of the Newton's step proportionally to the determinant of the local Hessian estimate, and then averaging them together to obtain an improved global estimate.