to

### Bifidelity data-assisted neural networks in nonintrusive reduced-order modeling

In this paper, we present a new nonintrusive reduced basis method when a cheap low-fidelity model and expensive high-fidelity model are available. The method relies on proper orthogonal decomposition (POD) to generate the high-fidelity reduced basis and a shallow multilayer perceptron to learn the high-fidelity reduced coefficients. In contrast to other methods, one distinct feature of the proposed method is to incorporate the features extracted from the low-fidelity data as the input feature, this approach not only improves the predictive capability of the neural network but also enables the decoupling the high-fidelity simulation from the online stage. Due to its nonin-trusive nature, it is applicable to general parameterized problems. We also provide several numerical examples to illustrate the effectiveness and performance of the proposed method.

### Finite Sample Analysis of LSTD with Random Projections and Eligibility Traces

Policy evaluation with linear function approximation is an important problem in reinforcement learning. When facing high-dimensional feature spaces, such a problem becomes extremely hard considering the computation efficiency and quality of approximations. We propose a new algorithm, LSTD($\lambda$)-RP, which leverages random projection techniques and takes eligibility traces into consideration to tackle the above two challenges. We carry out theoretical analysis of LSTD($\lambda$)-RP, and provide meaningful upper bounds of the estimation error, approximation error and total generalization error. These results demonstrate that LSTD($\lambda$)-RP can benefit from random projection and eligibility traces strategies, and LSTD($\lambda$)-RP can achieve better performances than prior LSTD-RP and LSTD($\lambda$) algorithms.

### Reducing Runtime by Recycling Samples

Contrary to the situation with stochastic gradient descent, we argue that when using stochastic methods with variance reduction, such as SDCA, SAG or SVRG, as well as their variants, it could be beneficial to reuse previously used samples instead of fresh samples, even when fresh samples are available. We demonstrate this empirically for SDCA, SAG and SVRG, studying the optimal sample size one should use, and also uncover be-havior that suggests running SDCA for an integer number of epochs could be wasteful.

### On b-bit min-wise hashing for large-scale regression and classification with sparse data

Large-scale regression problems where both the number of variables, $p$, and the number of observations, $n$, may be large and in the order of millions or more, are becoming increasingly more common. Typically the data are sparse: only a fraction of a percent of the entries in the design matrix are non-zero. Nevertheless, often the only computationally feasible approach is to perform dimension reduction to obtain a new design matrix with far fewer columns and then work with this compressed data. $b$-bit min-wise hashing (Li and Konig, 2011) is a promising dimension reduction scheme for sparse matrices which produces a set of random features such that regression on the resulting design matrix approximates a kernel regression with the resemblance kernel. In this work, we derive bounds on the prediction error of such regressions. For both linear and logistic models we show that the average prediction error vanishes asymptotically as long as $q \|\beta^*\|_2^2 /n \rightarrow 0$, where $q$ is the average number of non-zero entries in each row of the design matrix and $\beta^*$ is the coefficient of the linear predictor. We also show that ordinary least squares or ridge regression applied to the reduced data can in fact allow us fit more flexible models. We obtain non-asymptotic prediction error bounds for interaction models and for models where an unknown row normalisation must be applied in order for the signal to be linear in the predictors.

### Approximating Higher-Order Distances Using Random Projections

We provide a simple method and relevant theoretical analysis for efficiently estimating higher-order lp distances. While the analysis mainly focuses on l4, our methodology extends naturally to p = 6,8,10..., (i.e., when p is even). Distance-based methods are popular in machine learning. In large-scale applications, storing, computing, and retrieving the distances can be both space and time prohibitive. Efficient algorithms exist for estimating lp distances if 0 < p <= 2. The task for p > 2 is known to be difficult. Our work partially fills this gap.