Zhang, Yichi, Tao, Siyu, Chen, Wei, Apley, Daniel W.

Computer simulations often involve both qualitative and numerical inputs. Existing Gaussian process (GP) methods for handling this mainly assume a different response surface for each combination of levels of the qualitative factors and relate them via a multiresponse cross-covariance matrix. We introduce a substantially different approach that maps each qualitative factor to an underlying numerical latent variable (LV), with the mapped value for each level estimated similarly to the covariance lengthscale parameters. This provides a parsimonious GP parameterization that treats qualitative factors the same as numerical variables and views them as effecting the response via similar physical mechanisms. This has strong physical justification, as the effects of a qualitative factor in any physics-based simulation model must always be due to some underlying numerical variables. Even when the underlying variables are many, sufficient dimension reduction arguments imply that their effects can be represented by a low-dimensional LV. This conjecture is supported by the superior predictive performance observed across a variety of examples. Moreover, the mapped LVs provide substantial insight into the nature and effects of the qualitative factors.

We argue that second-order Markov logic is ideally suited for this purpose and propose an approach based on it. Our algorithm discovers structural regularities in the source domain in the form of Markov logic formulas with predicate variables and instantiates these formulas with predicates from the target domain. Our approach has successfully transferred learned knowledge among molecular biology, web, and social network domains. For example, Wall Street firms often hire physicists to solve finance problems. Even though these two domains have superficially nothing in common, training as a physicist provides knowledge and skills that are highly applicable in finance (for example, solving differential equations and performing Monte Carlo simulations).

In any learning task, it is natural to incorporate latent or hidden variables which are not directly observed. For instance, in a social network, we can observe interactions among the actors, but not their hidden interests/intents, in gene networks, we can measure gene expression levels but not the detailed regulatory mechanisms, and so on. I will present a broad framework for unsupervised learning of latent variable models, addressing both statistical and computational concerns. We show that higher order relationships among observed variables have a low rank representation under natural statistical constraints such as conditional-independence relationships. These findings have implications in a number of settings such as finding hidden communities in networks, discovering topics in text documents and learning about gene regulation in computational biology.

Glueck, Blake (Bradley University) | Alvin, Chris (Furman University)

This paper presents a method for generating single-variable limit problems for an introductory Calculus course. Our method generates problems in two steps. The first step uses an evolutionary approach to construct unique functions $f$. The second step involves an analysis of $f$ to compute distinct ``approach'' values. Our experimental procedures demonstrate the limitations and utility of our approach.

Garrido-Merchán, Eduardo C., Hernández-Lobato, Daniel

Bayesian Optimization (BO) methods are useful for optimizing functions that are expen- sive to evaluate, lack an analytical expression and whose evaluations can be contaminated by noise. These methods rely on a probabilistic model of the objective function, typically a Gaussian process (GP), upon which an acquisition function is built. The acquisition function guides the optimization process and measures the expected utility of performing an evaluation of the objective at a new point. GPs assume continous input variables. When this is not the case, for example when some of the input variables take categorical or integer values, one has to introduce extra approximations. Consider a suggested input location taking values in the real line. Before doing the evaluation of the objective, a common approach is to use a one hot encoding approximation for categorical variables, or to round to the closest integer, in the case of integer-valued variables. We show that this can lead to problems in the optimization process and describe a more principled approach to account for input variables that are categorical or integer-valued. We illustrate in both synthetic and a real experiments the utility of our approach, which significantly improves the results of standard BO methods using Gaussian processes on problems with categorical or integer-valued variables.