# Information Technology

### Distance Metric Learning for Kernel Machines

Recent work in metric learning has significantly improved the state-of-the-art in k-nearest neighbor classification. Support vector machines (SVM), particularly with RBF kernels, are amongst the most popular classification algorithms that uses distance metrics to compare examples. This paper provides an empirical analysis of the efficacy of three of the most popular Mahalanobis metric learning algorithms as pre-processing for SVM training. We show that none of these algorithms generate metrics that lead to particularly satisfying improvements for SVM-RBF classification. As a remedy we introduce support vector metric learning (SVML), a novel algorithm that seamlessly combines the learning of a Mahalanobis metric with the training of the RBF-SVM parameters. We demonstrate the capabilities of SVML on nine benchmark data sets of varying sizes and difficulties. In our study, SVML outperforms all alternative state-of-the-art metric learning algorithms in terms of accuracy and establishes itself as a serious alternative to the standard Euclidean metric with model selection by cross validation.

### Automated Variational Inference in Probabilistic Programming

We present a new algorithm for approximate inference in probabilistic programs, based on a stochastic gradient for variational programs. This method is efficient without restrictions on the probabilistic program; it is particularly practical for distributions which are not analytically tractable, including highly structured distributions that arise in probabilistic programs. We show how to automatically derive mean-field probabilistic programs and optimize them, and demonstrate that our perspective improves inference efficiency over other algorithms.

### Dynamical Models and Tracking Regret in Online Convex Programming

This paper describes a new online convex optimization method which incorporates a family of candidate dynamical models and establishes novel tracking regret bounds that scale with the comparator's deviation from the best dynamical model in this family. Previous online optimization methods are designed to have a total accumulated loss comparable to that of the best comparator sequence, and existing tracking or shifting regret bounds scale with the overall variation of the comparator sequence. In many practical scenarios, however, the environment is nonstationary and comparator sequences with small variation are quite weak, resulting in large losses. The proposed Dynamic Mirror Descent method, in contrast, can yield low regret relative to highly variable comparator sequences by both tracking the best dynamical model and forming predictions based on that model. This concept is demonstrated empirically in the context of sequential compressive observations of a dynamic scene and tracking a dynamic social network.

### Sparse Nonparametric Graphical Models

We present some nonparametric methods for graphical modeling. In the discrete case, where the data are binary or drawn from a finite alphabet, Markov random fields are already essentially nonparametric, since the cliques can take only a finite number of values. Continuous data are different. The Gaussian graphical model is the standard parametric model for continuous data, but it makes distributional assumptions that are often unrealistic. We discuss two approaches to building more flexible graphical models. One allows arbitrary graphs and a nonparametric extension of the Gaussian; the other uses kernel density estimation and restricts the graphs to trees and forests. Examples of both methods are presented. We also discuss possible future research directions for nonparametric graphical modeling.

### A Randomized Mirror Descent Algorithm for Large Scale Multiple Kernel Learning

We consider the problem of simultaneously learning to linearly combine a very large number of kernels and learn a good predictor based on the learnt kernel. When the number of kernels $d$ to be combined is very large, multiple kernel learning methods whose computational cost scales linearly in $d$ are intractable. We propose a randomized version of the mirror descent algorithm to overcome this issue, under the objective of minimizing the group $p$-norm penalized empirical risk. The key to achieve the required exponential speed-up is the computationally efficient construction of low-variance estimates of the gradient. We propose importance sampling based estimates, and find that the ideal distribution samples a coordinate with a probability proportional to the magnitude of the corresponding gradient. We show the surprising result that in the case of learning the coefficients of a polynomial kernel, the combinatorial structure of the base kernels to be combined allows the implementation of sampling from this distribution to run in $O(\log(d))$ time, making the total computational cost of the method to achieve an $\epsilon$-optimal solution to be $O(\log(d)/\epsilon^2)$, thereby allowing our method to operate for very large values of $d$. Experiments with simulated and real data confirm that the new algorithm is computationally more efficient than its state-of-the-art alternatives.

### Generating Motion Patterns Using Evolutionary Computation in Digital Soccer

Dribbling an opponent player in digital soccer environment is an important practical problem in motion planning. It has special complexities which can be generalized to most important problems in other similar Multi Agent Systems. In this paper, we propose a hybrid computational geometry and evolutionary computation approach for generating motion trajectories to avoid a mobile obstacle. In this case an opponent agent is not only an obstacle but also one who tries to harden dribbling procedure. One characteristic of this approach is reducing process cost of online stage by transferring it to offline stage which causes increment in agents' performance. This approach breaks the problem into two offline and online stages. During offline stage the goal is to find desired trajectory using evolutionary computation and saving it as a trajectory plan. A trajectory plan consists of nodes which approximate information of each trajectory plan. In online stage, a linear interpolation along with Delaunay triangulation in xy-plan is applied to trajectory plan to retrieve desired action.

### Greedy Sparsity-Constrained Optimization

Sparsity-constrained optimization has wide applicability in machine learning, statistics, and signal processing problems such as feature selection and compressive Sensing. A vast body of work has studied the sparsity-constrained optimization from theoretical, algorithmic, and application aspects in the context of sparse estimation in linear models where the fidelity of the estimate is measured by the squared error. In contrast, relatively less effort has been made in the study of sparsity-constrained optimization in cases where nonlinear models are involved or the cost function is not quadratic. In this paper we propose a greedy algorithm, Gradient Support Pursuit (GraSP), to approximate sparse minima of cost functions of arbitrary form. Should a cost function have a Stable Restricted Hessian (SRH) or a Stable Restricted Linearization (SRL), both of which are introduced in this paper, our algorithm is guaranteed to produce a sparse vector within a bounded distance from the true sparse optimum. Our approach generalizes known results for quadratic cost functions that arise in sparse linear regression and Compressive Sensing. We also evaluate the performance of GraSP through numerical simulations on synthetic data, where the algorithm is employed for sparse logistic regression with and without $\ell_2$-regularization.

### Supervised, semi-supervised and unsupervised inference of gene regulatory networks

Inference of gene regulatory network from expression data is a challenging task. Many methods have been developed to this purpose but a comprehensive evaluation that covers unsupervised, semi-supervised and supervised methods, and provides guidelines for their practical application, is lacking. We performed an extensive evaluation of inference methods on simulated expression data. The results reveal very low prediction accuracies for unsupervised techniques with the notable exception of the z-score method on knock-out data. In all other cases the supervised approach achieved the highest accuracies and even in a semi-supervised setting with small numbers of only positive samples, outperformed the unsupervised techniques.

### Recklessly Approximate Sparse Coding

It has recently been observed that certain extremely simple feature encoding techniques are able to achieve state of the art performance on several standard image classification benchmarks including deep belief networks, convolutional nets, factored RBMs, mcRBMs, convolutional RBMs, sparse autoencoders and several others. Moreover, these "triangle" or "soft threshold" encodings are ex- tremely efficient to compute. Several intuitive arguments have been put forward to explain this remarkable performance, yet no mathematical justification has been offered. The main result of this report is to show that these features are realized as an approximate solution to the a non-negative sparse coding problem. Using this connection we describe several variants of the soft threshold features and demonstrate their effectiveness on two image classification benchmark tasks.

### Similarity Assessment through blocking and affordance assignment in Textual CBR

It has been conceived that children learn new objects through their affordances, that is, the actions that can be taken on them. We suggest that web pages also have affordances defined in terms of the users' information need they meet. An assumption of the proposed approach is that different parts of a text may not be equally important / relevant to a given query. Judgment on the relevance of a web document requires, therefore, a thorough look into its parts, rather than treating it as a monolithic content. We propose a method to extract and assign affordances to texts and then use these affordances to retrieve the corresponding web pages. The overall approach presented in the paper relies on case-based representations that bridge the queries to the affordances of web documents. We tested our method on the tourism domain and the results are promising.