Asia
Feedback Solution to Optimal Switching Problems with Switching Cost
Many real-world control problems can be classified as switching problems in the sense that the system subject to control is comprised of several different modes (sometimes called subsystems) and at each instant only one of the modes can be active. A basic example of such a system is a plant equipped with on-off actuators [1]. The solution to such problems includes a switching schedule which determines the number of switching, the switching instants, and the order of the active subsystems. The developments in the field of optimal switching can be divided into different categories, two of which are nonlinear programming based methods and discretization based methods. Nonlinear programming based methods utilize the gradient of the cost with respect to the switching instants to calculate local optimal switching times using nonlinear programming [2]-[9]. In these methods, the sequence of active subsystems, known as the mode sequence, is typically selected a priori. The problem is then simplified to determining the switching instants between the modes. Discretization based methods, however, discretize the state and input space to end up with a finite number of choices [10], [11]. Among the intelligent approaches to the problem, genetic algorithm and neural networks were used in Refs.
Revisiting Kernelized Locality-Sensitive Hashing for Improved Large-Scale Image Retrieval
Jiang, Ke, Que, Qichao, Kulis, Brian
We present a simple but powerful reinterpretation of kernelized locality-sensitive hashing (KLSH), a general and popular method developed in the vision community for performing approximate nearest-neighbor searches in an arbitrary reproducing kernel Hilbert space (RKHS). Our new perspective is based on viewing the steps of the KLSH algorithm in an appropriately projected space, and has several key theoretical and practical benefits. First, it eliminates the problematic conceptual difficulties that are present in the existing motivation of KLSH. Second, it yields the first formal retrieval performance bounds for KLSH. Third, our analysis reveals two techniques for boosting the empirical performance of KLSH. We evaluate these extensions on several large-scale benchmark image retrieval data sets, and show that our analysis leads to improved recall performance of at least 12%, and sometimes much higher, over the standard KLSH method.
Joint modeling of multiple time series via the beta process with application to motion capture segmentation
Fox, Emily B., Hughes, Michael C., Sudderth, Erik B., Jordan, Michael I.
We propose a Bayesian nonparametric approach to the problem of jointly modeling multiple related time series. Our model discovers a latent set of dynamical behaviors shared among the sequences, and segments each time series into regions defined by a subset of these behaviors. Using a beta process prior, the size of the behavior set and the sharing pattern are both inferred from data. We develop Markov chain Monte Carlo (MCMC) methods based on the Indian buffet process representation of the predictive distribution of the beta process. Our MCMC inference algorithm efficiently adds and removes behaviors via novel split-merge moves as well as data-driven birth and death proposals, avoiding the need to consider a truncated model. We demonstrate promising results on unsupervised segmentation of human motion capture data.
On the Complexity of Learning with Kernels
Cesa-Bianchi, Nicolò, Mansour, Yishay, Shamir, Ohad
A well-recognized limitation of kernel learning is the requirement to handle a kernel matrix, whose size is quadratic in the number of training examples. Many methods have been proposed to reduce this computational cost, mostly by using a subset of the kernel matrix entries, or some form of low-rank matrix approximation, or a random projection method. In this paper, we study lower bounds on the error attainable by such methods as a function of the number of entries observed in the kernel matrix or the rank of an approximate kernel matrix. We show that there are kernel learning problems where no such method will lead to non-trivial computational savings. Our results also quantify how the problem difficulty depends on parameters such as the nature of the loss function, the regularization parameter, the norm of the desired predictor, and the kernel matrix rank. Our results also suggest cases where more efficient kernel learning might be possible.
Distributed Policy Evaluation Under Multiple Behavior Strategies
Macua, Sergio Valcarcel, Chen, Jianshu, Zazo, Santiago, Sayed, Ali H.
We apply diffusion strategies to develop a fully-distributed cooperative reinforcement learning algorithm in which agents in a network communicate only with their immediate neighbors to improve predictions about their environment. The algorithm can also be applied to off-policy learning, meaning that the agents can predict the response to a behavior different from the actual policies they are following. The proposed distributed strategy is efficient, with linear complexity in both computation time and memory footprint. We provide a mean-square-error performance analysis and establish convergence under constant step-size updates, which endow the network with continuous learning capabilities. The results show a clear gain from cooperation: when the individual agents can estimate the solution, cooperation increases stability and reduces bias and variance of the prediction error; but, more importantly, the network is able to approach the optimal solution even when none of the individual agents can (e.g., when the individual behavior policies restrict each agent to sample a small portion of the state space).
A provable SVD-based algorithm for learning topics in dominant admixture corpus
Bansal, Trapit, Bhattacharyya, Chiranjib, Kannan, Ravindran
Topic models, such as Latent Dirichlet Allocation (LDA), posit that documents are drawn from admixtures of distributions over words, known as topics. The inference problem of recovering topics from admixtures, is NP-hard. Assuming separability, a strong assumption, [4] gave the first provable algorithm for inference. For LDA model, [6] gave a provable algorithm using tensor-methods. But [4,6] do not learn topic vectors with bounded $l_1$ error (a natural measure for probability vectors). Our aim is to develop a model which makes intuitive and empirically supported assumptions and to design an algorithm with natural, simple components such as SVD, which provably solves the inference problem for the model with bounded $l_1$ error. A topic in LDA and other models is essentially characterized by a group of co-occurring words. Motivated by this, we introduce topic specific Catchwords, group of words which occur with strictly greater frequency in a topic than any other topic individually and are required to have high frequency together rather than individually. A major contribution of the paper is to show that under this more realistic assumption, which is empirically verified on real corpora, a singular value decomposition (SVD) based algorithm with a crucial pre-processing step of thresholding, can provably recover the topics from a collection of documents drawn from Dominant admixtures. Dominant admixtures are convex combination of distributions in which one distribution has a significantly higher contribution than others. Apart from the simplicity of the algorithm, the sample complexity has near optimal dependence on $w_0$, the lowest probability that a topic is dominant, and is better than [4]. Empirical evidence shows that on several real world corpora, both Catchwords and Dominant admixture assumptions hold and the proposed algorithm substantially outperforms the state of the art [5].
Adaptive Learning in Cartesian Product of Reproducing Kernel Hilbert Spaces
We propose a novel adaptive learning algorithm based on iterative orthogonal projections in the Cartesian product of multiple reproducing kernel Hilbert spaces (RKHSs). The task is estimating/tracking nonlinear functions which are supposed to contain multiple components such as (i) linear and nonlinear components, (ii) high- and low- frequency components etc. In this case, the use of multiple RKHSs permits a compact representation of multicomponent functions. The proposed algorithm is where two different methods of the author meet: multikernel adaptive filtering and the algorithm of hyperplane projection along affine subspace (HYPASS). In a certain particular case, the sum space of the RKHSs is isomorphic to the product space and hence the proposed algorithm can also be regarded as an iterative projection method in the sum space. The efficacy of the proposed algorithm is shown by numerical examples.
A Heuristic Method for Solving the Problem of Partitioning Graphs with Supply and Demand
Jovanovic, Raka, Bousselham, Abdelkader, Voss, Stefan
Noname manuscript No. (will be inserted by the editor) Abstract In this paper we present a greedy algorithm for solving the problem of the maximum partitioning of graphs with supply and demand (MPGSD). The goal of the method is to solve the MPGSD for large graphs in a reasonable time limit. This is done by using a two stage greedy algorithm, with two corresponding types of heuristics. The solutions acquired in this way are improved by applying a computationally inexpensive, hill climbing like, greedy correction procedure. In our numeric experiments we analyze different heuristic functions for each stage of the greedy algorithm, and show that their performance is highly dependent on the properties of the specific instance. Our tests show that by exploring a relatively small number of solutions generated by combining different heuristic functions, and applying the proposed correction procedure we can find solutions within only a few percent of the optimal ones. Keywords Graph Partitioning · Greedy Algorithm · Demand vertex · Supply vertex 1 Introduction A wide range of practical problems can be efficiently represented by means of graph partitioning. Present address: Qatar Environment and Energy Research Institute (QEERI), PO Box 5825, Doha, Qatar Abdelkader Bousselham Qatar Environment and Energy Research Institute (QEERI), PO Box 5825, Doha, Qatar Email: abousselham@qf.org.qa In this paper the focus is on the problem of maximum partitioning of a graph with supply and demand (MPGSD). This problem is defined on a graph G, in which each node is either a supply or a demand node. Each vertex v has a corresponding positive number, which is called the supply of node v; otherwise, if v is a demand node, this value would be called demand.
A Multi-Heuristic Approach for Solving the Pre-Marshalling Problem
Jovanovic, Raka, Tuba, Milan, Voss, Stefan
Minimizing the number of reshuffling operations at maritime container terminals incorporates the Pre-Marshalling Problem (PMP) as an important problem. Based on an analysis of existing solution approaches we develop new heuristics utilizing specific properties of problem instances of the PMP. We show that the heuristic performance is highly dependent on these properties. We introduce a new method that exploits a greedy heuristic of four stages, where for each of these stages several different heuristics may be applied. Instead of using randomization to improve the performance of the heuristic, we repetitively generate a number of solutions by using a combination of different heuristics for each stage. In doing so, only a small number of solutions is generated for which we intend that they do not have undesirable properties, contrary to the case when simple randomization is used. Our experiments show that such a deterministic algorithm significantly outperforms the original nondeterministic method when the quality of found solutions is observed, with a much lower number of generated solutions.
Minimal Narrative Annotation Schemes and Their Applications
Rahimtoroghi, Elahe (University of California, Santa Cruz) | Corcoran, Thomas (University of California, Santa Cruz) | Swanson, Reid (University of California, Santa Cruz) | Walker, Marilyn A. (University of California, Santa Cruz) | Sagae, Kenji (Institute for Creative Technologies, University of Southern California) | Gordon, Andrew (Institute for Creative Technologies, University of Southern California)
The increased use of large corpora in narrative research has created new opportunities for empirical research and intelligent narrative technologies. To best exploit the value of these corpora, several research groups are eschewing complex discourse analysis techniques in favor of high-level minimalist narrative annotation schemes that can be quickly applied, achieve high inter-rater agreement, and are amenable to automation using machine-learning techniques. In this paper we compare different annotation schemes that have been employed by two groups of researchers to annotate large corpora of narrative text. Using a dual-annotation methodology, we investigate the correlation between narrative clauses distinguished by their structural role (orientation, action, evaluation), their subjectivity, and their narrative level within the discourse. We find that each simple narrative annotation scheme captures a structurally distinct characteristic of real-world narratives, and each combination of labels is evident in a corpus of 19 weblog narratives (951 narrative clauses). We discuss several potential applications of minimalist narrative annotation schemes, noting the combination of label across these two annotation schemes that best support each task.