Asia
Mathematical understanding of detailed balance condition violation and its application to Langevin dynamics
We develop an efficient sampling method by simulating Langevin dynamics with an artificial force rather than a natural force by using the gradient of the potential energy. The standard technique for sampling following the predetermined distribution such as the Gibbs-Boltzmann one is performed under the detailed balance condition. In the present study, we propose a modified Langevin dynamics violating the detailed balance condition on the transition-probability formulation. We confirm that the numerical implementation of the proposed method actually demonstrates two major beneficial improvements: acceleration of the relaxation to the predetermined distribution and reduction of the correlation time between two different realizations in the steady state.
On Machine Learning towards Predictive Sales Pipeline Analytics
Yan, Junchi (East China Normal Univesity) | Zhang, Chao (Shanghai Jiao Tong University) | Zha, Hongyuan (IBM Research - China) | Gong, Min (East China Normal University) | Sun, Changhua (IBM Research - China) | Huang, Jin (IBM Research - China) | Chu, Stephen (IBM Research - China) | Yang, Xiaokang (IBM Research - China)
Sales pipeline win-propensity prediction is fundamental to effective sales management. In contrast to using subjective human rating, we propose a modern machine learning paradigm to estimate the win-propensity of sales leads over time. A profile-specific two-dimensional Hawkes processes model is developed to capture the influence from seller's activities on their leads to the win outcome, coupled with lead's personalized profiles. It is motivated by two observations: i) sellers tend to frequently focus their selling activities and efforts on a few leads during a relatively short time. This is evidenced and reflected by their concentrated interactions with the pipeline, including login, browsing and updating the sales leads which are logged by the system; ii) the pending opportunity is prone to reach its win outcome shortly after such temporally concentrated interactions. Our model is deployed and in continual use to a large, global, B2B multinational technology enterprize (Fortune 500) with a case study. Due to the generality and flexibility of the model, it also enjoys the potential applicability to other real-world problems.
Parallel Gaussian Process Regression for Big Data: Low-Rank Representation Meets Markov Approximation
Low, Kian Hsiang (National University of Singapore) | Yu, Jiangbo (National University of Singapore) | Chen, Jie (Singapore-MIT Alliance for Research and Technology) | Jaillet, Patrick (Massachusetts Institute of Technology)
The expressive power of a Gaussian process (GP) model comes at a cost of poor scalability in the data size. To improve its scalability, this paper presents a low-rank-cum-Markov approximation (LMA) of the GP model that is novel in leveraging the dual computational advantages stemming from complementing a low-rank approximate representation of the full-rank GP based on a support set of inputs with a Markov approximation of the resulting residual process; the latter approximation is guaranteed to be closest in the Kullback-Leibler distance criterion subject to some constraint and is considerably more refined than that of existing sparse GP models utilizing low-rank representations due to its more relaxed conditional independence assumption (especially with larger data). As a result, our LMA method can trade off between the size of the support set and the order of the Markov property to (a) incur lower computational cost than such sparse GP models while achieving predictive performance comparable to them and (b) accurately represent features/patterns of any scale. Interestingly, varying the Markov order produces a spectrum of LMAs with PIC approximation and full-rank GP at the two extremes. An advantage of our LMA method is that it is amenable to parallelization on multiple machines/cores, thereby gaining greater scalability. Empirical evaluation on three real-world datasets in clusters of up to 32 computing nodes shows that our centralized and parallel LMA methods are significantly more time-efficient and scalable than state-of-the-art sparse and full-rank GP regression methods while achieving comparable predictive performances.
A Sparse Combined Regression-Classification Formulation for Learning a Physiological Alternative to Clinical Post-Traumatic Stress Disorder Scores
Brown, Sarah Marie (Northeastern University and Charles Stark Draper Laboratory) | Webb, Andrea (Charles Stark Draper Laboratory) | Mangoubi, Rami (Charles Stark Draper Laboratory) | Dy, Jennifer (Northeastern University)
Current diagnostic methods for mental pathologies, including Post-Traumatic Stress Disorder (PTSD), involve a clinician-coded interview, which can be subjective. Heart rate and skin conductance, as well as other peripheral physiology measures, have previously shown utility in predicting binary diagnostic decisions. The binary decision problem is easier, but misses important information on the severity of the patient’s condition. This work utilizes a novel experimental set-up that exploits virtual reality videos and peripheral physiology for PTSD diagnosis. In pursuit of an automated physiology-based objective diagnostic method, we propose a learning formulation that integrates the description of the experimental data and expert knowledge on desirable properties of a physiological diagnostic score. From a list of desired criteria, we derive a new cost function that combines regression and classification while learning the salient features for predicting physiological score. The physiological score produced by Sparse Combined Regression-Classification (SCRC) is assessed with respect to three sets of criteria chosen to reflect design goals for an objective, physiological PTSD score: parsimony and context of selected features, diagnostic score validity, and learning generalizability. For these criteria, we demonstrate that Sparse Combined Regression-Classification performs better than more generic learning approaches.
Intelligent Agents for Rehabilitation and Care of Disabled and Chronic Patients
Kraus, Sarit (Bar-Ilan University)
The number of people with disabilities is continuously increasing. Providing patients who have disabilities with the rehabilitation and care necessary to allow them good quality of life creates overwhelming demands for health and rehabilitation services. We suggest that advancements in intelligent agent technology provide new opportunities for improving the provided services. We will discuss the challenges of building an agent for the health care domain and present four capabilities that are required for an agent in the health care domain: planning, monitoring, intervention and encouragement. We will discuss the importance of personalizing all of them and the needto facilitate cooperation between the automated agent and the human care givers. We will review recent technology that can be used toward the development of agents that can have these capabilities and their promise in automating services such as physiotherapy, speech therapy and cognitive training.
Eigenvalues Ratio for Kernel Selection of Kernel Methods
Liu, Yong (Tianjin University) | Liao, Shizhong (Tianjin University)
The selection of kernel function which determines the mapping between the input space and the feature space is of crucial importance to kernel methods. Existing kernel selection approaches commonly use some measures of generalization error, which are usually difficult to estimate and have slow convergence rates. In this paper, we propose a novel measure, called eigenvalues ratio (ER), of the tight bound of generalization error for kernel selection. ER is the ration between the sum of the main eigenvalues and that of the tail eigenvalues of the kernel matrix. Defferent from most of existing measures, ER is defined on the kernel matrxi, so it can be estimated easily from the available training data, which makes it usable for kernel selection. We establish tight ER-based generalization error bounds of order $O(\frac{1}{n})$ for several kernel-based methods under certain general conditions, while for most of existing measures, the convergence rate is at most $O(\frac{1}{\sqrt{n}})$. Finally, to guarantee good generalization performance, we propose a novel kernel selection criterion by minimizing the derived tight generalization error bounds. Theoretical analysis and experimental results demonstrate that our kernel selection criterion is a good choice for kernel seletion.
Noise-Robust Semi-Supervised Learning by Large-Scale Sparse Coding
Lu, Zhiwu (Renmin University of China) | Gao, Xin (King Abdullah University of Science and Technology) | Wang, Liwei (Peking University) | Wen, Ji-Rong (Renmin University of China) | Huang, Songfang (IBM China Research Lab)
This paper presents a large-scale sparse coding algorithm to deal with the challenging problem of noise-robust semi-supervised learning over very large data with only few noisy initial labels. By giving an L1-norm formulation of Laplacian regularization directly based upon the manifold structure of the data, we transform noise-robust semi-supervised learning into a generalized sparse coding problem so that noise reduction can be imposed upon the noisy initial labels. Furthermore, to keep the scalability of noise-robust semi-supervised learning over very large data, we make use of both nonlinear approximation and dimension reduction techniques to solve this generalized sparse coding problem in linear time and space complexity. Finally, we evaluate the proposed algorithm in the challenging task of large-scale semi-supervised image classification with only few noisy initial labels. The experimental results on several benchmark image datasets show the promising performance of the proposed algorithm.
Multi-Objective Reinforcement Learning with Continuous Pareto Frontier Approximation
Pirotta, Matteo (Politecnico di Milano) | Parisi, Simone (Politecnico di Milano) | Restelli, Marcello (Politecnico di Milano)
This paper is about learning a continuous approximation of the Pareto frontier in Multi-Objective Markov Decision Problems (MOMDPs).We propose a policy-based approach that exploits gradient information to generate solutions close to the Pareto ones.Differently from previous policy-gradient multi-objective algorithms, where n optimization routines are used to have n solutions, our approach performs a single gradient-ascent run that at each step generates an improved continuous approximation of the Pareto frontier.The idea is to exploit a gradient-based approach to optimize the parameters of a function that defines a manifold in the policy parameter space so that the corresponding image in the objective space gets as close as possible to the Pareto frontier.Besides deriving how to compute and estimate such gradient, we will also discuss the non-trivial issue of defining a metric to assess the quality of the candidate Pareto frontiers.Finally, the properties of the proposed approach are empirically evaluated on two interesting MOMDPs.
Sense-Aaware Semantic Analysis: A Multi-Prototype Word Representation Model Using Wikipedia
Wu, Zhaohui (The Pennsylvania State University) | Giles, C. Lee (The Pennsylvania State University)
Human languages are naturally ambiguous, which makes it difficult to automatically understand the semantics of text. Most vector space models (VSM) treat all occurrences of a word as the same and build a single vector to represent the meaning of a word, which fails to capture any ambiguity. We present sense-aware semantic analysis (SaSA), a multi-prototype VSM for word representation based on Wikipedia, which could account for homonymy and polysemy. The "sense-specific'' prototypes of a word are produced by clustering Wikipedia pages based on both local and global contexts of the word in Wikipedia. Experimental evaluations on semantic relatedness for both isolated words and words in sentential contexts and word sense induction demonstrate its effectiveness.
Low-Rank Similarity Metric Learning in High Dimensions
Liu, Wei (IBM T. J. Watson Research Center) | Mu, Cun (Columbia University) | Ji, Rongrong (Xiamen University) | Ma, Shiqian (The Chinese University of Hong Kong) | Smith, John R. (IBM T. J. Watson Research Center) | Chang, Shih-Fu (Columbia University)
Metric learning has become a widespreadly used tool in machine learning. To reduce expensive costs brought in by increasing dimensionality, low-rank metric learning arises as it can be more economical in storage and computation. However, existing low-rank metric learning algorithms usually adopt nonconvex objectives, and are hence sensitive to the choice of a heuristic low-rank basis. In this paper, we propose a novel low-rank metric learning algorithm to yield bilinear similarity functions. This algorithm scales linearly with input dimensionality in both space and time, therefore applicable to high-dimensional data domains. A convex objective free of heuristics is formulated by leveraging trace norm regularization to promote low-rankness. Crucially, we prove that all globally optimal metric solutions must retain a certain low-rank structure, which enables our algorithm to decompose the high-dimensional learning task into two steps: an SVD-based projection and a metric learning problem with reduced dimensionality. The latter step can be tackled efficiently through employing a linearized Alternating Direction Method of Multipliers. The efficacy of the proposed algorithm is demonstrated through experiments performed on four benchmark datasets with tens of thousands of dimensions.