Goto

Collaborating Authors

 Optimization


An Efficient, Expressive and Local Minima-Free Method for Learning Controlled Dynamical Systems

AAAI Conferences

We propose a framework for modeling and estimating the state of controlled dynamical systems, where an agent can affect the system through actions and receives partial observations. Based on this framework, we propose Predictive State Representation with Random Fourier Features (RFF-PSR). A key property in RFF-PSRs is that the state estimate is represented by a conditional distribution of future observations given future actions. RFFPSRs combine this representation with moment-matching, kernel embedding, and local optimization to achieve a method that enjoys several favorable qualities: It can represent controlled environments which can be affected by actions, it has an efficient and theoretically justified learning algorithm, it uses a non-parametric representation that has expressive power to represent continuous non-linear dynamics. We provide a detailed formulation, a theoretical analysis and an experimental evaluation that demonstrates the effectiveness of our method.


Approximate and Exact Enumeration of Rule Models

AAAI Conferences

In machine learning, rule models are one of the most popular choices when model interpretability is the primary concern. Ordinary, a single model is obtained by solving an optimization problem, and the resulting model is interpreted as the one that best explains the data. In this study, instead of finding a single rule model, we propose algorithms for enumerating multiple rule models. Model enumeration is useful in practice when (i) users want to choose a model that is particularly suited to their task knowledge, or (ii) users want to obtain several possible mechanisms that could be underlying the data to use as hypotheses for further scientific studies. To this end, we propose two enumeration algorithms: an approximate algorithm and an exact algorithm. We prove that these algorithms can enumerate models in a descending order of their objective function values approximately and exactly. We then confirm our theoretical results through experiments on real-world data. We also show that, by using the proposed enumeration algorithms, we can find several different models of almost equal quality.


Inexact Proximal Gradient Methods for Non-Convex and Non-Smooth Optimization

AAAI Conferences

In machine learning research, the proximal gradient methods are popular for solving various optimization problems with non-smooth regularization. Inexact proximal gradient methods are extremely important when exactly solving the proximal operator is time-consuming, or the proximal operator does not have an analytic solution. However, existing inexact proximal gradient methods only consider convex problems. The knowledge of inexact proximal gradient methods in the non-convex setting is very limited. To address this challenge, in this paper, we first propose three inexact proximal gradient algorithms, including the basic version and Nesterovโ€™s accelerated version. After that, we provide the theoretical analysis to the basic and Nesterovโ€™s accelerated versions. The theoretical results show that our inexact proximal gradient algorithms can have the same convergence rates as the ones of exact proximal gradient algorithms in the non-convex setting. Finally, we show the applications of our inexact proximal gradient algorithms on three representative non-convex learning problems. Empirical results confirm the superiority of our new inexact proximal gradient algorithms.


Incomplete Label Multi-Task Ordinal Regression for Spatial Event Scale Forecasting

AAAI Conferences

Event scales are commonly used by practitioners to gauge subjective feelings on the magnitude and significance of social events. For example, the Centers for Disease Control and Prevention (CDC) utilizes a 10-level scale to distinguish the severity of flu outbreaks and governments typically categorize violent outbreaks based on their intensity as reflected in multiple aspects. Effective forecasting of future event scales can be used qualitatively to determine reasonable resource allocations and facilitate accurate proactive actions by practitioners. Existing spatial event forecasting methods typically focus on the occurrence of events rather than their ordinal event scales as this is very challenging in several respects, including 1) the ordinal nature of the event scale, 2) the spatial heterogeneity of event scaling in different geo-locations, 3) the incompleteness of scale label data for some spatial locations, and 4) the spatial correlation of event scale patterns. In order to address all these challenges concurrently, a MultI-Task Ordinal Regression (MITOR) framework is proposed to effectively forecast the scale of future events. Our model enforces similar feature sparsity patterns for different tasks while preserving the heterogeneity in their scale patterns. In addition, based on the first law of geography, we proposed to enforce spatially-closed tasks to share similar scale patterns with theoretical guarantees. Optimizing the proposed model amounts to a new non-convex and non-smooth problem with an isotonicity constraint, which is then solved by our new algorithm based on ADMM and dynamic programming. Extensive experiments on ten real-world datasets demonstrate the effectiveness and efficiency of the proposed model.


Lagrangian Constrained Community Detection

AAAI Conferences

Semi-supervised or constrained community detection incorporates side information to findcommunities of interest in complex networks. The supervision is often represented as constraints such as known labels and pairwise constraints. Existing constrained community detection approaches often fail to fully benefit from the available side information. This results in poor performance for scenarios such as: when the constraints are required to be fully satisfied, when there is a high confidence about the correctness of the supervision information, and in situations where the side information is expensive or hard to achieve and is only available in a limited amount. In this paper, we propose a new constrained community detection algorithm based on Lagrangian multipliers to incorporate and fully satisfy the instance level supervisio nconstraints. Our proposed algorithm can more fully utilise available side information and find better quality solutions. Our experiments on real and synthetic data sets show our proposed LagCCD algorithm outperforms existing algorithms in terms of solution quality, ability to satisfy the constraints and noise resistance.


Decomposition Strategies for Constructive Preference Elicitation

AAAI Conferences

We tackle the problem of constructive preference elicitation, that is the problem of learning user preferences over verylarge decision problems, involving a combinatorial space of possible outcomes. In this setting, the suggested configuration is synthesized on-the-fly by solving a constrained optimization problem, while the preferences are learned iteratively by interacting with the user. Previous work has shown that Coactive Learning is a suitable method for learning userpreferences in constructive scenarios. In Coactive Learning the user provides feedback to the algorithm in the form of an improvement to a suggested configuration. When the problem involves many decision variables and constraints, this type of interaction poses a significant cognitive burden on the user. We propose a decomposition technique for large preference-based decision problems relying exclusively on inference and feedback over partial configurations. This has the clear advantage of drastically reducing the user cognitive load. Additionally, part-wise inference can be (up to exponentially) less computationally demanding than inference over full configurations. We discuss the theoretical implications of working with parts and present promising empirical results on one synthetic and two realistic constructive problems.


Graph Scan Statistics With Uncertainty

AAAI Conferences

Scan statistics is one of the most popular approaches for anomaly detection in spatial and network data. In practice, there are numerous sources of uncertainty in the observed data. However, most prior works have overlooked such uncertainty, which can affect the accuracy and inferences of such methods. In this paper, we develop the first systematic approach to incorporating uncertainty in scan statistics. We study two formulations for robust scan statistics, one based on the sample average approximation and the other using a max-min objective. We show that uncertainty significantly increases the computational complexity of these problems. Rigorous algorithms and efficient heuristics for both formulations are developed with justification of theoretical bounds. We evaluate our proposed methods on synthetic and real datasets, and we observe that our methods give significant improvement in the detection power as well as optimization objective, relative to a baseline.


Data Poisoning Attacks on Multi-Task Relationship Learning

AAAI Conferences

Multi-task learning (MTL) is a machine learning paradigm that improves the performance of each task by exploiting useful information contained in multiple related tasks. However, the relatedness of tasks can be exploited by attackers to launch data poisoning attacks, which has been demonstrated a big threat to single-task learning. In this paper, we provide the first study on the vulnerability of MTL. Specifically, we focus on multi-task relationship learning (MTRL) models, a popular subclass of MTL models where task relationships are quantized and are learned directly from training data. We formulate the problem of computing optimal poisoning attacks on MTRL as a bilevel program that is adaptive to arbitrary choice of target tasks and attacking tasks. We propose an efficient algorithm called PATOM for computing optimal attack strategies. PATOM leverages the optimality conditions of the subproblem of MTRL to compute the implicit gradients of the upper level objective function. Experimental results on real-world datasets show that MTRL models are very sensitive to poisoning attacks and the attacker can significantly degrade the performance of target tasks, by either directly poisoning the target tasks or indirectly poisoning the related tasks exploiting the task relatedness. We also found that the tasks being attacked are always strongly correlated, which provides a clue for defending against such attacks.


Latent Sparse Modeling of Longitudinal Multi-Dimensional Data

AAAI Conferences

We propose a tensor-based approach to analyze multi-dimensional data describing sample subjects. It simultaneously discovers patterns in features and reveals past temporal points that have impact on current outcomes. The model coefficient, a k-mode tensor, is decomposed into a summation of k tensors of the same dimension. To accomplish feature selection, we introduce the tensor '"atent L F,1 norm" as a grouped penalty in our formulation. Furthermore, the proposed model takes into account within-subject correlations by developing a tensor-based quadratic inference function. We provide an asymptotic analysis of our model when the sample size approaches to infinity. To solve the corresponding optimization problem, we develop a linearized block coordinate descent algorithm and prove its convergence for a fixed sample size. Computational results on synthetic datasets and real-file fMRI and EEG problems demonstrate the superior performance of the proposed approach over existing techniques.


Fair Inference on Outcomes

AAAI Conferences

In this paper, we consider the problem of fair statistical inference involving outcome variables. Examples include classification and regression problems, and estimating treatment effects in randomized trials or observational data. The issue of fairness arises in such problems where some covariates or treatments are "sensitive," in the sense of having potential of creating discrimination. In this paper, we argue that the presence of discrimination can be formalized in a sensible way as the presence of an effect of a sensitive covariate on the outcome along certain causal pathways, a view which generalizes (Pearl 2009). A fair outcome model can then be learned by solving a constrained optimization problem. We discuss a number of complications that arise in classical statistical inference due to this view and provide workarounds based on recent work in causal and semi-parametric inference.