Goto

Collaborating Authors

 Optimization


Distant-Supervision of Heterogeneous Multitask Learning for Social Event Forecasting With Multilingual Indicators

AAAI Conferences

Open-source indicators such as social media can be very effective precursors for forecasting future societal events. As events are often preceded by social indicators generated by groups of people speaking many different languages, multiple languages need to be considered to ensure comprehensive event forecasting. However, this leads to several technical challenges for traditional models: 1) high dimension, sparsity, and redundancy of features; 2) translation correlation among the multilingual features. and 3) lack of language-wise supervision. In order to simultaneously address these issues, we present a novel model capable of distant-supervision of heterogeneous multitask learning (DHML) for multilingual spatial social event forecasting. This model maps the multilingual heterogeneous features into several latent semantic spaces and then enforces a similar sparsity pattern across them all, using distant supervision across all the languages involved. Optimizing this model creates a difficult problem that is nonconvex and nonsmooth that can then be decomposed into simpler subproblems using the Alternative Direction Multiplier of Methods (ADMM). A novel dynamic programming-based algorithm is proposed to solve one challenging subproblem efficiently. Theoretical properties of the proposed algorithm are analyzed. The results of extensive experiments on multiple real-world datasets are presented to demonstrate the effectiveness, efficiency, and interpretability of the proposed approach.


EMD Metric Learning

AAAI Conferences

Earth Mover's Distance (EMD), targeting at measuring the many-to-many distances, has shown its superiority and been widely applied in computer vision tasks, such as object recognition, hyperspectral image classification and gesture recognition. However, there is still little effort concentrated on optimizing the EMD metric towards better matching performance. To tackle this issue, we propose an EMD metric learning algorithm in this paper. In our method, the objective is to learn a discriminative distance metric for EMD ground distance matrix generation which can better measure the similarity between compared subjects. More specifically, given a group of labeled data from different categories, we first select a subset of training data and then optimize the metric for ground distance matrix generation. Here, both the EMD metric and the EMD flow-network are alternatively optimized until a steady EMD value can be achieved. This method is able to generate a discriminative ground distance matrix which can further improve the EMD distance measurement. We then apply our EMD metric learning method on two tasks, i.e., multi-view object classification and document classification. The experimental results have shown better performance of our proposed EMD metric learning method compared with the traditional EMD method and the state-of-the-art methods. It is noted that the proposed EMD metric learning method can be also used in other applications.


Training Set Debugging Using Trusted Items

AAAI Conferences

Training set bugs are flaws in the data that adversely affect machine learning. The training set is usually too large for manual inspection, but one may have the resources to verify a few trusted items. The set of trusted items may not by itself be adequate for learning, so we propose an algorithm that uses these items to identify bugs in the training set and thus improves learning. Specifically, our approach seeks the smallest set of changes to the training set labels such that the model learned from this corrected training set predicts labels of the trusted items correctly. We flag the items whose labels are changed as potential bugs, whose labels can be checked for veracity by human experts. To find the bugs in this way is a challenging combinatorial bilevel optimization problem, but it can be relaxed into a continuous optimization problem.Experiments on toy and real data demonstrate that our approach can identify training set bugs effectively and suggest appropriate changes to the labels. Our algorithm is a step toward trustworthy machine learning.


Beyond Link Prediction: Predicting Hyperlinks in Adjacency Space

AAAI Conferences

This paper addresses the hyperlink prediction problem in hypernetworks. Different from the traditional link prediction problem where only pairwise relations are considered as links, our task here is to predict the linkage of multiple nodes, i.e., hyperlink. Each hyperlink is a set of an arbitrary number of nodes which together form a multiway relationship. Hyperlink prediction is challenging---since the cardinality of a hyperlink is variable, existing classifiers based on a fixed number of input features become infeasible. Heuristic methods, such as the common neighbors and Katz index, do not work for hyperlink prediction, since they are restricted to pairwise similarities. In this paper, we formally define the hyperlink prediction problem, and propose a new algorithm called Coordinated Matrix Minimization (CMM), which alternately performs nonnegative matrix factorization and least square matching in the vertex adjacency space of the hypernetwork, in order to infer a subset of candidate hyperlinks that are most suitable to fill the training hypernetwork. We evaluate CMM on two novel tasks: predicting recipes of Chinese food, and finding missing reactions of metabolic networks. Experimental results demonstrate the superior performance of our method over many seemingly promising baselines.


Partial Multi-Label Learning

AAAI Conferences

It is expensive and difficult to precisely annotate objects with multiple labels. Instead, in many real tasks, annotators may roughly assign each object with a set of candidate labels. The candidate set contains at least one but unknown number of ground-truth labels, and is usually adulterated with some irrelevant labels. In this paper, we formalize such problems as a new learning framework called partial multi-label learning (PML). To solve the PML problem, a confidence value is maintained for each candidate label to estimate how likely it is a ground-truth label of the instance. On one hand, the relevance ordering of labels on each instance is optimized by minimizing a rank loss weighted by the confidences; on the other hand, the confidence values are optimized by further exploiting structure information in feature and label spaces.Experimental results on various datasets show that the proposed approach is effective for solving PML problems.


On the ERM Principle With Networked Data

AAAI Conferences

Networked data, in which every training example involves two objects and may share some common objects with others, is used in many machine learning tasks such as learning to rank and link prediction. A challenge of learning from networked examples is that target values are not known for some pairs of objects. In this case, neither the classical i.i.d. assumption nor techniques based on complete U-statistics can be used. Most existing theoretical results of this problem only deal with the classical empirical risk minimization (ERM) principle that always weights every example equally, but this strategy leads to unsatisfactory bounds. We consider general weighted ERM and show new universal risk bounds for this problem. These new bounds naturally define an optimization problem which leads to appropriate weights for networked examples. Though this optimization problem is not convex in general, we devise a new fully polynomial-time approximation scheme (FPTAS) to solve it.


Bayesian Functional Optimization

AAAI Conferences

Bayesian optimization (BayesOpt) is a derivative-free approach for sequentially optimizing stochastic black-box functions. Standard BayesOpt, which has shown many successes in machine learning applications, assumes a finite dimensional domain which often is a parametric space. The parameter space is defined by the features used in the function approximations which are often selected manually. Therefore, the performance of BayesOpt inevitably depends on the quality of chosen features. This paper proposes a new Bayesian optimization framework that is able to optimize directly on the domain of function spaces. The resulting framework, Bayesian Functional Optimization (BFO), not only extends the application domains of BayesOpt to functional optimization problems but also relaxes the performance dependency on the chosen parameter space. We model the domain of functions as a reproducing kernel Hilbert space (RKHS), and use the notion of Gaussian processes on a real separable Hilbert space. As a result, we are able to define traditional improvement-based (PI and EI) and optimistic acquisition functions (UCB) as functionals. We propose to optimize the acquisition functionals using analytic functional gradients that are also proved to be functions in a RKHS. We evaluate BFO in three typical functional optimization tasks: i) a synthetic functional optimization problem, ii) optimizing activation functions for a multi-layer perceptron neural network, and iii) a reinforcement learning task whose policies are modeled in RKHS.


Dynamic Optimization of Neural Network Structures Using Probabilistic Modeling

AAAI Conferences

Deep neural networks (DNNs) are powerful machine learning models and have succeeded in various artificial intelligence tasks. Although various architectures and modules for the DNNs have been proposed, selecting and designing the appropriate network structure for a target problem is a challenging task. In this paper, we propose a method to simultaneously optimize the network structure and weight parameters during neural network training. We consider a probability distribution that generates network structures, and optimize the parameters of the distribution instead of directly optimizing the network structure. The proposed method can apply to the various network structure optimization problems under the same framework. We apply the proposed method to several structure optimization problems such as selection of layers, selection of unit types, and selection of connections using the MNIST, CIFAR-10, and CIFAR-100 datasets. The experimental results show that the proposed method can find the appropriate and competitive network structures.


Compact Multi-Label Learning

AAAI Conferences

Embedding methods have shown promising performance in multi-label prediction, as they can discover the dependency of labels. Most embedding methods cannot well align the input and output, which leads to degradation in prediction performance. Besides, they suffer from expensive prediction computational costs when applied to large-scale datasets. To address the above issues, this paper proposes a Co-Hashing (CoH) method by formulating multi-label learning from the perspective of cross-view learning. CoH first regards the input and output as two views, and then aims to learn a common latent hamming space, where input and output pairs are compressed into compact binary embeddings. CoH enjoys two key benefits: 1) the input and output can be well aligned, and their correlations are explored; 2) the prediction is very efficient using fast cross-view kNN search in the hamming space. Moreover, we provide the generalization error bound for our method. Extensive experiments on eight real-world datasets demonstrate the superiority of the proposed CoH over the state-of-the-art methods in terms of both prediction accuracy and efficiency.


SAGA: A Submodular Greedy Algorithm for Group Recommendation

AAAI Conferences

In this paper, we propose a unified framework and an algorithm for the problem of group recommendation where a fixed number of items or alternatives can be recommended to a group of users. The problem of group recommendation arises naturally in many real world contexts, and is closely related to the budgeted social choice problem studied in economics. We frame the group recommendation problem as choosing a subgraph with the largest group consensus score in a completely connected graph defined over the item affinity matrix. We propose a fast greedy algorithm with strong theoretical guarantees, and show that the proposed algorithm compares favorably to the state-of-the-art group recommendation algorithms according to commonly used relevance and coverage performance measures on benchmark dataset.