Statistical Learning
Cross-Modal Similarity Learning via Pairs, Preferences, and Active Supervision
Zhen, Yi (Georgia Institute of Technology) | Rai, Piyush (Duke University) | Zha, Hongyuan (Georgia Institute of Technology) | Carin, Lawrence (Duke University)
We present a probabilistic framework for learning pairwise similarities between objects belonging to different modalities, such as drugs and proteins, or text and images. Our framework is based on learning a binary code based representation for objects in each modality, and has the following key properties: (i) it can leverage both pairwise as well as easy-to-obtain relative preference based cross-modal constraints, (ii) the probabilistic framework naturally allows querying for the most useful/informative constraints, facilitating an active learning setting (existing methods for cross-modal similarity learning do not have such a mechanism), and (iii) the binary code length is learned from the data. We demonstrate the effectiveness of the proposed approach on two problems that require computing pairwise similarities between cross-modal object pairs: cross-modal link prediction in bipartite graphs, and hashing based cross-modal similarity search.
Active Manifold Learning via Gershgorin Circle Guided Sample Selection
Xu, Hongteng (Georgia Institute of Technology) | Zha, Hongyuan (Georgia Institute of Technology and East China Normal University) | Li, Ren-Cang (University of Texas at Arlington) | Davenport, Mark A. (Georgia Institute of Technology)
In this paper, we propose an interpretation of active learning from a pure algebraic view and combine it with semi-supervised manifold learning. The proposed active manifold learning algorithm aims to learn the low-dimensional parameter space of the manifold with high accuracy from smartly labeled samples. We demonstrate that this problem is equivalent to a condition number minimization problem of the alignment matrix. Focusing on this problem, we first give a theoretical upper bound for the solution. Then we develop a heuristic but effective sample selection algorithm with the help of the Gershgorin circle theorem. We investigate the rationality, the feasibility, the universality and the complexity of the proposed method and demonstrate that our method yields encouraging active learning results.
Clustering Longitudinal Clinical Marker Trajectories from Electronic Health Data: Applications to Phenotyping and Endotype Discovery
Schulam, Peter (Johns Hopkins University) | Wigley, Fredrick (Johns Hopkins School of Medicine) | Saria, Suchi (Johns Hopkins University)
Diseases such as autism, cardiovascular disease, and the autoimmune disorders are difficult to treat because of the remarkable degree of variation among affected individuals. Subtyping research seeks to refine the definition of such complex, multi-organ diseases by identifying homogeneous patient subgroups. In this paper, we propose the Probabilistic Subtyping Model (PSM) to identify subgroups based on clustering individual clinical severity markers. This task is challenging due to the presence of nuisance variability โ variations in measurements that are not due to disease subtype โ which, if not accounted for, generate biased estimates for the group-level trajectories. Measurement sparsity and irregular sampling patterns pose additional challenges in clustering such data. PSM uses a hierarchical model to account for these different sources of variability. Our experiments demonstrate that by accounting for nuisance variability, PSM is able to more accurately model the marker data. We also discuss novel subtypes discovered using PSM and the resulting clinical hypotheses that are now the subject of follow up clinical experiments.
Obtaining Well Calibrated Probabilities Using Bayesian Binning
Naeini, Mahdi Pakdaman (University of Pittsburgh) | Cooper, Gregory (University of Pittsburgh) | Hauskrecht, Milos (University of Pittsburgh)
However, model calibration and the learning is critical for many prediction and decision-making of well-calibrated probabilistic models have not been tasks in artificial intelligence. In this paper we present a new studied in the machine learning literature as extensively as nonparametric calibration method called Bayesian Binning for example discriminative machine learning models that into Quantiles (BBQ) which addresses key limitations of existing are built to achieve the best possible discrimination among calibration methods. The method post processes the classes of objects. One way to achieve a high level of model output of a binary classification algorithm; thus, it can be calibration is to develop methods for learning probabilistic readily combined with many existing classification algorithms.
Outlier-Robust Convex Segmentation
Katz, Itamar (Technion Israel Institute of Technology) | Crammer, Koby (Technion Israel Institute of Technology)
We derive a convex optimization problem for the task of segmenting sequential data, which explicitly treats presence of outliers. We describe two algorithms for solving this problem, one exact and one a top-down novel approach, and we derive a consistency results for the case of two segments and no outliers. Robustness to outliers is evaluated on two real-world tasks related to speech segmentation. Our algorithms outperform baseline segmentation algorithms.
Spectral Clustering Using Multilinear SVD: Analysis, Approximations and Applications
Ghoshdastidar, Debarghya (Indian Institute of Science, Bangalore) | Dukkipati, Ambedkar (Indian Institute of Science, Bangalore)
Spectral clustering, a graph partitioning technique, has gained immense popularity in machine learning in the context of unsupervised learning. This is due to convincing empirical studies, elegant approaches involved and the theoretical guarantees provided in the literature. To tackle some challenging problems that arose in computer vision etc., recently, a need to develop spectral methods that incorporate multi-way similarity measures surfaced. This, in turn, leads to a hypergraph partitioning problem. In this paper, we formulate a criterion for partitioning uniform hypergraphs, and show that a relaxation of this problem is related to the multilinear singular value decomposition (SVD) of symmetric tensors. Using this, we provide a spectral technique for clustering based on higher order affinities, and derive a theoretical bound on the error incurred by this method. We also study the complexity of the algorithm and use Nystr ฬomโs method and column sampling techniques to develop approximate methods with significantly reduced complexity. Experiments on geometric grouping and motion segmentation demonstrate the practical significance of the proposed methods.
Optimizing Bag Features for Multiple-Instance Retrieval
Fu, Zhouyu (University of Western Sydney, Kingswood) | Pan, Feifei (New York Institute of Technology) | Deng, Cheng (Xidian University) | Liu, Wei (IBM T. J. Watson Research Center)
Multiple-Instance (MI) learning is an important supervised learning technique which deals with collections of instances called bags. While existing research in MI learning mainly focused on classification, in this paper we propose a new approach for MI retrieval to enable effective similarity retrieval of bags of instances, where training data is presented in the form of similar and dissimilar bag pairs. An embedded scheme is devised as encoding each bag into a single bag feature vector by exploiting a similarity-based transformation. In this way, the original MI problem is converted into a single-instance version. Furthermore, we develop a principled approach for optimizing bag features specific to similarity retrieval through leveraging pairwise label information at the bag level. The experimental results demonstrate the effectiveness of the proposed approach in comparison with the alternatives for MI retrieval.
Modelling Class Noise with Symmetric and Asymmetric Distributions
Du, Jun (China University of Geosciences) | Cai, Zhihua (China University of Geosciences)
In classification problem, we assume that the samples around the class boundary are more likely to be incorrectly annotated than others, and propose boundary-conditional class noise (BCN). Based on the BCN assumption, we use unnormalized Gaussian and Laplace distributions to directly model how class noise is generated, in symmetric and asymmetric cases. In addition, we demonstrate that Logistic regression and Probit regression can also be reinterpreted from this class noise perspective, and compare them with the proposed models. The empirical study shows that, the proposed asymmetric models overall outperform the benchmark linear models, and the asymmetric Laplace-noise model achieves the best performance among all.
An Adaptive Gradient Method for Online AUC Maximization
Ding, Yi (Nanyang Technological University) | Zhao, Peilin (Institute for Infocomm Research) | Hoi, Steven C. H. (Singapore Management University) | Ong, Yew-Soon (Nanyang Technological University)
Learning for maximizing AUC performance is an important research problem in machine learning. Unlike traditional batch learning methods for maximizing AUC which often suffer from poor scalability, recent years have witnessed some emerging studies that attempt to maximize AUC by single-pass online learning approaches. Despite their encouraging results reported, the existing online AUC maximization algorithms often adopt simple stochastic gradient descent approaches, which fail to exploit the geometry knowledge of the data observed in the online learning process, and thus could suffer from relatively slow convergence. To overcome the limitation of the existing studies, in this paper, we propose a novel algorithm of Adaptive Online AUC Maximization (AdaOAM), by applying an adaptive gradient method for exploiting the knowledge of historical gradients to perform more informative online learning. The new adaptive updating strategy by AdaOAM is less sensitive to parameter settings due to its natural effect of tuning the learning rate. In addition, the time complexity of the new algorithm remains the same as the previous non-adaptive algorithms. To demonstrate the effectiveness of the proposed algorithm, we analyze its theoretical bound, and further evaluate its empirical performance on both public benchmark datasets and anomaly detection datasets. The encouraging empirical results clearly show the effectiveness and efficiency of the proposed algorithm.
The Utility of Text: The Case of Amicus Briefs and the Supreme Court
Sim, Yanchuan (Language Technologies Institute) | Routledge, Bryan R (Carnegie Mellon University) | Smith, Noah A (Carnegie Mellon University)
We explore the idea that authoring a piece of text is an act of maximizing one's expected utility.To make this idea concrete, we consider the societally important decisions of the Supreme Court of the United States.Extensive past work in quantitative political science provides a framework for empirically modeling the decisions of justices and how they relate to text.We incorporate into such a model texts authored by amici curiae (``friends of the court'' separate from the litigants) who seek to weigh in on the decision, then explicitly model their goals in a random utility model.We demonstrate the benefits of this approach in improved vote prediction and the ability to perform counterfactual analysis.