Supervised Learning
Learning to Rank based on Analogical Reasoning
Fahandar, Mohsen Ahmadi, Hüllermeier, Eyke
Object ranking or "learning to rank" is an important problem in the realm of preference learning. On the basis of training data in the form of a set of rankings of objects represented as feature vectors, the goal is to learn a ranking function that predicts a linear order of any new set of objects. In this paper, we propose a new approach to object ranking based on principles of analogical reasoning. More specifically, our inference pattern is formalized in terms of so-called analogical proportions and can be summarized as follows: Given objects $A,B,C,D$, if object $A$ is known to be preferred to $B$, and $C$ relates to $D$ as $A$ relates to $B$, then $C$ is (supposedly) preferred to $D$. Our method applies this pattern as a main building block and combines it with ideas and techniques from instance-based learning and rank aggregation. Based on first experimental results for data sets from various domains (sports, education, tourism, etc.), we conclude that our approach is highly competitive. It appears to be specifically interesting in situations in which the objects are coming from different subdomains, and which hence require a kind of knowledge transfer.
On the ERM Principle with Networked Data
Wang, Yuanhong, Wang, Yuyi, Liu, Xingwu, Pu, Juhua
Networked data, in which every training example involves two objects and may share some common objects with others, is used in many machine learning tasks such as learning to rank and link prediction. A challenge of learning from networked examples is that target values are not known for some pairs of objects. In this case, neither the classical i.i.d.\ assumption nor techniques based on complete U-statistics can be used. Most existing theoretical results of this problem only deal with the classical empirical risk minimization (ERM) principle that always weights every example equally, but this strategy leads to unsatisfactory bounds. We consider general weighted ERM and show new universal risk bounds for this problem. These new bounds naturally define an optimization problem which leads to appropriate weights for networked examples. Though this optimization problem is not convex in general, we devise a new fully polynomial-time approximation scheme (FPTAS) to solve it.
Inductive Representation Learning in Large Attributed Graphs
Ahmed, Nesreen K., Rossi, Ryan A., Zhou, Rong, Lee, John Boaz, Kong, Xiangnan, Willke, Theodore L., Eldardiry, Hoda
Graphs (networks) are ubiquitous and allow us to model entities (nodes) and the dependencies (edges) between them. Learning a useful feature representation from graph data lies at the heart and success of many machine learning tasks such as classification, anomaly detection, link prediction, among many others. Many existing techniques use random walks as a basis for learning features or estimating the parameters of a graph model for a downstream prediction task. Examples include recent node embedding methods such as DeepWalk, node2vec, as well as graph-based deep learning algorithms. However, the simple random walk used by these methods is fundamentally tied to the identity of the node. This has three main disadvantages. First, these approaches are inherently transductive and do not generalize to unseen nodes and other graphs. Second, they are not space-efficient as a feature vector is learned for each node which is impractical for large graphs. Third, most of these approaches lack support for attributed graphs. To make these methods more generally applicable, we propose a framework for inductive network representation learning based on the notion of attributed random walk that is not tied to node identity and is instead based on learning a function $\Phi : \mathrm{\rm \bf x} \rightarrow w$ that maps a node attribute vector $\mathrm{\rm \bf x}$ to a type $w$. This framework serves as a basis for generalizing existing methods such as DeepWalk, node2vec, and many other previous methods that leverage traditional random walks.
Training large margin host-pathogen protein-protein interaction predictors
Basit, Abdul Hannan, Abbasi, Wajid Arshad, Asif, Amina, Minhas, Fayyaz Ul Amir Afsar
Detection of protein-protein interactions (PPIs) plays a vital role in molecular biology. Particularly, infections are caused by the interactions of host and pathogen proteins. It is important to identify host-pathogen interactions (HPIs) to discover new drugs to counter infectious diseases. Conventional wet lab PPI prediction techniques have limitations in terms of large scale application and budget. Hence, computational approaches are developed to predict PPIs. This study aims to develop large margin machine learning models to predict interspecies PPIs with a special interest in host-pathogen protein interactions (HPIs). Especially, we focus on seeking answers to three queries that arise while developing an HPI predictor. 1) How should we select negative samples? 2) What should be the size of negative samples as compared to the positive samples? 3) What type of margin violation penalty should be used to train the predictor? We compare two available methods for negative sampling. Moreover, we propose a new method of assigning weights to each training example in weighted SVM depending on the distance of the negative examples from the positive examples. We have also developed a web server for our HPI predictor called HoPItor (Host Pathogen Interaction predicTOR) that can predict interactions between human and viral proteins. This webserver can be accessed at the URL: http://faculty.pieas.edu.pk/fayyaz/software.html#HoPItor.
North Dakota Museum Property Rights Case Set to Trial
The case was considered in district court in 2014. The next year, the North Dakota Legislature rejected a bill that would have sided with the historical society and allowed the museum to stay on the fairgrounds. The case returned to district court in 2015, but the original judge recused himself at the end of last year.
Traversing Knowledge Graph in Vector Space without Symbolic Space Guidance
Shen, Yelong, Huang, Po-Sen, Chang, Ming-Wei, Gao, Jianfeng
Recent studies on knowledge base completion, the task of recovering missing facts based on observed facts, demonstrate the importance of learning embeddings from multi-step relations. Due to the size of knowledge bases, previous works manually design relation paths of observed triplets in symbolic space (e.g. random walk) to learn multi-step relations during training. However, these approaches suffer some limitations as most paths are not informative, and it is prohibitively expensive to consider all possible paths. To address the limitations, we propose learning to traverse in vector space directly without the need of symbolic space guidance. To remember the connections between related observed triplets and be able to adaptively change relation paths in vector space, we propose Implicit ReasoNets (IRNs), that is composed of a global memory and a controller module to learn multi-step relation paths in vector space and infer missing facts jointly without any human-designed procedure. Without using any axillary information, our proposed model achieves state-of-the-art results on popular knowledge base completion benchmarks.
Entity Embeddings with Conceptual Subspaces as a Basis for Plausible Reasoning
Jameel, Shoaib, Schockaert, Steven
Conceptual spaces are geometric representations of conceptual knowledge, in which entities correspond to points, natural properties correspond to convex regions, and the dimensions of the space correspond to salient features. While conceptual spaces enable elegant models of various cognitive phenomena, the lack of automated methods for constructing such representations have so far limited their application in artificial intelligence. To address this issue, we propose a method which learns a vector-space embedding of entities from Wikipedia and constrains this embedding such that entities of the same semantic type are located in some lower-dimensional subspace. We experimentally demonstrate the usefulness of these subspaces as (approximate) conceptual space representations by showing, among others, that important features can be modelled as directions and that natural properties tend to correspond to convex regions.
Classification on Large Networks: A Quantitative Bound via Motifs and Graphons
Haupt, Andreas, Khatami, Mohammad, Schultz, Thomas, Tran, Ngoc Mai
When each data point is a large graph, graph statistics such as densities of certain subgraphs (motifs) can be used as feature vectors for machine learning. While intuitive, motif counts are expensive to compute and difficult to work with theoretically. Via graphon theory, we give an explicit quantitative bound for the ability of motif homomorphisms to distinguish large networks under both generative and sampling noise. Furthermore, we give similar bounds for the graph spectrum and connect it to homomorphism densities of cycles. This results in an easily computable classifier on graph data with theoretical performance guarantee. Our method yields competitive results on classification tasks for the autoimmune disease Lupus Erythematosus.
Elliptical modeling and pattern analysis for perturbation models and classfication
Suthaharan, Shan, Shen, Weining
The characteristics (or numerical patterns) of a feature vector in the transform domain of a perturbation model differ significantly from those of its corresponding feature vector in the input domain. These differences - caused by the perturbation techniques used for the transformation of feature patterns - degrade the performance of machine learning techniques in the transform domain. In this paper, we proposed a nonlinear parametric perturbation model that transforms the input feature patterns to a set of elliptical patterns, and studied the performance degradation issues associated with random forest classification technique using both the input and transform domain features. Compared with the linear transformation such as Principal Component Analysis (PCA), the proposed method requires less statistical assumptions and is highly suitable for the applications such as data privacy and security due to the difficulty of inverting the elliptical patterns from the transform domain to the input domain. In addition, we adopted a flexible block-wise dimensionality reduction step in the proposed method to accommodate the possible high-dimensional data in modern applications. We evaluated the empirical performance of the proposed method on a network intrusion data set and a biological data set, and compared the results with PCA in terms of classification performance and data privacy protection (measured by the blind source separation attack and signal interference ratio). Both results confirmed the superior performance of the proposed elliptical transformation. 1 1. INTRODUCTION Feature vectors carry useful numerical patterns that characterize the original domain (or a sub original domain - input domain) formed by the feature vectors themselves. Machine learning algorithms generally utilize these patterns to generate classifiers, that can help make decisions from data, by using supervised or unsupervised learning techniques (Suthaharan, 2015).
Deep Feature Learning for Graphs
Rossi, Ryan A., Zhou, Rong, Ahmed, Nesreen K.
This paper presents a general graph representation learning framework called DeepGL for learning deep node and edge representations from large (attributed) graphs. In particular, DeepGL begins by deriving a set of base features (e.g., graphlet features) and automatically learns a multi-layered hierarchical graph representation where each successive layer leverages the output from the previous layer to learn features of a higher-order. Contrary to previous work, DeepGL learns relational functions (each representing a feature) that generalize across-networks and therefore useful for graph-based transfer learning tasks. Moreover, DeepGL naturally supports attributed graphs, learns interpretable features, and is space-efficient (by learning sparse feature vectors). In addition, DeepGL is expressive, flexible with many interchangeable components, efficient with a time complexity of $\mathcal{O}(|E|)$, and scalable for large networks via an efficient parallel implementation. Compared with the state-of-the-art method, DeepGL is (1) effective for across-network transfer learning tasks and attributed graph representation learning, (2) space-efficient requiring up to 6x less memory, (3) fast with up to 182x speedup in runtime performance, and (4) accurate with an average improvement of 20% or more on many learning tasks.