AITopics | Radivojac, Predrag

Plotting

Radivojac, Predrag

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Leveraging Structure for Improved Classification of Grouped Biased Data

Zeiberg, Daniel, Jain, Shantanu, Radivojac, Predrag

arXiv.org Artificial IntelligenceDec-7-2022

We consider semi-supervised binary classification for applications in which data points are naturally grouped (e.g., survey responses grouped by state) and the labeled data is biased (e.g., survey respondents are not representative of the population). The groups overlap in the feature space and consequently the input-output patterns are related across the groups. To model the inherent structure in such data, we assume the partition-projected class-conditional invariance across groups, defined in terms of the group-agnostic feature space. We demonstrate that under this assumption, the group carries additional information about the class, over the group-agnostic features, with provably improved area under the ROC curve. Further assuming invariance of partition-projected class-conditional distributions across both labeled and unlabeled data, we derive a semi-supervised algorithm that explicitly leverages the structure to learn an optimal, group-aware, probability-calibrated classifier, despite the bias in the labeled data. Experiments on synthetic and real data demonstrate the efficacy of our algorithm over suitable baselines and ablative models, spanning standard supervised and semi-supervised learning approaches, with and without incorporating the group directly as a feature.

artificial intelligence, machine learning, null, (18 more...)

arXiv.org Artificial Intelligence

2212.03697

Country: North America > United States (0.67)

Genre:

Questionnaire & Opinion Survey (0.54)
Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Explaining Deep Tractable Probabilistic Models: The sum-product network case

Karanam, Athresh, Mathur, Saurabh, Radivojac, Predrag, Haas, David M., Kersting, Kristian, Natarajan, Sriraam

arXiv.org Artificial IntelligenceSep-21-2022

We consider the problem of explaining a class of tractable deep probabilistic models, the Sum-Product Networks (SPNs) and present an algorithm EX SPN to generate explanations. To this effect, we define the notion of a context-specific independence tree(CSI-tree) and present an iterative algorithm that converts an SPN to a CSI-tree. The resulting CSI-tree is both interpretable and explainable to the domain expert. We achieve this by extracting the conditional independencies encoded by the SPN and approximating the local context specified by the structure of the SPN. Our extensive empirical evaluations on synthetic, standard, and real-world clinical data sets demonstrate that the CSI-tree exhibits superior explainability.

artificial intelligence, machine learning, node, (17 more...)

arXiv.org Artificial Intelligence

2110.09778

Genre: Research Report > Experimental Study (0.48)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.94)

Add feedback

Classification in biological networks with hypergraphlet kernels

Lugo-Martinez, Jose, Radivojac, Predrag

arXiv.org Machine LearningMar-14-2017

Biological and cellular systems are often modeled as graphs in which vertices represent objects of interest (genes, proteins, drugs) and edges represent relational ties among these objects (binds-to, interacts-with, regulates). This approach has been highly successful owing to the theory, methodology and software that support analysis and learning on graphs. Graphs, however, often suffer from information loss when modeling physical systems due to their inability to accurately represent multiobject relationships. Hypergraphs, a generalization of graphs, provide a framework to mitigate information loss and unify disparate graph-based methodologies. In this paper, we present a hypergraph-based approach for modeling physical systems and formulate vertex classification, edge classification and link prediction problems on (hyper)graphs as instances of vertex classification on (extended, dual) hypergraphs in a semi-supervised setting. We introduce a novel kernel method on vertex- and edge-labeled (colored) hypergraphs for analysis and learning. The method is based on exact and inexact (via hypergraph edit distances) enumeration of small simple hypergraphs, referred to as hypergraphlets, rooted at a vertex of interest. We extensively evaluate this method and show its potential use in a positive-unlabeled setting to estimate the number of missing and false positive links in protein-protein interaction networks.

artificial intelligence, health & medicine, hypergraph, (15 more...)

arXiv.org Machine Learning

1703.04823

Country: North America > United States > Indiana (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.88)

Add feedback

Recovering True Classifier Performance in Positive-Unlabeled Learning

Jain, Shantanu (Indiana University) | White, Martha (Indiana University) | Radivojac, Predrag (Indiana University)

AAAI ConferencesFeb-14-2017

A common approach in positive-unlabeled learning is to train a classification model between labeled and unlabeled data. This strategy is in fact known to give an optimal classifier under mild conditions; however, it results in biased empirical estimates of the classifier performance. In this work, we show that the typically used performance measures such as the receiver operating characteristic curve, or the precision recall curve obtained on such data can be corrected with the knowledge of class priors; i.e., the proportions of the positive and negative examples in the unlabeled data. We extend the results to a noisy setting where some of the examples labeled positive are in fact negative and show that the correction also requires the knowledge of the proportion of noisy examples in the labeled positives. Using state-of-the-art algorithms to estimate the positive class prior and the proportion of noise, we experimentally evaluate two correction approaches and demonstrate their efficacy on real-life data.

artificial intelligence, classifier, machine learning, (16 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States > Indiana (0.14)

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Recovering True Classifier Performance in Positive-Unlabeled Learning

Jain, Shantanu, White, Martha, Radivojac, Predrag

arXiv.org Machine LearningFeb-1-2017

A common approach in positive-unlabeled learning is to train a classification model between labeled and unlabeled data. This strategy is in fact known to give an optimal classifier under mild conditions; however, it results in biased empirical estimates of the classifier performance. In this work, we show that the typically used performance measures such as the receiver operating characteristic curve, or the precision-recall curve obtained on such data can be corrected with the knowledge of class priors; i.e., the proportions of the positive and negative examples in the unlabeled data. We extend the results to a noisy setting where some of the examples labeled positive are in fact negative and show that the correction also requires the knowledge of the proportion of noisy examples in the labeled positives. Using state-of-the-art algorithms to estimate the positive class prior and the proportion of noise, we experimentally evaluate two correction approaches and demonstrate their efficacy on real-life data.

artificial intelligence, classifier, health & medicine, (17 more...)

arXiv.org Machine Learning

1702.00518

Country: North America > United States > Indiana (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Estimating the class prior and posterior from noisy positives and unlabeled data

Jain, Shantanu, White, Martha, Radivojac, Predrag

arXiv.org Machine LearningJan-31-2017

We develop a classification algorithm for estimating posterior distributions from positive-unlabeled data, that is robust to noise in the positive labels and effective for high-dimensional data. In recent years, several algorithms have been proposed to learn from positive-unlabeled data; however, many of these contributions remain theoretical, performing poorly on real high-dimensional data that is typically contaminated with noise. We build on this previous work to develop two practical classification algorithms that explicitly model the noise in the positive labels and utilize univariate transforms built on discriminative classifiers. We prove that these univariate transforms preserve the class prior, enabling estimation in the univariate space and avoiding kernel density estimation for high-dimensional data. The theoretical development and both parametric and nonparametric algorithms proposed here constitutes an important step towards wide-spread use of robust classification algorithms for positive-unlabeled data.

algorithm, artificial intelligence, health & medicine, (18 more...)

arXiv.org Machine Learning

1606.08561

Country: North America > United States > Indiana (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Estimating the class prior and posterior from noisy positives and unlabeled data

Jain, Shantanu, White, Martha, Radivojac, Predrag

Neural Information Processing SystemsDec-31-2016

We develop a classification algorithm for estimating posterior distributions from positive-unlabeled data, that is robust to noise in the positive labels and effective for high-dimensional data. In recent years, several algorithms have been proposed to learn from positive-unlabeled data; however, many of these contributions remain theoretical, performing poorly on real high-dimensional data that is typically contaminated with noise. We build on this previous work to develop two practical classification algorithms that explicitly model the noise in the positive labels and utilize univariate transforms built on discriminative classifiers. We prove that these univariate transforms preserve the class prior, enabling estimation in the univariate space and avoiding kernel density estimation for high-dimensional data. The theoretical development and parametric and nonparametric algorithms proposed here constitute an important step towards wide-spread use of robust classification algorithms for positive-unlabeled data.

algorithm, artificial intelligence, health & medicine, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Indiana (0.14)

Genre: Research Report (0.68)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Ultra High-Dimensional Nonlinear Feature Selection for Big Biological Data

Yamada, Makoto, Tang, Jiliang, Lugo-Martinez, Jose, Hodzic, Ermin, Shrestha, Raunak, Saha, Avishek, Ouyang, Hua, Yin, Dawei, Mamitsuka, Hiroshi, Sahinalp, Cenk, Radivojac, Predrag, Menczer, Filippo, Chang, Yi

arXiv.org Machine LearningAug-13-2016

Machine learning methods are used to discover complex nonlinear relationships in biological and medical data. However, sophisticated learning models are computationally unfeasible for data with millions of features. Here we introduce the first feature selection method for nonlinear learning problems that can scale up to large, ultra-high dimensional biological data. More specifically, we scale up the novel Hilbert-Schmidt Independence Criterion Lasso (HSIC Lasso) to handle millions of features with tens of thousand samples. The proposed method is guaranteed to find an optimal subset of maximally predictive features with minimal redundancy, yielding higher predictive power and improved interpretability. Its effectiveness is demonstrated through applications to classify phenotypes based on module expression in human prostate cancer patients and to detect enzymes among protein structures. We achieve high accuracy with as few as 20 out of one million features --- a dimensionality reduction of 99.998%. Our algorithm can be implemented on commodity cloud computing platforms. The dramatic reduction of features may lead to the ubiquitous deployment of sophisticated prediction models in mobile health care applications.

feature selection, health & medicine, oncology, (18 more...)

arXiv.org Machine Learning

1608.04048

Country:

North America > United States > Indiana (0.14)
North America > United States > California (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

New metrics for learning and inference on sets, ontologies, and functions

Yang, Ruiyu, Jiang, Yuxiang, Hahn, Matthew W., Housworth, Elizabeth A., Radivojac, Predrag

arXiv.org Machine LearningMar-24-2016

We propose new metrics on sets, ontologies, and functions that can be used in various stages of probabilistic modeling, including exploratory data analysis, learning, inference, and result interpretation. These new functions unify and generalize some of the popular metrics on sets and functions, such as the Jaccard and bag distances on sets and Marczewski-Steinhaus distance on functions. We then introduce information-theoretic metrics on directed acyclic graphs drawn independently according to a fixed probability distribution and show how they can be used to calculate similarity between class labels for the objects with hierarchical output spaces (e.g., protein function). Finally, we provide evidence that the proposed metrics are useful by clustering species based solely on functional annotations available for subsets of their genes. The functional trees resemble evolutionary trees obtained by the phylogenetic analysis of their genomes.

annotation, artificial intelligence, health & medicine, (19 more...)

arXiv.org Machine Learning

1603.06846

Country: North America > United States > Indiana (0.15)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.90)

Add feedback

Nonparametric semi-supervised learning of class proportions

Jain, Shantanu, White, Martha, Trosset, Michael W., Radivojac, Predrag

arXiv.org Machine LearningJan-8-2016

The problem of developing binary classifiers from positive and unlabeled data is often encountered in machine learning. A common requirement in this setting is to approximate posterior probabilities of positive and negative classes for a previously unseen data point. This problem can be decomposed into two steps: (i) the development of accurate predictors that discriminate between positive and unlabeled data, and (ii) the accurate estimation of the prior probabilities of positive and negative examples. In this work we primarily focus on the latter subproblem. We study nonparametric class prior estimation and formulate this problem as an estimation of mixing proportions in two-component mixture models, given a sample from one of the components and another sample from the mixture itself. We show that estimation of mixing proportions is generally ill-defined and propose a canonical form to obtain identifiability while maintaining the flexibility to model any distribution. We use insights from this theory to elucidate the optimization surface of the class priors and propose an algorithm for estimating them. To address the problems of high-dimensional density estimation, we provide practical transformations to low-dimensional spaces that preserve class priors. Finally, we demonstrate the efficacy of our method on univariate and multivariate data.

artificial intelligence, bayesian inference, proportion, (17 more...)

arXiv.org Machine Learning

1601.01944

Country: North America > United States > Indiana (0.14)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.71)
(2 more...)

Add feedback