AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

Redefining Cancer Treatment- The Memorial Sloan Way

#artificialintelligenceSep-16-2021, 15:22:27 GMT

Whenever a patient has symptoms of cancer, the cancer tumour is taken out and sequenced. Genetic information in the tumor cell is stored in the form of DNA. It is then transcribed to form RNA which is then translated to form proteins/amino acids. In case of a mutation, or a mistake in DNA sequence, the resultant amino acid is affected giving rise to a variation for the particular gene. Thousands of genetic mutations may be present in the sequence. We need to distinguish the malignant mutations (drivers leading to tumour growth) from the benign (passenger) ones.

gene feature, probability, text feature, (17 more...)

#artificialintelligence

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.33)

Add feedback

Multimodal Data Fusion in High-Dimensional Heterogeneous Datasets via Generative Models

Yilmaz, Yasin, Aktukmak, Mehmet, Hero, Alfred O.

arXiv.org Machine LearningSep-16-2021

The commonly used latent space embedding techniques, such as Principal Component Analysis, Factor Analysis, and manifold learning techniques, are typically used for learning effective representations of homogeneous data. However, they do not readily extend to heterogeneous data that are a combination of numerical and categorical variables, e.g., arising from linked GPS and text data. In this paper, we are interested in learning probabilistic generative models from high-dimensional heterogeneous data in an unsupervised fashion. The learned generative model provides latent unified representations that capture the factors common to the multiple dimensions of the data, and thus enable fusing multimodal data for various machine learning tasks. Following a Bayesian approach, we propose a general framework that combines disparate data types through the natural parameterization of the exponential family of distributions. To scale the model inference to millions of instances with thousands of features, we use the Laplace-Bernstein approximation for posterior computations involving nonlinear link functions. The proposed algorithm is presented in detail for the commonly encountered heterogeneous datasets with real-valued (Gaussian) and categorical (multinomial) features. Experiments on two high-dimensional and heterogeneous datasets (NYC Taxi and MovieLens-10M) demonstrate the scalability and competitive performance of the proposed algorithm on different machine learning tasks such as anomaly detection, data imputation, and recommender systems.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

2108.12445

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
(10 more...)

Genre:

Research Report (1.00)
Personal (0.93)

Industry:

Education (1.00)
Health & Medicine > Health Care Technology (0.67)
Transportation > Passenger (0.46)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.81)
(3 more...)

Add feedback

Non-parametric Semi-Supervised Learning in Many-body Hilbert Space with Rescaled Logarithmic Fidelity

Li, Wei-Ming, Ran, Shi-Ju

arXiv.org Artificial IntelligenceSep-16-2021

In quantum and quantum-inspired machine learning, the very first step is to embed the data in quantum space known as Hilbert space. Developing quantum kernel function (QKF), which defines the distances among the samples in the Hilbert space, belongs to the fundamental topics for machine learning. In this work, we propose the rescaled logarithmic fidelity (RLF) and non-parametric semi-supervised learning in the quantum space, which we name as RLF-NSSL. The rescaling takes advantage of the non-linearity of the kernel to tune the mutual distances of samples in the Hilbert space, and meanwhile avoids the exponentially-small fidelities between quantum many-qubit states. Being non-parametric excludes the possible effects from the variational parameters, and evidently demonstrates the advantages from the space itself. We compare RLF-NSSL with several well-known non-parametric algorithms including naive Bayes classifiers, k-nearest neighbors, and spectral clustering. Our method exhibits better accuracy particularly for the unsupervised case with no labeled samples and the few-shot cases with small numbers of labeled samples. With the visualizations by t-stochastic neighbor embedding, our results imply that the machine learning in the Hilbert space complies with the principles of maximal coding rate reduction, where the low-dimensional data exhibit within-class compressibility, between-class discrimination, and overall diversity. Our proposals can be applied to other quantum and quantum-inspired machine learning, including the methods using the parametric models such as tensor networks, quantum circuits, and quantum neural networks.

accuracy, fidelity, learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.3390/math10060940

2107.00195

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(3 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

Learning logic programs through divide, constrain, and conquer

Cropper, Andrew

arXiv.org Artificial IntelligenceSep-16-2021

We introduce an inductive logic programming approach that combines classical divide-and-conquer search with modern constraint-driven search. Our anytime approach can learn optimal, recursive, and large programs and supports predicate invention. Our experiments on three domains (classification, inductive general game playing, and program synthesis) show that our approach can increase predictive accuracies and reduce learning times.

accuracy, constraint, hypothesis, (14 more...)

arXiv.org Artificial Intelligence

2109.07818

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.83)

Industry: Leisure & Entertainment > Games (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.36)

Add feedback

A Comparative Study of Machine Learning Methods for Predicting the Evolution of Brain Connectivity from a Baseline Timepoint

Aktı, Şeymanur, Kamar, Doğay, Özlü, Özgür Anıl, Soydemir, Ihsan, Akcan, Muhammet, Kul, Abdullah, Rekik, Islem

arXiv.org Artificial IntelligenceSep-16-2021

Predicting the evolution of the brain network, also called connectome, by foreseeing changes in the connectivity weights linking pairs of anatomical regions makes it possible to spot connectivity-related neurological disorders in earlier stages and detect the development of potential connectomic anomalies. Remarkably, such a challenging prediction problem remains least explored in the predictive connectomics literature. It is a known fact that machine learning (ML) methods have proven their predictive abilities in a wide variety of computer vision problems. However, ML techniques specifically tailored for the prediction of brain connectivity evolution trajectory from a single timepoint are almost absent. To fill this gap, we organized a Kaggle competition where 20 competing teams designed advanced machine learning pipelines for predicting the brain connectivity evolution from a single timepoint. The competing teams developed their ML pipelines with a combination of data pre-processing, dimensionality reduction, and learning methods. Utilizing an inclusive evaluation approach, we ranked the methods based on two complementary evaluation metrics (mean absolute error (MAE) and Pearson Correlation Coefficient (PCC)) and their performances using different training and testing data perturbation strategies (single random split and cross-validation). The final rank was calculated using the rank product for each competing team across all evaluation measures and validation strategies. In support of open science, the developed 20 ML pipelines along with the connectomic dataset are made available on GitHub. The outcomes of this competition are anticipated to lead to the further development of predictive models that can foresee the evolution of brain connectivity over time, as well as other types of networks (e.g., genetic networks).

competition, pipeline, regressor, (14 more...)

arXiv.org Artificial Intelligence

2109.07739

Country:

Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)

Genre: Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Beyond Average Performance -- exploring regions of deviating performance for black box classification models

Torgo, Luis, Azevedo, Paulo, Areosa, Ines

arXiv.org Machine LearningSep-16-2021

Machine learning models are becoming increasingly popular in different types of settings. This is mainly caused by their ability to achieve a level of predictive performance that is hard to match by human experts in this new era of big data. With this usage growth comes an increase of the requirements for accountability and understanding of the models' predictions. However, the degree of sophistication of the most successful models (e.g. ensembles, deep learning) is becoming a large obstacle to this endeavour as these models are essentially black boxes. In this paper we describe two general approaches that can be used to provide interpretable descriptions of the expected performance of any black box classification model. These approaches are of high practical relevance as they provide means to uncover and describe in an interpretable way situations where the models are expected to have a performance that deviates significantly from their average behaviour. This may be of critical relevance for applications where costly decisions are driven by the predictions of the models, as it can be used to warn end users against the usage of the models in some specific cases.

ant sup 0, pvalue 0, subgroup, (15 more...)

arXiv.org Machine Learning

2109.08216

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
Europe > Portugal > Porto > Porto (0.04)
(5 more...)

Genre: Research Report (0.82)

Industry:

Transportation > Air (0.81)
Health & Medicine > Therapeutic Area (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Tuna-AI: tuna biomass estimation with Machine Learning models trained on oceanography and echosounder FAD data

Precioso, Daniel, Navarro-García, Manuel, Gavira-O'Neill, Kathryn, Torres-Barrán, Alberto, Gordo, David, Gallego-Alcalá, Victor, Gómez-Ullate, David

arXiv.org Machine LearningSep-15-2021

Echo-sounder data registered by buoys attached to drifting FADs provide a very valuable source of information on populations of tuna and their behaviour. This value increases when these data are supplemented with oceanographic data coming from CMEMS. We use these sources to develop Tuna-AI, a Machine Learning model aimed at predicting tuna biomass under a given buoy, which uses a 3-day window of echo-sounder data to capture the daily spatio-temporal patterns characteristic of tuna schools. As the supervised signal for training, we employ more than 5000 set events with their corresponding tuna catch reported by the AGAC tuna purse seine fleet.

biomass, biomass estimation, buoy, (13 more...)

arXiv.org Machine Learning

2109.06732

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
Europe > Spain > Galicia > Madrid (0.05)
Indian Ocean (0.05)
(4 more...)

Genre: Research Report > New Finding (0.68)

Industry: Food & Agriculture > Fishing (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)

Add feedback

Fake News Detection Using Machine Learning Ensemble Methods

#artificialintelligenceSep-14-2021, 03:10:13 GMT

The advent of the World Wide Web and the rapid adoption of social media platforms (such as Facebook and Twitter) paved the way for information dissemination that has never been witnessed in the human history before. With the current usage of social media platforms, consumers are creating and sharing more information than ever before, some of which are misleading with no relevance to reality. Automated classification of a text article as misinformation or disinformation is a challenging task. Even an expert in a particular domain has to explore multiple aspects before giving a verdict on the truthfulness of an article. In this work, we propose to use machine learning ensemble approach for automated classification of news articles. Our study explores different textual properties that can be used to distinguish fake contents from real. By using those properties, we train a combination of different machine learning algorithms using various ensemble methods and evaluate their performance on 4 real world datasets. Experimental evaluation confirms the superior performance of our proposed ensemble learner approach in comparison to individual learners. The advent of the World Wide Web and the rapid adoption of social media platforms (such as Facebook and Twitter) paved the way for information dissemination that has never been witnessed in the human history before. Besides other use cases, news outlets benefitted from the widespread use of social media platforms by providing updated news in near real time to its subscribers. The news media evolved from newspapers, tabloids, and magazines to a digital form such as online news platforms, blogs, social media feeds, and other digital media formats [1]. It became easier for consumers to acquire the latest news at their fingertips.

accuracy, algorithm, dataset, (13 more...)

#artificialintelligence

Genre: Research Report > New Finding (0.30)

Industry: Media > News (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.71)
(2 more...)

Add feedback

Semantic Answer Type Prediction using BERT: IAI at the ISWC SMART Task 2020

Setty, Vinay, Balog, Krisztian

arXiv.org Artificial IntelligenceSep-14-2021

A particular question we are interested in answering is how well neural methods, and specifically transformer models, such as BERT, perform on the answer type prediction task compared to traditional approaches. Our main finding is that coarse-grained answer types can be identified effectively with standard text classification methods, with over 95% accuracy, and BERT can bring only marginal improvements. For fine-grained type detection, on the other hand, BERT clearly outperforms previous retrieval-based approaches.

category classification, dbo, prediction, (10 more...)

arXiv.org Artificial Intelligence

2109.06714

Country:

Europe > Norway > Western Norway > Rogaland > Stavanger (0.04)
Asia > Azerbaijan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Add feedback

Targeted Cross-Validation

Zhang, Jiawei, Ding, Jie, Yang, Yuhong

arXiv.org Machine LearningSep-14-2021

In many applications, we have access to the complete dataset but are only interested in the prediction of a particular region of predictor variables. A standard approach is to find the globally best modeling method from a set of candidate methods. However, it is perhaps rare in reality that one candidate method is uniformly better than the others. A natural approach for this scenario is to apply a weighted $L_2$ loss in performance assessment to reflect the region-specific interest. We propose a targeted cross-validation (TCV) to select models or procedures based on a general weighted $L_2$ loss. We show that the TCV is consistent in selecting the best performing candidate under the weighted $L_2$ loss. Experimental studies are used to demonstrate the use of TCV and its potential advantage over the global CV or the approach of using only local data for modeling a local region. Previous investigations on CV have relied on the condition that when the sample size is large enough, the ranking of two candidates stays the same. However, in many applications with the setup of changing data-generating processes or highly adaptive modeling methods, the relative performance of the methods is not static as the sample size varies. Even with a fixed data-generating process, it is possible that the ranking of two methods switches infinitely many times. In this work, we broaden the concept of the selection consistency by allowing the best candidate to switch as the sample size varies, and then establish the consistency of the TCV. This flexible framework can be applied to high-dimensional and complex machine learning scenarios where the relative performances of modeling procedures are dynamic.

candidate method, procedure, tcv, (17 more...)

arXiv.org Machine Learning

2109.06949

Country: North America > United States > Minnesota (0.04)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.66)

Industry: Banking & Finance > Real Estate (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.62)

Add feedback