AITopics

1905.07892

Country:

Asia > Russia (0.05)
Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
(5 more...)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Zhang, Xiaokang, Jonassen, Inge

A Comparative Analysis of Feature Selection Methods for Biomarker Discovery in Study of Toxicant-treated Atlantic Cod (Gadus morhua) Liver

arXiv.org Machine LearningMay-20-2019

Univariate and multivariate feature selection methods can be used for biomarker discovery in analysis of toxicant exposure. Among the univariate methods, differential expression analysis (DEA) is often applied for its simplicity and interpretability. A characteristic of methods for DEA is that they treat genes individually, disregarding the correlation that exists between them. On the other hand, some multivariate feature selection methods are proposed for biomarker discovery. Provided with various biomarker discovery methods, how to choose the most suitable method for a specific dataset becomes a problem. In this paper, we present a framework for comparison of potential biomarker discovery methods: three methods that stem from different theories are compared by how stable they are and how well they can improve the classification accuracy. The three methods we have considered are: Significance Analysis of Microarrays (SAM) which identifies the differentially expressed genes; minimum Redundancy Maximum Relevance (mRMR) based on information theory; and Characteristic Direction (GeoDE) inspired by a graphical perspective. Tested on the gene expression data from two experiments exposing the cod fish to two different toxicants (MeHg and PCB 153), different methods stand out in different cases, so a decision upon the most suitable method should be made based on the dataset under study and the research interest.

artificial intelligence, feature selection method, machine learning, (14 more...)

1905.08048

Country: Europe (0.15)

Genre: Research Report > Experimental Study (0.73)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

A Distributionally Robust Boosting Algorithm

Blanchet, Jose, Kang, Yang, Zhang, Fan, Hu, Zhangyi

Distributionally Robust Optimization (DRO) has been shown to provide a flexible framework for decision making under uncertainty and statistical estimation. For example, recent works in DRO have shown that popular statistical estimators can be interpreted as the solutions of suitable formulated data-driven DRO problems. In turn, this connection is used to optimally select tuning parameters in terms of a principled approach informed by robustness considerations. This paper contributes to this growing literature, connecting DRO and statistics, by showing how boosting algorithms can be studied via DRO. We propose a boosting type algorithm, named DRO-Boosting, as a procedure to solve our DRO formulation. Our DRO-Boosting algorithm recovers Adaptive Boosting (AdaBoost) in particular, thus showing that AdaBoost is effectively solving a DRO problem. We apply our algorithm to a financial dataset on credit card default payment prediction. We find that our approach compares favorably to alternative boosting methods which are widely used in practice.

algorithm, artificial intelligence, machine learning, (17 more...)

1905.07845

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Zhelezniak, Vitalii, Savkov, Aleksandar, Shen, April, Hammerla, Nils Y.

Correlation Coefficients and Semantic Textual Similarity

A large body of research into semantic textual similarity has focused on constructing state-of-the-art embeddings using sophisticated modelling, careful choice of learning signals and many clever tricks. By contrast, little attention has been devoted to similarity measures between these embeddings, with cosine similarity being used unquestionably in the majority of cases. In this work, we illustrate that for all common word vectors, cosine similarity is essentially equivalent to the Pearson correlation coefficient, which provides some justification for its use. We thoroughly characterise cases where Pearson correlation (and thus cosine similarity) is unfit as similarity measure. Importantly, we show that Pearson correlation is appropriate for some word vectors but not others. When it is not appropriate, we illustrate how common non-parametric rank correlation coefficients can be used instead to significantly improve performance. We support our analysis with a series of evaluations on word-level and sentence-level semantic textual similarity benchmarks. On the latter, we show that even the simplest averaged word vectors compared by rank correlation easily rival the strongest deep representations compared by cosine similarity.

artificial intelligence, machine learning, natural language, (17 more...)

1905.0779

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.76)

Brunner, Thomas, Diehl, Frederik, Le, Michael Truong, Knoll, Alois

Leveraging Semantic Embeddings for Safety-Critical Applications

Semantic Embeddings are a popular way to represent knowledge in the field of zero-shot learning. We observe their interpretability and discuss their potential utility in a safety-critical context. Concretely, we propose to use them to add introspection and error detection capabilities to neural network classifiers. First, we show how to create embeddings from symbolic domain knowledge. We discuss how to use them for interpreting mispredictions and propose a simple error detection scheme. We then introduce the concept of semantic distance: a real-valued score that measures confidence in the semantic space. We evaluate this score on a traffic sign classifier and find that it achieves near state-of-the-art performance, while being significantly faster to compute than other confidence scores. Our approach requires no changes to the original network and is thus applicable to any task for which domain knowledge is available.

confidence score, machine learning, natural language, (16 more...)

1905.07733

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Mothilal, Ramaravind Kommiya, Sharma, Amit, Tan, Chenhao

Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations

Post-hoc explanations of machine learning models are crucial for people to understand and act on algorithmic predictions. An intriguing class of explanations is through counterfactuals, hypothetical examples that show people how to obtain a different prediction. We posit that effective counterfactual explanations should satisfy two properties: feasibility of the counterfactual actions given user context and constraints, and diversity among the counterfactuals presented. To this end, we propose a framework for generating and evaluating a diverse set of counterfactual explanations based on average distance and determinantal point processes. To evaluate the actionability of counterfactuals, we provide metrics that enable comparison of counterfactual-based methods to other local explanation methods. We further address necessary tradeoffs and point to causal implications in optimizing for counterfactuals. Our experiments on three real-world datasets show that our framework can generate a set of counterfactuals that are diverse and well approximate local decision boundaries.

artificial intelligence, machine learning, natural language, (20 more...)

1905.07697

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.68)
Banking & Finance > Loans (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Reference-Based Sequence Classification

He, Zengyou, Xu, Guangyao, Sheng, Chaohua, Xu, Bo, Zou, Quan

Sequence classification is an important data mining task in many real world applications. Over the past few decades, many sequence classification methods have been proposed from different aspects. In particular, the pattern-based method is one of the most important and widely studied sequence classification methods in the literature. In this paper, we present a reference-based sequence classification framework, which can unify existing pattern-based sequence classification methods under the same umbrella. More importantly, this framework can be used as a general platform for developing new sequence classification algorithms. By utilizing this framework as a tool, we propose new sequence classification algorithms that are quite different from existing solutions. Experimental results show that new methods developed under the proposed framework are capable of achieving comparable classification accuracy to those state-of-the-art sequence classification algorithms.

artificial intelligence, machine learning, pattern recognition, (20 more...)

1905.07188

Country:

Asia > China > Liaoning Province > Dalian (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

herremans, Dorien, Martens, David, Sörensen, Kenneth

Dance Hit Song Prediction

Record companies invest billions of dollars in new talent around the globe each year. Gaining insight into what actually makes a hit song would provide tremendous benefits for the music industry. In this research we tackle this question by focussing on the dance hit song classification problem. A database of dance hit songs from 1985 until 2013 is built, including basic musical features, as well as more advanced features that capture a temporal aspect. A number of different classifiers are used to build and test dance hit prediction models. The resulting best model has a good performance when predicting whether a song is a "top 10" dance hit versus a lower listed position.

dataset, music and machine learning, new music research, (12 more...)

doi: 10.1080/09298215.2014.881888

1905.08076

Country:

Europe > Belgium > Flanders > Antwerp Province > Antwerp (0.04)
North America > United States > New York > New York County > New York City (0.04)
Oceania > New Zealand > North Island > Waikato (0.04)
(4 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Cross-referencing using Fine-grained Topic Modeling

Lund, Jeffrey, Armstrong, Piper, Fearn, Wilson, Cowley, Stephen, Hales, Emily, Seppi, Kevin

Cross-referencing, which links passages of text to other related passages, can be a valuable study aid for facilitating comprehension of a text. However, cross-referencing requires first, a comprehensive thematic knowledge of the entire corpus, and second, a focused search through the corpus specifically to find such useful connections. Due to this, cross-reference resources are prohibitively expensive and exist only for the most well-studied texts (e.g. religious texts). We develop a topic-based system for automatically producing candidate cross-references which can be easily verified by human annotators. Our system utilizes fine-grained topic modeling with thousands of highly nuanced and specific topics to identify verse pairs which are topically related. We demonstrate that our system can be cost effective compared to having annotators acquire the expertise necessary to produce cross-reference resources unaided.

machine learning, natural language, topic model, (19 more...)

1905.07508

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.74)

MOBA: A multi-objective bounded-abstention model for two-class cost-sensitive problems

Guan, Hongjiao

Abstaining classifiers have been widely used in cost-sensitive applications to avoid ambiguous classification and reduce the cost of misclassification. Previous abstaining classification models rely on cost information, such as a cost matrix or cost ratio. However, it is difficult to obtain or estimate costs in practical applications. Furthermore, these abstention models are typically restricted to a single optimization metric, which may not be the expected indicator when evaluating classification performance. To overcome such problems, a multi-objective bounded-abstention (MOBA) model is proposed to optimize essential metrics. Specifically, the MOBA model minimizes the error rate of each class under class-dependent abstention constraints. The MOBA model is then solved using the non-dominated sorting genetic algorithm II, which is a popular evolutionary multi-objective optimization algorithm. A set of Pareto-optimal solutions will be generated and the best one can be selected according to provided conditions (whether costs are known) or performance demands (e.g., obtaining a high accuracy, F-measure, and etc). Hence, the MOBA model is robust towards variations in the conditions and requirements. Compared to state-of-the-art abstention models, MOBA achieves lower expected costs when cost information is considered, and better performance-abstention trade-offs when it is not.

evolutionary algorithm, machine learning, moba model, (18 more...)

1905.07297

Country: Asia (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)