AITopics

doi: 10.1093/bib/bbac353

2201.09508

Country: Europe (0.46)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

arXiv.org Artificial IntelligenceJul-5-2022

Local Multi-Label Explanations for Random Forest

Mylonas, Nikolaos, Mollas, Ioannis, Bassiliades, Nick, Tsoumakas, Grigorios

Multi-label classification is a challenging task, particularly in domains where the number of labels to be predicted is large. Deep neural networks are often effective at multi-label classification of images and textual data. When dealing with tabular data, however, conventional machine learning algorithms, such as tree ensembles, appear to outperform competition. Random forest, being a popular ensemble algorithm, has found use in a wide range of real-world problems. Such problems include fraud detection in the financial domain, crime hotspot detection in the legal sector, and in the biomedical field, disease probability prediction when patient records are accessible. Since they have an impact on people's lives, these domains usually require decision-making systems to be explainable. Random Forest falls short on this property, especially when a large number of tree predictors are used. This issue was addressed in a recent research named LionForests, regarding single label classification and regression. In this work, we adapt this technique to multi-label classification problems, by employing three different strategies regarding the labels that the explanation covers. Finally, we provide a set of qualitative and quantitative experiments to assess the efficacy of this approach.

artificial intelligence, explanation, machine learning, (18 more...)

doi: 10.1007/978-3-031-23618-1_25

2207.01994

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Artificial IntelligenceApr-13-2021

Conclusive Local Interpretation Rules for Random Forests

Mollas, Ioannis, Bassiliades, Nick, Tsoumakas, Grigorios

In critical situations involving discrimination, gender inequality, economic damage, and even the possibility of casualties, machine learning models must be able to provide clear interpretations for their decisions. Otherwise, their obscure decision-making processes can lead to socioethical issues as they interfere with people's lives. In the aforementioned sectors, random forest algorithms strive, thus their ability to explain themselves is an obvious requirement. In this paper, we present LionForests, which relies on a preliminary work of ours. LionForests is a random forest-specific interpretation technique, which provides rules as explanations. It is applicable from binary classification tasks to multi-class classification and regression tasks, and it is supported by a stable theoretical background. Experimentation, including sensitivity analysis and comparison with state-of-the-art techniques, is also performed to demonstrate the efficacy of our contribution. Finally, we highlight a unique property of LionForests, called conclusiveness, that provides interpretation validity and distinguishes it from previous techniques.

decision tree learning, health & medicine, prediction, (22 more...)

2104.0604

Country:

Europe (1.00)
North America > United States (0.46)

Genre: Research Report > Promising Solution (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)
Banking & Finance (0.67)
Law (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Artificial IntelligenceOct-15-2020

Altruist: Argumentative Explanations through Local Interpretations of Predictive Models

Mollas, Ioannis, Bassiliades, Nick, Tsoumakas, Grigorios

Interpretable machine learning is an emerging field providing solutions on acquiring insights into machine learning models' rationale. It has been put in the map of machine learning by suggesting ways to tackle key ethical and societal issues. However, existing techniques of interpretable machine learning are far from being comprehensible and explainable to the end user. Another key issue in this field is the lack of evaluation and selection criteria, making it difficult for the end user to choose the most appropriate interpretation technique for its use. In this study, we introduce a meta-explanation methodology that will provide truthful interpretations, in terms of feature importance, to the end user through argumentation. At the same time, this methodology can be used as an evaluation or selection tool for multiple interpretation techniques based on feature importance.

decision tree learning, interpretation, neural network, (21 more...)

2010.0765

Country: Europe > Germany (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)

arXiv.org Machine LearningJun-11-2020

ETHOS: an Online Hate Speech Detection Dataset

Mollas, Ioannis, Chrysopoulou, Zoe, Karlos, Stamatis, Tsoumakas, Grigorios

Online hate speech is a newborn problem in our modern society which is growing at a steady rate exploiting weaknesses of the corresponding regimes that characterise several social media platforms. Therefore, this phenomenon is mainly cultivated through such comments, either during users' interaction or on posted multimedia context. Nowadays, giant companies own platforms where many millions of users log in daily. Thus, protection of their users from exposure to similar phenomena for keeping up with the corresponding law, as well as for retaining a high quality of offered services, seems mandatory. Having a robust and reliable mechanism for identifying and preventing the uploading of related material would have a huge effect on our society regarding several aspects of our daily life. On the other hand, its absence would deteriorate heavily the total user experience, while its erroneous operation might raise several ethical issues. In this work, we present a protocol for creating a more suitable dataset, regarding its both informativeness and representativeness aspects, favouring the safer capture of hate speech occurrence, without at the same time restricting its applicability to other classification problems. Moreover, we produce and publish a textual dataset with two variants: binary and multi-label, called `ETHOS', based on YouTube and Reddit comments validated through figure-eight crowdsourcing platform. Our assumption about the production of more compatible datasets is further investigated by applying various classification models and recording their behaviour over several appropriate metrics.

crowdsourcing, dataset, deep learning, (25 more...)

2006.08328

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Media > News (0.49)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)
Government > Regional Government (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

arXiv.org Machine LearningMay-19-2020

Multi-Label Sampling based on Local Label Imbalance

Liu, Bin, Blekas, Konstantinos, Tsoumakas, Grigorios

Abstract--Class imbalance is an inherent characteristic of multi-label data that hinders most multi-label learning methods. One efficient and flexible strategy to deal with this problem is to employ sampling techniques before training a multi-label learning model. Although existing multi-label sampling approaches alleviate the global imbalance of multi-label datasets, it is actually the imbalance level within the local neighbourhood of minority class examples that plays a key role in performance degradation. To address this issue, we propose a novel measure to assess the local label imbalance of multi-label datasets, as well as two multi-label sampling approaches based on the local label imbalance, namely MLSOL and MLUL. By considering all informative labels, MLSOL creates more diverse and better labeled synthetic instances for difficult examples, while MLUL eliminates instances that are harmful to their local region. Experimental results on 13 multi-label datasets demonstrate the effectiveness of the proposed measure and sampling approaches for a variety of evaluation metrics, particularly in the case of an ensemble of classifiers trained on repeated samples of the original data.

artificial intelligence, data mining, dataset, (18 more...)

2005.0324

Country: Europe > Greece (0.46)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

arXiv.org Machine LearningJun-15-2019

LioNets: Local Interpretation of Neural Networks through Penultimate Layer Decoding

Mollas, Ioannis, Bassiliades, Nikolaos, Tsoumakas, Grigorios

Technological breakthroughs on smart homes, self-driving cars, health care and robotic assistants, in addition to reinforced law regulations, have critically influenced academic research on explainable machine learning. A sufficient number of researchers have implemented ways to explain indifferently any black box model for classification tasks. A drawback of building agnostic explanators is that the neighbourhood generation process is universal and consequently does not guarantee true adjacency between the generated neighbours and the instance. This paper explores a methodology on providing explanations for a neural network's decisions, in a local scope, through a process that actively takes into consideration the neural network's architecture on creating an instance's neighbourhood, that assures the adjacency among the generated neighbours and the instance.

artificial intelligence, ground transportation, neural network, (19 more...)

1906.06566

Country: Europe > Greece (0.14)

Genre: Research Report (1.00)

Industry:

Transportation (0.76)
Information Technology (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningMay-2-2019

Synthetic Oversampling of Multi-Label Data based on Local Label Distribution

Liu, Bin, Tsoumakas, Grigorios

Class-imbalance is an inherent characteristic of multi-label data which affects the prediction accuracy of most multi-label learning methods. One efficient strategy to deal with this problem is to employ resampling techniques before training the classifier. Existing multilabel sampling methods alleviate the (global) imbalance of multi-label datasets. However, performance degradation is mainly due to rare subconcepts and overlapping of classes that could be analysed by looking at the local characteristics of the minority examples, rather than the imbalance of the whole dataset. We propose a new method for synthetic oversampling of multi-label data that focuses on local label distribution to generate more diverse and better labeled instances. Experimental results on 13 multi-label datasets demonstrate the effectiveness of the proposed approach in a variety of evaluation measures, particularly in the case of an ensemble of classifiers trained on repeated samples of the original data.

artificial intelligence, machine learning, multi-label data, (16 more...)

1905.00609

Country:

Europe > Greece (0.28)
Asia (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Machine LearningJul-31-2018

Making Classifier Chains Resilient to Class Imbalance

Liu, Bin, Tsoumakas, Grigorios

Class imbalance is an intrinsic characteristic of multi-label data. Most of the labels in multi-label data sets are associated with a small number of training examples, much smaller compared to the size of the data set. Class imbalance poses a key challenge that plagues most multi-label learning methods. Ensemble of Classifier Chains (ECC), one of the most prominent multi-label learning methods, is no exception to this rule, as each of the binary models it builds is trained from all positive and negative examples of a label. To make ECC resilient to class imbalance, we first couple it with random undersampling. We then present two extensions of this basic approach, where we build a varying number of binary models per label and construct chains of different sizes, in order to improve the exploitation of majority examples with approximately the same computational budget. Experimental results on 16 multi-label datasets demonstrate the effectiveness of the proposed approaches in a variety of evaluation metrics.

artificial intelligence, classifier chain, inductive learning, (17 more...)

1807.11393

Country: Europe > Greece (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

arXiv.org Machine LearningSep-16-2017

Subset Labeled LDA for Large-Scale Multi-Label Classification

Papanikolaou, Yannis, Tsoumakas, Grigorios

Labeled Latent Dirichlet Allocation (LLDA) is an extension of the standard unsupervised Latent Dirichlet Allocation (LDA) algorithm, to address multi-label learning tasks. Previous work has shown it to perform in par with other state-of-the-art multi-label methods. Nonetheless, with increasing label sets sizes LLDA encounters scalability issues. In this work, we introduce Subset LLDA, a simple variant of the standard LLDA algorithm, that not only can effectively scale up to problems with hundreds of thousands of labels but also improves over the LLDA state-of-the-art. We conduct extensive experiments on eight data sets, with label sets sizes ranging from hundreds to hundreds of thousands, comparing our proposed algorithm with the previously proposed LLDA algorithms (Prior--LDA, Dep--LDA), as well as the state of the art in extreme multi-label classification. The results show a steady advantage of our method over the other LLDA algorithms and competitive results compared to the extreme multi-label classification algorithms.

algorithm, artificial intelligence, text processing, (18 more...)

1709.0548

Country:

Europe (0.46)
North America > United States (0.29)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)