Performance Analysis
ETHOS: an Online Hate Speech Detection Dataset
Mollas, Ioannis, Chrysopoulou, Zoe, Karlos, Stamatis, Tsoumakas, Grigorios
Online hate speech is a newborn problem in our modern society which is growing at a steady rate exploiting weaknesses of the corresponding regimes that characterise several social media platforms. Therefore, this phenomenon is mainly cultivated through such comments, either during users' interaction or on posted multimedia context. Nowadays, giant companies own platforms where many millions of users log in daily. Thus, protection of their users from exposure to similar phenomena for keeping up with the corresponding law, as well as for retaining a high quality of offered services, seems mandatory. Having a robust and reliable mechanism for identifying and preventing the uploading of related material would have a huge effect on our society regarding several aspects of our daily life. On the other hand, its absence would deteriorate heavily the total user experience, while its erroneous operation might raise several ethical issues. In this work, we present a protocol for creating a more suitable dataset, regarding its both informativeness and representativeness aspects, favouring the safer capture of hate speech occurrence, without at the same time restricting its applicability to other classification problems. Moreover, we produce and publish a textual dataset with two variants: binary and multi-label, called `ETHOS', based on YouTube and Reddit comments validated through figure-eight crowdsourcing platform. Our assumption about the production of more compatible datasets is further investigated by applying various classification models and recording their behaviour over several appropriate metrics.
NADS: Neural Architecture Distribution Search for Uncertainty Awareness
Ardywibowo, Randy, Boluki, Shahin, Gong, Xinyu, Wang, Zhangyang, Qian, Xiaoning
Machine learning (ML) systems often encounter Out-of-Distribution (OoD) errors when dealing with testing data coming from a distribution different from training data. It becomes important for ML systems in critical applications to accurately quantify its predictive uncertainty and screen out these anomalous inputs. However, existing OoD detection approaches are prone to errors and even sometimes assign higher likelihoods to OoD samples. Unlike standard learning tasks, there is currently no well established guiding principle for designing OoD detection architectures that can accurately quantify uncertainty. To address these problems, we first seek to identify guiding principles for designing uncertainty-aware architectures, by proposing Neural Architecture Distribution Search (NADS). NADS searches for a distribution of architectures that perform well on a given task, allowing us to identify common building blocks among all uncertainty-aware architectures. With this formulation, we are able to optimize a stochastic OoD detection objective and construct an ensemble of models to perform OoD detection. We perform multiple OoD detection experiments and observe that our NADS performs favorably, with up to 57% improvement in accuracy compared to state-of-the-art methods among 15 different testing configurations.
On mistakes we made in prior Computational Psychiatry Data driven approach projects and how they jeopardize translation of those findings in clinical practice
Radenkoviฤ, Milena ฤukiฤ, Pokrajac, David, Lopez, Victoria
In this work we aimed at comparing our findings in depression detection task with methodologies applied in present literature. Previously we showed that when electrophysiological signal (in this case electroencephalogram, EEG) is characterized by nonlinear measures, any of seven most popular classifiers yields high accuracy on the task. Following every step we done in this process we compare it with other researchers' practice and comment on other findings mainly from analysis of electrical signals or nonlinear analysis showing what would be optimal for further research. We focused on discussing various mistakes and differences that could potentially lead to unwarranted optimism and other misinterpretation of results. In Conclusion we summarize recommendation for future research in order to be applicable in clinical practice. Introduction Current clinical psychiatry is lacking objective biochemical or electrophysiological tests used for diagnosis unlike other medical disciplines. To diagnose depression, clinician will typically rely on the self-report from the patient and his experience in applying DSM manual, which is standardized list of symptoms to be checked in every case (in order to be qualified as a certain disorder). It is perfectly possible that two persons diagnosed with the same disorder have not overlapping symptoms, and that one person can have two distinct diagnosis. If someone has more than three episodes of depression, that is considered to be recurrent depression (after every episode the probability of the next one is doubling). This is particularly heard to treat and manage therapy which is ongoing through person's whole life. Apart from obsolete diagnostic, all antidepressants have serious side-effects, the waiting lists are very long (in Nederland they are between 6 and 9 months long) and the therapy can last for years or even decades. It is reported than only 11 - 30% of patients are improving in the first year of therapy (Rush et al., 2008).
A Variational Approach to Privacy and Fairness
Rodrรญguez-Gรกlvez, Borja, Thobaben, Ragnar, Skoglund, Mikael
In this article, we propose a new variational approach to learn private and/or fair representations. This approach is based on the Lagrangians of a new formulation of the privacy and fairness optimization problems that we propose. In this formulation, we aim at generating representations of the data that keep a prescribed level of the relevant information that is not shared by the private or sensitive data, while minimizing the remaining information they keep. The proposed approach (i) exhibits the similarities of the privacy and fairness problems, (ii) allows us to control the trade-off between utility and privacy or fairness through the Lagrange multiplier parameter, and (iii) can be comfortably incorporated to common representation learning algorithms such as the VAE, the $\beta$-VAE, the VIB, or the nonlinear IB.
ClarQ: A large-scale and diverse dataset for Clarification Question Generation
Kumar, Vaibhav, black, Alan W.
Question answering and conversational systems are often baffled and need help clarifying certain ambiguities. However, limitations of existing datasets hinder the development of large-scale models capable of generating and utilising clarification questions. In order to overcome these limitations, we devise a novel bootstrapping framework (based on self-supervision) that assists in the creation of a diverse, large-scale dataset of clarification questions based on post-comment tuples extracted from stackexchange. The framework utilises a neural network based architecture for classifying clarification questions. It is a two-step method where the first aims to increase the precision of the classifier and second aims to increase its recall. We quantitatively demonstrate the utility of the newly created dataset by applying it to the downstream task of question-answering. The final dataset, ClarQ, consists of ~2M examples distributed across 173 domains of stackexchange. We release this dataset in order to foster research into the field of clarification question generation with the larger goal of enhancing dialog and question answering systems.
Coronavirus: How air passengers can stay safe
Thermal-imaging cameras and swab tests for coronavirus are not "clinically valuable" in airports, according to a panel of aviation health experts. About one in every three infectious people would be missed, they say. Air systems and low humidity on planes already reduces virus spread through the cabin. But passengers should wear face coverings at all times, board and disembark one row at a time and be seated apart from others if possible. And those seated at the back should be the first on and last off.
Interpretable Random Forests via Rule Extraction
Bรฉnard, Clรฉment, Biau, Gรฉrard, da Veiga, Sรฉbastien, Scornet, Erwan
We introduce SIRUS (Stable and Interpretable RUle Set) for regression, a stable rule learning algorithm which takes the form of a short and simple list of rules. State-of-the-art learning algorithms are often referred to as ''black boxes'' because of the high number of operations involved in their prediction process. Despite their powerful predictivity, this lack of interpretability may be highly restrictive for applications with critical decisions at stake. On the other hand, algorithms with a simple structure-typically decision trees, rule algorithms, or sparse linear models-are well known for their instability. This undesirable feature makes the conclusions of the data analysis unreliable and turns out to be a strong operational limitation. This motivates the design of SIRUS, which combines a simple structure with a remarkable stable behavior when data is perturbed. The algorithm is based on random forests, the predictive accuracy of which is preserved. We demonstrate the efficiency of the method both empirically (through experiments) and theoretically (with the proof of its asymptotic stability). Our R/C++ software implementation sirus is available from CRAN.
Fair Data Integration
Galhotra, Sainyam, Shanmugam, Karthikeyan, Sattigeri, Prasanna, Varshney, Kush R.
The use of machine learning (ML) in high-stakes societal decisions has encouraged the consideration of fairness throughout the ML lifecycle. Although data integration is one of the primary steps to generate high quality training data, most of the fairness literature ignores this stage. In this work, we consider fairness in the integration component of data management, aiming to identify features that improve prediction without adding any bias to the dataset. We work under the causal interventional fairness paradigm. Without requiring the underlying structural causal model a priori, we propose an approach to identify a sub-collection of features that ensure the fairness of the dataset by performing conditional independence tests between different subsets of features. We use group testing to improve the complexity of the approach. We theoretically prove the correctness of the proposed algorithm to identify features that ensure interventional fairness and show that sub-linear conditional independence tests are sufficient to identify these variables. A detailed empirical evaluation is performed on real-world datasets to demonstrate the efficacy and efficiency of our technique.
SANOM Results for OAEI 2019
Mohammadi, Majid, Atashin, Amir Ahooye, Hofman, Wout, Tan, Yao-Hua
Simulated annealing-based ontology matching (SANOM) participates for the second time at the ontology alignment evaluation initiative (OAEI) 2019. This paper contains the configuration of SANOM and its results on the anatomy and conference tracks. In comparison to the OAEI 2017, SANOM has improved significantly, and its results are competitive with the state-of-the-art systems. In particular, SANOM has the highest recall rate among the participated systems in the conference track, and is competitive with AML, the best performing system, in terms of F-measure. SANOM is also competitive with LogMap on the anatomy track, which is the best performing system in this track with no usage of particular biomedical background knowledge. SANOM has been adapted to the HOBBIT platfrom and is now available for the registered users.
Data Augmentation for Training Dialog Models Robust to Speech Recognition Errors
Wang, Longshaokan, Fazel-Zarandi, Maryam, Tiwari, Aditya, Matsoukas, Spyros, Polymenakos, Lazaros
Speech-based virtual assistants, such as Amazon Alexa, Google assistant, and Apple Siri, typically convert users' audio signals to text data through automatic speech recognition (ASR) and feed the text to downstream dialog models for natural language understanding and response generation. The ASR output is error-prone; however, the downstream dialog models are often trained on error-free text data, making them sensitive to ASR errors during inference time. To bridge the gap and make dialog models more robust to ASR errors, we leverage an ASR error simulator to inject noise into the error-free text data, and subsequently train the dialog models with the augmented data. Compared to other approaches for handling ASR errors, such as using ASR lattice or end-to-end methods, our data augmentation approach does not require any modification to the ASR or downstream dialog models; our approach also does not introduce any additional latency during inference time. We perform extensive experiments on benchmark data and show that our approach improves the performance of downstream dialog models in the presence of ASR errors, and it is particularly effective in the low-resource situations where there are constraints on model size or the training data is scarce.