Goto

Collaborating Authors

 Accuracy


Loss-calibrated expectation propagation for approximate Bayesian decision-making

arXiv.org Machine Learning

Approximate Bayesian inference methods provide a powerful suite of tools for finding approximations to intractable posterior distributions. However, machine learning applications typically involve selecting actions, which -- in a Bayesian setting -- depend on the posterior distribution only via its contribution to expected utility. A growing body of work on loss-calibrated approximate inference methods has therefore sought to develop posterior approximations sensitive to the influence of the utility function. Here we introduce loss-calibrated expectation propagation (Loss-EP), a loss-calibrated variant of expectation propagation. This method resembles standard EP with an additional factor that "tilts" the posterior towards higher-utility decisions. We show applications to Gaussian process classification under binary utility functions with asymmetric penalties on False Negative and False Positive errors, and show how this asymmetry can have dramatic consequences on what information is "useful" to capture in an approximation.


Uncovering the Source of Machine Bias

arXiv.org Machine Learning

We develop a structural econometric model to capture the decision dynamics of human evaluators on an online micro-lending platform, and estimate the model parameters using a real-world dataset. We find two types of biases in gender, preference-based bias and belief-based bias, are present in human evaluators' decisions. Both types of biases are in favor of female applicants. Through counterfactual simulations, we quantify the effect of gender bias on loan granting outcomes and the welfare of the company and the borrowers. Our results imply that both the existence of the preference-based bias and that of the belief-based bias reduce the company's profits. When the preference-based bias is removed, the company earns more profits. When the belief-based bias is removed, the company's profits also increase. Both increases result from raising the approval probability for borrowers, especially male borrowers, who eventually pay back loans. For borrowers, the elimination of either bias decreases the gender gap of the true positive rates in the credit risk evaluation. We also train machine learning algorithms on both the real-world data and the data from the counterfactual simulations. We compare the decisions made by those algorithms to see how evaluators' biases are inherited by the algorithms and reflected in machine-based decisions. We find that machine learning algorithms can mitigate both the preference-based bias and the belief-based bias.


Weak Supervision for Affordable Modeling of Electrocardiogram Data

arXiv.org Artificial Intelligence

Analysing electrocardiograms (ECGs) is an inexpensive and non-invasive, yet powerful way to diagnose heart disease. ECG studies using Machine Learning to automatically detect abnormal heartbeats so far depend on large, manually annotated datasets. While collecting vast amounts of unlabeled data can be straightforward, the point-by-point annotation of abnormal heartbeats is tedious and expensive. We explore the use of multiple weak supervision sources to learn diagnostic models of abnormal heartbeats via human designed heuristics, without using ground truth labels on individual data points. Our work is among the first to define weak supervision sources directly on time series data. Results show that with as few as six intuitive time series heuristics, we are able to infer high quality probabilistic label estimates for over 100,000 heartbeats with little human effort, and use the estimated labels to train competitive classifiers evaluated on held out test data. Introduction Automatic analysis of electrocardiograms (ECGs) promises substantial improvements in critical care.


Evaluation metrics: leave your comfort zone and try MCC and Brier Score

#artificialintelligence

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains. Machine learning instead is the study of computer algorithms that can improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence. Data scientist around the world apply machine learning in order to build models able to predict future events, cluster people/objects into similar groups and also identify unexpected anomalies. Every enthusiastic data scientist knows that the most exciting part of machine learning is to choose the coolest algorithm capable of solve the problem of interest (Supervised or Unsupervised).


Implementing Naive Bayes From Scratch

#artificialintelligence

As stated in the general overview, we need to calculate the summary statistics for each class (and feature) as well as the prior. First of all, we need to gather some basic information about the dataset and create three zero-matrices to store the mean, the variance, and the prior for each class. Next, we iterate over all the classes, compute the statistics and update our zero-matrices accordingly. For example, assume we have two unique classes (0,1) and two features in our dataset. The matrix storing the mean values, therefore will have a two rows and two columns (2x2). The prior is just a single vector (1x2), containing the ratio of a single classes' samples divided by the total sample size.


Knowledge Tracing: A Survey

arXiv.org Artificial Intelligence

Humans ability to transfer knowledge through teaching is one of the essential aspects for human intelligence. A human teacher can track the knowledge of students to customize the teaching on students needs. With the rise of online education platforms, there is a similar need for machines to track the knowledge of students and tailor their learning experience. This is known as the Knowledge Tracing (KT) problem in the literature. Effectively solving the KT problem would unlock the potential of computer-aided education applications such as intelligent tutoring systems, curriculum learning, and learning materials' recommendation. Moreover, from a more general viewpoint, a student may represent any kind of intelligent agents including both human and artificial agents. Thus, the potential of KT can be extended to any machine teaching application scenarios which seek for customizing the learning experience for a student agent (i.e., a machine learning model). In this paper, we provide a comprehensive and systematic review for the KT literature. We cover a broad range of methods starting from the early attempts to the recent state-of-the-art methods using deep learning, while highlighting the theoretical aspects of models and the characteristics of benchmark datasets. Besides these, we shed light on key modelling differences between closely related methods and summarize them in an easy-to-understand format. Finally, we discuss current research gaps in the KT literature and possible future research and application directions.


LoMar: A Local Defense Against Poisoning Attack on Federated Learning

arXiv.org Artificial Intelligence

Federated learning (FL) provides a high efficient decentralized machine learning framework, where the training data remains distributed at remote clients in a network. Though FL enables a privacy-preserving mobile edge computing framework using IoT devices, recent studies have shown that this approach is susceptible to poisoning attacks from the side of remote clients. To address the poisoning attacks on FL, we provide a \textit{two-phase} defense algorithm called {Lo}cal {Ma}licious Facto{r} (LoMar). In phase I, LoMar scores model updates from each remote client by measuring the relative distribution over their neighbors using a kernel density estimation method. In phase II, an optimal threshold is approximated to distinguish malicious and clean updates from a statistical perspective. Comprehensive experiments on four real-world datasets have been conducted, and the experimental results show that our defense strategy can effectively protect the FL system. {Specifically, the defense performance on Amazon dataset under a label-flipping attack indicates that, compared with FG+Krum, LoMar increases the target label testing accuracy from $96.0\%$ to $98.8\%$, and the overall averaged testing accuracy from $90.1\%$ to $97.0\%$.


Fake Hilsa Fish Detection Using Machine Vision

arXiv.org Artificial Intelligence

Hilsa is the national fish of Bangladesh. Bangladesh is earning a lot of foreign currency by exporting this fish. Unfortunately, in recent days, some unscrupulous businessmen are selling fake Hilsa fishes to gain profit. The Sardines and Sardinella are the most sold in the market as Hilsa. The government agency of Bangladesh, namely Bangladesh Food Safety Authority said that these fake Hilsa fish contain high levels of cadmium and lead which are detrimental for humans. In this research, we have proposed a method that can readily identify original Hilsa fish and fake Hilsa fish. Based on the research available on online literature, we are the first to do research on identifying original Hilsa fish. We have collected more than 16,000 images of original and counterfeit Hilsa fish. To classify these images, we have used several deep learning-based models. Then, the performance has been compared between them. Among those models, DenseNet201 achieved the highest accuracy of 97.02%.


AnomMAN: Detect Anomaly on Multi-view Attributed Networks

arXiv.org Artificial Intelligence

Anomaly detection on attributed networks is widely used in web shopping, financial transactions, communication networks, and so on. However, most work tries to detect anomalies on attributed networks only considering a single interaction action, which cannot consider rich kinds of interaction actions in multi-view attributed networks. In fact, it remains a challenging task to consider all different kinds of interaction actions uniformly and detect anomalous instances in multi-view attributed networks. In this paper, we propose a Graph Convolution based framework, AnomMAN, to detect \textbf{Anom}aly on \textbf{M}ulti-view \textbf{A}ttributed \textbf{N}etworks. To consider the attributes and all interaction actions jointly, we use the attention mechanism to define the importance of all views in networks. Besides, the Graph Convolution operation cannot be simply applied in anomaly detection tasks on account of its low-pass characteristic. Therefore, AnomMAN uses a graph auto-encoder module to overcome the shortcoming and transform it to our strength. According to experiments on real-world datasets, AnomMAN outperforms state-of-the-art models and two variants of our proposed model. Besides, the Accuracy@50 indicator of AnomMAN reaches 1.000 on the dataset, which shows that the top 50 anomalous instances detected by AnomMAN are all anomalous ones.


Experts warn prenatal screening tests can lead to false positive results in some cases

FOX News

Fox News Flash top headlines are here. Check out what's clicking on Foxnews.com. Non-invasive prenatal testing (NIPT) on pregnant women to detect the risk of a fetus having rare genetic abnormalities may often be wrong, according to recent reports. These tests, according to multiple health experts, can actually give false positives, which can create significant angst in expecting parents. Health experts explained to Fox News that NIPT works by taking blood samples from the pregnant mother and then analyzing fragments of free-floating cell-free DNA (cfDNA).