Goto

Collaborating Authors

 Accuracy


Target-Focused Feature Selection Using a Bayesian Approach

arXiv.org Machine Learning

In many real-world scenarios where data is high dimensional, test time acquisition of features is a non-trivial task due to costs associated with feature acquisition and evaluating feature value. The need for highly confident models with an extremely frugal acquisition of features can be addressed by allowing a feature selection method to become target aware. We introduce an approach to feature selection that is based on Bayesian learning, allowing us to report target-specific levels of uncertainty, false positive, and false negative rates. In addition, measuring uncertainty lifts the restriction on feature selection being target agnostic, allowing for feature acquisition based on a single target of focus out of many. We show that acquiring features for a specific target is at least as good as common linear feature selection approaches for small non-sparse datasets, and surpasses these when faced with real-world healthcare data that is larger in scale and in sparseness.


Identifying Malicious Players in GWAP-based Disaster Monitoring Crowdsourcing System

arXiv.org Artificial Intelligence

Disaster monitoring is challenging due to the lake of infrastructures in monitoring areas. Based on the theory of Game-With-A-Purpose (GWAP), this paper contributes to a novel large-scale crowdsourcing disaster monitoring system. The system analyzes tagged satellite pictures from anonymous players, and then reports aggregated and evaluated monitoring results to its stakeholders. An algorithm based on directed graph centralities is presented to address the core issues of malicious user detection and disaster level calculation. Our method can be easily applied in other human computation systems. In the end, some issues with possible solutions are discussed for our future work.


Instagram Fake and Automated Account Detection

arXiv.org Machine Learning

Fake engagement is one of the significant problems in Online Social Networks (OSNs) which is used to increase the popularity of an account in an inorganic manner. The detection of fake engagement is crucial because it leads to loss of money for businesses, wrong audience targeting in advertising, wrong product predictions systems, and unhealthy social network environment. This study is related with the detection of fake and automated accounts which leads to fake engagement on Instagram. As far as we know, there is no publicly available dataset for fake and automated accounts. For this purpose, two datasets have been generated for the detection of fake and automated accounts. For the detection of these accounts, machine learning algorithms like Naive Bayes, logistic regression, support vector machines and neural networks are applied. Additionally, for the detection of automated accounts, cost sensitive genetic algorithm is applied because of the unnatural bias in the dataset. To deal with the unevenness problem in the fake dataset, Smote-nc algorithm is implemented. For the automated and fake account detection problem, 86% and 96% are obtained, respectively.


Can graph machine learning identify hate speech in online social networks?

#artificialintelligence

Over three decades, the Internet has grown from a small network of computers used by research scientists to communicate and exchange data to a technology that has penetrated almost every aspect of our day-to-day lives. Today, it is hard to imagine a life without online access for business, shopping, and socialising. A technology that has connected humanity at a scale never before possible has also amplified some of our worst qualities. Online hate speech spreads virally across the globe with short- and long-term consequences for individuals and societies. These consequences are often difficult to measure and predict. Online social media websites and mobile apps have inadvertently become the platform for the spread and proliferation of hate speech.


Inspecting adversarial examples using the Fisher information

arXiv.org Artificial Intelligence

Adversarial examples are slight perturbations that are designed to fool artificial neural networks when fed as an input. In this work the usability of the Fisher information for the detection of such adversarial attacks is studied. We discuss various quantities whose computation scales well with the network size, study their behavior on adversarial examples and show how they can highlight the importance of single input neurons, thereby providing a visual tool for further analyzing (un-)reasonable behavior of a neural network. The potential of our methods is demonstrated by applications to the MNIST, CIFAR10 and Fruits-360 datasets.


Vuno wins Korean approval for chest AI tool

#artificialintelligence

South Korean artificial intelligence (AI) developer Vuno has received class II approval for its Vuno Med Chest X-ray algorithm for identifying abnormal findings on chest x-rays. The algorithm was trained to find nodules, pneumothorax, effusions, and interstitial opacities that are commonly seen on chest x-ray images. The algorithm classifies lesions as normal or abnormal and highlights suspected abnormal regions. Results from clinical trials of the algorithm indicate it reduced the average reading time of medical staff by 50% while improving lesion detection performance by 5.8%. Sensitivity, specificity, and accuracy were all improved with Vuno Med Chest X-ray, and the probability of false positives in normal areas was reduced by 50%.


FAT Forensics: A Python Toolbox for Algorithmic Fairness, Accountability and Transparency

arXiv.org Artificial Intelligence

Machine learning algorithms can take important decisions, sometimes legally binding, about our everyday life. In most cases, however, these systems and decisions are neither regulated nor certified. Given the potential harm that these algorithms can cause, qualities such as fairness, accountability and transparency of predictive systems are of paramount importance. Recent literature suggested voluntary self-reporting on these aspects of predictive systems -- e.g., data sheets for data sets -- but their scope is often limited to a single component of a machine learning pipeline, and producing them requires manual labour. To resolve this impasse and ensure high-quality, fair, transparent and reliable machine learning systems, we developed an open source toolbox that can inspect selected fairness, accountability and transparency aspects of these systems to automatically and objectively report them back to their engineers and users. We describe design, scope and usage examples of this Python toolbox in this paper. The toolbox provides functionality for inspecting fairness, accountability and transparency of all aspects of the machine learning process: data (and their features), models and predictions. It is available to the public under the BSD 3-Clause open source licence.


Recognizing Variables from their Data via Deep Embeddings of Distributions

arXiv.org Machine Learning

--A key obstacle in automated analytics and meta-learning is the inability to recognize when different datasets contain measurements of the same variable. Because provided attribute labels are often uninformative in practice, this task may be more robustly addressed by leveraging the data values themselves rather than just relying on their arbitrarily selected variable names. Here, we present a computationally efficient method to identify high-confidence variable matches between a given set of data values and a large repository of previously encountered datasets. Our approach enjoys numerous advantages over distributional similarity based techniques because we leverage learned vector embeddings of datasets which adaptively account for naturalforms of data variation encountered in practice. Based on the neural architecture of deep sets, our embeddings can be computed for both numeric and string data. In dataset search and schema matching tasks, our methods outperform standard statistical techniques and we find that the learned embeddings generalize well to new data sources. I NTRODUCTION Emerging ideas in automated analytics [1] and meta-learning across many datasets [2] offer great promise for improving both performance and tedium in the data science pipeline. However, a major obstacle remains: such methods generally have no knowledge about what type of real-world entity (i.e. In contrast, human analysts presented with new data often utilize this knowledge to recall previously-encountered datasets that contain the same sort of variables. Reviewing past experience with how different algorithms fared on these same variables enables a person to quickly leverage methods that work well for this type of data (e.g.


Trust and Cognitive Load During Human-Robot Interaction

arXiv.org Artificial Intelligence

This paper presents an exploratory study to understand the relationship between a humans' cognitive load, trust, and anthropomorphism during human-robot interaction. To understand the relationship, we created a \say{Matching the Pair} game that participants could play collaboratively with one of two robot types, Husky or Pepper. The goal was to understand if humans would trust the robot as a teammate while being in the game-playing situation that demanded a high level of cognitive load. Using a humanoid vs. a technical robot, we also investigated the impact of physical anthropomorphism and we furthermore tested the impact of robot error rate on subsequent judgments and behavior. Our results showed that there was an inversely proportional relationship between trust and cognitive load, suggesting that as the amount of cognitive load increased in the participants, their ratings of trust decreased. We also found a triple interaction impact between robot-type, error-rate and participant's ratings of trust. We found that participants perceived Pepper to be more trustworthy in comparison with the Husky robot after playing the game with both robots under high error-rate condition. On the contrary, Husky was perceived as more trustworthy than Pepper when it was depicted as featuring a low error-rate. Our results are interesting and call further investigation of the impact of physical anthropomorphism in combination with variable error-rates of the robot.