Goto

Collaborating Authors

 Accuracy


The Problem With Using AI To Fight Terrorism On Social Media

Forbes - Tech

Social media has a terrorism problem. From Twitter's famous 2015 letter to Congress that it would never restrict the right of terrorists to use its platform, to its rapid about-face in the face of public and governmental outcry, Silicon Valley has had a change of heart in how it sees its role in curbing the use of its tools by those who wish to commit violence across the world. Today Facebook released a new transparency report that emphasizes its efforts to combat terroristic use of its platform and the role AI is playing in what it claims are significant successes. Yet, that narrative of AI success has been increasingly challenged, from academic studies suggesting that not only is content not being deleted, but that other Facebook tools may actually be assisting terrorists, to a Bloomberg piece last week that demonstrates just how readily terrorist content can still be found on Facebook. Can we really rely on AI to curb terroristic use of social media?


Why today's real-time economy needs machine learning

#artificialintelligence

"Machine learning" is not just a buzzword for futuristic applications; it is the concept of machines carrying out tasks on their own that would typically require human intelligence. Its emergence is very much happening now. It is at the top of Gartner's hype cycle. In fact, Gartner predicts that by 2022, more than half of data and analytics services will be performed by machines instead of human beings, up from 10 percent today. And while not all machine learning use cases include real-time analytics, there is a definitive growth trend in the market for real-time decision making powered by machine learning.


Classification of Household Materials via Spectroscopy

arXiv.org Machine Learning

Abstract-- Recognizing an object's material can inform a robot on how hard it may grasp the object during manipulation, or if the object may be safely heated up. To estimate an object's material during manipulation, many prior works have explored the use of haptic sensing. In this paper, we explore a technique for robots to estimate the materials of objects using spectroscopy. We demonstrate that spectrometers provide several benefits for material recognition, including fast sensing times and accurate measurements with low noise. Furthermore, spectrometers do not require direct contact with an object. To illustrate this, we collected a dataset of spectral measurements from two commercially available spectrometers during which a robotic platform interacted with 50 distinct objects, and we show that a residual neural network can accurately analyze these measurements. Due to the low variance in consecutive spectral measurements, our model achieved a material classification accuracy of 97.7% when given only one spectral sample per object. Similar to prior works with haptic sensors, we found that generalizing material recognition to new objects posed a greater challenge, for which we achieved an accuracy of 81.4% via leave-one-object-out cross-validation. From this work, we find that spectroscopy poses a promising approach for further research in material classification during robotic manipulation.


Neural Classification of Malicious Scripts: A study with JavaScript and VBScript

arXiv.org Artificial Intelligence

Malicious scripts are an important computer infection threat vector. Our analysis reveals that the two most prevalent types of malicious scripts include JavaScript and VBScript. The percentage of detected JavaScript attacks are on the rise. To address these threats, we investigate two deep recurrent models, LaMP (LSTM and Max Pooling) and CPoLS (Convoluted Partitioning of Long Sequences), which process JavaScript and VBScript as byte sequences. Lower layers capture the sequential nature of these byte sequences while higher layers classify the resulting embedding as malicious or benign. Unlike previously proposed solutions, our models are trained in an end-to-end fashion allowing discriminative training even for the sequential processing layers. Evaluating these models on a large corpus of 296,274 JavaScript files indicates that the best performing LaMP model has a 65.9% true positive rate (TPR) at a false positive rate (FPR) of 1.0%. Similarly, the best CPoLS model has a TPR of 45.3% at an FPR of 1.0%. LaMP and CPoLS yield a TPR of 69.3% and 67.9%, respectively, at an FPR of 1.0% on a collection of 240,504 VBScript files.


Man & Machine โ€“ A Mutually Beneficial Partnership in the Age of Artificial Intelligence

#artificialintelligence

Analysis predicting major societal problems caused by artificial intelligence (AI) surfaces every other day: how AI could be used to manipulate elections and launch drone attacks. The major fear seems to be that AI is set to make humans a redundant force in the workplace. Yes, AI, like any evolving technology, is set to change our jobs, but could it also be the key to unlocking creativity and productivity in the business sector? It's clear that nothing is holding AI back. Replacing an existing business process requires a clear investment case.


Confidence Scoring Using Whitebox Meta-models with Linear Classifier Probes

arXiv.org Machine Learning

We propose a confidence scoring mechanism for multi-layer neural networks based on a paradigm of a base model and a meta-model. The confidence score is learned by the meta-model using features derived from the base model -- a deep multi-layer neural network -- considered a whitebox. As features, we investigate linear classifier probes inserted between the various layers of the base model and trained using each layer's intermediate activations. Experiments show that this approach outperforms various baselines in a filtering task, i.e., task of rejecting samples with low confidence. Experimental results are presented using CIFAR-10 and CIFAR-100 dataset with and without added noise exploring various aspects of the method.


Model selection with lasso-zero: adding straw to the haystack to better find needles

arXiv.org Machine Learning

The high-dimensional linear model $y = X \beta^0 + \epsilon$ is considered and the focus is put on the problem of recovering the support $S^0$ of the sparse vector $\beta^0.$ We introduce lasso-zero, a new $\ell_1$-based estimator whose novelty resides in an "overfit, then threshold" paradigm and the use of noise dictionaries for overfitting the response. The methodology is supported by theoretical results obtained in the special case where no noise dictionary is used. In this case, lasso-zero boils down to thresholding the basis pursuit solution. We prove that this procedure requires weaker conditions on $X$ and $S^0$ than the lasso for exact support recovery, and controls the false discovery rate for orthonormal designs when tuned by the quantile universal threshold. However it requires a high signal-to-noise ratio, and the use of noise dictionaries addresses this issue. The threshold selection procedure is based on a pivotal statistic and does not require knowledge of the noise level. Numerical simulations show that lasso-zero performs well in terms of support recovery and provides a good trade-off between high true positive rate and low false discovery rate compared to competitors.


Large-Scale QA-SRL Parsing

arXiv.org Artificial Intelligence

We present a new large-scale corpus of Question-Answer driven Semantic Role Labeling (QA-SRL) annotations, and the first high-quality QA-SRL parser. Our corpus, QA-SRL Bank 2.0, consists of over 250,000 question-answer pairs for over 64,000 sentences across 3 domains and was gathered with a new crowd-sourcing scheme that we show has high precision and good recall at modest cost. We also present neural models for two QA-SRL subtasks: detecting argument spans for a predicate and generating questions to label the semantic relationship. The best models achieve question accuracy of 82.6% and span-level accuracy of 77.6% (under human evaluation) on the full pipelined QA-SRL prediction task. They can also, as we show, be used to gather additional annotations at low cost.


You are your Metadata: Identification and Obfuscation of Social Media Users using Metadata Information

arXiv.org Artificial Intelligence

Metadata are associated with most of the information we produce in our daily interactions and communication in the digital world. Y et, surprisingly, metadata are often still categorized as nonsensitive. Indeed, in the past, researchers and practitioners have mainly focused on the problem of the identification of a user from the content of a message. In this paper, we use Twitter as a case study to quantify the uniqueness of the association between metadata and user identity and to understand the effectiveness of potential obfuscation strategies. More specifically, we analyze atomic fields in the metadata and systematically combine them in an effort to classify new tweets as belonging to an account using different machine learning algorithms of increasing complexity. We demonstrate that, through the application of a supervised learning algorithm, we are able to identify any user in a group of 10,000 with approximately 96.7% accuracy. Moreover, if we broaden the scope of our search and consider the 10 most likely candidates we increase the accuracy of the model to 99.22%. We also found that data obfuscation is hard and ineffective for this type of data: even after perturbing 60% of the training data, it is still possible to classify users with an accuracy higher than 95%. These results have strong implications in terms of the design of metadata obfuscation strategies, for example for data set release, not only for Twitter, but, more generally, for most social media platforms.


Metropolitan Police's facial recognition technology 98% inaccurate, figures show

The Independent - Tech

Facial recognition software used by the UK's biggest police force has returned false positives in more than 98 per cent of alerts generated, The Independent can reveal, with the country's biometrics regulator calling it "not yet fit for use". The Metropolitan Police's system has produced 104 alerts of which only two were later confirmed to be positive matches, a freedom of information request showed. In its response the force said it did not consider the inaccurate matches "false positives" because alerts were checked a second time after they occurred. Facial recognition technology scans people in a video feed and compares their images to pictures stored in a reference library or watch list. It has been used at large events like the Notting Hill Carnival and a Six Nations Rugby match. The system used by another force, South Wales Police, has returned more than 2,400 false positives in 15 deployments since June 2017.