Goto

Collaborating Authors

 Accuracy


Learning Deep Convolutional Neural Networks for X-Ray Protein Crystallization Image Analysis

AAAI Conferences

Obtaining a protein's 3D structure is crucial to the understanding of its functions and interactions with other proteins. It is critical to accelerate the protein crystallization process with improved accuracy for understanding cancer and designing drugs. Systematic high-throughput approaches in protein crystallization have been widely applied, generating a large number of protein crystallization-trial images. Therefore, an efficient and effective automatic analysis for these images is a top priority. In this paper, we present a novel system, CrystalNet, for automatically labeling outcomes of protein crystallization-trial images. CrystalNet is a deep convolutional neural network that automatically extracts features from X-ray protein crystallization images for classification. We show that (1) CrystalNet can provide real-time labels for crystallization images effectively, requiring approximately 2 seconds to provide labels for all 1536 images of crystallization microassay on each plate; (2) compared with the state-of-the-art classification systems in crystallization image analysis, our technique demonstrates an improvement of 8% in accuracy, and achieve 90.8% accuracy in classification. As a part of the high-throughput pipeline which generates millions of images a year, CrystalNet can lead to a substantial reduction of labor-intensive screening.


Exploiting an Oracle That Reports AUC Scores in Machine Learning Contests

AAAI Conferences

In machine learning contests such as the ImageNet Large Scale Visual Recognition Challenge and the KDD Cup, contestants can submit candidate solutions and receive from an oracle (typically the organizers of the competition) the accuracy of their guesses compared to the ground-truth labels. One of the most commonly used accuracy metrics for binary classification tasks is the Area Under the Receiver Operating Characteristics Curve (AUC). In this paper we provide proofs-of-concept of how knowledge of the AUC of a set of guesses can be used, in two different kinds of attacks, to improve the accuracy of those guesses. On the other hand, we also demonstrate the intractability of one kind of AUC exploit by proving that the number of possible binary labelings of n examples for which a candidate solution obtains a AUC score of c grows exponentially in n, for every c in (0,1).


Instilling Social to Physical: Co-Regularized Heterogeneous Transfer Learning

AAAI Conferences

Ubiquitous computing tasks, such as human activity recognition (HAR), are enabling a wide spectrum of applications, ranging from healthcare to environment monitoring. The success of a ubiquitous computing task relies on sufficient physical sensor data with groundtruth labels, which are always scarce due to the expensive annotating process. Meanwhile, social media platforms provide a lot of social or semantic context information. People share what they are doing and where they are frequently in the messages they post. This rich set of socially shared activities motivates us to transfer knowledge from social media to address the sparsity issue of labelled physical sensor data. In order to transfer the knowledge of social and semantic context, we propose a Co-Regularized Heterogeneous Transfer Learning (CoHTL) model, which builds a common semantic space derived from two heterogeneous domains. Our proposed method outperforms state-of-the-art methods on two ubiquitous computing tasks, namely human activity recognition and region function discovery.


Optimizing Personalized Email Filtering Thresholds to Mitigate Sequential Spear Phishing Attacks

AAAI Conferences

Highly targeted spear phishing attacks are increasingly common, and have been implicated in many major security breeches. Email filtering systems are the first line of defense against such attacks. These filters are typically configured with uniform thresholds for deciding whether or not to allow a message to be delivered to a user. However, users have very significant differences in both their susceptibility to phishing attacks as well as their access to critical information and credentials that can cause damage. Recent work has considered setting personalized thresholds for individual users based on a Stackelberg game model. We consider two important extensions of the previous model. First, in our model user values can be substitutable, modeling cases where multiple users provide access to the same information or credential. Second, we consider attackers who make sequential attack plans based on the outcome of previous attacks. Our analysis starts from scenarios where there is only one credential and then extends to more general scenarios with multiple credentials. For single-credential scenarios, we demonstrate that the optimal defense strategy can be found by solving a binary combinatorial optimization problem called PEDS. For multiple-credential scenarios, we formulate it as a bilevel optimization problem for finding the optimal defense strategy and then reduce it to a single level optimization problem called PEMS using complementary slackness conditions. Experimental results show that both PEDS and PEMS lead to significant higher defender utilities than two existing benchmarks in different parameter settings. Also, both PEDS and PEMS are more robust than the existing benchmarks considering uncertainties.


Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark

AAAI Conferences

Psychological research results have confirmed that people can have different emotional reactions to different visual stimuli. Several papers have been published on the problem of visual emotion analysis. In particular, attempts have been made to analyze and predict people's emotional reaction towards images. To this end, different kinds of hand-tuned features are proposed. The results reported on several carefully selected and labeled small image data sets have confirmed the promise of such features. While the recent successes of many computer vision related tasks are due to the adoption of Convolutional Neural Networks (CNNs), visual emotion analysis has not achieved the same level of success. This may be primarily due to the unavailability of confidently labeled and relatively large image data sets for visual emotion analysis. In this work, we introduce a new data set, which started from 3+ million weakly labeled images of different emotions and ended up 30 times as large as the current largest publicly available visual emotion data set. We hope that this data set encourages further research on visual emotion analysis. We also perform extensive benchmarking analyses on this large data set using the state of the art methods including CNNs.


Supervised Hashing via Uncorrelated Component Analysis

AAAI Conferences

The Approximate Nearest Neighbor (ANN) search problem is important in applications such as information retrieval. Several hashing-based search methods that provide effective solutions to the ANN search problem have been proposed. However, most of these focus on similarity preservation and coding error minimization, and pay little attention to optimizing the precision-recall curve or receiver operating characteristic curve. In this paper, we propose a novel projection-based hashing method that attempts to maximize the precision and recall. We first introduce an uncorrelated component analysis (UCA) by examining the precision and recall, and then propose a UCA-based hashing method. The proposed method is evaluated with a variety of datasets. The results show that UCA-based hashing outperforms state-of-the-art methods, and has computationally efficient training and encoding processes.


"8 Amazing Secrets for Getting More Clicks": Detecting Clickbaits in News Streams Using Article Informality

AAAI Conferences

Clickbaits are articles with misleading titles, exaggerating the content on the landing page. Their goal is to entice users to click on the title in order to monetize the landing page. The content on the landing page is usually of low quality. Their presence in user homepage stream of news aggregator sites (e.g., Yahoo news, Google news) may adversely impact user experience. Hence, it is important to identify and demote or block them on homepages. In this paper, we present a machine-learning model to detect clickbaits. We use a variety of features and show that the degree of informality of a webpage (as measured by different metrics) is a strong indicator of it being a clickbait. We conduct extensive experiments to evaluate our approach and analyze properties of clickbait and non-clickbait articles. Our model achieves high performance (74.9% F-1 score) in predicting clickbaits.


Face Behind Makeup

AAAI Conferences

In this work, we propose a novel automatic makeup detector and remover framework. For makeup detector, a locality-constrained low-rank dictionary learning algorithm is used to determine and locate the usage of cosmetics. For the challenging task of makeup removal, a locality-constrained coupled dictionary learning (LC-CDL) framework is proposed to synthesize non-makeup face, so that the makeup could be erased according to the style. Moreover, we build a stepwise makeup dataset (SMU) which to the best of our knowledge is the first dataset with procedures of makeup. This novel technology itself carries many practical applications, e.g. products recommendation for consumers; user-specified makeup tutorial; security applications on makeup face verification. Finally, our system is evaluated on three existing (VMU, MIW, YMU) and one own-collected makeup datasets. Experimental results have demonstrated the effectiveness of DL-based method on makeup detection. The proposed LC-CDL shows very promising performance on makeup removal regarding on the structure similarity. In addition, the comparison of face verification accuracy with presence or absence of makeup is presented, which illustrates an application of our automatic makeup remover system in the context of face verification with facial makeup.


MIT AI Researchers Make Breakthrough On Threat Detection

#artificialintelligence

Researchers with MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) believe that can offer the security world a huge boost in incident response and preparation with a new artificial-intelligence platform it believes can eventually become a secret weapon in squeezing the most productivity from security analyst teams. Dubbed AI2, the technology has shown the capability to offer three times more predictive capabilities and drastically fewer false positive than todays analytics methods. CSAIL gave a sneak peek into AI2 in a presentation to the academic community last week at the IEEE International Conference on Big Data Security, which detailed the specifics of a paper released to the public this morning. The driving force behind AI2 is its blending of artificial intelligence with what researchers at CSAIL call "analyst intuition," essentially finding an effective way to continuously model data with unsupervised machine learning while layering in periodic human feedback from skilled analysts to inform a supervised learning model. "You can think about the system as a virtual analyst," says CSAIL research scientist Kalyan Veeramachaneni, who developed AI2 with former CSAIL postdoc Ignacio Arnaldo, who is now a chief data scientist at PatternEx.